bluegraph.downstream package

Data structures

class bluegraph.downstream.data_structures.ElementClassifier(model, feature_vector_prop=None, feature_props=None, **kwargs)

Interface for graph element classification models.

It wraps a predictive classification model provided by the user and a set of configs that allow the user to fit the model and make predictions on the input PGFrames. Its main goal is to hide the details on converting element (node or edge) properties into data tables that can be provided to the predictive model.

fit(pgframe, train_elements=None, labels=None, label_prop=None, **kwargs)

Fit the classifier.

predict(pgframe, predict_elements=None)

Run prediction on the input graph.

Similarity

class bluegraph.downstream.similarity.FaissSimilarityIndex(dimension=None, similarity='euclidean', initial_vectors=None, n_segments=1)

Similarity index based of faiss indices.

This class allows to build vector similarity indices using Faiss. It allows to segment the search space into Voronoi cells (see example: https://github.com/facebookresearch/faiss/wiki/Faster-search) allowing to speed up the search.

add(vectors)

Add new vectors to the index.

static export_index(index, path)

Dump backend-specific similarity index object.

static load_index(path)

Load backend-specific similarity index object.

reconstruct(index)

Get a vector by its integer index.

search(vectors, k)

Search for k nearest neighbors to the provided vectors.

class bluegraph.downstream.similarity.NodeSimilarityProcessor(pgframe, vector_property, similarity='euclidean', index_configs=None)

Node similarity processor.

This class allows to build and query node similarity indices using Faiss. In wraps the underlying graph object and the vector similarity processor and provides.

class bluegraph.downstream.similarity.ScikitLearnSimilarityIndex(dimension, similarity='euclidean', initial_vectors=None, leaf_size=40, index_type='balltree')

Similarity index based of scikit-learn indices.

This class allows to build vector similarity indices using scikit-learn. It allows to use various distance metrics with KDTrees and BallTrees.

add(vectors)

Add new vectors to the index.

static export_index(index, path)

Dump backend-specific similarity index object.

static load_index(path)

Load backend-specific similarity index object.

reconstruct(index)

Get a vector by its integer index.

search(vectors, k)

Search for k nearest neighbors to the provided vectors.

class bluegraph.downstream.similarity.SimilarityIndex

An interface for similarity indices.

This class specifies an interface for vector similarity indices that can be plugged into BlueGraph’s SimilarityProcessor

exception IndexException
exception QueryException
exception SimilarityException
exception SimilarityWarning
exception TrainException
abstract add(vectors)

Add new vectors to the index.

export(path, index_path)

Dump index object.

abstract static export_index(self, index, path)

Dump backend-specific similarity index object.

static load(path, index_path)

Load index object.

abstract static load_index(self, path)

Load backend-specific similarity index object.

abstract reconstruct(index)

Get a vector by its integer index.

abstract search(vectors, k)

Search for k nearest neighbors to the provided vectors.

class bluegraph.downstream.similarity.SimilarityProcessor(similarity_index, point_ids=None)

Vector similarity processor.

This class wraps the indices (names or IDs) of the points, vector space and similarity measure configs.

exception QueryException
exception SimilarityException
exception SimilarityProcessorWarning
add(vectors, point_ids=None)

Add new points to the index.

get_neighbors(vectors=None, point_ids=None, existing_points=None, k=10, add_to_index=False)

Get top N similar points.

get_vectors(existing_points)

Get vectors for passed point indices.

query_existing(existing_points, k=10)

Query existing points.

query_new(vectors, k=10)

Query input vectors.

bluegraph.downstream.similarity.kl_divergence(v1, v2)

Compute Kullback–Leibler divergence on two vectors.

bluegraph.downstream.similarity.poincare_distance(v1, v2)

Compute Poincare distance between two vectors.

bluegraph.downstream.similarity.wasserstein_metric(v1, v2)

Compute Wasserstein metric on two vectors.

Node classification

class bluegraph.downstream.node_classification.NodeClassifier(model, feature_vector_prop=None, feature_props=None, **kwargs)

Interface for node classification models.

This wrapper alows to build classification models of PGFrame nodes.

Edge prediction

This module is inspired by the following StellarGraph demo (licensed under Apache 2) https://stellargraph.readthedocs.io/en/stable/demos/link-prediction/node2vec-link-prediction.html.

Interface for edge prediction models.

This wrapper alows to build predictive models for PGFrame edges that discriminate between true and false edges of the given node pairs.

Generate false edges of the input PGFrame.

Parameters
  • pgframe (bluegraph.core.io.PGFrame) – The input graph

  • p (float, optional) – Fraction of graph edges to use as the number of false edges. If the input graph has N edges, int(N * p) false edges will be generated.

  • directed (bool, optional) – Flag indicating whether the input graph should be interpreted as directed.

  • edges_to_exclude (collection of tuples) – Additional edges to exclude from generation (these edges are not necessarily in the set of edges of the input graph).

Returns

negative_edges – List of false edges

Return type

list of tuples