bluegraph.downstream package¶
Data structures¶
- class bluegraph.downstream.data_structures.ElementClassifier(model, feature_vector_prop=None, feature_props=None, **kwargs)¶
Interface for graph element classification models.
It wraps a predictive classification model provided by the user and a set of configs that allow the user to fit the model and make predictions on the input PGFrames. Its main goal is to hide the details on converting element (node or edge) properties into data tables that can be provided to the predictive model.
- fit(pgframe, train_elements=None, labels=None, label_prop=None, **kwargs)¶
Fit the classifier.
- predict(pgframe, predict_elements=None)¶
Run prediction on the input graph.
Similarity¶
- class bluegraph.downstream.similarity.FaissSimilarityIndex(dimension=None, similarity='euclidean', initial_vectors=None, n_segments=1)¶
Similarity index based of faiss indices.
This class allows to build vector similarity indices using Faiss. It allows to segment the search space into Voronoi cells (see example: https://github.com/facebookresearch/faiss/wiki/Faster-search) allowing to speed up the search.
- add(vectors)¶
Add new vectors to the index.
- static export_index(index, path)¶
Dump backend-specific similarity index object.
- static load_index(path)¶
Load backend-specific similarity index object.
- reconstruct(index)¶
Get a vector by its integer index.
- search(vectors, k)¶
Search for k nearest neighbors to the provided vectors.
- class bluegraph.downstream.similarity.NodeSimilarityProcessor(pgframe, vector_property, similarity='euclidean', index_configs=None)¶
Node similarity processor.
This class allows to build and query node similarity indices using Faiss. In wraps the underlying graph object and the vector similarity processor and provides.
- class bluegraph.downstream.similarity.ScikitLearnSimilarityIndex(dimension, similarity='euclidean', initial_vectors=None, leaf_size=40, index_type='balltree')¶
Similarity index based of scikit-learn indices.
This class allows to build vector similarity indices using scikit-learn. It allows to use various distance metrics with KDTrees and BallTrees.
- add(vectors)¶
Add new vectors to the index.
- static export_index(index, path)¶
Dump backend-specific similarity index object.
- static load_index(path)¶
Load backend-specific similarity index object.
- reconstruct(index)¶
Get a vector by its integer index.
- search(vectors, k)¶
Search for k nearest neighbors to the provided vectors.
- class bluegraph.downstream.similarity.SimilarityIndex¶
An interface for similarity indices.
This class specifies an interface for vector similarity indices that can be plugged into BlueGraph’s SimilarityProcessor
- exception IndexException¶
- exception QueryException¶
- exception SimilarityException¶
- exception SimilarityWarning¶
- exception TrainException¶
- abstract add(vectors)¶
Add new vectors to the index.
- export(path, index_path)¶
Dump index object.
- abstract static export_index(self, index, path)¶
Dump backend-specific similarity index object.
- static load(path, index_path)¶
Load index object.
- abstract static load_index(self, path)¶
Load backend-specific similarity index object.
- abstract reconstruct(index)¶
Get a vector by its integer index.
- abstract search(vectors, k)¶
Search for k nearest neighbors to the provided vectors.
- class bluegraph.downstream.similarity.SimilarityProcessor(similarity_index, point_ids=None)¶
Vector similarity processor.
This class wraps the indices (names or IDs) of the points, vector space and similarity measure configs.
- exception QueryException¶
- exception SimilarityException¶
- exception SimilarityProcessorWarning¶
- add(vectors, point_ids=None)¶
Add new points to the index.
- get_neighbors(vectors=None, point_ids=None, existing_points=None, k=10, add_to_index=False)¶
Get top N similar points.
- get_vectors(existing_points)¶
Get vectors for passed point indices.
- query_existing(existing_points, k=10)¶
Query existing points.
- query_new(vectors, k=10)¶
Query input vectors.
- bluegraph.downstream.similarity.kl_divergence(v1, v2)¶
Compute Kullback–Leibler divergence on two vectors.
- bluegraph.downstream.similarity.poincare_distance(v1, v2)¶
Compute Poincare distance between two vectors.
- bluegraph.downstream.similarity.wasserstein_metric(v1, v2)¶
Compute Wasserstein metric on two vectors.
Node classification¶
- class bluegraph.downstream.node_classification.NodeClassifier(model, feature_vector_prop=None, feature_props=None, **kwargs)¶
Interface for node classification models.
This wrapper alows to build classification models of PGFrame nodes.
Edge prediction¶
This module is inspired by the following StellarGraph demo (licensed under Apache 2) https://stellargraph.readthedocs.io/en/stable/demos/link-prediction/node2vec-link-prediction.html.
- class bluegraph.downstream.link_prediction.EdgePredictor(model, feature_vector_prop=None, feature_props=None, operator='hadamard', directed=True)¶
Interface for edge prediction models.
This wrapper alows to build predictive models for PGFrame edges that discriminate between true and false edges of the given node pairs.
- bluegraph.downstream.link_prediction.generate_negative_edges(pgframe, p=0.5, directed=True, edges_to_exclude=None)¶
Generate false edges of the input PGFrame.
- Parameters
pgframe (bluegraph.core.io.PGFrame) – The input graph
p (float, optional) – Fraction of graph edges to use as the number of false edges. If the input graph has N edges, int(N * p) false edges will be generated.
directed (bool, optional) – Flag indicating whether the input graph should be interpreted as directed.
edges_to_exclude (collection of tuples) – Additional edges to exclude from generation (these edges are not necessarily in the set of edges of the input graph).
- Returns
negative_edges – List of false edges
- Return type
list of tuples