bluegraph.downstream package

Data structures

class bluegraph.downstream.data_structures.ElementClassifier(model, feature_vector_prop=None, feature_props=None, **kwargs)

Interface for graph element classification models.

It wraps a predictive classification model provided by the user and a set of configs that allow the user to fit the model and make predictions on the input PGFrames. Its main goal is to hide the details on converting element (node or edge) properties into data tables that can be provided to the predictive model.

fit(pgframe, train_elements=None, labels=None, label_prop=None, **kwargs)

Fit the classifier.

predict(pgframe, predict_elements=None)

Run prediction on the input graph.

class bluegraph.downstream.data_structures.EmbeddingPipeline(preprocessor=None, embedder=None, similarity_processor=None)

Data structure for stacking embedding pipelines.

In this context, an embedding pipeline consists of the following steps:

  1. preprocess

  2. embedd

  3. build a similarity index

exception EmbeddingPipelineException

Pipeline exception class.

exception EmbeddingPipelineWarning

Pipeline warning class.

generate_embedding_table()

Generate embedding table from similarity index.

get_index()

Get index of existing points.

get_similar_points(vectors=None, existing_indices=None, k=10, preprocessor_kwargs=None, embedder_kwargs=None)

Get top most similar points for the input indices.

is_inductive()

Return flag indicating if the embedder is inductive.

is_transductive()

Return flag indicating if the embedder is transductive.

classmethod load(path, embedder_interface=None, embedder_ext='pkl')

Load a dumped embedding pipeline.

retrieve_embeddings(indices)

Get embedding vectors for the input indices.

run_fitting(data, index=None, preprocessor_kwargs=None, embedder_kwargs=None)

Run fitting of the pipeline components.

run_prediction(data, preprocessor_kwargs=None, embedder_kwargs=None, data_indices=None, add_to_index=False)

Run prediction using the pipeline components.

save(path, compress=False)

Save the pipeline.

Similarity

class bluegraph.downstream.similarity.NodeSimilarityProcessor(pgframe, vector_property, similarity='euclidean')

Node similarity processor.

This class allows to build and query node similarity indices using Faiss. In wraps the underlying graph object and the vector similarity processor and provides.

class bluegraph.downstream.similarity.SimilarityProcessor(dimension, similarity='euclidean', initial_vectors=None, initial_index=None, n_segments=1)

Vector similarity processor.

This class allows to build vector similarity indices using Faiss. In wraps the indices (names or IDs) of the points, vector space and similarity measure configs. It also allows to segment the search space into Voronoi cells (see example: https://github.com/facebookresearch/faiss/wiki/Faster-search) allowing to speed up the search.

exception IndexException
exception QueryException
exception SimilarityException
exception SimilarityWarning
exception TrainException
add(vectors, vector_indices=None)

Add new points to the index.

get_similar_points(vectors=None, vector_indices=None, existing_indices=None, k=10, add_to_index=False)

Get top N similar points.

get_vectors(existing_indices)

Get vectors for passed point indices.

query_existing(existing_indices, k=10)

Query existing points.

query_new(vectors, k=10)

Query input vectors.

Node classification

class bluegraph.downstream.node_classification.NodeClassifier(model, feature_vector_prop=None, feature_props=None, **kwargs)

Interface for node classification models.

This wrapper alows to build classification models of PGFrame nodes.

Edge prediction

This module is inspired by the following StellarGraph demo (licensed under Apache 2) https://stellargraph.readthedocs.io/en/stable/demos/link-prediction/node2vec-link-prediction.html.

Interface for edge prediction models.

This wrapper alows to build predictive models for PGFrame edges that discriminate between true and false edges of the given node pairs.

Generate false edges of the input PGFrame.

Parameters
  • pgframe (bluegraph.core.io.PGFrame) – The input graph

  • p (float, optional) – Fraction of graph edges to use as the number of false edges. If the input graph has N edges, int(N * p) false edges will be generated.

  • directed (bool, optional) – Flag indicating whether the input graph should be interpreted as directed.

  • edges_to_exclude (collection of tuples) – Additional edges to exclude from generation (these edges are not necessarily in the set of edges of the input graph).

Returns

negative_edges – List of false edges

Return type

list of tuples