vO.1.2 Release Notes¶
This release includes some major bug-fixes, several new features and API changes described below.
Blue Graph’s core¶
PGFrame¶
Updates to the PGFrame
interface include:
- Added methods:
rename_node_properties
andrename_edge_properties
for changing property names;add_nodes_from_df
andadd_edges_from_df
for additing nodes and edges using dataframes.
Added the
from_ontology
classmethod for importing (e.g. from Webprotege) ontologies as property graphs.Property values that are added to existing properties are now aggregated into sets (and not replaced as it was before).
Backend support¶
graph-tool¶
Fix of a major bug occuring during node merging.
Neo4j¶
Several minor bugfixes of the Neo4j backend were included in this release. In additon, the interfaces of pgframe_to_neo4j
has changed:
NaN
properties are skipped;Node types can be used as Neo4j node labels;
Edge types can be used as Neo4j edge relationship types: edges with multiple types result in multiple Neo4j relationships with respective types and their properties replicated (this behaviour is implemented due to the fact that Neo4j relationships can have exactly one relationship type).
Graph preprocessing with BlueGraph¶
Semantic property encoding¶
Updates to the encoders:
Word2VecModel
is renamed toDoc2VecEncoder
and is inherited frombluegraph.downstream.Preprocessor
;Wrapped scikit-learn’s
TfidfVectorizer
intoTfIdfEncoder
inheritingbluegraph.downstream.Preprocessor
.
The above-mentioned changes allow using the BlueGraph’s encoders as a part of EmbeddingPipeline
).
Downstream tasks with BlueGraph¶
Similarity API¶
Similarity processor updates:
Smarter handling of elements not existing in the index (when vectors or similar points are requested,
None
is returned).Segmented Faiss index can be initialized without vectors, the model can be trained on the first call to
add
.
Embedding pipelines¶
Embedding pipeline updates:
Added basic prediction interface (the
run_prediction
method);Modified
get_similar_points
to be able to query similarity for the unknown vectors;Optimized embedding pipeline memory usage: embedding table is not stored explicitly, but is a part of Faiss index.
Services¶
Embedder¶
Embedder is a mini-service for retrieving embedding vectors and similar points distributed as a part of BlueGraph. A detailed description of the API can be found here. Two examples can be found in the Embedder API for NCIt term embedding notebook and Embedder API for node embedding.
This release includes the following updates to the service:
- Embedder app can predict vectors for unseen points, the following formats can be passed on the input
raw
: raw data as isjson_pgframe
: a JSON representation of a PGFramenexus_dataset
: endpoint, bucket, resource id and a Nexus token (in the request header), fetches the dataset by resource ID, downloads it and creates a PGFrame (dataset is a JSON representation of a PGFrame).
API changes: the POST method for
embedding/
andsimilar-points/
operates on unseen points;Dockerfile fix (smaller image size), dockerignore updates
Embedder app can fetch local models from the directory (specified in the configs).