Pular para o conteúdo principal

DGL-KE : Deep Graph Library (DGL)

Fonte: https://towardsdatascience.com/introduction-to-knowledge-graph-embedding-with-dgl-ke-77ace6fb60ef

Amazon recently launched DGL-KE, a software package that simplifies this process with simple command-line scripts. With DGL-KE, users can generate embeddings for very large graphs 2–5x faster than competing techniques. DGL-KE provides users the flexibility to select models used to generate embeddings and optimize performance by configuring hardware, data sampling parameters, and the loss function. To use this package effectively, however, it is important to understand how embeddings work and the optimizations available to compute them. This two-part blog series is designed to provide this information and get you ready to start taking advantage of DGL-KE.

Finally, another class of graphs that is especially important for knowledge graphs are multigraphs. These are graphs that can have multiple (directed) edges between the same pair of nodes and can also contain loops. These multiple edges are typically of different types and as such most multigraphs are heterogeneous. Note that graphs that do not allow these multiple edges and self-loops are called simple graphs

A knowledge graph (KG) is a directed heterogeneous multigraph whose node and relation types have domain-specific semantics. KGs allow us to encode the knowledge into a form that is human interpretable and amenable to automated analysis and inference. 

Knowledge graph embeddings (KGEs) are low-dimensional representations of the entities and relations in a knowledge graph. They provide a generalizable context about the overall KG that can be used to infer relations.

The knowledge graph embeddings are computed so that they satisfy certain properties; i.e., they follow a given KGE model. These KGE models define different score functions that measure the distance of two entities relative to its relation type in the low-dimensional embedding space. These score functions are used to train the KGE models so that the entities connected by relations are close to each other while the entities that are not connected are far away.

There are many popular KGE models such as TransE, TransR, RESCAL, DistMult, ComplEx, and RotatE, which define different score functions to learn entity and relation embeddings. DGL-KE makes these implementations accessible with a simple input argument in the command line script.

TransE
Translation based embedding model (TransE) is a representative translational distance model that represents entities and relations as vectors in the same semantic space of dimension Rd, where d is the dimension of the target space with reduced dimension. A fact in the source space is represented as a triplet (h,r,t) where h is short for the head, r is for the relation, and t is for the tail. The relationship is interpreted as a translation vector so that the embedded entities are connected by relation r have a short distance. [3, 4]

In terms of vector computation, it could mean adding a head to a relation should approximate to the relation’s tail or h+r ≈ t.

TransE performs linear transformation and the scoring function is negative distance between:

TransE cannot cover a relationship that is not 1-to-1 as it learns only one aspect of similarity. TransR addresses this issue with separating relationship space from entity space where h, t ∈ ℝᵏ and $r ∈ ℝᵈ. The semantic spaces do not need to be of the same dimension. 

 

The score function in TransR is similar to the one used in TransE and measures the euclidean distance between h+r and t, but the distance measure is per relationship space. 

Fonte: https://towardsdatascience.com/optimize-knowledge-graph-embeddings-with-dgl-ke-1fff4ab275f2

What is DGL-KE

To recap, DGL-KE is a high performance, easy-to-use, and scalable toolkit to generate knowledge graph embeddings from large graphs. It is built on top of the Deep Graph Library (DGL), an open-source library to implement Graph Neural Networks (GNN).


Instalação no VM029

mkdir dgl-ke

cd dgl-ke
conda create -n dgl_pytorch python==3.8
conda activate dgl_pytorch  

conda install numpy 
conda install PyTorch torchvision cudatoolkit=10 -c PyTorch
pip install dgl dglke
python - version
python
>>> import torch
>>> torch.__version__
'1.9.1' 

>>> import dgl
Using backend: pytorch
>>> import dglke
>>> dgl.__version__, dglke.__version__
('0.6.1', '0.1.2')

>>> quit ()

mkdir my_task && cd my_task 
DGLBACKEND=pytorch
dglke_train --model_name TransE_l2 --dataset FB15k --batch_size 1000 \ 
--neg_sample_size 200 --hidden_dim 400 --gamma 19.9 --lr 0.25 --max_step 500 \
--log_interval 100 --batch_size_eval 16 -adv --regularization_coef 1.00E-09 \
--test --num_thread 1 --num_proc 8

O teste acima apresentou erro 

Using backend: pytorch
Reading train triples....
Finished. Read 483142 train triples.
Reading valid triples....
Finished. Read 50000 valid triples.
Reading test triples....
Finished. Read 59071 test triples.
|Train|: 483142
random partition 483142 edges into 8 parts
part 0 has 60393 edges
part 1 has 60393 edges
part 2 has 60393 edges
part 3 has 60393 edges
part 4 has 60393 edges
part 5 has 60393 edges
part 6 has 60393 edges
part 7 has 60391 edges
/home/cloud-di/miniconda3/envs/dgl_pytorch/lib/python3.8/site-packages/dgl/base.py:45: DGLWarning: Recommend creating graphs by `dgl.graph(data)` instead of `dgl.DGLGraph(data)`.
  return warnings.warn(message, category=category, stacklevel=1)
/home/cloud-di/miniconda3/envs/dgl_pytorch/lib/python3.8/site-packages/dgl/base.py:45: DGLWarning: Keyword arguments ['readonly', 'multigraph', 'sort_csr'] are deprecated in v0.5, and can be safely removed in all cases.
  return warnings.warn(message, category=category, stacklevel=1)
Traceback (most recent call last):
  File "/home/cloud-di/miniconda3/envs/dgl_pytorch/bin/dglke_train", line 8, in <module>
    sys.exit(main())
  File "/home/cloud-di/miniconda3/envs/dgl_pytorch/lib/python3.8/site-packages/dglke/train.py", line 118, in main
    train_sampler_head = train_data.create_sampler(args.batch_size,
  File "/home/cloud-di/miniconda3/envs/dgl_pytorch/lib/python3.8/site-packages/dglke/dataloader/sampler.py", line 379, in create_sampler
    return EdgeSampler(self.g,
  File "/home/cloud-di/miniconda3/envs/dgl_pytorch/lib/python3.8/site-packages/dgl/contrib/sampling/sampler.py", line 683, in __init__
    self._sampler = _CAPI_CreateUniformEdgeSampler(
  File "dgl/_ffi/_cython/./function.pxi", line 287, in dgl._ffi._cy3.core.FunctionBase.__call__
  File "dgl/_ffi/_cython/./function.pxi", line 232, in dgl._ffi._cy3.core.FuncCall
  File "dgl/_ffi/_cython/./base.pxi", line 155, in dgl._ffi._cy3.core.CALL
dgl._ffi.base.DGLError: [11:53:32] /opt/dgl/include/dgl/packed_func_ext.h:117: Check failed: ObjectTypeChecker<TObjectRef>::Check(sptr.get()): Expected type graph.Graph but get graph.HeteroGraph
Stack trace:
  [bt] (0) /home/cloud-di/miniconda3/envs/dgl_pytorch/lib/python3.8/site-packages/dgl/libdgl.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x4f) [0x7f85552fe03f]
  [bt] (1) /home/cloud-di/miniconda3/envs/dgl_pytorch/lib/python3.8/site-packages/dgl/libdgl.so(dgl::GraphRef dgl::runtime::DGLArgValue::AsObjectRef<dgl::GraphRef>() const+0x264) [0x7f85554824a4]
  [bt] (2) /home/cloud-di/miniconda3/envs/dgl_pytorch/lib/python3.8/site-packages/dgl/libdgl.so(+0x9b1690) [0x7f8555a97690]
  [bt] (3) /home/cloud-di/miniconda3/envs/dgl_pytorch/lib/python3.8/site-packages/dgl/libdgl.so(+0x9b2264) [0x7f8555a98264]
  [bt] (4) /home/cloud-di/miniconda3/envs/dgl_pytorch/lib/python3.8/site-packages/dgl/libdgl.so(DGLFuncCall+0x48) [0x7f85559f1d48]
  [bt] (5) /home/cloud-di/miniconda3/envs/dgl_pytorch/lib/python3.8/site-packages/dgl/_ffi/_cy3/core.cpython-38-x86_64-linux-gnu.so(+0x15d3e) [0x7f8554c4ed3e]
  [bt] (6) /home/cloud-di/miniconda3/envs/dgl_pytorch/lib/python3.8/site-packages/dgl/_ffi/_cy3/core.cpython-38-x86_64-linux-gnu.so(+0x1626b) [0x7f8554c4f26b]
  [bt] (7) /home/cloud-di/miniconda3/envs/dgl_pytorch/bin/python(_PyObject_MakeTpCall+0x22f) [0x55e80661331f]
  [bt] (8) /home/cloud-di/miniconda3/envs/dgl_pytorch/bin/python(_PyEval_EvalFrameDefault+0x485) [0x55e80669a945]

Comentários

Postagens mais visitadas deste blog

Aula 12: WordNet | Introdução à Linguagem de Programação Python *** com NLTK

 Fonte -> https://youtu.be/0OCq31jQ9E4 A WordNet do Brasil -> http://www.nilc.icmc.usp.br/wordnetbr/ NLTK  synsets = dada uma palavra acha todos os significados, pode informar a língua e a classe gramatical da palavra (substantivo, verbo, advérbio) from nltk.corpus import wordnet as wn wordnet.synset(xxxxxx).definition() = descrição do significado É possível extrair hipernimia, hiponimia, antonimos e os lemas (diferentes palavras/expressões com o mesmo significado) formando uma REDE LEXICAL. Com isso é possível calcular a distância entre 2 synset dentro do grafo.  Veja trecho de código abaixo: texto = 'útil' print('NOUN:', wordnet.synsets(texto, lang='por', pos=wordnet.NOUN)) texto = 'útil' print('ADJ:', wordnet.synsets(texto, lang='por', pos=wordnet.ADJ)) print(wordnet.synset('handy.s.01').definition()) texto = 'computador' for synset in wn.synsets(texto, lang='por', pos=wn.NOUN):     print('DEF:',s...

truth makers AND truth bearers - Palestra Giancarlo no SBBD

Dando uma googada https://iep.utm.edu/truth/ There are two commonly accepted constraints on truth and falsehood:     Every proposition is true or false.         [Law of the Excluded Middle.]     No proposition is both true and false.         [Law of Non-contradiction.] What is the difference between a truth-maker and a truth bearer? Truth-bearers are either true or false; truth-makers are not since, not being representations, they cannot be said to be true, nor can they be said to be false . That's a second difference. Truth-bearers are 'bipolar,' either true or false; truth-makers are 'unipolar': all of them obtain. What are considered truth bearers?   A variety of truth bearers are considered – statements, beliefs, claims, assumptions, hypotheses, propositions, sentences, and utterances . When I speak of a fact . . . I mean the kind of thing that makes a proposition true or false. (Russe...