Pular para o conteúdo principal

Gremlin x Query using Context Meta-information (Edges Properties)

Download do TinkerPop Gremlin Console com plugin de conversão Cyper->Gremlin

https://github.com/opencypher/cypher-for-gremlin/releases

Descompactar naVM029 e executar dentro da pasta bin o script gremlin.sh

// Para converter Cyper em Gremlin

gremlin> :plugin use opencypher.gremlin
gremlin> :remote connect opencypher.gremlin conf/remote-objects.yaml translate gremlin 
gremlin> :> EXPLAIN  {comando em Cypher}

Script para inMemory TinkerGraph

// Create an empty graph database (Código em Groovy, variação de Java)

graph = TinkerGraph.open()

// Create the TraversalSource

g = graph.traversal()

// Create vertex with properties and save a pointer to vertex for future use


c1 = g.addV('Country')\
      .property(id, 1)\
      .property('name','Germany') \
      .property('language', 'German') \
      .property('continent', 'Europe') \
      .property('population', 83000000)\
      .next()

c2 = g.addV('Country')\
      .property(id, 2)\
      .property('name','France') \
      .property('language', 'French') \
      .property('continent', 'Europe') \
      .property('population', 67000000)\
      .next()

c3 = g.addV('Country')\
      .property(id, 3)\
      .property('name','United Kingdom') \
      .property('language', 'English') \
      .property('continent', 'Europe') \
      .property('population', 66000000)\
      .next()

p1 = g.addV('Person')\
      .property(id, 4)\
      .property('name','John') \
      .next()

p2 = g.addV('Person')\
      .property(id, 5)\
      .property('name','Harry') \
      .next()

p3 = g.addV('Person')\
      .property(id, 6)\
      .property('name','Anna') \
      .next()

// Create edges between vertices with properties

g.addE('LIVING_IN')\
 .from(p1)\
 .to(c1)\
 .property(id, 7)\
 .property('date_of_start', 2014)

g.addE('LIVING_IN')\
 .from(p2)\
 .to(c3)\
 .property(id, 8)\
 .property('date_of_start', 2013)

g.addE('LIVING_IN')\
 .from(p3)\
 .to(c1)\
 .property(id, 9)\
 .property('date_of_start', 2014)

g.addE('LIVING_IN')\
 .from(p3)\
 .to(c3)\
 .property(id, 10)\
 .property('date_of_start', 2014)

g.addE('WORKING_IN')\
 .from(p1)\
 .to(c2)\
 .property(id, 11)\
 .property('date_of_start', 2014)

g.addE('FRIENDS_WITH')\
 .from(p1)\
 .to(p2)\
 .property(id, 12)\
 .property('date_of_start', 2011)

g.addE('FRIENDS_WITH')\
 .from(p3)\
 .to(p1)\
 .property(id, 13)\
 .property('date_of_start', 2012)

g.addE('FRIENDS_WITH')\
 .from(p3)\
 .to(p2)\
 .property(id, 14)\
 .property('date_of_start', 2014)

// Get ALL edges and vertices

g.V()
g.E()
g.V().outE()
g.V().out()

// Partindo do vértice de destino em direção ao de origem

g.V()\
 .has('Country', 'name', 'United Kingdom')\
 .inE()\
 .has('date_of_start', 2014-01-01)\
 .outV()\
 .values('name')

// Partindo do vértice de origem

g.V()\
 .has('Person', 'name', 'John')\
 .outE()\
 .has('date_of_start', 2014-01-01)\
 .inV()\
 .values('name')

// Especificar o tipo de aresta e filtrar

g.V()\
 .as('n')\
 .hasLabel('Person')\
 .bothE('FRIENDS_WITH')\
 .dedup().by(__.path())\
 .as('k')\
 .has('date_of_start', gt(2010))\
 .otherV()\
 .as('f')\
 .select('n', 'f', 'k')\
 .project('n.name', 'f.name', 'since')\
 .by(__.select('n').choose(__.values('name'), __.values('name'), __.constant('  cypher.null')))\
 .by(__.select('f').choose(neq('  cypher.null'), __.choose(__.values('name'), __.values('name'), __.constant('  cypher.null'))))\
 .by(__.select('k').choose(neq('  cypher.null'), __.choose(__.values('date_of_start'), __.values('date_of_start'), __.constant('  cypher.null'))))

--

:> EXPLAIN MATCH (n:Person)-[k:FRIENDS_WITH]-(f) WHERE k.date_of_start > 2010 RETURN n.name, f.name, k.date_of_start AS since


// Especificar um sujeito em comum (n) e comparar as propriedades das arestas

g.V()\
 .as('n')\
 .hasLabel('Person')\
 .outE('FRIENDS_WITH')\
 .as('k1')\
 .inV()\
 .select('n')\
 .outE('FRIENDS_WITH')\
 .as('k2')\
 .inV()\
 .as('f2')\
 .where(__.select('k1').values('date_of_start').as('  GENERATED4').select('k2').values('date_of_start').where(gt('  GENERATED4')))\
 .select('n', 'f2', 'k2')\
 .project('n.name', 'f2.name', 'since')\
 .by(__.select('n').choose(__.values('name'), __.values('name'), __.constant('  cypher.null')))\
 .by(__.select('f2').choose(neq('  cypher.null'), __.choose(__.values('name'), __.values('name'), __.constant('  cypher.null'))))\
 .by(__.select('k2').choose(neq('  cypher.null'), __.choose(__.values('date_of_start'), __.values('date_of_start'), __.constant('  cypher.null'))))

// Especificar dois subgrafos não conectados e fazer a junção pelas propriedades das arestas

g.V()\
 .as('n1')\
 .hasLabel('Person')\
 .outE('FRIENDS_WITH')\
 .as('k1')\
 .inV()\
 .as('n3')\
 .hasLabel('Person')\
 .V()\
 .as('n2')\
 .hasLabel('Person')\
 .outE('FRIENDS_WITH')\
 .as('k2')\
 .inV()\
 .as('n4')\
 .hasLabel('Person')\
 .where(__.and(__.select('k1').values('date_of_start').as('  GENERATED3').select('k2').values('date_of_start')\
    .where(eq('  GENERATED3')), \
    __.select('n1').where(neq('n2'))))\

 .select('n1', 'n3', 'n2', 'n4', 'k2')\
 .project('n1.name', 'n3.name', 'n2.name', 'n4.name', 'since')\
 .by(__.select('n1').choose(__.values('name'), __.values('name'), __.constant('  cypher.null')))\
 .by(__.select('n3').choose(neq('  cypher.null'), __.choose(__.values('name'), __.values('name'), __.constant('  cypher.null'))))\
 .by(__.select('n2').choose(neq('  cypher.null'), __.choose(__.values('name'), __.values('name'), __.constant('  cypher.null'))))\
 .by(__.select('n4').choose(neq('  cypher.null'), __.choose(__.values('name'), __.values('name'), __.constant('  cypher.null'))))\
 .by(__.select('k2').choose(neq('  cypher.null'), __.choose(__.values('date_of_start'), __.values('date_of_start'), __.constant('  cypher.null'))))

--

:> EXPLAIN MATCH (n1:Person)-[k1:FRIENDS_WITH]->(n3:Person) MATCH (n2:Person)-[k2:FRIENDS_WITH]->(n4:Person) WHERE k2.date_of_start = k1.date_of_start and n1 <> n2 RETURN n1.name, n3.name, n2.name, n4.name, k2.date_of_start AS since

// Especificar dois subgrafos não conectados e fazer a junção pelas propriedades das arestas

g.V()\
 .as('n1')\
 .hasLabel('Person')\
 .outE('LIVING_IN')\
 .as('l1')\
 .inV()\
 .as('c1')\
 .hasLabel('Country')\
 .V()\
 .as('n2')\
 .hasLabel('Person')\
 .outE('LIVING_IN')\
 .as('l2')\
 .inV()\
 .as('c2')\
 .hasLabel('Country')\
 .where(__.and(__.select('l2').values('date_of_start').as('  GENERATED3').select('l1').values('date_of_start')\
   .where(eq('  GENERATED3')), \
 __.select('n1').where(neq('n2'))))\

 .select('n1', 'c1', 'n2', 'c2', 'l2')\
 .project('n1.name', 'c1.name', 'n2.name', 'c2.name', 'since')\
 .by(__.select('n1').choose(__.values('name'), __.values('name'), __.constant('  cypher.null')))\
 .by(__.select('c1').choose(neq('  cypher.null'), __.choose(__.values('name'), __.values('name'), __.constant('  cypher.null'))))\
 .by(__.select('n2').choose(neq('  cypher.null'), __.choose(__.values('name'), __.values('name'), __.constant('  cypher.null'))))\
 .by(__.select('c2').choose(neq('  cypher.null'), __.choose(__.values('name'), __.values('name'), __.constant('  cypher.null'))))\
 .by(__.select('l2').choose(neq('  cypher.null'), __.choose(__.values('date_of_start'), __.values('date_of_start'), __.constant('  cypher.null'))))

--

:> EXPLAIN MATCH (n1:Person)-[l1:LIVING_IN]->(c1:Country) MATCH (n2:Person)-[l2:LIVING_IN]->(c2:Country) WHERE l1.date_of_start = l2.date_of_start  and n1 <> n2 RETURN n1.name, c1.name, n2.name, c2.name, l2.date_of_start AS since


// Especificar um caminho de tamanho variável e filtrar pelas propriedades das arestas

-- Não converte Cypher com ALL

:> EXPLAIN WITH 2010 AS since MATCH rels= (p1:Person) - [:FRIENDS_WITH*1..2]->(p2:Person) WHERE ALL(k1 in relationships(rels) WHERE k1.date_of_start > since) RETURN rels;

:> EXPLAIN WITH 2010 AS since MATCH rels= (p1:Person) - [:FRIENDS_WITH*1..2]->(p2:Person) RETURN rels;

g.V()\
 .as('p1')\
 .hasLabel('Person')\
 .emit(__.loops()\
 .is(gte(1)))\
 .repeat(__.outE('FRIENDS_WITH')\
 .as('  UNNAMED44')\
 .has('date_of_start', gte(2010))\
 .aggregate('  cypher.path.edge.rels')\
 .inV())\
 .times(2)\

 .hasLabel('Person')\
 .path()\
 .from('p1')\
 .as('rels')\
 .optional(__.select(all, '  UNNAMED44')\
 .as('  UNNAMED44'))\
 .select('rels')\
 .project('rels')\
 .by(__.is(neq('  cypher.unused'))\
 .project('  cypher.relationship', '  cypher.element')\
 .by(__.select('  cypher.path.edge.rels')\
 .unfold().project('  cypher.id', '  cypher.inv', '  cypher.outv')\
 .by(__.id())\
 .by(__.inV().id())\
 .by(__.outV().id())\
 .fold())\
 .by(__.unfold().is(neq('  cypher.start'))\
 .valueMap().with('~tinkerpop.valueMap.tokens')\
 .fold()))


g.V()\
 .as('p1')\
 .hasLabel('Person')\
 .emit(__.loops()\
 .is(gte(1)))\
 .repeat(__.outE('FRIENDS_WITH')\
 .as('  UNNAMED44')\
 .has('date_of_start', lte(2013))\
 .aggregate('  cypher.path.edge.rels')\
 .inV())\
 .hasLabel('Person')\
 .path()\
 .from('p1')\
 .as('rels')\
 .optional(__.select(all, '  UNNAMED44')\
 .as('  UNNAMED44'))\
 .select('rels')\
 .project('rels')\
 .by(__.is(neq('  cypher.unused'))\
 .project('  cypher.relationship', '  cypher.element')\
 .by(__.select('  cypher.path.edge.rels')\
 .unfold().project('  cypher.id', '  cypher.inv', '  cypher.outv')\
 .by(__.id())\
 .by(__.inV().id())\
 .by(__.outV().id())\
 .fold())\
 .by(__.unfold().is(neq('  cypher.start'))\
 .valueMap().with('~tinkerpop.valueMap.tokens')\
 .fold()))

Tutorial no Youtube sobre Gremlin

https://youtu.be/DiN_aeW0NdY
https://youtu.be/LUSHQXVkDIc
https://youtu.be/etLaa_rPhms

Documentação

https://tinkerpop.apache.org/gremlin.html

Imperative & Declarative Traversals

A Gremlin traversal can be written in either an imperative (procedural) manner, a declarative (descriptive) manner, or in a hybrid manner containing both imperative and declarative aspects. An imperative Gremlin traversal tells the traversers how to proceed at each step in the traversal ...

A declarative Gremlin traversal does not tell the traversers the order in which to execute their walk, but instead, allows each traverser to select a pattern to execute from a collection of (potentially nested) patterns. The declarative traversal on the left yields the same result as the imperative traversal above. However, the declarative traversal has the added benefit that it leverages not only a compile-time query planner (like imperative traversals), but also a runtime query planner that chooses which traversal pattern to execute next based on the historic statistics of each pattern -- favoring those patterns which tend to reduce/filter the most data

The user can write their traversals in any way they choose. However, ultimately when their traversal is compiled, and depending on the underlying execution engine (i.e. an OLTP graph database or an OLAP graph processor), the user's traversal is rewritten by a set of traversal strategies which do their best to determine the most optimal execution plan based on an understanding of graph data access costs as well as the underlying data systems's unique capabilities (e.g. fetch the Gremlin vertex from the graph database's "name"-index). Gremlin has been designed to give users flexibility in how they express their queries and graph system providers flexibility in how to efficiently evaluate traversals against their TinkerPop-enabled data system.  


https://tinkerpop.apache.org/docs/current/reference/#tinkergraph-gremlin

TinkerGraph is a single machine, in-memory (with optional persistence), non-transactional graph engine that provides both OLTP and OLAP functionality. It is deployed with TinkerPop and serves as the reference implementation for other providers to study in order to understand the semantics of the various methods of the TinkerPop API. Its status as a reference implementation does not however imply that it is not suitable for production. TinkerGraph has many practical use cases in production applications and their development. Some examples of TinkerGraph use cases include:

TinkerPop requires every Element to have a single, immutable string label (i.e. a Vertex, Edge, and VertexProperty). In Neo4j, a Node (vertex) can have an arbitrary number of labels while a Relationship (edge) can have one and only one. Furthermore, in Neo4j, Node labels are mutable while Relationship labels are not. In order to handle this mismatch, three Neo4jVertex specific methods exist in Neo4j-Gremlin.

The difference between these languages in this case is that in Cypher we can use a Kleene star operator to find paths between any two given nodes in a graph database. In Gremlin however we will have to explicitly define all such paths. But we can use a repeat operator in Gremlin to find multiple occurrences of such explicit paths in a graph database. However, doing iterations over explicit structures in not possible in Cypher.

https://tinkerpop.apache.org/docs/current/reference/#vertex-properties

Vertex Properties

vertex properties TinkerPop introduces the concept of a VertexProperty<V>. All the properties of a Vertex are a VertexProperty. A VertexProperty implements Property and as such, it has a key/value pair. However, VertexProperty also implements Element and thus, can have a collection of key/value pairs. Moreover, while an Edge can only have one property of key "name" (for example), a Vertex can have multiple "name" properties. With the inclusion of vertex properties, two features are introduced which ultimately advance the graph modelers toolkit:

    Multiple properties (multi-properties): a vertex property key can have multiple values. For example, a vertex can have multiple "name" properties.

    Properties on properties (meta-properties): a vertex property can have properties (i.e. a vertex property can have key/value data associated with it).

Possible use cases for meta-properties:

    Permissions: Vertex properties can have key/value ACL-type permission information associated with them.

    Auditing: When a vertex property is manipulated, it can have key/value information attached to it saying who the creator, deletor, etc. are.

    Provenance: The "name" of a vertex can be declared by multiple users. For example, there may be multiple spellings of a name from different sources.


    A vertex can have zero or more properties with the same key associated with it.

    If a property is added with a cardinality of Cardinality.list, an additional property with the provided key will be added.

    A vertex property can have standard key/value properties attached to it.

    Vertex property removal is identical to property removal.

    Gets the meta-properties of each vertex property.

    A vertex property can have any number of key/value properties attached to it.

    property(…​) will remove all existing key’d properties before adding the new single property (see VertexProperty.Cardinality).

    If only the value of a property is needed, then values() can be used.


Comentários

Postagens mais visitadas deste blog

Connected Papers: Uma abordagem alternativa para revisão da literatura

Durante um projeto de pesquisa podemos encontrar um artigo que nos identificamos em termos de problema de pesquisa e também de solução. Então surge a vontade de saber como essa área de pesquisa se desenvolveu até chegar a esse ponto ou quais desdobramentos ocorreram a partir dessa solução proposta para identificar o estado da arte nesse tema. Podemos seguir duas abordagens:  realizar uma revisão sistemática usando palavras chaves que melhor caracterizam o tema em bibliotecas digitais de referência para encontrar artigos relacionados ou realizar snowballing ancorado nesse artigo que identificamos previamente, explorando os artigos citados (backward) ou os artigos que o citam (forward)  Mas a ferramenta Connected Papers propõe uma abordagem alternativa para essa busca. O problema inicial é dado um artigo de interesse, precisamos encontrar outros artigos relacionados de "certa forma". Find different methods and approaches to the same subject Track down the state of the art rese...

Knowledge Graph Embedding with Triple Context - Leitura de Abstract

  Jun Shi, Huan Gao, Guilin Qi, and Zhangquan Zhou. 2017. Knowledge Graph Embedding with Triple Context. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management (CIKM '17). Association for Computing Machinery, New York, NY, USA, 2299–2302. https://doi.org/10.1145/3132847.3133119 ABSTRACT Knowledge graph embedding, which aims to represent entities and relations in vector spaces, has shown outstanding performance on a few knowledge graph completion tasks. Most existing methods are based on the assumption that a knowledge graph is a set of separate triples, ignoring rich graph features, i.e., structural information in the graph. In this paper, we take advantages of structures in knowledge graphs, especially local structures around a triple, which we refer to as triple context. We then propose a Triple-Context-based knowledge Embedding model (TCE). For each triple, two kinds of structure information are considered as its context in the graph; one is the out...

KnOD 2021

Beyond Facts: Online Discourse and Knowledge Graphs A preface to the proceedings of the 1st International Workshop on Knowledge Graphs for Online Discourse Analysis (KnOD 2021, co-located with TheWebConf’21) https://ceur-ws.org/Vol-2877/preface.pdf https://knod2021.wordpress.com/   ABSTRACT Expressing opinions and interacting with others on the Web has led to the production of an abundance of online discourse data, such as claims and viewpoints on controversial topics, their sources and contexts . This data constitutes a valuable source of insights for studies into misinformation spread, bias reinforcement, echo chambers or political agenda setting. While knowledge graphs promise to provide the key to a Web of structured information, they are mainly focused on facts without keeping track of the diversity, connection or temporal evolution of online discourse data. As opposed to facts, claims are inherently more complex. Their interpretation strongly depends on the context and a vari...