Pular para o conteúdo principal

GQL by Alastair Green @ LinkedIn

PostgreSQL, Oracle ... graph query language standards adoption begins

Link -> https://www.linkedin.com/pulse/postgresql-oracle-graph-query-language-standards-adoption-green

Abril/2020

SQL/PGQ is planned as Part 16 of the SQL standard, and is likely to be adopted as a final ISO/IEC international standard in 2021. It describes a language for read-only graph queries, operating over schema-defined property "graph views", which are declared by mapping SQL tables to graphs using DDL.

However, in a wider perspective the PGQ query language is also seen a subset of the emergent CRUD graph query language GQL.

Both PGQ and GQL are developed by the same ISO/IEC joint (JTC1) SC32/WG3 committee that has developed the SQL language over the past thirty-plus years.

SQL/PGQ's queries are based on the path pattern matching syntax and semantics of Cypher.  

The shared appetite to leverage prior work on conjunctive regular path queries has also shown up in recent additions for edge patterns in TigerGraph's GSQL language. TigerGraph Inc. (whose Chief Scientist Alin Deutsch is a noted researcher in the database field) are also actively contributing their learnings, including in a recent consensus paper on path syntax co-authored with Oracle and Neo4j experts for the SQL/PGQ query language.

However, it seems that allowing a SQL-like closed schema is the first critical step, and the LDBC Property Graph Schema working group is focussing its efforts on proposing solutions for that problem.

This community working group is considering three main aspects: the model for property values, the topological structure of the graph, and the definition of key and cardinality constraints. All of these investigations are being measured against the yardstick of the extended Entity Relationship Model, to ensure that a proposed schema or graph typing system will work well with prevalent techniques of conceptual data modelling. The fact that an ERM looks a lot like a property graph is a very important advantage of the graph data model.

The data model sub-group has focussed on two related issues: the nature of the data that can be attached as a property value, and the problem of "metaproperties" or annotations which convey information like the provenance or source of a property value. There is a consensus that property values should not be graph elements like nodes or edges (or graphs): the property graph model has become popular because it divides graph topology from the attribution associated with elements. 

Schema: metaproperties

Back to property values: let's assume that we have a nested record structure, with collections. What about meta-properties? How do we annotate a value with some comment or qualification? These are important requirements, particularly for knowledge graphs, as Bei Li from Google has stressed, alongside others, in the LDBC schema discussions. Wikidata qualifiers are a great example of this requirement in practice.

Annotation can be achieved by allowing a property to be attached to a property (in the manner of Tinkerpop.) Josh Shinavier at Uber, and co-author of an important paper on Algebraic Property Graphs (APG), is part of the LDBC schema working group, and is also working on Tinkerpop 4, so we've been able to get some very interesting insights into the way in which metaproperties were conceived and implemented in that world (where properties are considered first-class graph elements, like edges and nodes).

However, APG's current design does not seem to allow properties to be attached to the members of a collection of properties.

I have proposed an approach to introducing metaproperties into the nested record model that is based on a generalization of XML's idea of "mixed content", and can be seen in the data structures of existing OSS tree-data libraries for e.g. C++ and C#.

This "knowledge tree" structure differs from the models of JSON or XML, because it allows any node (including an inner node of the tree) to have a value, as well as children. A node may not have children, and only have a value (like a leaf-node in JSON), or it may have children and no value (like an inner node in JSON), or it may have both (like a mixed content node in an XML document tree, although mixed content "text children" are limited by data type and cannot be subtrees themselves).

If a node has both a value and children, then a child node can be seen as annotation on the value of the parent node. In this world (like in every lockdown family), children most certainly get to comment on their parents.

Another way of looking at this is that every value has an annotation, which is a record. A record is a set of attribute values (name, type, value), and it may be empty. So, in some business domains annotation records may be empty 99.9% of the time, and in a knowledge base they may be ubiquitous (for example, every fact must have a source), but the data model allows both cases to handled. Any field in the annotation record can have a value which also has an annotation, so we can qualify a source with our confidence in the source, etc. Note that this model allows the elements of a collection, as well as a collection itself, to be independently annotated.

One of the drivers for the explicit modelling of metaproperties on top of nested records is the desire to avoid changing the meaning of paths which identify nodes or subtrees within a tree.

The simple dotted notation (with index and key subscripts) which would allow us to talk about myNode.name or person.email[3] could easily be extended to handle nested records and collections: person.coordinatates["email'].address[3]. But in such paths, it is expected that the value of a leaf node is simply the path to the node. So we would expect to see such a path evaluate to something like "alastair@acm.org".

If there are children, then path languages would normally return the subtree levels 1 and deeper for an inner node. However it is achieved, and there are syntactic options, we want to allow a value to be returned for a path expression, but to also allow children (annotations) to be returned. This would suggest something like person.coordinates.email.address[3].since, allowing a path to "step past" the value itself, to return the value of a child annotation, in this case perhaps a date like 1992. But if we wanted to specify the subtree of an inner node then we would need a distinguished syntax, like person.coordinates.email.address[3].since., where the final period indicates "ignore the value, only give me the subtree, the children and their descendants". No conclusions have been drawn in discussions to date on syntactic issues like this.

First GQL research implementation from Olof Morra at TU Eindhoven!

Link -> https://www.linkedin.com/pulse/first-gql-research-implementation-from-olof-morra-tu-eindhoven-green?trk=pulse-article_more-articles_related-content-card

Setembro/2021

You can find out all about Olof's work on his ANTLR-based parser at his Github project: https://github.com/OlofMorra/GQL-parser. 

 

 

Comentários

Postagens mais visitadas deste blog

Connected Papers: Uma abordagem alternativa para revisão da literatura

Durante um projeto de pesquisa podemos encontrar um artigo que nos identificamos em termos de problema de pesquisa e também de solução. Então surge a vontade de saber como essa área de pesquisa se desenvolveu até chegar a esse ponto ou quais desdobramentos ocorreram a partir dessa solução proposta para identificar o estado da arte nesse tema. Podemos seguir duas abordagens:  realizar uma revisão sistemática usando palavras chaves que melhor caracterizam o tema em bibliotecas digitais de referência para encontrar artigos relacionados ou realizar snowballing ancorado nesse artigo que identificamos previamente, explorando os artigos citados (backward) ou os artigos que o citam (forward)  Mas a ferramenta Connected Papers propõe uma abordagem alternativa para essa busca. O problema inicial é dado um artigo de interesse, precisamos encontrar outros artigos relacionados de "certa forma". Find different methods and approaches to the same subject Track down the state of the art rese...

Knowledge Graph Embedding with Triple Context - Leitura de Abstract

  Jun Shi, Huan Gao, Guilin Qi, and Zhangquan Zhou. 2017. Knowledge Graph Embedding with Triple Context. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management (CIKM '17). Association for Computing Machinery, New York, NY, USA, 2299–2302. https://doi.org/10.1145/3132847.3133119 ABSTRACT Knowledge graph embedding, which aims to represent entities and relations in vector spaces, has shown outstanding performance on a few knowledge graph completion tasks. Most existing methods are based on the assumption that a knowledge graph is a set of separate triples, ignoring rich graph features, i.e., structural information in the graph. In this paper, we take advantages of structures in knowledge graphs, especially local structures around a triple, which we refer to as triple context. We then propose a Triple-Context-based knowledge Embedding model (TCE). For each triple, two kinds of structure information are considered as its context in the graph; one is the out...

KnOD 2021

Beyond Facts: Online Discourse and Knowledge Graphs A preface to the proceedings of the 1st International Workshop on Knowledge Graphs for Online Discourse Analysis (KnOD 2021, co-located with TheWebConf’21) https://ceur-ws.org/Vol-2877/preface.pdf https://knod2021.wordpress.com/   ABSTRACT Expressing opinions and interacting with others on the Web has led to the production of an abundance of online discourse data, such as claims and viewpoints on controversial topics, their sources and contexts . This data constitutes a valuable source of insights for studies into misinformation spread, bias reinforcement, echo chambers or political agenda setting. While knowledge graphs promise to provide the key to a Web of structured information, they are mainly focused on facts without keeping track of the diversity, connection or temporal evolution of online discourse data. As opposed to facts, claims are inherently more complex. Their interpretation strongly depends on the context and a vari...