PostgreSQL, Oracle ... graph query language standards adoption begins
Link -> https://www.linkedin.com/pulse/postgresql-oracle-graph-query-language-standards-adoption-green
Abril/2020
SQL/PGQ is planned as Part 16 of the SQL standard, and is likely to be adopted as a final ISO/IEC international standard in 2021. It describes a language for read-only graph queries, operating over schema-defined property "graph views", which are declared by mapping SQL tables to graphs using DDL.
However, in a wider perspective the PGQ query language is also seen a subset of the emergent CRUD graph query language GQL.
Both PGQ and GQL are developed by the same ISO/IEC joint (JTC1) SC32/WG3 committee that has developed the SQL language over the past thirty-plus years.
SQL/PGQ's queries are based on the path pattern matching syntax and semantics of Cypher.
The shared appetite to leverage prior work on conjunctive regular path queries has also shown up in recent additions for edge patterns in TigerGraph's GSQL language. TigerGraph Inc. (whose Chief Scientist Alin Deutsch is a noted researcher in the database field) are also actively contributing their learnings, including in a recent consensus paper on path syntax co-authored with Oracle and Neo4j experts for the SQL/PGQ query language.
However, it seems that allowing a SQL-like closed schema is the first critical step, and the LDBC Property Graph Schema working group is focussing its efforts on proposing solutions for that problem.
This community working group is considering three main aspects: the model for property values, the topological structure of the graph, and the definition of key and cardinality constraints. All of these investigations are being measured against the yardstick of the extended Entity Relationship Model, to ensure that a proposed schema or graph typing system will work well with prevalent techniques of conceptual data modelling. The fact that an ERM looks a lot like a property graph is a very important advantage of the graph data model.
The data model sub-group has focussed on two related issues: the nature of the data that can be attached as a property value, and the problem of "metaproperties" or annotations which convey information like the provenance or source of a property value. There is a consensus that property values should not be graph elements like nodes or edges (or graphs): the property graph model has become popular because it divides graph topology from the attribution associated with elements.
Schema: metaproperties
Back to property values: let's assume that we have a nested record structure, with collections. What about meta-properties? How do we annotate a value with some comment or qualification? These are important requirements, particularly for knowledge graphs, as Bei Li from Google has stressed, alongside others, in the LDBC schema discussions. Wikidata qualifiers are a great example of this requirement in practice.
Annotation can be achieved by allowing a property to be attached to a property (in the manner of Tinkerpop.) Josh Shinavier at Uber, and co-author of an important paper on Algebraic Property Graphs (APG), is part of the LDBC schema working group, and is also working on Tinkerpop 4, so we've been able to get some very interesting insights into the way in which metaproperties were conceived and implemented in that world (where properties are considered first-class graph elements, like edges and nodes).
However, APG's current design does not seem to allow properties to be attached to the members of a collection of properties.
I have proposed an approach to introducing metaproperties into the nested record model that is based on a generalization of XML's idea of "mixed content", and can be seen in the data structures of existing OSS tree-data libraries for e.g. C++ and C#.
This "knowledge tree" structure differs from the models of JSON or XML, because it allows any node (including an inner node of the tree) to have a value, as well as children. A node may not have children, and only have a value (like a leaf-node in JSON), or it may have children and no value (like an inner node in JSON), or it may have both (like a mixed content node in an XML document tree, although mixed content "text children" are limited by data type and cannot be subtrees themselves).
If a node has both a value and children, then a child node can be seen as annotation on the value of the parent node. In this world (like in every lockdown family), children most certainly get to comment on their parents.
Another way of looking at this is that every value has an annotation, which is a record. A record is a set of attribute values (name, type, value), and it may be empty. So, in some business domains annotation records may be empty 99.9% of the time, and in a knowledge base they may be ubiquitous (for example, every fact must have a source), but the data model allows both cases to handled. Any field in the annotation record can have a value which also has an annotation, so we can qualify a source with our confidence in the source, etc. Note that this model allows the elements of a collection, as well as a collection itself, to be independently annotated.
One of the drivers for the explicit modelling of metaproperties on top of nested records is the desire to avoid changing the meaning of paths which identify nodes or subtrees within a tree.
The simple dotted notation (with index and key subscripts) which would allow us to talk about myNode.name or person.email[3] could easily be extended to handle nested records and collections: person.coordinatates["email'].address[3]. But in such paths, it is expected that the value of a leaf node is simply the path to the node. So we would expect to see such a path evaluate to something like "alastair@acm.org".
If there are children, then path languages would normally return the subtree levels 1 and deeper for an inner node. However it is achieved, and there are syntactic options, we want to allow a value to be returned for a path expression, but to also allow children (annotations) to be returned. This would suggest something like person.coordinates.email.address[3].since, allowing a path to "step past" the value itself, to return the value of a child annotation, in this case perhaps a date like 1992. But if we wanted to specify the subtree of an inner node then we would need a distinguished syntax, like person.coordinates.email.address[3].since., where the final period indicates "ignore the value, only give me the subtree, the children and their descendants". No conclusions have been drawn in discussions to date on syntactic issues like this.
First GQL research implementation from Olof Morra at TU Eindhoven!
Link -> https://www.linkedin.com/pulse/first-gql-research-implementation-from-olof-morra-tu-eindhoven-green?trk=pulse-article_more-articles_related-content-card
Setembro/2021
You can find out all about Olof's work on his ANTLR-based parser at his
Github project: https://github.com/OlofMorra/GQL-parser.
Comentários
Postar um comentário
Sinta-se a vontade para comentar. Críticas construtivas são sempre bem vindas.