Pular para o conteúdo principal

ADBIS 2023 - No Intelligence Without Knowledge

Keynote on Youtube -> https://youtu.be/DZ6NlcW4YV8?si=4Z5zDA1Vx_D10GKz

No Intelligence Without Knowledge
Katja Hose
TU Wien, Austria

Abstract. Knowledge graphs and graph data in general are becoming more and more essential components of intelligent systems. This does not only include native graph data, such as social networks or Linked Data on the Web. The flexibility of the graph model and its ability to store data relationships explicitly enables the integration and exploitation of data from very diverse sources. However, to truly exploit their potential, it becomes crucial to provide intelligent systems with verifiable knowledge, reliable facts, patterns, and a deeper understanding of the underlying domains. This talk will therefore chart a number of challenges for exploiting graphs to manage and bring meaning to large amounts of heterogeneous data and discuss opportunities with, without, and for artificial intelligence emerging from research situated at the confluence of data management, knowledge engineering, and machine learning 

Knowledge Engineering in the Era of Artificial Intelligence
Katja Hose(B)
TU Wien, Vienna, Austria
katja.hose@tuwien.ac.at

Abstract. Knowledge engineering with respect to knowledge graphs and graph data in general is becoming a more and more essential component of intelligent systems. Such systems benefit from the wealth of structured knowledge, which does not only include native graph data, such as social networks or Linked Data on the Web, but also general knowledge describing particular topics of interest. Furthermore, the flexibility of the graph model and its ability to store data relationships explicitly enables the integration and exploitation of data from very diverse sources. Hence, to truly exploit their potential, it becomes crucial to provide intelligent systems with verifiable knowledge, reliable facts, patterns, and a deeper understanding of the underlying domains. This paper will therefore chart a number of current challenges in knowledge engineering and discuss opportunities.

1 Introduction

Most recently, large language models, and in particular ChatGPT, have gained a lot of attention. Obviously, it is very appealing to simply formulate questions in natural language and receive elaborate and detailed replies that explain an extremely broad range of complex topics. While this system seems to be intelligent, it suffers from a similar problem as other large language models and machine learning approaches in general: the answer it returns is the most probable answer, it cannot be certain about its correctness. In the context of ChatGPT the latter is commonly referred to as hallucinations [11], i.e., the
answer does not necessarily reflect reality but can be “made up”.

[LLMs chutam respostas plausíveis]

2 Modeling and Storing Knowledge

But there are also interoperability issues between the different graph models, query languages, and standards that hamper efficient use of graph data. 

[Mas também questões de expressividade do modelo]

When converting all data into an integrated knowledge graph directly, it can be queried in a single system – not only with standard queries. There are some works [23,51] on setting up semantic data warehouses incl. spatio-temporal extensions.

[Contexto de Localização e Temporal é importante em KG pq o conhecimento muda]

An interesting observation here is that publishing data and making it available in this way is very easy as the publishers do not need to conform to a common integrated schema. However, this comes at the
expense of query formulation and optimization, which then is considerably more complex. To formulate a query, users themselves have to know how the information in the different sources are connected – whereas this would typically be done when defining a common schema or table in a traditional relational database scenario.

[Schema on read ... tratar as inconsistências e incompletudes]

3 Querying Knowledge

The way in which knowledge is queried very much depends on the chosen data model and the way the data is physically stored.

However, many users are not familiar with the details, content, and schema of a knowledge graph and therefore have difficulties formulating structured queries. 

[Nem sempre o KG tem schema conhecido ou um schema único]

To help such users, the literature has proposed exploratory query techniques and the query-by-example paradigm [44,45]. In this case, users do not formulate structured queries directly but provide the system with examples of potential answers – the system then tries to reverse engineer a query from the desired output, executes it, and presents the results to the user who can then iteratively refine the query until the information need is met. This is even possible for complex setups incl. analytical queries over statistical knowledge graphs [43]. 

Exploratory techniques for knowledge graphs cover a broad range of methods that include data profiling [1] as well as skyline queries [39].

Keles, I., Hose, K.: Skyline queries over knowledge graphs. In: Ghidini, C., et al. (eds.) ISWC 2019. LNCS, vol. 11778, pp. 293–310. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-30793-6

[Skyline query seria o mesmo que star-query?]

Assuming that the user was able to formulate a structured query that expresses the information need ....

The OneGraph vision [41], for instance, sketches a scenario where the data model no longer determines the query languages and would allow formulating Cypher queries over an RDF store.

[Modelo do Amazon Neptune 1G -> https://versant-pesquisadedoutorado.blogspot.com/2022/01/graph-yes-which-one-help-leitura-de.html]

4 Knowledge Quality and Metadata

Nevertheless, while OWL and RDFS have been developed for capturing the meaning of data by defining proper classes, hierarchies, and constraints, SHACL has been proposed more recently as a standard to define constraints on the structure of knowledge graphs – without the need to define a proper full-fledged ontology and capture the meaning of the data. SHACL allows to define graph patterns, referred to as shapes, along with constraints that subgraphs matching the patterns/shapes should fulfill. While SHACL is becoming more and more adopted by the community, it still remains a challenge to avoid having to define shapes manually [56] but instead being offered semi-automatic solutions for cre-
ating them given a knowledge graph as input.

[SHACL é usado em tempo de inserir/atualizar informações ou de verificar/validar o conteúdo do KG. Como SHACL poderia ser usado em tempo de consulta? Acionar o reasoning para completar as respostas?]

While mining shapes from large knowledge graphs meets scalability issues, it is also important to mine meaningful shapes [57] and avoid spurious ones, i.e., those that do not occur frequently or are fulfilled by only a small proportion of matching subgraphs. Once determined, such shapes can not only be used to create validation reports but they can also be used in a more interactive fashion in a similar way as mined association rules [20], e.g., to help experts find outliers and erroneous information so that the data can be corrected and the quality can be improved [58]

[Mineração de Regras e Padrões em KG Profiling -> https://versant-pesquisadedoutorado.blogspot.com/2023/02/rule-mining-with-amie-trabalho.html]

Another way of improving quality and trust in knowledge is to provide metadata. While metadata in property graphs can be expressed by adding attributes to nodes and edges, this is not straightforward for knowledge graphs. The latter require special constructs, such as reification, singleton properties [52], named graphs [13], or RDF-star. 

While reification leads to a large increase in the number of triples (because subject, predicate, and object of the original triple are separated into their own triples), singleton properties (instantiating a unique subproperty for each triple with metadata) and named graph solutions (in the worst case creating a separate named graphs for each single triple) typically also suffer from scalability issues and require verbose query constructs since existing engines are not designed to efficiently support such use cases. 

[Dificuldades em representar contexto com reificação]

On the other hand, RDF-star is proposing to nest triples, i.e., to use a complete triple on subject or object position of another triple. While this is very elegant from a modeling perspective, it poses several challenges on data organization and querying since nesting has not yet been a typical requirement. Still, many triple stores do already support RDF-star so that it can already be used in practice.

Provenance, in the sense of explaining the origin of data, is an important kind of metadata. In this sense it is often desired to capture information about who created the data, how and when it was obtained, how it was processed, etc. In RDF, such workflow provenance [19,24] can for instance be encoded using the PROV-O ontology, which offers several classes with well defined meaning for this purpose. Another type of provenance, how-provenance [21,28], describes how an answer to a particular query was derived from a given input dataset. This approach allows to directly trace down the input tuples/triples/edges that were combined to derive a particular answer to a query – in addition, how-provenance also returns a polynomial describing how these tuples/triples/edges have been combined for a given query answer. In general, all flavors of provenance help explain answers to structured queries and in doing so increase the trust users can have in a system. To the best of our knowledge, however, there is currently no system for knowledge graphs combining workflow provenance with how-provenance.

[Confiança estaria diretamente ligada a Proveniência e a capacidade de explicar a geração dos resultados]

Incompletude -> https://youtu.be/DZ6NlcW4YV8?si=mpJoAroHpvEBjTOq&t=2377

Comentários

Postagens mais visitadas deste blog

Connected Papers: Uma abordagem alternativa para revisão da literatura

Durante um projeto de pesquisa podemos encontrar um artigo que nos identificamos em termos de problema de pesquisa e também de solução. Então surge a vontade de saber como essa área de pesquisa se desenvolveu até chegar a esse ponto ou quais desdobramentos ocorreram a partir dessa solução proposta para identificar o estado da arte nesse tema. Podemos seguir duas abordagens:  realizar uma revisão sistemática usando palavras chaves que melhor caracterizam o tema em bibliotecas digitais de referência para encontrar artigos relacionados ou realizar snowballing ancorado nesse artigo que identificamos previamente, explorando os artigos citados (backward) ou os artigos que o citam (forward)  Mas a ferramenta Connected Papers propõe uma abordagem alternativa para essa busca. O problema inicial é dado um artigo de interesse, precisamos encontrar outros artigos relacionados de "certa forma". Find different methods and approaches to the same subject Track down the state of the art rese...

Knowledge Graph Toolkit (KGTK)

https://kgtk.readthedocs.io/en/latest/ KGTK represents KGs using TSV files with 4 columns labeled id, node1, label and node2. The id column is a symbol representing an identifier of an edge, corresponding to the orange circles in the diagram above. node1 represents the source of the edge, node2 represents the destination of the edge, and label represents the relation between node1 and node2. >> Quad do RDF, definir cada tripla como um grafo   KGTK defines knowledge graphs (or more generally any attributed graph or hypergraph ) as a set of nodes and a set of edges between those nodes. KGTK represents everything of meaning via an edge. Edges themselves can be attributed by having edges asserted about them, thus, KGTK can in fact represent arbitrary hypergraphs. KGTK intentionally does not distinguish attributes or qualifiers on nodes and edges from full-fledged edges, tools operating on KGTK graphs can instead interpret edges differently if they so desire. In KGTK, e...

Vague Queries

VAGUE QUERIES https://youtu.be/7tmqQ-y-hNQ Consultas vagas: Consultas que permitem resultados aproximados ao que se busca  https://dl.acm.org/doi/pdf/10.1145/45945.48027 Utiliza métricas de distância e de similaridade para o resultado 1988 Requisitos Simplicidade Conceitual Adaptabilidade Externalidade ao SGBD Estende o modelo relacional com um único conceito: métrica de similaridade na linguagem de consulta, é um novo comparador Usuário escolhe qual é a métrica de similaridade / distância Externalidade para posterior incorporação (como o Daniel comentou que é usual em BD) Interativo: pergunta ao usuário qual é a interpretação de similar, qual o critério de ordenação do resultado, se o usuário deseja flexibilizar mais a consulta (em caso não houver resultado) Não é linguagem natural, é linguagem do BD (SQL) estendida Extraído do Texto A specific query establishes a rigid qualification and is concerned only with data that match it precisely. A vague query establishes a target qualif...