Pular para o conteúdo principal

Large Language Models, Knowledge Graphs and Search Engines: A Crossroads for Answering Users’ Questions - Leitura de Artigo 2

Hogan, A., Dong, X.L., Vrandevci'c, D., & Weikum, G. (2025). Large Language Models, Knowledge Graphs and Search Engines: A Crossroads for Answering Users' Questions.

3 THE PERSPECTIVE OF INFORMATION-SEEKING USERS

Todas as categorias podem estar presentes em processos de tomada de decisão

Facts: Users seek objective, verifiable information that may be satisfied by a simple answer: a name, number, list, table, etc.

Algumas variam com o tempo, outras exploram várias arestas (caminhos).

Explanations: Users seek an explanation as to what something is, what caused it, what properties it has, how it works, etc., based principally on objective criteria.

Exploratory queries involve gaining initial understanding into an unfamiliar topic, or further understanding into a familiar one. In this case, the user may not know precisely what they are looking for, but will rather hope to recognize it when they see it.

Perguntas incompletas.

Planning: Users seek information to help them take a particular action or make a more informed decision. Such needs may involve a mix of objective and subjective criteria.

Recommendation queries seek general suggestions on a preferred course of action based on subjective and objective criteria .... KGs cannot directly address such questions, and have no inherent recommendation faculties, ....

Nem recomendações usando similaridade de nós e arestas?

Spatio-temporal queries seek information in a particular geographical or temporal setting, relating to, for example, events, places, times, etc. ... Many KGs contain rich geographic information that can be queried using geospatial operators, but they again may not be fully up-todate, and be only indirectly helpful for assessing subjective criteria.

Quando atualizado a informação será contextualizada corretamente no CoaKG

Advice: Users seek general counsel that may range from matters of self-improvement to the ethical or philosophical. Such questions are often more open-ended than those covered in the other categories, and may again blur the lines of the subjective and the objective

Não é o forte de KGs (semelhante as recomendações)

Questions may involve deixis (???), including expressions such as “here”, “yesterday”, “me”, etc. that must be resolved in the user’s context; as an example, an analytical query might ask “How many U.S. Congress Members are younger than me?". Such questions highlight challenges on how such xpressions can be resolved while maintaining the privacy of users.

Contexto explícito na consulta mas também pode ser implícito

Questions may involve ambiguity, as in the case of the factual question “Who is the mayor of Springfield?”, where there are a great many places called Springfield. Such ambiguity may require interaction with the user to resolve, independently of the technology used.

A ambiguidade da linguagem ao expressar uma consulta/pergunta

Questions may be the subject of controversy or varying opinions, such as the question “Would building more nuclear power plants help the environment?”. Such questions may require a nuanced way to respond that reflects potentially divergent viewpoints.

CoaKGs permitem a representação das visões controversas de modo explícito e contextualizado

4 RESEARCH DIRECTIONS

what combination of SEs, KGs and LLMs can go beyond the individual technologies in supporting more categories of information needs? The ideal solutions should provide Internet-scale coverage and freshness, give precise answers with user-friendly explanations of provenance, and support the entire spectrum from fact lookups to analytic queries and personalized advice, all with low computational cost.
The idea of combining LLMs with other technologies is not new. Most notably, KGs have been leveraged by SEs for entitycentric queries, and recent LLMs are coupled with SE techniques for RAG.

4.1 Augmenting Language Models

KG for LLM: Curated Knowledge. Per the old idiom, “just because it’s said doesn’t make it true”, much of the text on which LLMs are trained is not factual in nature. KGs as a curated source of structured knowledge can thus enhance LLMs in a variety of ways [31, 53], particularly for addressing complex (e.g., multi-hop or aggregation based) factual questions. This can involve pre-training and finetuning enhancements, or inference-time injection of factual knowledge.

GraphRAG. 

Considerando que KG refletem fatos e não alegações, que devem estar contextualizadas. Mas se o KG for construído colaborativamente ou automaticamente usando fontes não necessariamente confiáveis. 

4.2 Augmenting Search Engines

LLM for SE: AI-Assisted Search. LLMs have the capability to enrich SE functionality on both the user-input side and the way results are presented [60]. On the input side, LLMs provide users with natural-language dialogue that humans appreciate, particularly for explanations, planning and advice. Queries are seamlessly derived from the user’s utterances, based on the LLM’s skills in language

Perguntar em linguagem natural e responder da mesma forma seria mais orgânico para os humanos

KG for SE: Semantic Search. Major search engines have deployed semantic search functionalities by means of their back-end KGs, which allow for answering entity-centric queries with KG excerpts, sometimes called “knowledge panels” or “knowledge cards”. 

Busca do Google usando Wikidata

4.3 Augmenting Knowledge Graphs

LLM for KG: Knowledge Generation. The provocatively titled work “Language Models as Knowledge Bases?” [34] initiated a wave of research to investigate if and to what extent an LLM can generate
a full-fledged KG, based on advanced prompt engineering and fewshot in-context learning, and sometimes even with supervised finetuning. Such techniques typically take as input a subject–predicate
pair, such as Venezuela–capital, plus further prompting and context, and generate one or more objects that complete the desired fact.

Tarefa de geração de triplas. Mas e se precisar adiconar o contexto? 

There is then the danger that the hallucinations of LLMs could lead to polluting KGs, dragging down their correctness, and the quality of responses to factual queries (in particular). Rather than trying to re-create KGs of this scale and quality, an interesting direction would be to identify gaps in the KG – in terms of missing entities and facts about entities – and use LLMs to strategically fill such gaps. In particular, less prominent entities in the long tail would call for extending KGs, but studies so far show that the LLM ability to generate facts rapidly degrades for long-tail entities that appear more rarely in the training corpus.

Introduzir as alucinações no KG mas a geração automática de triplas basedas em corpus de texto também pode fazer o mesmo

For this reason, KGs in practice often adorn the core graph abstraction with additional elements to capture such nuances; for example, Wikidata uses qualifiers, ranks and references to add additional context, priority and provenance to statements forming edges in the KG....

Modelos como o hiper relacional ou o multi layer graph

SE for KG: Knowledge Refinement. SEs can be used to refine the knowledge of KGs in various ways relating to updates, verification, provenance, negation, etc. ... SEs are also vital for checking and validating or invalidating statements in a KG by finding supportive or contradictory evidence, boosting correctness.

Verificação de alegações, atenderia a Camada de Confiança

As KGs follow the principle of the Open World Assumption, statements that are not in KG can either be false or merely missing. As an example, a KG may state that Manuel Blum won the Turing Award but not list him among ACM Fellows. However, we do not know whether the KG has a complete list of ACM Fellows – such that Blum is not a Fellow – or Blum is just missing from what is an incomplete list.

Combinar duas fontes de Mundo Aberto mas ambas deveria serm consideradas DOWA

4.4 KG + LLM + SE

Augmentation→Ensemble→Federation→Amalgamation. To combine all three technologies, we envision research following a natural progression through four phases.

A combinação das 3 opções. 

Moving away from augmenting a main technology, one can consider an ensemble approach, where KGs, LLMs and SEs are peers in the information infrastructure, and a user query is delegated to the technology best adapted to address the particular type of information need. Such an ensemble would likely have a natural language interface powered by an LLM, but underneath it could call KGs, LLMs or SEs.

LLMs por serem modelos de linguagem deveriam ser usados somente em interefaces e não como fontes de informação em tarefas de tomada de decisão

Another idea would be to explore “dual neural knowledge” whereby LLMs and KGs are applied for a dual encoding of popular entities, whereas KGs and SEs are used for long-tail information. Whatever the particular research direction, in this phase, the lines between SEs, KGs and LLMs become increasingly blurred, leading to an amalgam technology that aims to surpass the sum of its parts.

Direcionar as tarefas com base na sua distrivuiçõ da cauda longa. 

5 CONCLUSIONS

Regarding users’ information needs, KGs excel on complex factual queries, but do not cope well with non-factual categories. SEs provide support for factual and non-factual categories, but are effective only on simple queries and are inconvenient for questions whose answers do not lie in a single document; similar limitations arise also when structured data needs to be aggregated. LLMs also partially cover both factual and non-factual needs, but are prone to hallucinations and bias, have no formal operators for analytical queries, and can be trapped by false premises in questions.

As limitações de cada abordagem precisam ser claramente conhecidas para que sejam combinadas


Comentários

Postagens mais visitadas deste blog

Connected Papers: Uma abordagem alternativa para revisão da literatura

Durante um projeto de pesquisa podemos encontrar um artigo que nos identificamos em termos de problema de pesquisa e também de solução. Então surge a vontade de saber como essa área de pesquisa se desenvolveu até chegar a esse ponto ou quais desdobramentos ocorreram a partir dessa solução proposta para identificar o estado da arte nesse tema. Podemos seguir duas abordagens:  realizar uma revisão sistemática usando palavras chaves que melhor caracterizam o tema em bibliotecas digitais de referência para encontrar artigos relacionados ou realizar snowballing ancorado nesse artigo que identificamos previamente, explorando os artigos citados (backward) ou os artigos que o citam (forward)  Mas a ferramenta Connected Papers propõe uma abordagem alternativa para essa busca. O problema inicial é dado um artigo de interesse, precisamos encontrar outros artigos relacionados de "certa forma". Find different methods and approaches to the same subject Track down the state of the art rese...

Aula 12: WordNet | Introdução à Linguagem de Programação Python *** com NLTK

 Fonte -> https://youtu.be/0OCq31jQ9E4 A WordNet do Brasil -> http://www.nilc.icmc.usp.br/wordnetbr/ NLTK  synsets = dada uma palavra acha todos os significados, pode informar a língua e a classe gramatical da palavra (substantivo, verbo, advérbio) from nltk.corpus import wordnet as wn wordnet.synset(xxxxxx).definition() = descrição do significado É possível extrair hipernimia, hiponimia, antonimos e os lemas (diferentes palavras/expressões com o mesmo significado) formando uma REDE LEXICAL. Com isso é possível calcular a distância entre 2 synset dentro do grafo.  Veja trecho de código abaixo: texto = 'útil' print('NOUN:', wordnet.synsets(texto, lang='por', pos=wordnet.NOUN)) texto = 'útil' print('ADJ:', wordnet.synsets(texto, lang='por', pos=wordnet.ADJ)) print(wordnet.synset('handy.s.01').definition()) texto = 'computador' for synset in wn.synsets(texto, lang='por', pos=wn.NOUN):     print('DEF:',s...

DGL-KE : Deep Graph Library (DGL)

Fonte: https://towardsdatascience.com/introduction-to-knowledge-graph-embedding-with-dgl-ke-77ace6fb60ef Amazon recently launched DGL-KE, a software package that simplifies this process with simple command-line scripts. With DGL-KE , users can generate embeddings for very large graphs 2–5x faster than competing techniques. DGL-KE provides users the flexibility to select models used to generate embeddings and optimize performance by configuring hardware, data sampling parameters, and the loss function. To use this package effectively, however, it is important to understand how embeddings work and the optimizations available to compute them. This two-part blog series is designed to provide this information and get you ready to start taking advantage of DGL-KE . Finally, another class of graphs that is especially important for knowledge graphs are multigraphs . These are graphs that can have multiple (directed) edges between the same pair of nodes and can also contain loops. The...