Large Language Models, Knowledge Graphs and Search Engines: A Crossroads for Answering Users’ Questions - Leitura de Artigo 2
Hogan, A., Dong, X.L., Vrandevci'c, D., & Weikum, G. (2025). Large Language Models, Knowledge Graphs and Search Engines: A Crossroads for Answering Users' Questions.
3 THE PERSPECTIVE OF INFORMATION-SEEKING USERS
Todas as categorias podem estar presentes em processos de tomada de decisão
Facts: Users seek objective, verifiable information that may be satisfied by a simple answer: a name, number, list, table, etc.
Algumas variam com o tempo, outras exploram várias arestas (caminhos).
Explanations: Users seek an explanation as to what something is, what caused it, what properties it has, how it works, etc., based principally on objective criteria.
Exploratory queries involve gaining initial understanding into an unfamiliar topic, or further understanding into a familiar one. In this case, the user may not know precisely what they are looking for, but will rather hope to recognize it when they see it.
Perguntas incompletas.
Planning: Users seek information to help them take a particular action or make a more informed decision. Such needs may involve a mix of objective and subjective criteria.
Recommendation queries seek general suggestions on a preferred course of action based on subjective and objective criteria .... KGs cannot directly address such questions, and have no inherent recommendation faculties, ....
Nem recomendações usando similaridade de nós e arestas?
Spatio-temporal queries seek information in a particular geographical or temporal setting, relating to, for example, events, places, times, etc. ... Many KGs contain rich geographic information that can be queried using geospatial operators, but they again may not be fully up-todate, and be only indirectly helpful for assessing subjective criteria.
Quando atualizado a informação será contextualizada corretamente no CoaKG
Advice: Users seek general counsel that may range from matters of self-improvement to the ethical or philosophical. Such questions are often more open-ended than those covered in the other categories, and may again blur the lines of the subjective and the objective
Não é o forte de KGs (semelhante as recomendações)
Questions may involve deixis (???), including expressions such as “here”, “yesterday”, “me”, etc. that must be resolved in the user’s context; as an example, an analytical query might ask “How many U.S. Congress Members are younger than me?". Such questions highlight challenges on how such xpressions can be resolved while maintaining the privacy of users.
Contexto explícito na consulta mas também pode ser implícito
Questions may involve ambiguity, as in the case of the factual question “Who is the mayor of Springfield?”, where there are a great many places called Springfield. Such ambiguity may require interaction with the user to resolve, independently of the technology used.
A ambiguidade da linguagem ao expressar uma consulta/pergunta
Questions may be the subject of controversy or varying opinions, such as the question “Would building more nuclear power plants help the environment?”. Such questions may require a nuanced way to respond that reflects potentially divergent viewpoints.
CoaKGs permitem a representação das visões controversas de modo explícito e contextualizado
4 RESEARCH DIRECTIONS
what combination of SEs, KGs and LLMs can go beyond the individual technologies in supporting more categories of information needs? The ideal solutions should provide Internet-scale coverage and freshness, give precise answers with user-friendly explanations of provenance, and support the entire spectrum from fact lookups to analytic queries and personalized advice, all with low computational cost.
The idea of combining LLMs with other technologies is not new. Most notably, KGs have been leveraged by SEs for entitycentric queries, and recent LLMs are coupled with SE techniques for RAG.
4.1 Augmenting Language Models
KG for LLM: Curated Knowledge. Per the old idiom, “just because it’s said doesn’t make it true”, much of the text on which LLMs are trained is not factual in nature. KGs as a curated source of structured knowledge can thus enhance LLMs in a variety of ways [31, 53], particularly for addressing complex (e.g., multi-hop or aggregation based) factual questions. This can involve pre-training and finetuning enhancements, or inference-time injection of factual knowledge.
GraphRAG.
Considerando que KG refletem fatos e não alegações, que devem estar contextualizadas. Mas se o KG for construído colaborativamente ou automaticamente usando fontes não necessariamente confiáveis.
4.2 Augmenting Search Engines
LLM for SE: AI-Assisted Search. LLMs have the capability to enrich SE functionality on both the user-input side and the way results are presented [60]. On the input side, LLMs provide users with natural-language dialogue that humans appreciate, particularly for explanations, planning and advice. Queries are seamlessly derived from the user’s utterances, based on the LLM’s skills in language
Perguntar em linguagem natural e responder da mesma forma seria mais orgânico para os humanos
KG for SE: Semantic Search. Major search engines have deployed semantic search functionalities by means of their back-end KGs, which allow for answering entity-centric queries with KG excerpts, sometimes called “knowledge panels” or “knowledge cards”.
Busca do Google usando Wikidata
4.3 Augmenting Knowledge Graphs
LLM for KG: Knowledge Generation. The provocatively titled work “Language Models as Knowledge Bases?” [34] initiated a wave of research to investigate if and to what extent an LLM can generate
a full-fledged KG, based on advanced prompt engineering and fewshot in-context learning, and sometimes even with supervised finetuning. Such techniques typically take as input a subject–predicate
pair, such as Venezuela–capital, plus further prompting and context, and generate one or more objects that complete the desired fact.
Tarefa de geração de triplas. Mas e se precisar adiconar o contexto?
There is then the danger that the hallucinations of LLMs could lead to polluting KGs, dragging down their correctness, and the quality of responses to factual queries (in particular). Rather than trying to re-create KGs of this scale and quality, an interesting direction would be to identify gaps in the KG – in terms of missing entities and facts about entities – and use LLMs to strategically fill such gaps. In particular, less prominent entities in the long tail would call for extending KGs, but studies so far show that the LLM ability to generate facts rapidly degrades for long-tail entities that appear more rarely in the training corpus.
Introduzir as alucinações no KG mas a geração automática de triplas basedas em corpus de texto também pode fazer o mesmo
For this reason, KGs in practice often adorn the core graph abstraction with additional elements to capture such nuances; for example, Wikidata uses qualifiers, ranks and references to add additional context, priority and provenance to statements forming edges in the KG....
Modelos como o hiper relacional ou o multi layer graph
SE for KG: Knowledge Refinement. SEs can be used to refine the knowledge of KGs in various ways relating to updates, verification, provenance, negation, etc. ... SEs are also vital for checking and validating or invalidating statements in a KG by finding supportive or contradictory evidence, boosting correctness.
Verificação de alegações, atenderia a Camada de Confiança
As KGs follow the principle of the Open World Assumption, statements that are not in KG can either be false or merely missing. As an example, a KG may state that Manuel Blum won the Turing Award but not list him among ACM Fellows. However, we do not know whether the KG has a complete list of ACM Fellows – such that Blum is not a Fellow – or Blum is just missing from what is an incomplete list.
Combinar duas fontes de Mundo Aberto mas ambas deveria serm consideradas DOWA
4.4 KG + LLM + SE
Augmentation→Ensemble→Federation→Amalgamation. To combine all three technologies, we envision research following a natural progression through four phases.
A combinação das 3 opções.
Moving away from augmenting a main technology, one can consider an ensemble approach, where KGs, LLMs and SEs are peers in the information infrastructure, and a user query is delegated to the technology best adapted to address the particular type of information need. Such an ensemble would likely have a natural language interface powered by an LLM, but underneath it could call KGs, LLMs or SEs.
LLMs por serem modelos de linguagem deveriam ser usados somente em interefaces e não como fontes de informação em tarefas de tomada de decisão
Another idea would be to explore “dual neural knowledge” whereby LLMs and KGs are applied for a dual encoding of popular entities, whereas KGs and SEs are used for long-tail information. Whatever the particular research direction, in this phase, the lines between SEs, KGs and LLMs become increasingly blurred, leading to an amalgam technology that aims to surpass the sum of its parts.
Direcionar as tarefas com base na sua distrivuiçõ da cauda longa.
5 CONCLUSIONS
Regarding users’ information needs, KGs excel on complex factual queries, but do not cope well with non-factual categories. SEs provide support for factual and non-factual categories, but are effective only on simple queries and are inconvenient for questions whose answers do not lie in a single document; similar limitations arise also when structured data needs to be aggregated. LLMs also partially cover both factual and non-factual needs, but are prone to hallucinations and bias, have no formal operators for analytical queries, and can be trapped by false premises in questions.
As limitações de cada abordagem precisam ser claramente conhecidas para que sejam combinadas
Comentários
Postar um comentário
Sinta-se a vontade para comentar. Críticas construtivas são sempre bem vindas.