Pular para o conteúdo principal

Hierarchical knowledge graphs: A novel information representation for exploratory search tasks - Leitura de Artigo

Sarrafzadeh, B., Roegiest, A., & Lank, E. (2020). Hierarchical knowledge graphs: A novel information representation for exploratory search tasks. arXiv preprint arXiv:2005.01716. ACM Transactions on Information Systems, Vol. 4, No. TOIS, Article 1. Publication date: April 2020.


5 IMPACT OF INFORMATION EXTRACTION ERRORS ON HKGS

In this section, we evaluate the performance of HKGs in light of errors in information extraction. To understand why we wish to explore the impact of errors in information extraction, consider Figure 5. In typical web search, users formulate queries, inspect retrieved documents, and either view documents or, if they find that the returned documents are not exactly appropriate, reformulate queries to refine the set of documents retrieved. Because a user can directly examine the results of a query retrieval operation, the user can refine the search query to modify the retrieved documents as needed. However, when performing information extraction, one challenge that the user faces is a limited ability to influence the quality of extracted information. Even if the set of retrieved documents is correct, errors in information extraction propagate through the representation of the entity-relationship tuples.

[Erros introduzidos pelas ferramentas de IE afetam a hierarquia de visualização]

5.1 Experimental Design

To examine the impact that different levels of precision and recall have on exploratory search while using HKGs, we use two different information extraction outputs. One set represents the raw, uncorrected output of an IE algorithm; the second represents human-corrected output, used in the previous section to evaluate the potential of HKGs. We use these two outputs to populate our hierarchical knowledge graphs and leverage the interface that we designed (described in Section 3) to support interaction with these HKGs.  

[Executar as tarefas em uma versão do KG com e outra sem curadoria]

Characterizing Precision and Recall of Automatic vs Hand-Tuned IE.

5.1.9 Hypotheses and Research Questions. Quantitative data allows us to test the following hypotheses:
• Automatically generated hierarchical knowledge graphs result in a lower performance (i.e. task outcomes - measured by essay qualities) than do manually curated HKGs.
• Automatically generated hierarchical knowledge graphs result in more document views and more time spent reading documents (i.e. proxies for effort) than do manually curated HKGs.

5.4 Discussion

The goal of this section was to explore the impact of error prone information extraction on exploratory search tasks supported with HKGs. From our quantitative results, we note that Group, rather than Error condition or Task, resulted in significantly different performance on exploratory search tasks, as highlighted by the dependent measure Mark. To investigate this further, we looked more closely at any confounds within each group that could potentially impact the variance we see in search performance and behavior.

Summarizing these observations, while we observe no initial effect of error on performance, combining qualitative data with post-hoc statistical analysis, we find some evidence that precision and recall rates may impact one of our tasks, the History task, more than the Politics task. This, potentially makes sense; because the History task is an investigate task with, as noted qualitatively by our participants, a set of answers that are targeted rather than open-ended, errors in precision and/or recall might result in concepts useful to the search task being omitted from the data set.

[A tarefa investigativa teria sofrido mais impacto. Quais outros tipos de tarefas de Exploratory Search também seriam afetadas?]

6 SYNTHESIS, IMPLICATIONS, AND FUTURE WORK

Inspired by our earlier work contrasting the support networks and hierarchies provide for exploratory search, this paper explores a novel data structure, hierarchical knowledge graphs. The goal of HKGs is to combine the complementary advantages of individual data structures. Specifically, we observed that hierarchies provide better sensemaking for searchers new to a topic area by structuring the information space; whereas networks contain greater information within the data structure, thus reducing the need to read documents to acquire information.

[Hierarquia fez mais diferença para quem sabia menos do assunto]

Alongside this demonstration of complementary nature of networks and hierarchies, in our second experiment, we demonstrate a disassociation between output and outcomes within our exploratory search system. While benchmarking demonstrates error-prone information extraction, our measures of user performance demonstrate only limited impacts, and then only qualitative, of errors in information extraction (the output) with respect to user performance (the outcomes). Obviously, there exist a number of domains where output accuracy is highly relevant: specific document lookup, legal document discovery, and research reference lookup are all examples of such. However, it is possible that other domains, such as exploratory search, may be more resilient to errors in output accuracy,

[Erros poderiam ser omissões também? KGs são incompletos. O domínio poderia fazer diferença pq o usuário tem o seu conhecimento prévio.]

6.1 Limitations and Future Work

... another obvious area of future work is to add additional exploratory search tasks from Marchionini’s taxonomy [41]. Our emergent qualitative results in our second experiment indicate that, within the broad category of exploratory search tasks, different types of exploratory search may be impacted differently by errors. Understanding where and how the current levels of precision for IE algorithms impact each of these different task types will help to clarify where and how useful variable IE accuracy is for different types of tasks at a more fine-grained level.

[Relação entre as demais tarefas de Exploratory Search e a acurácia dos algoritmos de IE]

... future work is to investigate other potential factors that may affect user performance in search tasks. For example, cognitive biases can result in irrational search behavior and influence searchers’ relevance judgment of information. As well, bounded rationality impacts the way information seekers optimize their information processing efforts even at the cost of achieving a sub-optimal outcome. Our qualitative data provides some evidence that biases might be salient: participants who approached the exploratory Politics task with an intention of finding evidence to support the power of the president they believed is more powerful limited browsing behaviors because participants felt ‘already informed’ on the topic. Given the difficulty in fully controlling for cognitive biases and ability, this final research question would require extensive studies to both identify factors and to model them in a tractable way to measure their effects.

[Como isolar possível viés do que o usuário acredita que já sabe ao realizar uma busca exploratória: mindset do escoteiro x mindset do soldado ... Crenças]

7 CONCLUSION

In the field of information retrieval, alongside document retrieval, issues of representations that allow users to make sense of information and interfaces that allow users to interact with search results are important areas of inquiry. In this paper, we explore hierarchical knowledge graphs, an extension of knowledge graphs that leverages connectivity to generate hierarchies from the underlying knowledge graphs. Our mixed method experimental results argue that hierarchical knowledge graphs support the overview advantages of hierarchical representations, the information content advantages of knowledge graphs, and exhibit resilience to information extraction error rates common in contemporary information extraction algorithms.  

[Grafos de conhecimento hierárquicos, uma extensão dos KG que aproveita a conectividade para gerar hierarquias a partir do KG usando o grau dos vértices. Vantagens de visão geral das representações hierárquicas, as vantagens de conteúdo de informação dos KGs no nível de detalhe e exibem resiliência a taxas de erro de extração de informações comuns em algoritmos de extração de informações contemporâneos.]

      


 


Comentários

Postagens mais visitadas deste blog

Connected Papers: Uma abordagem alternativa para revisão da literatura

Durante um projeto de pesquisa podemos encontrar um artigo que nos identificamos em termos de problema de pesquisa e também de solução. Então surge a vontade de saber como essa área de pesquisa se desenvolveu até chegar a esse ponto ou quais desdobramentos ocorreram a partir dessa solução proposta para identificar o estado da arte nesse tema. Podemos seguir duas abordagens:  realizar uma revisão sistemática usando palavras chaves que melhor caracterizam o tema em bibliotecas digitais de referência para encontrar artigos relacionados ou realizar snowballing ancorado nesse artigo que identificamos previamente, explorando os artigos citados (backward) ou os artigos que o citam (forward)  Mas a ferramenta Connected Papers propõe uma abordagem alternativa para essa busca. O problema inicial é dado um artigo de interesse, precisamos encontrar outros artigos relacionados de "certa forma". Find different methods and approaches to the same subject Track down the state of the art rese...

Knowledge Graphs as a source of trust for LLM-powered enterprise question answering - Leitura de Artigo

J. Sequeda, D. Allemang and B. Jacob, Knowledge Graphs as a source of trust for LLM-powered enterprise question answering, Web Semantics: Science, Services and Agents on the World Wide Web (2025), doi: https://doi.org/10.1016/j.websem.2024.100858. 1. Introduction These question answering systems that enable to chat with your structured data hold tremendous potential for transforming the way self service and data-driven decision making is executed within enterprises. Self service and data-driven decision making in organizations today is largly made through Business Intelligence (BI) and analytics reporting. Data teams gather the original data, integrate the data, build a SQL data warehouse (i.e. star schemas), and create BI dashboards and reports that are then used by business users and analysts to answer specific questions (i.e. metrics, KPIs) and make decisions. The bottleneck of this approach is that business users are only able to answer questions given the views of existing dashboa...

Knowledge Graph Toolkit (KGTK)

https://kgtk.readthedocs.io/en/latest/ KGTK represents KGs using TSV files with 4 columns labeled id, node1, label and node2. The id column is a symbol representing an identifier of an edge, corresponding to the orange circles in the diagram above. node1 represents the source of the edge, node2 represents the destination of the edge, and label represents the relation between node1 and node2. >> Quad do RDF, definir cada tripla como um grafo   KGTK defines knowledge graphs (or more generally any attributed graph or hypergraph ) as a set of nodes and a set of edges between those nodes. KGTK represents everything of meaning via an edge. Edges themselves can be attributed by having edges asserted about them, thus, KGTK can in fact represent arbitrary hypergraphs. KGTK intentionally does not distinguish attributes or qualifiers on nodes and edges from full-fledged edges, tools operating on KGTK graphs can instead interpret edges differently if they so desire. In KGTK, e...