Pular para o conteúdo principal

Hierarchical knowledge graphs: A novel information representation for exploratory search tasks - Leitura de Artigo

Sarrafzadeh, B., Roegiest, A., & Lank, E. (2020). Hierarchical knowledge graphs: A novel information representation for exploratory search tasks. arXiv preprint arXiv:2005.01716. ACM Transactions on Information Systems, Vol. 4, No. TOIS, Article 1. Publication date: April 2020.


5 IMPACT OF INFORMATION EXTRACTION ERRORS ON HKGS

In this section, we evaluate the performance of HKGs in light of errors in information extraction. To understand why we wish to explore the impact of errors in information extraction, consider Figure 5. In typical web search, users formulate queries, inspect retrieved documents, and either view documents or, if they find that the returned documents are not exactly appropriate, reformulate queries to refine the set of documents retrieved. Because a user can directly examine the results of a query retrieval operation, the user can refine the search query to modify the retrieved documents as needed. However, when performing information extraction, one challenge that the user faces is a limited ability to influence the quality of extracted information. Even if the set of retrieved documents is correct, errors in information extraction propagate through the representation of the entity-relationship tuples.

[Erros introduzidos pelas ferramentas de IE afetam a hierarquia de visualização]

5.1 Experimental Design

To examine the impact that different levels of precision and recall have on exploratory search while using HKGs, we use two different information extraction outputs. One set represents the raw, uncorrected output of an IE algorithm; the second represents human-corrected output, used in the previous section to evaluate the potential of HKGs. We use these two outputs to populate our hierarchical knowledge graphs and leverage the interface that we designed (described in Section 3) to support interaction with these HKGs.  

[Executar as tarefas em uma versão do KG com e outra sem curadoria]

Characterizing Precision and Recall of Automatic vs Hand-Tuned IE.

5.1.9 Hypotheses and Research Questions. Quantitative data allows us to test the following hypotheses:
• Automatically generated hierarchical knowledge graphs result in a lower performance (i.e. task outcomes - measured by essay qualities) than do manually curated HKGs.
• Automatically generated hierarchical knowledge graphs result in more document views and more time spent reading documents (i.e. proxies for effort) than do manually curated HKGs.

5.4 Discussion

The goal of this section was to explore the impact of error prone information extraction on exploratory search tasks supported with HKGs. From our quantitative results, we note that Group, rather than Error condition or Task, resulted in significantly different performance on exploratory search tasks, as highlighted by the dependent measure Mark. To investigate this further, we looked more closely at any confounds within each group that could potentially impact the variance we see in search performance and behavior.

Summarizing these observations, while we observe no initial effect of error on performance, combining qualitative data with post-hoc statistical analysis, we find some evidence that precision and recall rates may impact one of our tasks, the History task, more than the Politics task. This, potentially makes sense; because the History task is an investigate task with, as noted qualitatively by our participants, a set of answers that are targeted rather than open-ended, errors in precision and/or recall might result in concepts useful to the search task being omitted from the data set.

[A tarefa investigativa teria sofrido mais impacto. Quais outros tipos de tarefas de Exploratory Search também seriam afetadas?]

6 SYNTHESIS, IMPLICATIONS, AND FUTURE WORK

Inspired by our earlier work contrasting the support networks and hierarchies provide for exploratory search, this paper explores a novel data structure, hierarchical knowledge graphs. The goal of HKGs is to combine the complementary advantages of individual data structures. Specifically, we observed that hierarchies provide better sensemaking for searchers new to a topic area by structuring the information space; whereas networks contain greater information within the data structure, thus reducing the need to read documents to acquire information.

[Hierarquia fez mais diferença para quem sabia menos do assunto]

Alongside this demonstration of complementary nature of networks and hierarchies, in our second experiment, we demonstrate a disassociation between output and outcomes within our exploratory search system. While benchmarking demonstrates error-prone information extraction, our measures of user performance demonstrate only limited impacts, and then only qualitative, of errors in information extraction (the output) with respect to user performance (the outcomes). Obviously, there exist a number of domains where output accuracy is highly relevant: specific document lookup, legal document discovery, and research reference lookup are all examples of such. However, it is possible that other domains, such as exploratory search, may be more resilient to errors in output accuracy,

[Erros poderiam ser omissões também? KGs são incompletos. O domínio poderia fazer diferença pq o usuário tem o seu conhecimento prévio.]

6.1 Limitations and Future Work

... another obvious area of future work is to add additional exploratory search tasks from Marchionini’s taxonomy [41]. Our emergent qualitative results in our second experiment indicate that, within the broad category of exploratory search tasks, different types of exploratory search may be impacted differently by errors. Understanding where and how the current levels of precision for IE algorithms impact each of these different task types will help to clarify where and how useful variable IE accuracy is for different types of tasks at a more fine-grained level.

[Relação entre as demais tarefas de Exploratory Search e a acurácia dos algoritmos de IE]

... future work is to investigate other potential factors that may affect user performance in search tasks. For example, cognitive biases can result in irrational search behavior and influence searchers’ relevance judgment of information. As well, bounded rationality impacts the way information seekers optimize their information processing efforts even at the cost of achieving a sub-optimal outcome. Our qualitative data provides some evidence that biases might be salient: participants who approached the exploratory Politics task with an intention of finding evidence to support the power of the president they believed is more powerful limited browsing behaviors because participants felt ‘already informed’ on the topic. Given the difficulty in fully controlling for cognitive biases and ability, this final research question would require extensive studies to both identify factors and to model them in a tractable way to measure their effects.

[Como isolar possível viés do que o usuário acredita que já sabe ao realizar uma busca exploratória: mindset do escoteiro x mindset do soldado ... Crenças]

7 CONCLUSION

In the field of information retrieval, alongside document retrieval, issues of representations that allow users to make sense of information and interfaces that allow users to interact with search results are important areas of inquiry. In this paper, we explore hierarchical knowledge graphs, an extension of knowledge graphs that leverages connectivity to generate hierarchies from the underlying knowledge graphs. Our mixed method experimental results argue that hierarchical knowledge graphs support the overview advantages of hierarchical representations, the information content advantages of knowledge graphs, and exhibit resilience to information extraction error rates common in contemporary information extraction algorithms.  

[Grafos de conhecimento hierárquicos, uma extensão dos KG que aproveita a conectividade para gerar hierarquias a partir do KG usando o grau dos vértices. Vantagens de visão geral das representações hierárquicas, as vantagens de conteúdo de informação dos KGs no nível de detalhe e exibem resiliência a taxas de erro de extração de informações comuns em algoritmos de extração de informações contemporâneos.]

      


 


Comentários

Postagens mais visitadas deste blog

Aula 12: WordNet | Introdução à Linguagem de Programação Python *** com NLTK

 Fonte -> https://youtu.be/0OCq31jQ9E4 A WordNet do Brasil -> http://www.nilc.icmc.usp.br/wordnetbr/ NLTK  synsets = dada uma palavra acha todos os significados, pode informar a língua e a classe gramatical da palavra (substantivo, verbo, advérbio) from nltk.corpus import wordnet as wn wordnet.synset(xxxxxx).definition() = descrição do significado É possível extrair hipernimia, hiponimia, antonimos e os lemas (diferentes palavras/expressões com o mesmo significado) formando uma REDE LEXICAL. Com isso é possível calcular a distância entre 2 synset dentro do grafo.  Veja trecho de código abaixo: texto = 'útil' print('NOUN:', wordnet.synsets(texto, lang='por', pos=wordnet.NOUN)) texto = 'útil' print('ADJ:', wordnet.synsets(texto, lang='por', pos=wordnet.ADJ)) print(wordnet.synset('handy.s.01').definition()) texto = 'computador' for synset in wn.synsets(texto, lang='por', pos=wn.NOUN):     print('DEF:',s

truth makers AND truth bearers - Palestra Giancarlo no SBBD

Dando uma googada https://iep.utm.edu/truth/ There are two commonly accepted constraints on truth and falsehood:     Every proposition is true or false.         [Law of the Excluded Middle.]     No proposition is both true and false.         [Law of Non-contradiction.] What is the difference between a truth-maker and a truth bearer? Truth-bearers are either true or false; truth-makers are not since, not being representations, they cannot be said to be true, nor can they be said to be false . That's a second difference. Truth-bearers are 'bipolar,' either true or false; truth-makers are 'unipolar': all of them obtain. What are considered truth bearers?   A variety of truth bearers are considered – statements, beliefs, claims, assumptions, hypotheses, propositions, sentences, and utterances . When I speak of a fact . . . I mean the kind of thing that makes a proposition true or false. (Russell, 1972, p. 36.) “Truthmaker theories” hold that in order for any truthbe

DGL-KE : Deep Graph Library (DGL)

Fonte: https://towardsdatascience.com/introduction-to-knowledge-graph-embedding-with-dgl-ke-77ace6fb60ef Amazon recently launched DGL-KE, a software package that simplifies this process with simple command-line scripts. With DGL-KE , users can generate embeddings for very large graphs 2–5x faster than competing techniques. DGL-KE provides users the flexibility to select models used to generate embeddings and optimize performance by configuring hardware, data sampling parameters, and the loss function. To use this package effectively, however, it is important to understand how embeddings work and the optimizations available to compute them. This two-part blog series is designed to provide this information and get you ready to start taking advantage of DGL-KE . Finally, another class of graphs that is especially important for knowledge graphs are multigraphs . These are graphs that can have multiple (directed) edges between the same pair of nodes and can also contain loops. The