Pular para o conteúdo principal

Hierarchical Knowledge Graphs: A Novel Information Representation for Exploratory Search Tasks - Leitura de Artigo I

Sarrafzadeh, B., Roegiest, A., & Lank, E. (2020). Hierarchical knowledge graphs: A novel information representation for exploratory search tasks. arXiv preprint arXiv:2005.01716.
 
In exploratory search tasks, alongside information retrieval, information representation is an important factor in sensemaking.
 
[Representação da informação (e não apresentação)]
 
In this paper, we explore a multi-layer extension to knowledge graphs, hierarchical knowledge graphs (HKGs), that combines hierarchical and network visualizations into a unified data representation as
a tool to support exploratory search.
 
[Aqui já fala em apresentação / visualização]

We describe our algorithm to construct these visualizations, analyze interaction logs to quantitatively demonstrate performance parity with networks and performance advantages over hierarchies, and synthesize data from interaction logs, interviews, and thinkalouds on a testbed data set to demonstrate the utility of the unified  hierarchy+network structure in our HKGs.
 
[H de hierárquico e não de hiper relacional]
 
Alongside the above study, we perform an additional mixed methods analysis of the effect of precision and recall on the performance of hierarchical knowledge graphs for two different exploratory search tasks.
 
[Eficiência na resposta, não é tempo de resposta ou throughput]

While the quantitative data shows a limited effect of precision and recall on user performance and user effort, qualitative data combined with post-hoc statistical analysis provides evidence that the type of exploratory search task (e.g., learning versus investigating) can be impacted by precision and recall.
 
[Eficiência para atender a necessidade de informação, resolver o gap de conhecimento]
 
Furthermore, our qualitative analyses find that users are unable to perceive differences in the quality of extracted information. We discuss the implications of our results and analyze other factors that more significantly impact exploratory search performance in our experimental tasks
 
1 INTRODUCTION
 
Information Retrieval (IR) research explores a wide range of questions connected to the goal of helping a user find useful information in response to a need... , there exists two broad categories of search: look-up searches, which leverage a “search by query” strategy for information seeking; and browsing (i.e., “search by navigation”)  in the second category of search, and, in particular, in the broad category of information seeking that has been classified as Exploratory Search where the goal involves “learning" (i.e., developing new knowledge) or “investigating" (i.e., applying analysis, synthesis, and evaluation), rather than simply “looking-up"
 
[Aqui é IR como campo de pesquisa e não como sistema. Mas Information Seeking veio de Ciência da Informação e não da Computação.]
 
The 2018 SWIRL Workshop emphasizes the need for research in supporting complex, evolving and exploratory information seeking goals. ... To these ends, this paper focuses on two aspects that are critical for success in exploratory search tasks: challenges in the algorithms that are used to identify and extract information that is relevant to a searcher’s query; and, on the interfaces that visualize the output of these algorithms and facilitate a searcher’s interaction with and exploration of the retrieved information.
 
[Importância para pesquisa em IR -> https://sigir.org/wp-content/uploads/2018/07/p034.pdf]
 
In hierarchical representations, information is represented using categorical labels, such as those present in faceted browsing and automatic clustering of research results.
 
[A hierarquia é uma forma de agrupar ou filtrar]
 
While our expectation was that precision and recall would impact user performance (i.e., success in an exploratory search task) or the effort expended during search (e.g., the number of documents viewed), our results indicate that neither performance nor effort was significantly impacted by differing levels of precision and recall (Section 5.2). To probe this result in greater detail, we analyze qualitative data collected via observations and interviews.
 
[Métricas clássicas de eficácia em IR não são adequadas para Info Seek]
 
Our qualitative data indicates that task characteristics may be an important factor to consider in exploratory search (Section 5.3). Specifically, for investigate-style tasks where there is a defined set of facts to retrieve, recall may impact user behavior because it is necessary to find specific facts within the information presented. In contrast, more open-ended learn/comprehension or comparison tasks, where salient data can be more flexibly applied by the user, seem more resilient to lower recall rates.
 
[Cobertura x tarefas de Information Seeking]
 
2 BACKGROUND
 
While there has been research on understanding complex and exploratory search (see [ 76, 78] for a survey), there are many open questions when it comes to the design and evaluation of IR systems that provide tailored and adaptive support for different search tasks. Given our interest in exploratory search, in this section we first survey three areas of past research that explore support for users with more complex, exploratory search tasks. First, there has been a growing body of work in the IR community that aims to deliver “information” and not documents.  
 
[76] Ryen W White and Resa A Roth. 2009. Exploratory search: Beyond the query-response paradigm. Synthesis Lectures on Information Concepts, Retrieval, and Services 1, 1 (2009), 1–98.

[78] Max L Wilson, Bill Kules, Ben Shneiderman, et al. 2010. From keyword search to exploration: Designing future search interfaces for the web. Foundations and Trends in Web Science 2, 1 (2010), 1–97.
 
Alongside research on supporting exploratory search, our second research question probes the effect of error on information seeking tasks. It is often assumed if an evaluation measure coupled with a test collection reveals that system A provides higher quality output than system B, then the user will both prefer system A and that system A will more effectively support the user’s information seeking task. However, in our analysis of IR research on this topic, we have found that the relationship between output quality and system efficacy is not clear. This ambiguity is highlighted in the second part of this Background section.
 
[Não é possível afirmar que existe uma relação direta entre o sucesso na tarefa de busca exploratória e a qualidade da fonte de informação]
 
2.1 System Support for Exploratory Search
 
To support exploratory search, a system must have two components: an information extraction system that can identify and extract relevant information from a corpus (e.g., search engine results); and, an interactive UI that presents the information to users and allows users to browse the extracted information for sensemaking
 
[E a Base de Conhecimento? É somente o Corpus? O Grafo de Conhecimento aqui é um resultado intermediário do processo de consulta e não um artefato em si]
 
2.1.1 Organizing Search Results.  
 
Information seekers often express a need for tools that organize search results into meaningful constructs in order to support sensemaking and navigation. Because of the importance of structure in search, there have been efforts to contrast strengths and weaknesses of different spatial representations and groupings of search results.
 
[A interface e a forma como a informação é apresentada]
 
Given the lack of intuitiveness associated with clustering and a desire for understandable hierarchies in which categories are presented at uniform levels of granularity, alongside specified hierarchies such as tables-of-contents, researchers have explored faceted categories, i.e. categories that are semantically related to the search task of the user, to organize search results.
 
In other words, exploratory tasks (e.g. learning or investigating) are precisely those tasks where interactions between facets are needed.
 
2.2 Evaluating Exploratory Search Systems
 
Given a system to support exploratory search, we must determine how best to characterize the performance of an exploratory search system. There are two aspects to system performance accuracy and effectiveness. Assessing the accuracy of an algorithm can be performed through benchmarking and/or combined efforts tasks (e.g., TREC or CLEF tasks). System effectiveness for exploratory search, on the other hand, requires evaluating how well the systems aids in the exploratory search tasks it is designed around.
 
[precisão e eficácia, não é eficiência]
 
2.2.1 System Accuracy.  
 
Evaluating information extraction is challenging. There are no clear guidelines as to what constitutes a valid proposition to be extracted, and most information extraction evaluations consist of a post-hoc manual evaluation of a small output sample
 
[Precisão das informações extraídas do Corpus ... isso não tem relação direta com a minha pesquisa]
 
2.2.2 System Effectiveness.  
 
It has long been understood in IR that a system understanding of relevance is not always consistent with what a user desires and so we must also understand how systems impact user performance. The recent SWIRL Workshop has identified the most relevant research questions to be addressed in order to develop new evaluation models that are suited for complex and exploratory information seeking.
 
[Relevância dos resultados depende da tarefa que motivou a busca]
 
A major step towards this goal is to design and study characteristics of search tasks that elicit exploratory behavior. These studies, in turn, provide data on searchers performing these tasks, specifically focused on task outcomes and searcher behaviors. Designing tasks for exploratory search studies can be especially difficult since inducing exploratory style search requires the searcher to individually interpret the tasks, results, and their relevance which is at odds with maintaining some level of experimental control and consistency.
 
[São estudos qualitativos e quantitativos]
 
To aid in the creation of appropriate exploratory search tasks, we look to Marchionini, referencing Bloom’s taxonomy of educational objectives, who distinguishes three broad categories of search tasks as Lookup, Learn, and Investigate. While these categories are depicted as overlapping activities, exploratory search is more pertinent to the Learn and Investigate activities. As a result, exploratory search is defined as searching that supports learning, investigating, comparing or discovering. From this understanding, we can distill exploratory search tasks into fitting into one of two themes. The first theme includes those tasks that facilitate learning to achieve knowledge acquisition, comprehension of concepts, interpretation of ideas and comparison or aggregation of concepts. The second theme covers those investigative tasks that involve discovery, analysis, synthesis and evaluation.
 
[Características das tarefas de Busca Exploratória]
 
Based upon the aforementioned works and a survey of existing classifications by Li and Belkin, we believe that exploratory search tasks should: provide uncertainty and ambiguity about the information need and in how to satisfy it; suggest a specific knowledge acquisition, comparison or discovery task; be in an unfamiliar domain for the searcher; represent a situation that a user can relate to and identify with; be of sufficient interest to test users; and, be formulated such that the user has enough imaginative context to facilitate immersion in the task. Any task that meets these criteria provides sufficient complexity that the end-to-end experience with an exploratory search system can be fully and properly assessed.
 
[Características das tarefas de Busca Exploratória]
 
2.2.3 Impact of Accuracy on Effectiveness.
 
Synthesizing past research, we see ambiguity in the effect of errors in the domain of information retrieval. Coupled with this, we note that exploratory search tasks require systems that support browsing, and, within information retrieval, this has given rise to systems that retrieve and present information to users in formats that support browsing. Absent from past research is assessment of the effect of information extraction errors on exploratory search interfaces. While we concur that IE extraction systems would ideally have perfect precision and recall, in the near term it seems unlikely that computational information extraction will be perfected, further motivating exploration of the effect of information extraction errors on interfaces that support exploratory search. Possibly due to the ambiguous link between system performance and effectiveness, there have been calls to extend evaluation of IR systems from an analysis of the output of the system to the outcome of the search task. Furthermore, there is also an evolving drive toward evaluations of how effectively IR systems support complex, evolving, long term information seeking goals, such as learning and exploration.
 
[Relação entre a precisão do conteúdo do grafo e o desempenho da tarefa de busca não é determinística. O usuário aidna pode inspecionar o documento para obter mais respostas]
 
3 HIERARCHICAL KNOWLEDGE GRAPHS
 
In this section, we describe hierarchical knowledge graphs, an extension of knowledge graphs that include hierarchical information about the lower level graphical structures. Our past work argues that hierarchies provide a breadth-first exploration of the information allowing the user to iteratively reduce confusion, obtain an overview, and slowly exploit detail (i.e., they provide a structured way to navigate from more general concepts to more fine grained data) and are valuable when people feel a need to orient themselves ...
 
[Navegação entre diferentes níveis de agregação]
 
3.1 Visualization Design and Creation
 
3.1.1 Document Retrieval. The Document Retrieval component aims at creating an initial document collection based on a user’s query. This collection will then be used as an input for the Knowledge Graph Generation component and will represent the top view of the target hierarchy.
 
[Um subconjunto do corpus criado a partir de uma consulta por palavras-chave é usado para a criação do KG. Seria esse conjunto de palavras-chave significativo para representar a necessidade de informação? ]
 
3.1.2 Knowledge Graph Generation. To create our knowledge graph, we designed an Open Information Extraction system that processes a text collection and generates (entity-relation-entity) triples. This module is implemented in four phases. During the first phase we create the input corpus by collecting retrieved documents based on a given query.

[Gerar o KG de modo aberto, não tem esquema definido]
 
3.1.3 Minimap Generation. The final component of this system generates a hierarchical representation of the search results by extracting a middle layer from the input Knowledge graph tuples and provides bidirectional mappings between all three layers. As noted earlier, we call this layer the minimap layer.
 
[Camadas da hierarquia para visualização. Abordagem de KG Profiling. ]
 
A natural result of the entity-relationship tuples extracted above is that some entities have a higher number of edges, i.e., are of higher degree. A higher edge count implies a larger number of connections to other entities in the graph; in other words, those entities with higher edge counts were more frequently linked with other entities in the document. We call these higher degree vertices, central concepts, and hypothesize that one alternative to hierarchical faceted structures is to consider a multi-level view of a knowledge graph around central concepts.
 
[Usar o grau do nó para destacar. Abordagem de KG Profiling. ]
 
3.2 Prototype Development
 
Based on established literature and pilot studies we found that knowledge graphs can become overwhelming or confusing for participants. The overwhelming nature of the full knowledge graph leads to a need to create filtered views of our graph. These filtered views draw inspiration from the “expand-from-known” paradigm in information visualization.
 
[Essa expansão é uma forma de drill-down]
 
4 EVALUATING HIERARCHICAL KNOWLEDGE GRAPHS
 
In our earlier work, the specific, complementary benefits of hierarchies and knowledge graphs were that hierarchies support a better global view of the search space, allowing participants to gain an appreciation of important topics whereas networks provide information on low-level entities and their relationship, thus reducing the need to read documents.
 
[A visão do grafo forneceria um resumo dos documentos dispensando a inspeção de cada documento]
 
4.1 Experimental Design

4.1.2 Data Set. To populate our interactive applications, we created two distinct data sets: one focusing on history and the second on global politics.
 
[Temas associados as tarefas de busca exploratória]
 
4.1.3 Search Tasks. Search tasks can be either simple (e.g., question answering) or complex (e.g., essay writing). With respect to the complexity level, each participant performed one Simple and one Complex task. We also used two different topics (i.e., History and Politics) to investigate the relation between the topic and content knowledge with the structure used to organize the retrieved information.
 
[Duas tarefa simples e duas complexas. A simples não é uma pergunta de Q&A, é uma pergunta ambígua que depende de interpretação.]
 
The queries we asked people to find information to satisfy in our study were the following:
 
Simple Politics: What governmental body or bodies are involved in the impeachment of the President of Iran and of Russia?
 
Complex Politics: Imagine you are a high school student who is going to write an essay on the Political Systems of Iran and Russia. Knowing little about the presidents of these two countries, you wish to determine which president has more power. Find at least 3 arguments to justify your answer.
 
Simple History: As a result of which act were Upper and Lower Canada formed?
 
Complex History: Imagine you are a high school student who is going to write an essay on the History of Canada. Knowing little about Canadian History, you wish to know which cities have served as a capital for Canada. You would also like to understand the reasons behind moving the capital from one city to another.
 
[Esse da capital é possível fazer um análogo ao Brasil.  Idem para impeachment]
 
To design these exploratory search tasks, we were guided by Marchionini’s work on exploratory search. These tasks combine aspects of knowledge  acquisition/comparison (Marchionini’s learn subcategory) with analysis, synthesis, and evaluation (Marchionini’s investigate subcategory). In addition, the task descriptions closely follow Bystrom and Hansen’s [10] recommendation that three levels of description should be used to specify a search task: a contextual description, a situational description and a topical description and query
 
[Três descrições para a tarefa de busca; (1) contexto do usuário; (2) contexto da tarefa e (3) necessidade de informação que motiva a busca]

4.1.4 Study Design.
4.1.5 Participants.
 
4.1.6 Procedure. After introducing the study, participants were presented with an experimental interface (populated with an unrelated data set), and were given time to familiarize themselves with the interface and data structure. Once participants had developed some comfort with the features of the interface (∼ 3 minutes), participants completed a questionnaire assessing their familiarity with the topic used for the first task. They were then given the description of their task (see above), and were asked to complete the task using the interface (15 minutes per task). Participants completed a post-task questionnaire that evaluated the experience; we used questionnaires provided by TREC-9 Interactive Searching track modified to fit our experiment. The same process was repeated for the second task.
 
[Ambientação na ferramenta usando outros dados antes de realizar as tarefas. Questionário]
 
At the end of the second task, a semi-structured interview explored participants’ experience using the interface. Interviews explored the conceptual usability of the visualization, the technical usability of the application and the efficacy of the interface for different types of search tasks.

[Sobre a interface e não sobre a tarefa]
 
4.1.7 Data Collection. Alongside a mixed design of within subject and between subject factors, we perform a mixed methods analysis of both quantitative and qualitative data. Data was captured as follows:
(a) The interface was instrumented with a logger ...
(b) Two assessors evaluated the quality of answers provided by the participants for each of the search tasks independently. Simple queries were rated as either correct or incorrect. Complex questions were rated on a scale.
(c) We captured field notes during participant interactions, audio recorded all sessions, transcribed final interviews, and collected questionnaire data.

[Avaliação da tarefa fim a fim. Como avaliar no meu caso se não tem interface?]
 
4.1.8 Hypotheses and Research Questions. Quantitative data allows us to test the following hypotheses:
• Hierarchical knowledge graphs result in fewer document views and less time spent reading documents than do hierarchical trees.
• Hierarchical knowledge graphs exhibit statistically similar behaviors to Knowledge Graphs.
 
4.3 Results: Qualitative Analysis

4.3.1 Supporting Exploratory Search Tasks. As noted in our study design, we incorporate two exploratory information seeking tasks with different levels of complexity. In post-experiment interviews the participants were able to compare how different task complexities are supported by the assigned interface. The hierarchical graph representation was found to provide more support for the Complex Task (i.e., more open ended and exploratory tasks such as essay writing or learning) versus Simple tasks (such as question answering and specific knowledge finding). This observation seems to be true for any multi-level structure which provides an overview and allows a gradual immersion into details: Finding a specific piece of information to satisfy a simple query is best done using a traditional search engine.

[Se a tarefa é pontual, de Look Up, usar as ferramentas de IR tradicional é mais adequado]
 
As White and Roth point out, exploratory search is motivated by complex information problems, poor understanding of terminology and information space structure, and often a “desire to learn.” Vakkari also argues “more support is needed in the initial stages of a task,” when users have an unstructured mental model. Inspired by Kim, in our prior work we found that hierarchical trees provide this benefit in unfamiliar domains. A strength of our design of hierarchical knowledge graphs is that it enables the user to engage in two alternative navigation paradigms. Users can exploit overview layers to explore the collection at a higher level followed by targeted immersion in the detailed view.

[A Exploração ajuda a compreender o domínio]
 
4.3.2 Imposing a Structure versus Open Exploration.
 
One interesting perspective of the multi-layer graph representation which presents central concepts of a domain as an overview for each document is that it reflects the knowledge graph concepts. This reflection made it, for many participants, more flexible and exploratory, a window into the knowledge graph. Many participants commented on this phenomenon, noting it was “guiding but not imposing,”

[Hierarquia e o drill-down]
 
4.4 Discussion
 
Qualitative data from our participants indicate that hierarchies grounded in tables-of-contents are more familiar, easier to follow, and more focused. This is primarily because the tree layout explicitly represents connections between nodes, which helps with understanding how and where a concept fits in a bigger picture. This in turn helps users orient themselves in the data.
 
The goal of hierarchies in HKG was to help users self-orient within the data, to develop an overview of the data. This was one identified benefit of the hierarchical view provided by the tree interface.
 
Olhar
[42] Gary Marchionini. 2019. Search, sense making and learning: closing gaps. Information and Learning Sciences (2019).     

                     

Comentários

Postagens mais visitadas deste blog

Connected Papers: Uma abordagem alternativa para revisão da literatura

Durante um projeto de pesquisa podemos encontrar um artigo que nos identificamos em termos de problema de pesquisa e também de solução. Então surge a vontade de saber como essa área de pesquisa se desenvolveu até chegar a esse ponto ou quais desdobramentos ocorreram a partir dessa solução proposta para identificar o estado da arte nesse tema. Podemos seguir duas abordagens:  realizar uma revisão sistemática usando palavras chaves que melhor caracterizam o tema em bibliotecas digitais de referência para encontrar artigos relacionados ou realizar snowballing ancorado nesse artigo que identificamos previamente, explorando os artigos citados (backward) ou os artigos que o citam (forward)  Mas a ferramenta Connected Papers propõe uma abordagem alternativa para essa busca. O problema inicial é dado um artigo de interesse, precisamos encontrar outros artigos relacionados de "certa forma". Find different methods and approaches to the same subject Track down the state of the art rese...

Knowledge Graph Embedding with Triple Context - Leitura de Abstract

  Jun Shi, Huan Gao, Guilin Qi, and Zhangquan Zhou. 2017. Knowledge Graph Embedding with Triple Context. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management (CIKM '17). Association for Computing Machinery, New York, NY, USA, 2299–2302. https://doi.org/10.1145/3132847.3133119 ABSTRACT Knowledge graph embedding, which aims to represent entities and relations in vector spaces, has shown outstanding performance on a few knowledge graph completion tasks. Most existing methods are based on the assumption that a knowledge graph is a set of separate triples, ignoring rich graph features, i.e., structural information in the graph. In this paper, we take advantages of structures in knowledge graphs, especially local structures around a triple, which we refer to as triple context. We then propose a Triple-Context-based knowledge Embedding model (TCE). For each triple, two kinds of structure information are considered as its context in the graph; one is the out...

KnOD 2021

Beyond Facts: Online Discourse and Knowledge Graphs A preface to the proceedings of the 1st International Workshop on Knowledge Graphs for Online Discourse Analysis (KnOD 2021, co-located with TheWebConf’21) https://ceur-ws.org/Vol-2877/preface.pdf https://knod2021.wordpress.com/   ABSTRACT Expressing opinions and interacting with others on the Web has led to the production of an abundance of online discourse data, such as claims and viewpoints on controversial topics, their sources and contexts . This data constitutes a valuable source of insights for studies into misinformation spread, bias reinforcement, echo chambers or political agenda setting. While knowledge graphs promise to provide the key to a Web of structured information, they are mainly focused on facts without keeping track of the diversity, connection or temporal evolution of online discourse data. As opposed to facts, claims are inherently more complex. Their interpretation strongly depends on the context and a vari...