Pular para o conteúdo principal

Avaliação de proposta - Exploração de KG

Casos de Uso (Aluno do Schwabe)

The criteria to select the cases were the following:

1. The case should be published as a difficult case in the area. Since the case is published, we infer that the case is a real problematic situation faced by a community of data users, with reasonable complexity.

[Não é inventado pelo pesquisador, não teria viés de construção para validar a proposta mas pode ter viés na seleção]

2. The case is difficult to be solved with operators in the state-of-the-art tools. The rationale for this criterion is that we can use the case studies to compare the expressivity of our model against state-of-the-art tools using the same tasks.

[Permitir comparação com outras ferramentas de exploração]

5.1.Case Study 1: Discovering Technological Trends

The changes in the technological landscape that can be identified by analyzing published patents in different time periods are observed by answering four main questions:
• Which industry fields have increased the level of attention throughout given periods?
• Which industry fields have decreased the level of attention throughout given periods?
• Which industry fields started to be addressed throughout given periods?
• Which industry fields stopped to be addressed throughout given periods?

5.2.Case Study 2: Evaluating a scientific paper

Consider a reviewer evaluating a scientific paper. In order to do so, the user can take the following strategy:
1. Analyze the age of the citations: the reviewer extracts the years of each citation and calculates, for example, the mean year;
2. Check the lack of citations to relevant publications: The reviewer can extract the keywords of the paper and issue a keyword search for related papers; Rank the articles by the number of incoming citations. Keep the first 20 articles; Differentiate the two sets and verify which ones are not in the bibliography of the paper;
3. Analyze the degree of "self-citations": the reviewer analyzes how self-referential is the paper. A self-citation can be either a citation of previous works of one of the authors or citations from authors of the same research group;
4. Evaluate if the paper fits to the scope of a venue: the reviewer might count the number of citations published in the same venue as an indicator of how adequate the paper is to the targeted venue.

The main goal of the reuse in this case is the transference of knowledge not only with regards to the results of the tasks but also concerning the resolutionprocesses. Therefore, new users can draw upon the experience of previous users to aid their task resolution strategies.

[O próprio processo de exploração pode ser um conhecimento a ser compartilhado]

5.3. Case Study 3: Summarizing Gene Clusters

The task consists in crossing the gene identifiers of the cluster with a bibliographical dataset in order to find terms that better describe the genes in the cluster. The strategy employed is the following:
1. A gene can have many identifiers. Therefore, the user tries to obtain all identifiers for each gene in the cluster;
2. Once having a more complete set of identifiers, the user tries to query a bibliographic dataset in order to find all publications that mention the gene identifiers;
3. From publications, find the terms that better describe the genes;
4. Rank the terms using specific ranking criteria designed to extract different information from the cluster

• The framework is useful for formally describing relatively complex exploration tasks of different domains of knowledge;
• The framework leverages analyzes of the task resolution process abstracting interface and interaction details. Therefore, the framework can be used as an epistemic tool for design decisions, where the designers could use it to devise alternative exploration paths and analyze which sequences is mostly indicated given the task execution context and user profile.
• Once the solution strategy employed can be formally represented, they can be shared and reused either in different scenario within the same domain or within different domains. Reuse in different domains will require adaptations of the schema used in the task, which can be achieved by parameterizations. Moreover, by representing exploration strategies formally, it is possible to audit the strategies for results validation purposes, which is of a great value for validating scientific results, for example

[O framework define operações genéricas e os estudos de caso mostraram como aplicar essas operações para atingir um objetivo]

Artigo Exploring KG for Exploratory Search (2014) 

Interface para navegar entre os documentos e o grafo. Duas tarefas de exploração: uma simples e outra complexa. Conjunto de documentos pré-selecionado, resultado do OIE avaliado e corrigido por especialista (mas isso não é possível em tempo de consulta). A tarefa simples é mais semelhante a Q&A e a complexa é um cenário de busca sem resultado exato definido. 

Complexa: 10 minutos e Qual é o posicionamento do Estado da Califórnia sobre pesquisas com células tronco?

Simples:  Tópico Doença de Lyme e algumas perguntas pontuais.

Questionários sobre o conhecimento prévio do assunto e o que aprenderam com as buscas. E registro do log de navegação na interface para avaliar os padrões de acesso ao grafo versus documentos.

Será que consigo: 

  1. (Pesquisador) Selecionar um benchmark de perguntas e respostas que use KG hiper-relacional;
  2. (Pesquisador) Agrupar perguntas deste benchmark sobre o mesmo assunto ou entidade;
  3. (Pesquisador) Analisar o esquema/resumo KG hiper-relacional para identificar dimensões contextuais de interesse e possíveis regras a serem aplicadas;
  4. (Pesquisador) Formular uma necessidade de informação que envolva aprender ou investigar um assunto ou entidade (ou mais de uma relacionada) encontrada no benchmark e também considerar as dimensões de contexto disponíveis;
  5. (Pesquisador) Apresentar tarefas de busca exploratória e hiper KG contextualizado aos usuários;
  6. (Usuário) Realizar a exploração do KG e registrar as consultas (com o respectivo tipo de respostas);
  7. (Pesquisador) Elaborar um questionário pós-tarefa;
  8. (Usuário) Responder ao questionário para testar os conhecimentos adquiridos;
  9. (Pesquisador) Analisar padrões de consulta;
  10. (Pesquisador) Analisar as respostas dos usuários à pesquisa.




Comentários

Postagens mais visitadas deste blog

Connected Papers: Uma abordagem alternativa para revisão da literatura

Durante um projeto de pesquisa podemos encontrar um artigo que nos identificamos em termos de problema de pesquisa e também de solução. Então surge a vontade de saber como essa área de pesquisa se desenvolveu até chegar a esse ponto ou quais desdobramentos ocorreram a partir dessa solução proposta para identificar o estado da arte nesse tema. Podemos seguir duas abordagens:  realizar uma revisão sistemática usando palavras chaves que melhor caracterizam o tema em bibliotecas digitais de referência para encontrar artigos relacionados ou realizar snowballing ancorado nesse artigo que identificamos previamente, explorando os artigos citados (backward) ou os artigos que o citam (forward)  Mas a ferramenta Connected Papers propõe uma abordagem alternativa para essa busca. O problema inicial é dado um artigo de interesse, precisamos encontrar outros artigos relacionados de "certa forma". Find different methods and approaches to the same subject Track down the state of the art rese...

Knowledge Graphs as a source of trust for LLM-powered enterprise question answering - Leitura de Artigo

J. Sequeda, D. Allemang and B. Jacob, Knowledge Graphs as a source of trust for LLM-powered enterprise question answering, Web Semantics: Science, Services and Agents on the World Wide Web (2025), doi: https://doi.org/10.1016/j.websem.2024.100858. 1. Introduction These question answering systems that enable to chat with your structured data hold tremendous potential for transforming the way self service and data-driven decision making is executed within enterprises. Self service and data-driven decision making in organizations today is largly made through Business Intelligence (BI) and analytics reporting. Data teams gather the original data, integrate the data, build a SQL data warehouse (i.e. star schemas), and create BI dashboards and reports that are then used by business users and analysts to answer specific questions (i.e. metrics, KPIs) and make decisions. The bottleneck of this approach is that business users are only able to answer questions given the views of existing dashboa...

Knowledge Graph Toolkit (KGTK)

https://kgtk.readthedocs.io/en/latest/ KGTK represents KGs using TSV files with 4 columns labeled id, node1, label and node2. The id column is a symbol representing an identifier of an edge, corresponding to the orange circles in the diagram above. node1 represents the source of the edge, node2 represents the destination of the edge, and label represents the relation between node1 and node2. >> Quad do RDF, definir cada tripla como um grafo   KGTK defines knowledge graphs (or more generally any attributed graph or hypergraph ) as a set of nodes and a set of edges between those nodes. KGTK represents everything of meaning via an edge. Edges themselves can be attributed by having edges asserted about them, thus, KGTK can in fact represent arbitrary hypergraphs. KGTK intentionally does not distinguish attributes or qualifiers on nodes and edges from full-fledged edges, tools operating on KGTK graphs can instead interpret edges differently if they so desire. In KGTK, e...