Pular para o conteúdo principal

Avaliação de proposta - Exploração de KG

Casos de Uso (Aluno do Schwabe)

The criteria to select the cases were the following:

1. The case should be published as a difficult case in the area. Since the case is published, we infer that the case is a real problematic situation faced by a community of data users, with reasonable complexity.

[Não é inventado pelo pesquisador, não teria viés de construção para validar a proposta mas pode ter viés na seleção]

2. The case is difficult to be solved with operators in the state-of-the-art tools. The rationale for this criterion is that we can use the case studies to compare the expressivity of our model against state-of-the-art tools using the same tasks.

[Permitir comparação com outras ferramentas de exploração]

5.1.Case Study 1: Discovering Technological Trends

The changes in the technological landscape that can be identified by analyzing published patents in different time periods are observed by answering four main questions:
• Which industry fields have increased the level of attention throughout given periods?
• Which industry fields have decreased the level of attention throughout given periods?
• Which industry fields started to be addressed throughout given periods?
• Which industry fields stopped to be addressed throughout given periods?

5.2.Case Study 2: Evaluating a scientific paper

Consider a reviewer evaluating a scientific paper. In order to do so, the user can take the following strategy:
1. Analyze the age of the citations: the reviewer extracts the years of each citation and calculates, for example, the mean year;
2. Check the lack of citations to relevant publications: The reviewer can extract the keywords of the paper and issue a keyword search for related papers; Rank the articles by the number of incoming citations. Keep the first 20 articles; Differentiate the two sets and verify which ones are not in the bibliography of the paper;
3. Analyze the degree of "self-citations": the reviewer analyzes how self-referential is the paper. A self-citation can be either a citation of previous works of one of the authors or citations from authors of the same research group;
4. Evaluate if the paper fits to the scope of a venue: the reviewer might count the number of citations published in the same venue as an indicator of how adequate the paper is to the targeted venue.

The main goal of the reuse in this case is the transference of knowledge not only with regards to the results of the tasks but also concerning the resolutionprocesses. Therefore, new users can draw upon the experience of previous users to aid their task resolution strategies.

[O próprio processo de exploração pode ser um conhecimento a ser compartilhado]

5.3. Case Study 3: Summarizing Gene Clusters

The task consists in crossing the gene identifiers of the cluster with a bibliographical dataset in order to find terms that better describe the genes in the cluster. The strategy employed is the following:
1. A gene can have many identifiers. Therefore, the user tries to obtain all identifiers for each gene in the cluster;
2. Once having a more complete set of identifiers, the user tries to query a bibliographic dataset in order to find all publications that mention the gene identifiers;
3. From publications, find the terms that better describe the genes;
4. Rank the terms using specific ranking criteria designed to extract different information from the cluster

• The framework is useful for formally describing relatively complex exploration tasks of different domains of knowledge;
• The framework leverages analyzes of the task resolution process abstracting interface and interaction details. Therefore, the framework can be used as an epistemic tool for design decisions, where the designers could use it to devise alternative exploration paths and analyze which sequences is mostly indicated given the task execution context and user profile.
• Once the solution strategy employed can be formally represented, they can be shared and reused either in different scenario within the same domain or within different domains. Reuse in different domains will require adaptations of the schema used in the task, which can be achieved by parameterizations. Moreover, by representing exploration strategies formally, it is possible to audit the strategies for results validation purposes, which is of a great value for validating scientific results, for example

[O framework define operações genéricas e os estudos de caso mostraram como aplicar essas operações para atingir um objetivo]

Artigo Exploring KG for Exploratory Search (2014) 

Interface para navegar entre os documentos e o grafo. Duas tarefas de exploração: uma simples e outra complexa. Conjunto de documentos pré-selecionado, resultado do OIE avaliado e corrigido por especialista (mas isso não é possível em tempo de consulta). A tarefa simples é mais semelhante a Q&A e a complexa é um cenário de busca sem resultado exato definido. 

Complexa: 10 minutos e Qual é o posicionamento do Estado da Califórnia sobre pesquisas com células tronco?

Simples:  Tópico Doença de Lyme e algumas perguntas pontuais.

Questionários sobre o conhecimento prévio do assunto e o que aprenderam com as buscas. E registro do log de navegação na interface para avaliar os padrões de acesso ao grafo versus documentos.

Será que consigo: 

  1. (Pesquisador) Selecionar um benchmark de perguntas e respostas que use KG hiper-relacional;
  2. (Pesquisador) Agrupar perguntas deste benchmark sobre o mesmo assunto ou entidade;
  3. (Pesquisador) Analisar o esquema/resumo KG hiper-relacional para identificar dimensões contextuais de interesse e possíveis regras a serem aplicadas;
  4. (Pesquisador) Formular uma necessidade de informação que envolva aprender ou investigar um assunto ou entidade (ou mais de uma relacionada) encontrada no benchmark e também considerar as dimensões de contexto disponíveis;
  5. (Pesquisador) Apresentar tarefas de busca exploratória e hiper KG contextualizado aos usuários;
  6. (Usuário) Realizar a exploração do KG e registrar as consultas (com o respectivo tipo de respostas);
  7. (Pesquisador) Elaborar um questionário pós-tarefa;
  8. (Usuário) Responder ao questionário para testar os conhecimentos adquiridos;
  9. (Pesquisador) Analisar padrões de consulta;
  10. (Pesquisador) Analisar as respostas dos usuários à pesquisa.




Comentários

Postagens mais visitadas deste blog

Aula 12: WordNet | Introdução à Linguagem de Programação Python *** com NLTK

 Fonte -> https://youtu.be/0OCq31jQ9E4 A WordNet do Brasil -> http://www.nilc.icmc.usp.br/wordnetbr/ NLTK  synsets = dada uma palavra acha todos os significados, pode informar a língua e a classe gramatical da palavra (substantivo, verbo, advérbio) from nltk.corpus import wordnet as wn wordnet.synset(xxxxxx).definition() = descrição do significado É possível extrair hipernimia, hiponimia, antonimos e os lemas (diferentes palavras/expressões com o mesmo significado) formando uma REDE LEXICAL. Com isso é possível calcular a distância entre 2 synset dentro do grafo.  Veja trecho de código abaixo: texto = 'útil' print('NOUN:', wordnet.synsets(texto, lang='por', pos=wordnet.NOUN)) texto = 'útil' print('ADJ:', wordnet.synsets(texto, lang='por', pos=wordnet.ADJ)) print(wordnet.synset('handy.s.01').definition()) texto = 'computador' for synset in wn.synsets(texto, lang='por', pos=wn.NOUN):     print('DEF:',s

truth makers AND truth bearers - Palestra Giancarlo no SBBD

Dando uma googada https://iep.utm.edu/truth/ There are two commonly accepted constraints on truth and falsehood:     Every proposition is true or false.         [Law of the Excluded Middle.]     No proposition is both true and false.         [Law of Non-contradiction.] What is the difference between a truth-maker and a truth bearer? Truth-bearers are either true or false; truth-makers are not since, not being representations, they cannot be said to be true, nor can they be said to be false . That's a second difference. Truth-bearers are 'bipolar,' either true or false; truth-makers are 'unipolar': all of them obtain. What are considered truth bearers?   A variety of truth bearers are considered – statements, beliefs, claims, assumptions, hypotheses, propositions, sentences, and utterances . When I speak of a fact . . . I mean the kind of thing that makes a proposition true or false. (Russell, 1972, p. 36.) “Truthmaker theories” hold that in order for any truthbe

DGL-KE : Deep Graph Library (DGL)

Fonte: https://towardsdatascience.com/introduction-to-knowledge-graph-embedding-with-dgl-ke-77ace6fb60ef Amazon recently launched DGL-KE, a software package that simplifies this process with simple command-line scripts. With DGL-KE , users can generate embeddings for very large graphs 2–5x faster than competing techniques. DGL-KE provides users the flexibility to select models used to generate embeddings and optimize performance by configuring hardware, data sampling parameters, and the loss function. To use this package effectively, however, it is important to understand how embeddings work and the optimizations available to compute them. This two-part blog series is designed to provide this information and get you ready to start taking advantage of DGL-KE . Finally, another class of graphs that is especially important for knowledge graphs are multigraphs . These are graphs that can have multiple (directed) edges between the same pair of nodes and can also contain loops. The