Matteo Lissandrini, Torben Bach Pedersen, Katja Hose, and Davide Mottin. 2020. Knowledge graph exploration: where are we and where are we going? SIGWEB Newsl., Summer 2020, Article 4 (Summer 2020), 8 pages. https://doi.org/10.1145/3409481.3409485
ABSTRACT...
We speak of knowledge graph exploration as of the gradual discovery and understanding of the contents of a large and unfamiliar KG.
[Só consideraram KG em RDF]
In this paper, we present an overview of the state-of-the-art approaches for KG exploration. We divide them into three areas: profiling, search, and analysis and we argue that, while KG profiling and KG exploratory search received considerable attention, exploratory KG analytics is still in its infancy.
[A abordagem com mais oportunidades de pesquisa seria análise exploratória e não a de busca exploratória]
1. INTRODUCTION
These networks of rich connections among entities are called knowledge graphs (KGs) ... thus, the contents of these KGs have become less and less familiar even to domain experts and almost impenetrable to first-time users, calling for exploratory methods on graphs .... In this context, we speak of knowledge graph exploration .... as of the machine-assisted process of progressive analysis of the contents of a KG with the goal of
(1) understanding the structure and nature of the dataset at hand,
(2) identifying whether the dataset can satisfy the current information need or research question, and
(3) retrieving the portion of the dataset that is pertinent to an often vague and hard-to-express information need.
[Definição para KG exploration: análise progressiva apoiada por máquina do conteúdo de um KG com o objetivo de: (1) entender a estrutura e a natureza do KG; (2) identificar se o KG é uma fonte de dados para atender a necessidade de informação e (3) recuperar o subgrafo do KG que é relevante para atender a uma necessidade de informação inerentemente vaga e difícil de materializar em uma consulta.]
These goals are achieved through three main tasks: (i) summarization and profiling, (ii) exploratory search, and (iii) exploratory data analytics.
2. METHODS FOR KNOWLEDGE GRAPH EXPLORATION
(1) methods for KG profiling and summarization to distill the most important features and characteristics both of the structure and the contents of a KG;
(2) exploratory search methods for a gradual discovery and understanding of the items that are pertinent to a vague or underspecified information need; and
(3) techniques for exploratory analytics to distill salient features from different data subsets. 2.1 Profiling & Summarization
Data profiling ... it computes basic statistics. For instance, counting the number classes (e.g., Movie) and their instances or summarizing value distributions for specific attributes (e.g., averaging the release year). Their focus is then on frequencies and statistical measures.
[Estatísticas do KG]
Structural summarization .. and pattern mining approaches have been applied to KGs to facilitate understanding the structure of the data as well as to obtain concise representations of the most salient features of their contents. In general, KG summaries either (i) present a compact representation of the main features of the original graph; or (ii) define a new graph derived from the original graph.
[KG em resumo, é um novo KG]
Overall, these approaches require no specific domain knowledge and they return a high-level overview of the data. Thus, they are helpful in the initial exploratory stages since they can assist in evaluating whether a dataset matches the domain of interest, whether any data cleaning is required, and they can help in formulating initial research questions.
[Apresentar um resumo sobre os dados do grafo antes das atividades de busca exploratória]
[Exemplo meu: uma tag cloud por tipo de instância, o tamanho da tag é proporcional ao grau do nó]
[Isso poderia fazer parte da preparação do KG para Exploratory Search?]
2.2 Exploratory Search
... exploratory search instead delves into the data itself with the goal of retrieving specific portions of it that are relevant to the current information need (e.g., zooming-in to a subset of items of interest). Yet, contrary to traditional search, where the desired result is well-defined, exploratory search usually starts from a tentative query that hopefully leads to answers that are at least partially relevant and that provide cues for the next queries, ....
[Processo iterativo de refinamentos sucessivos enquanto se aprende mais sobre o domínio, sobre o KG e sobre a necessidade de informação em si]
Hence, exploratory queries change the traditional semantics of the search input: instead of a strict prescription of the desired result set, they provide a hint of what is relevant. This shift in semantics has led to (i) a number of methods following the search-by-example paradignm and (ii) methods and interfaces that help the user formalize their intent into a domain-specific query construct that is usually an expansion of the input .... Both have the common goal to overcome one of the main challenges in enabling exploratory search: to avoid complicated declarative languages (e.g., SPARQL) and at the same time retain the flexibility and expressiveness of such languages.
[A interface e a abordagem visam evitar o uso de linguagem específica de consulta ao KG]
[Mas isso não é uma preocupação da minha proposta já que vou assumir que "alguém" converteu uma consulta em linguagem natural ou outra abordagem em uma chamada de função / API]
Search-by-example methods receive as input a set of example members of the answer set .... The search system then infers the entire answer set based on the given examples and any additional information provided by the underlying database ... . This allows retrieving a set of entities similar to some entities of interest ..., or complex structures matching some relevant structure known by the user.
[Poderia usar cálculo de similaridade com embeddings do exemplo versus o conteúdo do KG]
Node and Entity search allows for automatic completion of a set of seed entities (persons, organizations, places). Example-based graph search works similarly to node search but requires a full example (a subgraph or a tuple) to be provided as input. For instance, it is possible to support by-example reverse engineering of (SPARQL) queries from example tuples ....
[Os exemplos de entrada são subgrafos, tuplas ou consultas SPARQL]
To further facilitate the user to formulate a query written in an unfamiliar language and over an unfamiliar dataset, different studies have proposed query suggestion and refinement techniques ... and graphical user interfaces .... Yet, while by-example methods allow for rather vague information needs, query formulation interfaces are designed to help users with a clear information need in writing (relatively simple) queries about specific entities.
[Sugestão e reformulação são mais adequadas quando o usuário tem uma visão mais clara da necessidade de informação]
Therefore, exploratory search approaches are particularly useful in the later stages of the exploration since they support the user in identifying specific entities, relationships, and structures of interests. They help in answering more fine-grained and specialized information needs but still take into account that the user is not familiar with the dataset. For this reason, particular focus is given to approximate methods ... and to query suggestion and query refinement techniques.
[Métodos que usam consultas aproximadas seriam mais adequados para busca exploratória]
2.3 Exploratory Analytics
Exploratory Analytics is an iterative, integrated process of data discovery and analytical querying on data which is not well known to the user, e.g., external data. The ability to support analytical workflows for rich KGs has recently [2014, 2015, recente?] received increased attention .... The idea is to provide functionalities typical of relational data warehouses, i.e., multi-dimensional analysis over knowledge graphs by describing multi-dimensional and statistical within the KG model ... All these approaches enable a similar approach: to obtain analytical insights on RDF graphs by means of “views” and aggregation operations.
[Criar visões e realizar operações de agregação]
[Visões e agregações poderiam ser com base no contexto também]
[Representar os pares cahve/valor de contexto das afirmações e identificar a quais contextos esses qualificadores pertencem é essencial para realizar operações de agregação por contexto e também visões de acordo com contextos, a metáfora da janela que circunscreve as afirmações]
Finally, outlier detection approaches identify elements that are interesting because they are very different from the rest of the elements ...
[Detecção de outlier]
In conclusion, exploratory analytics is effective to enable users to identify high-level details w.r.t. facets of the data tailored to specific user needs. In contrast, data summarization approaches are agnostic of the user’s information need and only provide a global overview of the data. On the other hand, exploratory search digs into specific data items (entities and relationships) but these searches return very large result sets instead of a more useful aggregate analysis identifying trends and common patterns. Hence, exploratory analytics techniques are a middle ground, where specific summarization methods are applied over large results of an exploratory search. Yet, current approaches usually mimic the same operators proposed for relational data, providing no graph-centric analyses. Moreover, in these approaches, either the user is required to be familiar with the (complex) query language, or the system is not able to accept any user input to customize the output. Thus, analytical approaches for KGs are currently missing the ability to reverse engineer analytical queries as well as to suggest appropriate query refinements based on user interactions.
3. FUTURE DIRECTIONS
Analyzing the state of the art (Figure 2), we identify 3 important research avenues for KG exploration: (1) example-based exploratory analytics methods, (2) enhanced interactivity and personalization through machine learning and active learning, and (3) KG exploration applied to the exploration of other datasets, e.g., documents and semantic data lakes.
Exploratory analytics should combine techniques from both summarization and exploratory search: on the one side, similar to the exploratory search case, the user can identify a (usually large) set of elements of interest. Then, these elements are not presented verbatim to the user, instead, data summarization and profiling techniques should be employed to extract context-specific insights
[Aplicar a busca exploratória e depois resumir o subgrafo resultante usando as técnicas]
Enhanced interactivity and personalization. Data exploration in general, and KG exploration in particular, is a process that cannot be disconnected from the specific user need. The two core tenets are interactivity and personalization. The two are tightly connected: during interaction with the user, the system can improve and learn more about the user needs to enable personalization. Machine learning and active search are a promising ground to learn user preferences from interactions and adapt to the user needs.
[Considerar o histórico de consultas do usuário. Não seria stateless e sim statefull]
Cross-domain applications. In recent years, KGs have proven highly effective to model heterogeneous data, by mapping entities and concepts that appear in different repositories to equivalent nodes in a KG. This characteristic facilitates data exchange through the integration of different datasets and data models within large and unstructured repositories of data, e.g., data lakes, in this case, denoted semantic data lakes. As such, a KG exploration process is paramount for cross-model and cross-domain exploration workflows. KGs also simplify and represent semantic connections between Web documents .... Hence, KG exploration techniques could assist the exploration of both Linked Open Data as well as Web documents seamlessly.
[KG + Documentos]
CEBIRI ́C, ˇS., GOASDOU ́E, F., KONDYLAKIS, H., KOTZINOS, D., MANOLESCU, I., TROULLINOU, G., AND ZNEIKA, M. 2019. Summarizing semantic graphs: a survey. The VLDB Journal 28, 3, 295–327.
LISSANDRINI, M., MOTTIN, D., PALPANAS, T., AND VELEGRAKIS, Y. 2020. Graph-query suggestions for knowledge graph exploration. In The Web Conference 2020. ACM, New York, USA, 2549–2555.
KGTK faz Data Profiling com degree, PageRank e HITS além de TOP relations
ResponderExcluirhttps://kgtk.readthedocs.io/en/latest/analysis/graph_statistics/