Pular para o conteúdo principal

Vague Queries

VAGUE QUERIES

https://youtu.be/7tmqQ-y-hNQ

Consultas vagas: Consultas que permitem resultados aproximados ao que se busca 

https://dl.acm.org/doi/pdf/10.1145/45945.48027

Utiliza métricas de distância e de similaridade para o resultado

1988

Requisitos

  • Simplicidade Conceitual
  • Adaptabilidade
  • Externalidade ao SGBD

Estende o modelo relacional com um único conceito: métrica de similaridade na linguagem de consulta, é um novo comparador

Usuário escolhe qual é a métrica de similaridade / distância

Externalidade para posterior incorporação (como o Daniel comentou que é usual em BD)

Interativo: pergunta ao usuário qual é a interpretação de similar, qual o critério de ordenação do resultado, se o usuário deseja flexibilizar mais a consulta (em caso não houver resultado)

Não é linguagem natural, é linguagem do BD (SQL) estendida

Extraído do Texto

A specific query establishes a rigid qualification and is concerned only with data that match it precisely. A vague query establishes a target qualification and is concerned also with data that are close to this target.

To determine similarity between data values we introduce the notion of distance. Each database domain is provided with a definition of distance between its values called duta metric.

Often, distances between values of a given domain may be measured according to various metrics. 

1.3.4 Query Constructors, Browsers, and Cooperative Interfaces.

In our model, vague queries are distinct from specific queries only by their “soft” selection qualification. Thus, the same level of expertise is required to issue specific or vague queries.

Another kind of vague request occurs when the user does not possess the knowledge required for formulating a proper query (this may be because the user is not familiar with the data model, the query language, the organization of the particular database, or because the user does not have a well-defined retrieval goal). This problem has been approached in two ways.
(1) Interactive query constructors help users crystalize their requests. A notable example is RABBIT [26], which applies a paradigm of repetitive reformulation of an initial goal. At each iteration in the construction process the user is presented with the answer to the current query. Having observed the answer, the user can then refine the query by critiquing it in one of several ways available.
(2) Browsers, such as TIMBER [22], SDMS [7], BAROQUE [15] or KIVIEW [18], provide users with a variety of features for exploratory searches. Often, the information is represented as a network, and the retrieval process is iterative. At each iteration the user is presented with information that corresponds to the current location on the network. The user can then issue a new command to advance the search in a particular direction. Elements of browsers are also present in the ME system [9]. The ME database is a network of files connected through links which represent weighted terms. A retrieval request is a set of terms, and a spreading activation process is used to match the files that are most relevant. As the user changes the terms of the query in one terminal window, the window that shows the matched files is updated dynamically.

 A user interface to databases that is capable of handling vague requests appears to be more “intelligent.” This is because answering questions with information that is only close to what was requested, or somehow related to it, is a common feature of human interaction. Such interaction is known as cooperative behavior, and there has been much focus on how to improve man-machine interaction by emulating such behavior through various techniques. Various cooperative interfaces (including those mentioned above) are discussed in [11]. Not surprisingly, this added intelligence is made possible by including additional semantic information in the database, namely distances.

In particular, we distinguish between attributes and domains. An attribute is a named column in a relation. A domain is a set of values (possibly infinite). Each attribute is associated with one domain. The domain contains all the values that may appear in that attribute.

Often, database domains are numerical, and the absolute value distance is satisfactory. Sometimes, although a domain is nonnumerical, its values are strictly ordered (for example, a domain RANK with values such as Exce11ent, Good, Fair, and Poor). Such domains are easily metricized by mapping the domain onto a range of integers (while preserving the order), using the absolute value metric to derive distances, and then storing the distances in a table. Metrics can also be derived from domain partitions. Assume that a domain can be partitioned into a collection of disjoint sets called clusters, each containing values that are judged to be similar. A tabular metric can then be defined as follows: All intracluster distances (distances between two values that are in the same cluster) are set to 0, and all intercluster distances (distances between two values in different clusters) are set to 1. This metric can be refined if a hierarchical partitioning of the domain is available (i.e., clusters are possibly partitioned into further subclusters). The metric is derived from the clusters at the bottom level (level 0). Again, the distance between two values that are in the same cluster is set to 0. The distance between two values that are not in the same cluster‘is set to the level of the cluster that contains both.

The extensions to the relational data model that have been described in this article should become an integral part of the database system. However, it is also possible to provide similar functionalities by constructing a simple system “on top” of existing database systems. The advantage of this approach is that it can also be implemented in cases in which the database system in use cannot be modified. 

In recent years (!) there has been much interest in issues regarding databases with incomplete information (for a review of this topic see [13, chap. 121). Incomplete information in metricized databases involves two new issues, first, how the availability of distances affects the conventional approaches to incomplete information, and, second, how to deal with incompleteness of the distance information itself.

1. BOLC, L., AND JARKE, M., Eds. Cooperative Interfaces to Information Systems. Topics in Information Systems, Springer-Verlag, Berlin, West Germany, 1986

11. KAPLAN, S. J. Cooperative responses from a portable natural language query system. Artif. Zntell. 19, 2 (Oct. 1982), 165-187.

13. MAIER, D. The Theory of Relational Databases. Computer Science Press, Rockville, Md., 1983.

 =======================================================================

https://www.vldb.org/conf/1990/P696.PDF

1996

Usa o VAGUE

4 variações: Cris Data, Crisp Query e Crisp Result .... Fuzzy Data, Fuzzy Query e Fuzzy Result

Linguagem VQL: mais conceitos adicionados ao SQL

Tem uma arquitetura para estender um SGBD

Extraído do texto

Instead of retrieving only a set of answers, our approach yields a ranking of objects from the database in response to a query. By using relevance judgements from the user about the objects retrieved, the ranking for the actual query as well as the overall retrieval quality of the system can be further improved.

On the other hand, handling of user requests that cannot be expressed in two-valued logic is difficult with current DBMSs

In all these applications, the query languages of current DBMSs offer little support. Mostly, users are forced to submit a series of queries in order to retrieve some objects that are possible solutions to their problem. Moreover, they often cannot be sure if they tried the query that retrieves the optimum solution.

For a vague query, a system based on our approach first will yield an initial ranking of possible answers. Then the user is asked to give relevance judgements for some of the answers, that is, he must decide whether an answer is an acceptable solution to his problem. From this relevance feedback data, the system can derive an improved ranking of the answers for the current request.

A second kind of weighting (called query condition weighting below) refers to the different criteria specified by the user, which may not be of equal importance for him.

=====================================================================

Conceitos antigos que podem ser úteis para a proposta .....

Comentários

Postagens mais visitadas deste blog

Connected Papers: Uma abordagem alternativa para revisão da literatura

Durante um projeto de pesquisa podemos encontrar um artigo que nos identificamos em termos de problema de pesquisa e também de solução. Então surge a vontade de saber como essa área de pesquisa se desenvolveu até chegar a esse ponto ou quais desdobramentos ocorreram a partir dessa solução proposta para identificar o estado da arte nesse tema. Podemos seguir duas abordagens:  realizar uma revisão sistemática usando palavras chaves que melhor caracterizam o tema em bibliotecas digitais de referência para encontrar artigos relacionados ou realizar snowballing ancorado nesse artigo que identificamos previamente, explorando os artigos citados (backward) ou os artigos que o citam (forward)  Mas a ferramenta Connected Papers propõe uma abordagem alternativa para essa busca. O problema inicial é dado um artigo de interesse, precisamos encontrar outros artigos relacionados de "certa forma". Find different methods and approaches to the same subject Track down the state of the art rese...

Embedding Logical Queries on Knowledge Graphs - Leitura de Artigo

William L. Hamilton, Payal Bajaj, Marinka Zitnik, Dan Jurafsky, Jure Leskovec: Embedding Logical Queries on Knowledge Graphs . NeurIPS 2018: 2030-2041 Abstract Learning low-dimensional embeddings of knowledge graphs is a powerful approach used to predict unobserved or missing edges between entities. However, an open challenge in this area is developing techniques that can go beyond simple edge prediction and handle more complex logical queries, which might involve multiple unobserved edges, entities, and variables. [ Link Prediction é a tarefa mais comum em GRL, é uma query do tipo <s, p, ?o> ou <s, ?p, o> ou <?s, p, o>, ou seja, Look up ou Existe <s, p, o> (ASK) ] For instance, given an incomplete biological knowledge graph, we might want to predict what drugs are likely to target proteins involved with both diseases X and Y? —a query that requires reasoning about all possible proteins that might interact with diseases X and Y. [ Query conjuntiva, BGP com join ...