Data fusion by Luna Dong and Divesh Srivastava

Data fusion by Luna Dong and Divesh Srivastava - Leitura de Artigos

Dong, X. L., Berti-Equille, L., & Srivastava, D. (2013). Data fusion: Resolving conflicts from multiple sources. Handbook of Data Quality: Research and Practice, 293–318. https://doi.org/10.1007/978-3-642-36257-6_13

[Na minha proposta não queremos resolver conflitos na camada do KG. A camada superior que vai explorar o KG (humano ou máquina) poderá aplicar as suas políticas de resolução de conflitos considerando o contexto das respostas fornecidas pela abordagem de melhor resposta possível]

Abstract. Many data management applications, such as setting up Web portals, managing enterprise data, managing community data, and sharing scientific data, require integrating data from multiple sources. Each of these sources provides a set of values and different sources can often provide conflicting values.

... approach that finds true values from conflicting information when there are a large number of sources, among which some may copy from others.

In addition to enabling the availability of useful information, the Web has also eased the ability to publish and spread false information across multiple sources. Widespread availability of conflicting information (some true, some false) makes it hard to separate the wheat from the chaff. Simply using the information that is asserted by the largest number of data sources (i.e., naive voting) is clearly inadequate since biased (and even malicious) sources abound, and plagiarism (i.e., copying without proper attribution) between sources may be widespread. Data fusion aims at resolving conflicts from different sources and find values that reflect the real world.

[O contexto permite que alegações aparentemente conflitantes, controversas, incongruentes entre si sejam representadas uma vez que a Verdade é Relativa e não Absoluta]

First, we often do not know a priori the trustworthiness of a source and that depends on how much of its provided data are correct, but the correctness of data, on the other hand, needs to be decided by considering the number and trustworthiness of the providers; thus, it is a chicken-and-egg problem.

[Proveniência é um tipo de contexto mas ainda temos o contexto temporal e espacial]

[Quais outras dimensões de contexto estão implícitas quando se busca a Verdade Absoluta?]

2.1 Data Fusion

Among different values provided for an object, one correctly describes the real world and is true, and the rest are false. ... Note that this problem definition focuses on static information that does not evolve over time, such as authors and publishers of books, and we refer our read- ers to [8] for data fusion for evolving values.

[Aqui inclui mais uma dimensão contextual que é a Temporal]

5 Related Work and Conclusions

Our work is closely related to Data Provenance, which has been a topic of research for a decade [4, 5]. Whereas research on data provenance is focused on how to represent and analyze available provenance information, our work on copy detection helps detect provenance and in particular copying relationships between dependent data sources.

[O contexto de proveniência está representado no KG e pode usar Ontologias]

8. Xin Luna Dong, Laure Berti-Equille, and Divesh Srivastava. 2009. Truth discovery and copying detection in a dynamic world. Proc. VLDB Endow. 2, 1 (August 2009), 562–573. https://doi.org/10.14778/1687627.1687691

ABSTRACT
... When these data sources model a dynamically changing world (e.g., people’s contact information changes over time, restaurants open and go out ofbusiness), sources often provide out-of-date data. Errors can also creep into data when sources are updated often. Given out-of-date and erroneous data provided by different, possibly dependent, sources, it is challenging for data integration systems to provide the true values.

[A informação contextualizada no tempo não poderia ser considerada errada se a Verdade for Relativa. Só está sendo vista como errada ou desatualizada pq implicitamente o contexto Temporal Corrente é o único contexto aceito para a Verdade Absoluta]

1. INTRODUCTION

Modern information management applications often require integrating data from a variety of data sources.

[KG também podem ser construídos usando várias fontes de dados. A abordagem de construção pode ter como premissa a resolução destes conflitos em tempo de criação do KG (a priori) ou a coexistências destes conflitos em potencial desde que o contexto de cada informação seja preservado. Somente no segundo caso é possível deixar para a camada superior a resolução em tempo de consulta (a posteriori) e é possível acrescer novas regras de resolução após criado o KG]

First, true values can evolve over time and in many applications we are interested in the whole history or a fragment of the history of true values for particular items (e.g., a person’s addresses in the past five years, the history of a customer’s billing information, and the previous chairs of an organization).

....

7. RELATED WORK

There are three bodies ofwork related to our research: truth discovery, copying detection and data freshness. Recent work on truth discovery considers a snapshot of data (surveyed in [6]). We consider discovering the whole life span of an object from history of source updates and we use more fine-grained source-quality measures: coverage, exactness, and freshness.

[Estas métricas poderiam ser incluídas no contexto de proveniência como atributos das fontes de dados em tempo de criação do KG e recuperadas nas respostas para que a camada de confiança aplique as políticas que achar convenientes]

8. CONCLUSIONS

For future work, one direction is to apply our techniques in Web 2.0 applications to identify sources or users that are trustable.

[Classificar usuários/perfis de redes sociais como confiáveis ou não]

Another direction is to optimize query answering in data integration with knowledge of source quality and dependence.

[Aqui o sistema resolve os conflitos de acordo com os próprios critérios de confiabilidade para dar a resposta mas será que atende a todos os casos? ]

Outras referências

1. D. Artz and Y. Gil. A survey of trust in computer science and the semantic web. Journal ofWeb Semantics, 5(2), 2010.

11. X. Li, X. L. Dong, K. B. Lyons, W. Meng, and D. Srivastava. Truth finding on the deep web: Is the problem solved? PVLDB, 6(2), 2013.

Pesquisa de Doutorado da Veronica

Pesquisar este blog

Data fusion by Luna Dong and Divesh Srivastava - Leitura de Artigos

Marcadores

Comentários

Postar um comentário

Postagens mais visitadas deste blog

Aprendizado de Máquina Relacional

Connected Papers: Uma abordagem alternativa para revisão da literatura

Knowledge graphs: Introduction, history, and perspectives - Leitura de Artigo