Pular para o conteúdo principal

Data fusion by Luna Dong and Divesh Srivastava - Leitura de Artigos

Dong, X. L., Berti-Equille, L., & Srivastava, D. (2013). Data fusion: Resolving conflicts from multiple sources. Handbook of Data Quality: Research and Practice, 293–318. https://doi.org/10.1007/978-3-642-36257-6_13

[Na minha proposta não queremos resolver conflitos na camada do KG. A camada superior que vai explorar o KG (humano ou máquina) poderá aplicar as suas políticas de resolução de conflitos considerando o contexto das respostas fornecidas pela abordagem de melhor resposta possível] 

Abstract. Many data management applications, such as setting up Web portals, managing enterprise data, managing community data, and sharing scientific data, require integrating data from multiple sources. Each of these sources provides a set of values and different sources can often provide conflicting values.

... approach that finds true values from conflicting information when there are a large number of sources, among which some may copy from others. 

In addition to enabling the availability of useful information, the Web has also eased the ability to publish and spread false information across multiple sources. Widespread availability of conflicting information (some true, some false) makes it hard to separate the wheat from the chaff. Simply using the information that is asserted by the largest number of data sources (i.e., naive voting) is clearly inadequate since biased (and even malicious) sources abound, and plagiarism (i.e., copying without proper attribution) between sources may be widespread. Data fusion aims at resolving conflicts from different sources and find values that reflect the real world.

[O contexto permite que alegações aparentemente conflitantes, controversas, incongruentes entre si sejam representadas uma vez que a Verdade é Relativa e não Absoluta]

First, we often do not know a priori the trustworthiness of a source and that depends on how much of its provided data are correct, but the correctness of data, on the other hand, needs to be decided by considering the number and trustworthiness of the providers; thus, it is a chicken-and-egg problem.

[Proveniência é um tipo de contexto mas ainda temos o contexto temporal e espacial]

[Quais outras dimensões de contexto estão implícitas quando se busca a Verdade Absoluta?]

2.1 Data Fusion

Among different values provided for an object, one correctly describes the real world and is true, and the rest are false. ... Note that this problem definition focuses on static information that does not evolve over time, such as authors and publishers of books, and we refer our read- ers to [8] for data fusion for evolving values.

[Aqui inclui mais uma dimensão contextual que é a Temporal]

5 Related Work and Conclusions

Our work is closely related to Data Provenance, which has been a topic of research for a decade [4, 5]. Whereas research on data provenance is focused on how to represent and analyze available provenance information, our work on copy detection helps detect provenance and in particular copying relationships between dependent data sources.

[O contexto de proveniência está representado no KG e pode usar Ontologias]

8. Xin Luna Dong, Laure Berti-Equille, and Divesh Srivastava. 2009. Truth discovery and copying detection in a dynamic world. Proc. VLDB Endow. 2, 1 (August 2009), 562–573. https://doi.org/10.14778/1687627.1687691

ABSTRACT
... When these data sources model a dynamically changing world (e.g., people’s contact information changes over time, restaurants open and go out ofbusiness), sources often provide out-of-date data. Errors can also creep into data when sources are updated often. Given out-of-date and erroneous data provided by different, possibly dependent, sources, it is challenging for data integration systems to provide the true values.

[A informação contextualizada no tempo não poderia ser considerada errada se a Verdade for Relativa. Só está sendo vista como errada ou desatualizada pq implicitamente o contexto Temporal Corrente é o único contexto aceito para a Verdade Absoluta]

1. INTRODUCTION

Modern information management applications often require integrating data from a variety of data sources.

[KG também podem ser construídos usando várias fontes de dados. A abordagem de construção pode ter como premissa a resolução destes conflitos em tempo de criação do KG (a priori) ou a coexistências destes conflitos em potencial desde que o contexto de cada informação seja preservado. Somente no segundo caso é possível deixar para a camada superior a resolução em tempo de consulta (a posteriori) e é possível acrescer novas regras de resolução após criado o KG]

First, true values can evolve over time and in many applications we are interested in the whole history or a fragment of the history of true values for particular items (e.g., a person’s addresses in the past five years, the history of a customer’s billing information, and the previous chairs of an organization).

....

7. RELATED WORK

There are three bodies ofwork related to our research: truth discovery, copying detection and data freshness. Recent work on truth discovery considers a snapshot of data (surveyed in [6]). We consider discovering the whole life span of an object from history of source updates and we use more fine-grained source-quality measures: coverage, exactness, and freshness.

[Estas métricas poderiam ser incluídas no contexto de proveniência como atributos das fontes de dados em tempo de criação do KG e recuperadas nas respostas para que a camada de confiança aplique as políticas que achar convenientes]

8. CONCLUSIONS

For future work, one direction is to apply our techniques in Web 2.0 applications to identify sources or users that are trustable. 

[Classificar usuários/perfis de redes sociais como confiáveis ou não]

Another direction is to optimize query answering in data integration with knowledge of source quality and dependence.

[Aqui o sistema resolve os conflitos de acordo com os próprios critérios de confiabilidade para dar a resposta mas será que atende a todos os casos? ]

Outras referências

1. D. Artz and Y. Gil. A survey of trust in computer science and the semantic web. Journal ofWeb Semantics, 5(2), 2010.

11. X. Li, X. L. Dong, K. B. Lyons, W. Meng, and D. Srivastava. Truth finding on the deep web: Is the problem solved? PVLDB, 6(2), 2013.

Comentários

Postagens mais visitadas deste blog

Connected Papers: Uma abordagem alternativa para revisão da literatura

Durante um projeto de pesquisa podemos encontrar um artigo que nos identificamos em termos de problema de pesquisa e também de solução. Então surge a vontade de saber como essa área de pesquisa se desenvolveu até chegar a esse ponto ou quais desdobramentos ocorreram a partir dessa solução proposta para identificar o estado da arte nesse tema. Podemos seguir duas abordagens:  realizar uma revisão sistemática usando palavras chaves que melhor caracterizam o tema em bibliotecas digitais de referência para encontrar artigos relacionados ou realizar snowballing ancorado nesse artigo que identificamos previamente, explorando os artigos citados (backward) ou os artigos que o citam (forward)  Mas a ferramenta Connected Papers propõe uma abordagem alternativa para essa busca. O problema inicial é dado um artigo de interesse, precisamos encontrar outros artigos relacionados de "certa forma". Find different methods and approaches to the same subject Track down the state of the art rese...

Knowledge Graph Embedding with Triple Context - Leitura de Abstract

  Jun Shi, Huan Gao, Guilin Qi, and Zhangquan Zhou. 2017. Knowledge Graph Embedding with Triple Context. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management (CIKM '17). Association for Computing Machinery, New York, NY, USA, 2299–2302. https://doi.org/10.1145/3132847.3133119 ABSTRACT Knowledge graph embedding, which aims to represent entities and relations in vector spaces, has shown outstanding performance on a few knowledge graph completion tasks. Most existing methods are based on the assumption that a knowledge graph is a set of separate triples, ignoring rich graph features, i.e., structural information in the graph. In this paper, we take advantages of structures in knowledge graphs, especially local structures around a triple, which we refer to as triple context. We then propose a Triple-Context-based knowledge Embedding model (TCE). For each triple, two kinds of structure information are considered as its context in the graph; one is the out...

KnOD 2021

Beyond Facts: Online Discourse and Knowledge Graphs A preface to the proceedings of the 1st International Workshop on Knowledge Graphs for Online Discourse Analysis (KnOD 2021, co-located with TheWebConf’21) https://ceur-ws.org/Vol-2877/preface.pdf https://knod2021.wordpress.com/   ABSTRACT Expressing opinions and interacting with others on the Web has led to the production of an abundance of online discourse data, such as claims and viewpoints on controversial topics, their sources and contexts . This data constitutes a valuable source of insights for studies into misinformation spread, bias reinforcement, echo chambers or political agenda setting. While knowledge graphs promise to provide the key to a Web of structured information, they are mainly focused on facts without keeping track of the diversity, connection or temporal evolution of online discourse data. As opposed to facts, claims are inherently more complex. Their interpretation strongly depends on the context and a vari...