Truth and Trust on the Web

Referências extraídas do artigo sobre controvérsias na WD indicado pelo professor Altigran na defesa de proposta

D. Artz and Y. Gil. A survey of trust in computer science and the semantic web. Journal ofWeb Semantics, 5(2), 2010.

Abstract
Trust is an integral component in many kinds of human interaction, allowing people to act under uncertainty and with the risk of negative consequences. For example, exchanging money for a service, giving access to your property, and choosing between conflicting sources of information all may utilize some form of trust. In computer science, trust is a widely used term whose definition differs among researchers and application areas. Trust is an essential component of the vision for the Semantic Web, where both new problems and new applications of trust are being studied.
This paper gives an overview of existing trust research in computer science and the Semantic Web

[Confiar a ponto de agir, mesmo assumindo riscos]

1. Introduction
Trust is a central component of the Semantic Web vision [1–3]. The Semantic Web stack [3,4] has included all along a trust layer to assimilate the ontology, rules, logic, and proof layers. Trust often refers to mechanisms to verify that the source of information is really who the source claims to be.

[Pilha da Web Semântica]

In addition, proofs should provide a tractable way to verify that a claim is valid. In this sense, any information provider should be able to supply upon request a proof that can be easily checked that certifies the origins of the information, rather than expect consumers to have to generate those proofs themselves through a computationally expensive process. The web motto “Anyone can say anything about anything” makes the web a unique source of information, but we need to be able to understand where we are placing our trust. Trust has another important role in the Semantic Web, as agents and automated reasoners need to make trust judgements when alternative sources of information are available.

[A Cama de Confiança iria aplicar as políticas e regras de confiança para dar suporte ao usuário no processo de decisão que ocorre na camada superior. Mas para estas políticas e regras é necessário adicionar informações de contexto]

These trust judgements are made by humans based on their prior knowledge about a source’s perceived reputation, or past personal experience about its quality relative to other alternative sources they may consider. Humans also bring to bear vast amounts of knowledge about the world they live in and the humans that populate the web with information about it.

[Na tarefas cotidianas é o agente humano quem aplica suas políticas para selecionar a informação mais confiável para tomada de decisão]

Reasoners will need to judge which of the many information sources available, at times contradicting one another, are more adequate for answering a question.

Trust is not a new research topic in computer science, spanning areas as diverse as security and access control in computer networks, reliability in distributed systems, game theory and agent systems, and policies for decision making under uncertainty. The concept of trust in these different communities varies in how it is represented, computed, and used. While trust in the Semantic Web presents unique challenges, prior work in these areas is relevant and should be the basis for future research.

2. Modeling and reasoning about trust

A unifying theme is that trust is only worth modeling when there is a possibility of deception, that is, when there is a chance of a different outcome than what is expected or has been agreed upon.

[Em sistemas de suporte a decisão]

Two common ways of determining trust are through using policies or reputation. We adopt these categories from Bonatti et al. [8], as they best describe the distinction we observe between the “hard evidence” used in policies, and the estimation of trust used in reputation systems. Policies describe the conditions necessary to obtain trust, and can also prescribe actions and outcomes if certain conditions are met. Policies frequently involve the exchange or verification of credentials, which are information issued (and sometimes endorsed using a digital signature) by one entity, and may describe qualities or features of another entity.

Trust in information resources. Trust is an increasingly common theme in Web related research regarding whether Web resources and Web sites are reliable. Moreover, trust on the Web has its own range of varying uses and meanings, including capturing ratings from users about the quality of information and services they have used, how web site design influences trust on content and content providers, propagating trust over links, etc. With the advent of the Semantic Web, new work in trust is harnessing both the potential gained from machine understanding, and addressing the problems of reliance on the content available in the web so that agents in the Semantic Web can ultimately make trust decisions autonomously. Provenance of information is key to support trust decisions, as is automated detection of opinions as distinct from objective information.

[Confiar na Proveniência, mas não só na fonte mas na forma que o dado foi calculado ou obtido]

All of these approaches to computation over a web of trust do not consider context, and as a result do not differentiate between “topic specific trust” and referral trust. In contrast, Ding et al. [60], presents a method of computing within a web of trust that also considers the domain of knowledge (context), and does so separately from referral trust. This work enumerates several kinds of referral (trust in ability to recommend) and associative (two agents being similar) trust as a result: domain expert (trust in an agent’s domain knowledge), recommendation expert (trust in an agent’s ability to refer other agents), similar trusting (two agents having similar trust in other agents), and similar cited (two agents being similarly trusted by other agents).

[Contexto aqui é somente o domínio]

5.1. General considerations and properties of trust

Several papers in social sciences, similar to this survey, have put forth an interpretation of existing research in trust. A frequently cited work is Mcknight and Chervany [64], which is noted for its effort to integrate existing work and for its resulting classification of types of trust. The goal of this work was to highlight and find common ground between the many different uses of the word “trust” in social sciences research. Of key importance, are the four qualities that McKnight and Chervany identify as being significant when making a trust decision: competence (ability to give accurate information), benevolence (willingness to expend the effort), integrity (adherence to honest behavior), and predictability (evidence to support that the desired outcome will occur).

[Pela ótica das Ciências Sociais]

5.2. Computational and online trust models

The widely cited 1994 Ph.D. dissertation by Marsh [68] is considered the first prominent, comprehensive, formal, computational model of trust. His intent was to address “an imperfect understanding, a plethora of definitions, and informal use in the literature and in everyday life” with regard to trust. Marsh proposed a set of (subjectively set) variables, and a way to combine them to arrive at one continuous value of trust in the range [−1,1]. While the intuitive explanation of this range may be complete distrust to full trust, Marsh actually argues against these meanings at the extremes, saying neither full trust or distrust is actually possible. Marsh identified three types of trust: basic, over all contexts; general, between two people and all their contexts occurring together; and situational, between two people in a specific context. In addition to context, Marsh also identified time as being relevant to each of the variables used to comprise trust.

[Confiança seria em uma escala entre -1 e 1 mas o Daniel diz que é 0 ou 1 se considerarmos a ação: Age com base na Informação ou NÂO age com base na Informação]

A key point presented is that simply performing a task is not the same as providing good service or being high quality, which is a problem with automated reputation systems that fail to capture this subtle difference. Also made prominent is the idea that people trust people, not technology, which itself earns (or loses) our trust as an extension of trust in people.

[Modelos de Linguagem conversam como humanos, humanos acreditariam mais ?]

6. Trust in information resources

6.2. Trust concerns on the Semantic Web

Declaring that there is more to trust than reputation, Bizer and Oldakowski [84] make several claims with the Semantic Web in mind. First, any statements contained in the Semantic Web must be considered as claims rather than facts until trust can be established. Second, this work makes the case that it is too much of a burden to provide trust information that is current. Third, context-based trust matters; in this case, context refers to the circumstances and associations of the target of the trust decision

6.6. Subjectivity analysis

Although information retrieval pioneered some of the approaches used now on the Web for locating relevant information sources, trust-based retrieval is a relatively recent focus in that area of research. Trust in information retrieval is motivated by the need for not just relevant documents, but high-quality documents as well [96]. One approach to this is subjectivity analysis, which aims at distinguishing true facts from subjective opinions [97].

Trust is also an important area in question answering, since contradictory answers can be obtained from diverse sources in answer to a question. Sometimes opinions are often filtered out in question answering tasks so that only objective facts are returned as answers [98]. In other contexts, detecting opinions is useful when no single ground truth can be provided in answer to a question, and instead multiple perspectives are summarized as the answer provided [99].

[Além de Sistemas de Suporte a Decisão, confiança também seria necessária em sistemas de recuperação da informação, perguntas e respostas além de sistemas de recomendação]

X. Li, X. L. Dong, K. B. Lyons, W. Meng, and D. Srivastava. Truth finding on the deep web: Is the problem solved? PVLDB, 6(2), 2013.

Xin Luna Dong & Divesh Srivastava

ABSTRACT

The amount of useful information available on the Web has been growing at a dramatic pace in recent years and people rely more and more on the Web to fulfill their information needs.

[Utilidade: só é útil se for realmente usado para tomar decisão]

In this paper, we study truthfulness of Deep Web data in two domains where we believed data are fairly clean and data quality is important to people’s lives: Stock and Flight.

[Decisões que impactam a vida das pessoas: financeira e risco de morte]

We further applied on these two data sets state-of-the-art data fusion methods that aim at resolving conflicts and finding the truth, analyzed their strengths and limitations, and suggested promising research directions.

[Resolver o conflito nos dados para encontrar A VERDADE mas só existe uma Verdade? E se a Verdade for Subjetiva (não no sentido de opinião) mas no sentido de avaliar a utilidade de acordo com quem vai tomar a decisão?]

We wish our study can increase awareness of the seriousness of conflicting data on the Web and in turn inspire more research in our community to tackle this problem

[É um importante problema mas será que resolver os conflitos no nível dos dados é a melhor abordagem? E se os critérios de resolução estiverem carregados de viés?]

1. INTRODUCTION

Compared with traditional media, information on the Web can be published fast, but with fewer guarantees on quality and credibility. While conflicting information is observed frequently on the Web, typical users still trust Web data.

[E KGs estão sendo construídos de modo automático com base nestas informações ou então sendo alimentados pelas pessoas com a mesma facilidade que publicam informações na Web]

Even for these domains that most people consider as highly reliable, we observed a large amount of inconsistency: for 70% data items more than one value is provided. Among them, nearly 50% are caused by various kinds of ambiguity, although we have tried our best to resolve heterogeneity over attributes and instances; 20% are caused by out-of-date data; and 30% seem to be caused purely by mistakes. Only 70% correct values are provided by the majority of the sources (over half of the sources); and over 10% of them are not even provided by more sources than their alternative values are. Although well-known authoritative sources, such as Google Finance for stock and Orbitz for flight, often have fairly high accuracy, they are not perfect and often do not have full coverage, so it is hard to recommend one as
the “only” source that users need to care about.

[Valores desatualizados ou que não deixavam claro de qual data de referência se tratavam? As fontes era de boa reputação de acordo com quem?]

2. PROBLEM DEFINITION AND DATA SETS

The distribution observes Zipf’s law; that is, only a small portion of attributes have a high coverage and most of the “tail” attributes have a low coverage ... n both domains we observe that the distributions of the attributes observe Zipf’s Law and only a small percentage of attributes are popular among all sources.

[Aspecto da Cauda Longa]

3. WEB DATA QUALITY

(2) Out-of-date data causes 11% of the inconsistency; for example, even when a flight is already canceled, a source might still report its actual departure and arrival time (the latter is marked as “estimated”).

[Valores Verdadeiros em um determinado tempo (referência) t0 mas que deveria ter sido acrescentados de novos Valores Verdadeiros em t1. KGs Temporais (assim como BDs Temporais) também seguem esta abordagem. E a decisão tomada em t1 seria válida para os valores vigentes naquele momento]

4. DATA FUSION

Data fusion aims at resolving conflicts and finding the true values. A basic fusion strategy that considers the dominant value (i.e., the value with the largest number of providers) as the truth works well when the dominant value is provided by a large percentage of sources (i.e., a high dominance factor), but fails quite often otherwise.

[Na nossa abordagem não iremos resolver os conflitos na camada de dados e sim apresentar os dados, com conflitos em potencial se houverem, para a Camada de Confiança que aplicaria as regras e políticas para informar a Camada de Decisão qual informação é mais confiável]

4.1 Review of data-fusion methods

In our data collections each source provides at most one value on a data item and each data item is associated with a single true value. We next review existing fusion methods suitable for this context. Before we jump into descriptions of each method, we first enumerate the many insights that have been considered in fusion.

• Number of providers: A value that is provided by a large number of sources is considered more likely to be true.
• Trustworthiness of providers: A value that is provided by trustworthy sources is considered more likely to be true.
• Difficulty of data items: The error rate on each particular data item is also considered in the decision.
• Similarity of values: The provider of a value v is also considered as a partial provider of values similar to v.
• Formatting of values: The provider of a value v is also considered as a partial provider of a value that subsumes v. For example, if a source typically rounds to million and provides “8M”, it is also considered as a partial provider of “7,528,396”.
• Popularity of values: Popularity of wrong values is considered in the decision.
• Copying relationships: A copied value is ignored in the decision

All fusion methods more or less take a voting approach; that is, accumulating votes from providers for each value on the same data item and choosing the value with the highest vote as the true one. The vote count of a source is often a function of the trustworthiness of the source. Since source trustworthiness is typically unknown a priori, they proceed in an iterative fashion: computing value vote and source trustworthiness in each round until the results converge

5. FUTURE RESEARCH DIRECTIONS

Improving evaluation: ... One major challenge in evaluation is to construct the gold standard. In our experiments our gold standards trust data from certain sources, but as we observed, this sometimes puts wrong values or coarse-grained values in the gold standard. Can we improve gold standard construction
and can we capture our uncertainty for some data items in the gold standard?

[Padrão Ouro é assumir Verdade Absoluta? Ou ainda seria uma Alegação com Contexto para comparabilidade?]

Pesquisa de Doutorado da Veronica

Pesquisar este blog