Pular para o conteúdo principal

THE OPEN WORLD ASSUMPTION: ELEPHANT IN THE ROOM - Leitura de artigo

THE OPEN WORLD ASSUMPTION: ELEPHANT IN THE ROOM 

author:Mike Bergman

description:In speaking of the semantic Web, it is not infrequent that the open world assumption (OWA) gets mentioned. What this post argues is that this somewhat obscure concept may hold within it the key as to why there have been decades of too-frequent failures in the enterprise in business intelligence, data warehousing, data integration and federation, and knowledge management

Fonte https://www.mkbergman.com/852/the-open-world-assumption-elephant-in-the-room/

The main argument is that the closed world assumption (CWA) and its prevalent mindset in traditional database systems have hindered the ability of enterprises and the vendors that support them to adopt incremental, low-risk means to knowledge systems and management. CWA, in turn, has led to over-engineered schema, too complicated architectures and massive specification efforts that have led to high deployment costs, blown schedules and brittleness. 

[CWA seria motivo do fracasso de projetos em integração de dados ]

Relational Approach - Closed World Assumption (CWA)

That which is not known to be true is presumed to be false; it needs to be explicitly stated as true. Negation as failure (NAF) is a related assumption, since it assumes as false every predicate that cannot be proven to be true. Under CWA, any statement not known to be true is false. Everything is prohibited until it is permitted.

(Open) Semantic Web Approach - Open World Assumption (OWA)

The lack of a given assertion or fact being available does not imply whether that possible assertion is true or false: it simply is not known. In other words, lack of knowledge does not imply falsity.
Everything is permitted until it is prohibited.

Relational Approach - Unique Name Assumption (UNA)

The unique name assumption (UNA) is premised that different names always refer to different entities in the world.

(Open) Semantic Web Approach - Duplicate Labels Allowed

OWL allows different synonym labels to be used for the same object; same names may refer to different objects. Identity assertions must be explicitly stated.

[SameAs nas Ontologias]

Relational Approach - Complete Information

The data system at hand is assumed to be complete. (Missing information is often handled via the null statement in SQL, but that has been controversial and contentious in its own right.) This is also known s the domain-closure assumption.

(Open) Semantic Web Approach - Incomplete Information

A central tenet of OWA is that information is incomplete. A corollary is that the attributes of specific objects or instances may also be incomplete or partially known.

[KBs são incompletos por essência]

Relational Approach -  Single Schema (one world)

A single schema is necessary to define the scope and interpretation of the world (domain at hand).

(Open) Semantic Web Approach - Many World Interpretations

Schema and data instance assertions are kept separate. Multiple interpretations (worlds) for the same data are possible.

[Schemafull x Schemaless]

Relational Approach -  Integrity Constraints

Integrity constraints prevent “incorrect” values from being asserted in the relational model. It is useful for validation/parsing/data input and is related to the single model that contains only the facts asserted. Strict cardinality is used for checking validation.

(Open) Semantic Web Approach - Logical Axioms (restrictions) 

Logical axioms provide restrictions through property domains and ranges. Everything can be true unless proven otherwise, and multiple possible models can satisfy the axioms. This provides more powerful inferencing, though can also be unintuitive at times. Cardinality and range restrictions exhibit different behavior for objects (inferred) or datatypes.

[Early bind x Late bind]

Relational Approach -  Non-monotonic Logic

The set of conclusions warranted on the basis of a given knowledge base does not increase (in fact, it likely shrinks) with the size of the knowledge base [5].

(Open) Semantic Web Approach - Monotonic Logic

The hypotheses of any derived fact may be freely extended with additional assumptions. Additional assertions tend to reduce the inferences or entailments that can be applied. A new piece of knowledge cannot reduce what is known [5]. New knowledge can arise through inference.

[Inferência, Dedução]

Relational Approach - Fixed and Brittle

Changing the schema requires re-architecting the database; not inherently extensible.

(Open) Semantic Web Approach - Reusable and Extensible

Designed from the ground up to reuse existing ontologies (axioms) and to be extensible. Database design and management can be more agile, with schema evolving incrementally.

[Reusar ontologias, integrar com fontes que usam essas ontologias

Relational Approach -  Flat Structure; Strong Typing

Information organized into flat tables; linkages and connections between tables based on foreign keys or joins. Strong data typing orientation.

(Open) Semantic Web Approach - Graph Structure; Open Typing 

Inherent graph structure, supporting of linkage and connectivity analysis. Datatypes are inherently loose, though axioms can add strong types. Datatypes treated in the same way as classes, and datatype values are treated in the same way as individual identiers (i.e., a data value is treated as referring to an object).

[Tipo de dados

Relational Approach -  Querying and Tooling

SQL and query optimizers well developed. Tooling well developed. Disjunction not supported; negation must be accommodated through approaches such as NAF. Sums and counts are easier due to unique name premise. Answer closure (one answer passable to a next calculation) is easier than OWA. Most tools are not suitable for any arbitrary schema.

(Open) Semantic Web Approach - Querying and Tooling

SPARQL and emerging rule languages used for querying; performance at scale and with broad distribution a concern. Queries require contextual information for proper set selection. Negation and disjunction are allowed and are powerful constructs. Tools generally less developed. Exciting opportunities for ontology-driven applications working against a small set of generic tools. 

[SQL x SPARQL - maturidade e disseminação]

The number of negative facts about a given domain is typically much greater than the number of the positive ones. So, in many bounded applications, the number of negative facts is so large that their explicit representation can become practically impossible [7]. In such cases, it is simpler and shorter to state known “true” statements than to enumerate all “false” conditions.

[CWA o que não está na relação, está no complemento da relação e não precisa ser representado explicitamente]

However, the relational model is a paradigm where the information must be complete and it must be described by a single schema. ... This makes CWA and its related assumptions a very poor choice when attempting to combine information from multiple sources, to deal with uncertainty or incompleteness in the world, or to try to integrate internal, proprietary information with external data.

OWA allows suppliers without cities and names to be stored along alongside suppliers with that information. ... Duplicate checking now occurs based on the logic of the system and not unique name evaluations.

  • Knowledge is never complete
  • Knowledge is found in structured, semi-structured and unstructured forms
  • Knowledge can be found anywhere
  • Knowledge structure evolves with the incorporation of more information
  • Knowledge is contextualthe importance or meaning of given information changes by perspective and context. Further, exactly the same information may be used differently or given different importance depending on circumstance. Still further, what is important to describe (the “attributes”) about certain information also varies by context and perspective. Large knowledge management initiatives that attempt to use the relational model and single perspectives or schema to capture this information are doomed in one of two ways: either they fail to capture the relevant perspectives of some users; or they take forever and massive dollars and effort to embrace all relevant stakeholders’ contexts
  • Knowledge should be coherent
  • Knowledge is about connections
  • Knowledge is about its users defining its structure and use — since knowledge is a state of understanding by practitioners and experts in a given domain, it is also important that those very same users be active in its gathering, organization (structure) and use.

Open world is simply a way to think about the information we have and how we act on it. OWA technologies are neutral to the question of open or public sources.

Thus, open world frameworks provide some incredibly important benefits for knowledge management applications in the enterprise:
• Domains can be analyzed and inspected incrementally
Schema can be incomplete and developed and refined incrementally
• The data and the structures within these open world frameworks can be used and expressed in a
piecemeal or incomplete manner
• We can readily combine data with partial characterizations with other data having complete
characterizations
• Systems built with open world frameworks are flexible and robust; as new information or structure is
gained, it can be incorporated without negating the information already resident, and
• Open world systems can readily bridge or embrace closed world subsystems.

There are also questions about performance and scalability with open semantic technologies.


Comentários

Postagens mais visitadas deste blog

Connected Papers: Uma abordagem alternativa para revisão da literatura

Durante um projeto de pesquisa podemos encontrar um artigo que nos identificamos em termos de problema de pesquisa e também de solução. Então surge a vontade de saber como essa área de pesquisa se desenvolveu até chegar a esse ponto ou quais desdobramentos ocorreram a partir dessa solução proposta para identificar o estado da arte nesse tema. Podemos seguir duas abordagens:  realizar uma revisão sistemática usando palavras chaves que melhor caracterizam o tema em bibliotecas digitais de referência para encontrar artigos relacionados ou realizar snowballing ancorado nesse artigo que identificamos previamente, explorando os artigos citados (backward) ou os artigos que o citam (forward)  Mas a ferramenta Connected Papers propõe uma abordagem alternativa para essa busca. O problema inicial é dado um artigo de interesse, precisamos encontrar outros artigos relacionados de "certa forma". Find different methods and approaches to the same subject Track down the state of the art rese...

Knowledge Graph Embedding with Triple Context - Leitura de Abstract

  Jun Shi, Huan Gao, Guilin Qi, and Zhangquan Zhou. 2017. Knowledge Graph Embedding with Triple Context. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management (CIKM '17). Association for Computing Machinery, New York, NY, USA, 2299–2302. https://doi.org/10.1145/3132847.3133119 ABSTRACT Knowledge graph embedding, which aims to represent entities and relations in vector spaces, has shown outstanding performance on a few knowledge graph completion tasks. Most existing methods are based on the assumption that a knowledge graph is a set of separate triples, ignoring rich graph features, i.e., structural information in the graph. In this paper, we take advantages of structures in knowledge graphs, especially local structures around a triple, which we refer to as triple context. We then propose a Triple-Context-based knowledge Embedding model (TCE). For each triple, two kinds of structure information are considered as its context in the graph; one is the out...

KnOD 2021

Beyond Facts: Online Discourse and Knowledge Graphs A preface to the proceedings of the 1st International Workshop on Knowledge Graphs for Online Discourse Analysis (KnOD 2021, co-located with TheWebConf’21) https://ceur-ws.org/Vol-2877/preface.pdf https://knod2021.wordpress.com/   ABSTRACT Expressing opinions and interacting with others on the Web has led to the production of an abundance of online discourse data, such as claims and viewpoints on controversial topics, their sources and contexts . This data constitutes a valuable source of insights for studies into misinformation spread, bias reinforcement, echo chambers or political agenda setting. While knowledge graphs promise to provide the key to a Web of structured information, they are mainly focused on facts without keeping track of the diversity, connection or temporal evolution of online discourse data. As opposed to facts, claims are inherently more complex. Their interpretation strongly depends on the context and a vari...