Pular para o conteúdo principal

Knowledge Graphs - Tracking the historical events that lead to the interweaving of data and knowledge - Leitura de Artigo

Claudio Gutierrez and Juan F. Sequeda. 2021. Knowledge graphs. Commun. ACM 64, 3 (March 2021), 96–104. https://doi.org/10.1145/3418294

=========================================================================

Data was traditionally considered a material object, tied to bits, with no semantics per se.
Knowledge was traditionally conceived as the immaterial object, living only in people’s minds and language.
The destinies of data and knowledge became bound together, becoming almost inseparable, by the emergence of digital computing in the mid-20th century.
Knowledge Graphs can be considered the coming of age of the integration of knowledge and data at large scale with heterogeneous formats.

[KG como a junção de Dados/Bits e Conhecimento/Semântica]

... Automation of reasoning, ... connection between theorem proving and deduction in databases by developing question answering systems.
Researchers recognized the process of searching in large spaces represented a form of “intelligence” or “reasoning.”
Having an understanding of such space would ease searching.

[Busca em grandes volumes como uma forma de gerar Conhecimento]

The ideas go ... “As We May Think” but were developed systematically in the 1950s. A milestone was Bertram Raphael’s “SIR: A Computer Program for Semantic Information Retrieval” (1964). This system demonstrated what could be called an ability to “understand” semantic information. It uses word associations and property lists for the relational information normally conveyed in conversational statements. A format-matching procedure extracts semantic content from English sentences.

[associações devem ser relacionamentos]

Graphical representation of knowledge. Semantic networks were introduced in 1956 by Richard H. Richens, a botanist and computational linguist, as a tool in the area of machine translation of natural languages. 

[Uso para tradução de texto]

the need to understand natural language and other human representations of knowledge; the potential of semantic nets (and graphical representations in general) as abstraction layers;

DATA. The growth in data processing needs brought a division of labor expressed in the notion of representational independence.

This idea is at the core of Edgar Codd’s paper “A Relational Model of Data for Large Shared Data Banks” that describes the use of relations as a mathematical model to provide representational independence; Codd calls this “data independence.”
This theory and design philosophy fostered database management systems and modeling tools.
Such ER models incorporated semantic information of the real world in the form of graphs.

[Independência de dados, níveis de abstração, isolamento]

KNOWLEDGE. While the data stream was focusing on the structure of data and creating systems to best manage it, knowledge was focusing on the meaning of data.

A network data structure for organizing and retrieving semantic information. These ideas were implemented in the semantic network and processing system (SNePS) that can be considered as one of the first stand-alone KRR systems.
Researchers focused on extending semantic networks with formal semantics. An early approach to providing structure and extensibility to local and minute knowledge was the notion of frames. This was introduced by Marvin Minsky in his 1974 article “A Framework for Representing Knowledge.” A frame was defined as a network of nodes and relations.
In 1976, John Sowa introduced Conceptual Graphs in his paper “Conceptual Graphs for a Data Base Interface.” Conceptual graphs serve as an intermediate language to map natural language queries and assertions to a relational database.
The formalism represented a sorted logic with types for concepts and relations.

[Representação do Conhecimento, Base de Conhecimento]

DATA + KNOWLEDGE. ... use of logic as both a declarative and procedural representation of knowledge, a field now known as logic programming.
These ideas were implemented by Alain Colmerauer in PROLOG.
Early systems that could reason based on knowledge, known as knowledge-based systems, and solve complex problems were expert systems.
These systems encoded domain knowledge as if-then rules.

Important notions such as Closed World Assumption ... and Negation as Failure ..., which can be considered the birth of the logical approach to data.

[CWA]

DATA. ... the need of representational independence led to a separation of the software program from the data. This drove the need to find ways to combine object-oriented programming languages with databases.

Graphs started to be investigated as a representation for object-oriented data, graphical and visual interfaces, hypertext, etc. An early case was Harel’s higraphs, which formalize relations in a visual structure, and are now widely used in UML.

KNOWLEDGE
. ... the trade-off between the expressive power of a logic language and the computational complexity of reasoning tasks.

This led to research trade-offs along the expressivity continuum, giving rise to a new family of logics called Description Logics.
F-Logic was heavily influenced by objects and frames, allowing it to reason about schema and object structures within the same declarative language.

They would become the underpinning to OWL, the ontology language for the Semantic Web.

Additionally, the development of non-monotonic reasoning techniques occurred during this time, for example, the introduction of numerous formalisms for non-monotonic reasoning, including circumscription, default logic, autoepistemic logics and conditional logics.

DATA + KNOWLEDGE. ... the Cyc project, which came out of MCC, had the goal of creating the world’s largest knowledge base of common sense to be used for applications performing human-like reasoning.

On the academic side, an initial approach of combining logic and data was to layer logic programming on top of relational databases. Given that logic programs specify functionality (“the what”) without specifying an algorithm (“the how”), optimization plays a key role and was considered much harder than the relational query optimization problem. This gave rise to deductive databases systems, which natively extended relational databases with recursive rules. Datalog, a subset of Prolog for relational data with a clean semantics, became the query language for deductive databases.

By the end of this decade, the first systematic study with the term “Knowledge Graph” appeared. It was the Ph.D. thesis of R.R. Bakker, “Knowledge Graphs: Representation and Structuring of Scientific Knowledge.”

[Primeira menção encontrada para o termo: 1991]

Two main limitations deserve to be highlighted: the fact that negation was a hard problem and was still not well understood at this time; and that reasoning at large scale was an insurmountable problem—in particular, hardware was not ready for the task.
This would be known as the knowledge acquisition bottleneck.

The 1990s witnessed two phenomena that would change the world. First, the emergence of the World Wide Web, the global information infrastructure that revolutionized traditional data, information, and knowledge practices. ... Second, the digitization of almost all aspects of our society. Everything started to move from paper to electronic.
These phenomena paved the way to what is known today as Big Data.

DATA. A key result of fulfilling these goals was semistructured data models, such as Object Exchange Model (OEM), Extensible Markup Language (XML), and Resource Description Framework (RDF), among others.

KNOWLEDGE. Researchers realized that knowledge acquisition was the bottleneck to implement knowledge based and expert systems.

The topic evolved and grew into the fields of knowledge engineering and ontology engineering.

The need to elevate from administrative metadata to formal semantic descriptions gave rise to the spread of languages to describe and reason over taxonomies and ontologies. The notion of ontology was defined as a “shared and formal specification of a conceptualization” by Gruber.
Among the first scientists arguing the relevance of ontologies were N. Guarino, M. Uschold, and M. Grunninger.
Research focused on methodologies to design and maintain ontologies, ...

[Formalismo semântico através da Lógica]

DATA + KNOWLEDGE. The combination of data and knowledge in database management systems was manifested through Deductive Databases. Specialized workshops on Deductive Databases (1990–1999) and Knowledge Representation meets Databases(1994–2003) were a center for the activity of the field.

The Semantic Web project is an endeavor to combine knowledge and data on the Web.
The goal was to converge technologies such as knowledge representation, ontologies, logic, databases, and information retrieval on the Web.

We entered the Big Data revolution. During this era, we see the rise of statistical methods by the introduction of deep learning into AI.
 
MapReduce. The emergence of non-relational, distributed, data stores got a boom with systems such as CouchDB, Google Bigtable and Amazon Dynamo. This gave rise to “NoSQL” databases that (re-)popularized database management systems for Column, Document, Key-Value and Graph data models.

... advances in NLP, and so on consolidated the notion that “data” is well beyond tables of values.
The data management research community continued its research on data integration problems such as schema matching, entity linking, and XML processing.

KNOWLEDGE. The Description Logic research community continued to study trade-offs and define new profiles of logic for knowledge representation.
Reasoning algorithms were implemented in software systems (for example, FACT, Hermit, Pellet).
 
Big Data drove statistical applications to knowledge via machine learning and neural networks. Statistical techniques advanced applications that deduced new facts from already known facts.

The original attempts in the 1960s to model knowledge directly through neural networks were working in practice.

DATA + KNOWLEDGE. The connection between data and knowledge was developed in this period along two lines, namely logical and statistical.
On the logical thread, the Semantic Web project was established, built upon previous results like the graph data model, description logics, and knowledge engineering.

The technologies underpinning the Semantic Web were being developed simultaneously by academia and industry through the World Wide Web Consortium (W3C) standardization efforts.

This gave rise to the Linked Open Data (LOD) project and large RDF graph-based knowledge bases such as DBPedia, and Freebase, which would eventually lead to Wikidata. The LOD project was a demonstration of how data could be integrated at Web scale. In 2011, the major search engines released schema.org, a lightweight ontology, as a way to improve the semantic annotation of Web pages. These efforts were built on the results of the Semantic Web research community.

... speech recognition, NLP, and image processing. This motivated Halevy, Norvig, and Pereira to speak of the “the unreasonable effectiveness of data.” This is probably one of the drivers that motivated the search for new forms of storing, managing and integrating data and knowledge in the world of Big Data and the emergence of the notion of Knowledge Graph.

Data science

... people realized the need to combine logical and statistical techniques, little is yet known on how to integrate these approaches. Another important limitation is that statistical methods, particularly in neural networks, still are opaque regarding explanation of their results.

[Oportunidade de pesquisa em integração e explicabilidade]

In 2012, Google announced a product called the Google Knowledge Graph. Old ideas achieved worldwide popularity as technical limitations were overcome and it was adopted by large companies.

[Retomada das ideias]

Later, myriad companies and organizations started to use the Knowledge Graph keyword to refer to the integration of data, given rise to entities and relations forming graphs. Academia began to adopt this keyword to loosely designate systems that integrate data with some structure of graphs, a reincarnation of the Semantic Web, and Linked Data. In fact, today the notion of Knowledge Graph can be considered, more than a precise notion or system, an evolving project and a vision.

The ongoing area of Knowledge Graphs represents in this sense a convergence of data and knowledge techniques around the old notion of graphs or networks.

[Teoria dos Grafos]

On the other hand, we see a wealth of knowledge technologies addressing the graph model: on the logical side, the materialization and implementation of old ideas like semantic networks, and frames, or more recently, the Semantic Web and Linked Data projects

[Tecnologia para realizar as ideias]

 Halevy, A.Y., Norvig, P. and Pereira, F. The unreasonable effectiveness of data. IEEE Intell. Syst. 24, 2 (2009), 8–12.

Comentários

Postagens mais visitadas deste blog

Aula 12: WordNet | Introdução à Linguagem de Programação Python *** com NLTK

 Fonte -> https://youtu.be/0OCq31jQ9E4 A WordNet do Brasil -> http://www.nilc.icmc.usp.br/wordnetbr/ NLTK  synsets = dada uma palavra acha todos os significados, pode informar a língua e a classe gramatical da palavra (substantivo, verbo, advérbio) from nltk.corpus import wordnet as wn wordnet.synset(xxxxxx).definition() = descrição do significado É possível extrair hipernimia, hiponimia, antonimos e os lemas (diferentes palavras/expressões com o mesmo significado) formando uma REDE LEXICAL. Com isso é possível calcular a distância entre 2 synset dentro do grafo.  Veja trecho de código abaixo: texto = 'útil' print('NOUN:', wordnet.synsets(texto, lang='por', pos=wordnet.NOUN)) texto = 'útil' print('ADJ:', wordnet.synsets(texto, lang='por', pos=wordnet.ADJ)) print(wordnet.synset('handy.s.01').definition()) texto = 'computador' for synset in wn.synsets(texto, lang='por', pos=wn.NOUN):     print('DEF:',s...

truth makers AND truth bearers - Palestra Giancarlo no SBBD

Dando uma googada https://iep.utm.edu/truth/ There are two commonly accepted constraints on truth and falsehood:     Every proposition is true or false.         [Law of the Excluded Middle.]     No proposition is both true and false.         [Law of Non-contradiction.] What is the difference between a truth-maker and a truth bearer? Truth-bearers are either true or false; truth-makers are not since, not being representations, they cannot be said to be true, nor can they be said to be false . That's a second difference. Truth-bearers are 'bipolar,' either true or false; truth-makers are 'unipolar': all of them obtain. What are considered truth bearers?   A variety of truth bearers are considered – statements, beliefs, claims, assumptions, hypotheses, propositions, sentences, and utterances . When I speak of a fact . . . I mean the kind of thing that makes a proposition true or false. (Russe...

DGL-KE : Deep Graph Library (DGL)

Fonte: https://towardsdatascience.com/introduction-to-knowledge-graph-embedding-with-dgl-ke-77ace6fb60ef Amazon recently launched DGL-KE, a software package that simplifies this process with simple command-line scripts. With DGL-KE , users can generate embeddings for very large graphs 2–5x faster than competing techniques. DGL-KE provides users the flexibility to select models used to generate embeddings and optimize performance by configuring hardware, data sampling parameters, and the loss function. To use this package effectively, however, it is important to understand how embeddings work and the optimizations available to compute them. This two-part blog series is designed to provide this information and get you ready to start taking advantage of DGL-KE . Finally, another class of graphs that is especially important for knowledge graphs are multigraphs . These are graphs that can have multiple (directed) edges between the same pair of nodes and can also contain loops. The...