Pular para o conteúdo principal

Wikidata: The Making Of - Leitura de Artigo

Vídeo -> https://youtu.be/P3-nklyrDx4

Denny Vrandečić, Lydia Pintscher, and Markus Krötzsch. 2023. Wikidata: The Making Of. In Companion Proceedings of the ACM Web Conference 2023 (WWW '23 Companion). Association for Computing Machinery, New York, NY, USA, 615–624. https://doi.org/10.1145/3543873.3585579

1 INTRODUCTION

(5) Verifability, not truth: Wikidata relies on external sources for confrmation; statements can come with references; conficting or debated standpoints may co-exist

The data collected in most of these projects can also be considered knowledge graphs, i.e., structured data collections that encode meaningful information in terms of (typed, directed) connections between concepts. Nevertheless, the actual data sets are completely diferent, both in their vocabulary and their underlying data model. In comparison to other approaches, Wikidata has one of the richest graph formats, where each statement (edge in the graph) can have user-defned annotations (e.g., validity time) and references.

3 SEMANTIC WIKIPEDIA

4 MOVING SIDEWAYS (2005–2010)

5 EVOLUTION OF AN IDEA

Another important realization was that verifability would have to play a central role.

The project developed ideas for handling contradicting and incomplete knowledge, and analyzed Wikipedia to understand the necessity for such approaches [63].

6 PROJECT PROPOSAL

Thanks to the on-going collaboration in RENDER, Pavel Richter, then Executive Director of Wikimedia Deutschland, took the proposal to WMDE’s Board, which decided to accept Wikidata as a new Wikimedia project in June 2011, provided that sufcient funding would be available.21 For Richter and Wikimedia Deutschland this was a major step, as the planned development team would signifcantly enlarge Wiki-media Deutschland, and necessitate a sudden transformation of the organization, which Richter managed in the years to come 

While looking for funding, at least one major donor dropped out because the project proposal insisted that the ontology of Wikidata had to be community-controlled, and would be neither pre-defned by professional ontologists nor imported from existing ontologies

7 EARLY DEVELOPMENT AND LAUNCH

8 EARLY WIKIDATA (2013–2015)

The editor community started rallying around the tasks that could be done with the limited functionality and started forming task forces (later becoming WikiProjects) to collect and expand data around topics such as countries and Pokémon, or to improve the language coverage for certain languages

It has been a challenge to make the idea of a knowledge graph accessible and attractive to an audience that is not familiar with the ideas of the Semantic Web. Data is abstract, and it takes creativity and efort to see the potential in linking this data and making it machine-readable. A few key applications were instrumental in sparking excitement by showing what is and will become possible once Wikidata grew. Chief among the people who made this possible was Magnus Manske, who developed Reasonator,27 an alternative view on Wikidata; Wiri,28 an early question answering demo; and Wikidata Query, the frst query tool for Wikidata.

27 https://reasonator.toolforge.org 

28 https://magnus-toolserver.toolforge.org/thetalkpage

WDQS is a Blaze-graph-based SPARQL endpoint that gives access to the RDF-ized version [16, 21] of the data in Wikidata in real-time, through live updates [37]. Its goal is to enable applications and services on top of Wikidata, as well as to support the editor community, especially in improving data quality.

9 TEENAGE WIKIDATA (2015-2022)

10 OUTLOOK

Indeed, even the original concept has not been fully realized yet. The initial Wikidata proposal (Section 6) was split in three phases: frst sitelinks, second statements, third queries. The third phase, though, has not yet been realized. It was planned to allow the community to define queries, to store and visualize the results in Wikidata, and to include these results in Wikipedia. This would have served as a forcing function to increase the uniformity of Wikidata’s structure.

By selecting a flexible, statement-centric data model – inspired by SMW, and in turn by RDF – Wikidata does not enforce a fixed schema upon groups of concepts.

Wikifunctions in turn is envisioned as a wiki-based repository of executable functions, described in community-curated source code. These functions will in particular be used to access and transform data in Wikidata, in order to generate views on the data. These views – tables, graphs, text – can then be integrated into Wikipedia. This is a return to the goals of the original Phase 3, which would increase both the incentives to make the data more coherent, and the visibility and reach of the data as such. This may then lead to improved correctness and completeness of the data, since only data that is used is data that is good (a corollary to Linus’s law of “given enough eyeballs, all bugs are shallow” [54])

Another aspect of Wikidata that we think needs further development is how to more efectively share semantics – within Wikidata itself, with other Wikimedia projects, and with the world in general. Wikidata is not based on a standard semantics such as OWL [22], although community modeling is strongly inspired by some of the expressive features developed for ontologies. The intended modeling of data is communicated through documentation on wikidata.org, shared SPARQL query patterns, and Entity Schemas in ShEx [52]. Nevertheless, the intention of modeling patterns and individual statements often remains informal, vague, and ambiguous. As Krötzsch argued in his ISWC 2022 keynote [32], a single, fixed semantic model could not be enough for all uses and perspectives required for Wikidata (or the Web as a whole), yet some suficiently formal, unambiguous, and declarative way of sharing intended interpretations is still needed. A variety of powerful knowledge representation languages could be used for this purpose, but we still lack both infrastructure and best practices to use them efectively in such complex applications.



Comentários

Postagens mais visitadas deste blog

Aula 12: WordNet | Introdução à Linguagem de Programação Python *** com NLTK

 Fonte -> https://youtu.be/0OCq31jQ9E4 A WordNet do Brasil -> http://www.nilc.icmc.usp.br/wordnetbr/ NLTK  synsets = dada uma palavra acha todos os significados, pode informar a língua e a classe gramatical da palavra (substantivo, verbo, advérbio) from nltk.corpus import wordnet as wn wordnet.synset(xxxxxx).definition() = descrição do significado É possível extrair hipernimia, hiponimia, antonimos e os lemas (diferentes palavras/expressões com o mesmo significado) formando uma REDE LEXICAL. Com isso é possível calcular a distância entre 2 synset dentro do grafo.  Veja trecho de código abaixo: texto = 'útil' print('NOUN:', wordnet.synsets(texto, lang='por', pos=wordnet.NOUN)) texto = 'útil' print('ADJ:', wordnet.synsets(texto, lang='por', pos=wordnet.ADJ)) print(wordnet.synset('handy.s.01').definition()) texto = 'computador' for synset in wn.synsets(texto, lang='por', pos=wn.NOUN):     print('DEF:',s...

truth makers AND truth bearers - Palestra Giancarlo no SBBD

Dando uma googada https://iep.utm.edu/truth/ There are two commonly accepted constraints on truth and falsehood:     Every proposition is true or false.         [Law of the Excluded Middle.]     No proposition is both true and false.         [Law of Non-contradiction.] What is the difference between a truth-maker and a truth bearer? Truth-bearers are either true or false; truth-makers are not since, not being representations, they cannot be said to be true, nor can they be said to be false . That's a second difference. Truth-bearers are 'bipolar,' either true or false; truth-makers are 'unipolar': all of them obtain. What are considered truth bearers?   A variety of truth bearers are considered – statements, beliefs, claims, assumptions, hypotheses, propositions, sentences, and utterances . When I speak of a fact . . . I mean the kind of thing that makes a proposition true or false. (Russe...

DGL-KE : Deep Graph Library (DGL)

Fonte: https://towardsdatascience.com/introduction-to-knowledge-graph-embedding-with-dgl-ke-77ace6fb60ef Amazon recently launched DGL-KE, a software package that simplifies this process with simple command-line scripts. With DGL-KE , users can generate embeddings for very large graphs 2–5x faster than competing techniques. DGL-KE provides users the flexibility to select models used to generate embeddings and optimize performance by configuring hardware, data sampling parameters, and the loss function. To use this package effectively, however, it is important to understand how embeddings work and the optimizations available to compute them. This two-part blog series is designed to provide this information and get you ready to start taking advantage of DGL-KE . Finally, another class of graphs that is especially important for knowledge graphs are multigraphs . These are graphs that can have multiple (directed) edges between the same pair of nodes and can also contain loops. The...