Pular para o conteúdo principal

New OpenLink Virtuoso hosted Wikidata Knowledge Graph

WD de Dezembro de 2022

From: Kingsley Idehen <kidehen@openlinksw.com>
Subject: Announce: New OpenLink Virtuoso hosted Wikidata Knowledge Graph Release
Date: 11 January 2023 17:51:49 GMT-3
To: wikidata@lists.wikimedia.org, "public-lod@w3.org" <public-lod@w3.org>
Resent-From: public-lod@w3.org

All,

We are pleased to announce immediate availability of an new Virtuoso-hosted Wikidata instance based on the most recent datasets. This instance comprises 17 billion+ RDF triples.

Host Machine Info:

Item     Value
CPU         2x Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz
Cores    24
Memory    378 GB
SSD        4x Crucial M4 SSD 500 GB

Cloud related costs for a self-hosted variant, assuming:

    dedicated machine for 1 year without upfront costs
    128 GiB memory
    16 cores or more
    512GB SSD for the database
    3T outgoing internet traffic (based on our DBpedia statistics)

SPARQL Query and Full Text Search service endpoints:

    https://wikidata.demo.openlinksw.com/sparql -- SPARQL Query Services Endpoint

    https://wikidata.demo.openlinksw.com/fct -- Faceted Search & Browsing

Additional Information

    Loading the Wikidata dataset 2022/12 into Virtuoso Open Source - Announcements - OpenLink Software Community (openlinksw.com)

=============================================================

Rodei a seguinte query para os "disputed by" neste endpoint

SELECT count(distinct ?statement)
WHERE
{
  ?item ?predicate ?statement.
  ?item ?predicate ?value.
  ?statement pq:P1310 ?qualivalue
}

Retornou 1926 (referente ao dump de dez/22).No WDQS retornou  1936 (referente a hoje).E no dataset do kgtk temos 1577(referente a junho/22)

A query dos multiples values pode ser executada configurando o timeout para 120000

SELECT distinct ?item ?predicate ?value1 ?value2
WHERE
{
# ?item wdt:P31 wd:Q5.
  ?item ?predicate ?value1.
  ?item ?predicate ?value2.
  FILTER (?value1 < ?value2).
  FILTER (?predicate not in (schema:description, rdfs:label))
}

Mas eu tentei incluir mais alguns filtros para remover as triplas referentes a reificação e começou a dar timeout

SELECT distinct ?item ?predicate ?value1 ?value2
WHERE
{
# ?item wdt:P31 wd:Q5.
  ?item ?predicate ?value1.
  ?item ?predicate ?value2.
  FILTER (?value1 < ?value2).
        FILTER (strstarts(str(?item), 'http://www.wikidata.org/entity/Q')).
        FILTER (str(?predicate) not in ('http://www.w3.org/1999/02/22-rdf-syntax-ns#type'))
}

Consegui rodar a query abaixo para verificar os valores dos top-10 qualificadores mais usados para o conjunto completo

SELECT ?qualifier count(distinct ?statement) as ?c_quali
WHERE
{
  ?statement ?qualifier ?qualivalue.
  FILTER (?qualifier in (pq:P407, pq:P577, pq:P304, pq:P478, pq:P291, pq:P2093, pq:P1476, pq:P813, pq:P1343, pq:P958))
}

Mas achei algumas quantidades muito diferentes. Será que a remoção de artigos científicos justificaria esta diferença?

Virtuosokgtk
QualifierCountQualifierCount
http://www.wikidata.org/prop/qualifier/P4071410475P4071242876
http://www.wikidata.org/prop/qualifier/P5771003312P577537468
http://www.wikidata.org/prop/qualifier/P304841445P304441380
http://www.wikidata.org/prop/qualifier/P478513899P478187030
http://www.wikidata.org/prop/qualifier/P2093370076P209398772
http://www.wikidata.org/prop/qualifier/P291113265P291105546
http://www.wikidata.org/prop/qualifier/P958108660P95845364
http://www.wikidata.org/prop/qualifier/P147698247P147692759
http://www.wikidata.org/prop/qualifier/P81379584P81368490
http://www.wikidata.org/prop/qualifier/P134336831P134350777




Comentários

Postagens mais visitadas deste blog

Aula 12: WordNet | Introdução à Linguagem de Programação Python *** com NLTK

 Fonte -> https://youtu.be/0OCq31jQ9E4 A WordNet do Brasil -> http://www.nilc.icmc.usp.br/wordnetbr/ NLTK  synsets = dada uma palavra acha todos os significados, pode informar a língua e a classe gramatical da palavra (substantivo, verbo, advérbio) from nltk.corpus import wordnet as wn wordnet.synset(xxxxxx).definition() = descrição do significado É possível extrair hipernimia, hiponimia, antonimos e os lemas (diferentes palavras/expressões com o mesmo significado) formando uma REDE LEXICAL. Com isso é possível calcular a distância entre 2 synset dentro do grafo.  Veja trecho de código abaixo: texto = 'útil' print('NOUN:', wordnet.synsets(texto, lang='por', pos=wordnet.NOUN)) texto = 'útil' print('ADJ:', wordnet.synsets(texto, lang='por', pos=wordnet.ADJ)) print(wordnet.synset('handy.s.01').definition()) texto = 'computador' for synset in wn.synsets(texto, lang='por', pos=wn.NOUN):     print('DEF:',s...

truth makers AND truth bearers - Palestra Giancarlo no SBBD

Dando uma googada https://iep.utm.edu/truth/ There are two commonly accepted constraints on truth and falsehood:     Every proposition is true or false.         [Law of the Excluded Middle.]     No proposition is both true and false.         [Law of Non-contradiction.] What is the difference between a truth-maker and a truth bearer? Truth-bearers are either true or false; truth-makers are not since, not being representations, they cannot be said to be true, nor can they be said to be false . That's a second difference. Truth-bearers are 'bipolar,' either true or false; truth-makers are 'unipolar': all of them obtain. What are considered truth bearers?   A variety of truth bearers are considered – statements, beliefs, claims, assumptions, hypotheses, propositions, sentences, and utterances . When I speak of a fact . . . I mean the kind of thing that makes a proposition true or false. (Russe...

DGL-KE : Deep Graph Library (DGL)

Fonte: https://towardsdatascience.com/introduction-to-knowledge-graph-embedding-with-dgl-ke-77ace6fb60ef Amazon recently launched DGL-KE, a software package that simplifies this process with simple command-line scripts. With DGL-KE , users can generate embeddings for very large graphs 2–5x faster than competing techniques. DGL-KE provides users the flexibility to select models used to generate embeddings and optimize performance by configuring hardware, data sampling parameters, and the loss function. To use this package effectively, however, it is important to understand how embeddings work and the optimizations available to compute them. This two-part blog series is designed to provide this information and get you ready to start taking advantage of DGL-KE . Finally, another class of graphs that is especially important for knowledge graphs are multigraphs . These are graphs that can have multiple (directed) edges between the same pair of nodes and can also contain loops. The...