Pular para o conteúdo principal

New OpenLink Virtuoso hosted Wikidata Knowledge Graph

WD de Dezembro de 2022

From: Kingsley Idehen <kidehen@openlinksw.com>
Subject: Announce: New OpenLink Virtuoso hosted Wikidata Knowledge Graph Release
Date: 11 January 2023 17:51:49 GMT-3
To: wikidata@lists.wikimedia.org, "public-lod@w3.org" <public-lod@w3.org>
Resent-From: public-lod@w3.org

All,

We are pleased to announce immediate availability of an new Virtuoso-hosted Wikidata instance based on the most recent datasets. This instance comprises 17 billion+ RDF triples.

Host Machine Info:

Item     Value
CPU         2x Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz
Cores    24
Memory    378 GB
SSD        4x Crucial M4 SSD 500 GB

Cloud related costs for a self-hosted variant, assuming:

    dedicated machine for 1 year without upfront costs
    128 GiB memory
    16 cores or more
    512GB SSD for the database
    3T outgoing internet traffic (based on our DBpedia statistics)

SPARQL Query and Full Text Search service endpoints:

    https://wikidata.demo.openlinksw.com/sparql -- SPARQL Query Services Endpoint

    https://wikidata.demo.openlinksw.com/fct -- Faceted Search & Browsing

Additional Information

    Loading the Wikidata dataset 2022/12 into Virtuoso Open Source - Announcements - OpenLink Software Community (openlinksw.com)

=============================================================

Rodei a seguinte query para os "disputed by" neste endpoint

SELECT count(distinct ?statement)
WHERE
{
  ?item ?predicate ?statement.
  ?item ?predicate ?value.
  ?statement pq:P1310 ?qualivalue
}

Retornou 1926 (referente ao dump de dez/22).No WDQS retornou  1936 (referente a hoje).E no dataset do kgtk temos 1577(referente a junho/22)

A query dos multiples values pode ser executada configurando o timeout para 120000

SELECT distinct ?item ?predicate ?value1 ?value2
WHERE
{
# ?item wdt:P31 wd:Q5.
  ?item ?predicate ?value1.
  ?item ?predicate ?value2.
  FILTER (?value1 < ?value2).
  FILTER (?predicate not in (schema:description, rdfs:label))
}

Mas eu tentei incluir mais alguns filtros para remover as triplas referentes a reificação e começou a dar timeout

SELECT distinct ?item ?predicate ?value1 ?value2
WHERE
{
# ?item wdt:P31 wd:Q5.
  ?item ?predicate ?value1.
  ?item ?predicate ?value2.
  FILTER (?value1 < ?value2).
        FILTER (strstarts(str(?item), 'http://www.wikidata.org/entity/Q')).
        FILTER (str(?predicate) not in ('http://www.w3.org/1999/02/22-rdf-syntax-ns#type'))
}

Consegui rodar a query abaixo para verificar os valores dos top-10 qualificadores mais usados para o conjunto completo

SELECT ?qualifier count(distinct ?statement) as ?c_quali
WHERE
{
  ?statement ?qualifier ?qualivalue.
  FILTER (?qualifier in (pq:P407, pq:P577, pq:P304, pq:P478, pq:P291, pq:P2093, pq:P1476, pq:P813, pq:P1343, pq:P958))
}

Mas achei algumas quantidades muito diferentes. Será que a remoção de artigos científicos justificaria esta diferença?

Virtuosokgtk
QualifierCountQualifierCount
http://www.wikidata.org/prop/qualifier/P4071410475P4071242876
http://www.wikidata.org/prop/qualifier/P5771003312P577537468
http://www.wikidata.org/prop/qualifier/P304841445P304441380
http://www.wikidata.org/prop/qualifier/P478513899P478187030
http://www.wikidata.org/prop/qualifier/P2093370076P209398772
http://www.wikidata.org/prop/qualifier/P291113265P291105546
http://www.wikidata.org/prop/qualifier/P958108660P95845364
http://www.wikidata.org/prop/qualifier/P147698247P147692759
http://www.wikidata.org/prop/qualifier/P81379584P81368490
http://www.wikidata.org/prop/qualifier/P134336831P134350777




Comentários

Postagens mais visitadas deste blog

Connected Papers: Uma abordagem alternativa para revisão da literatura

Durante um projeto de pesquisa podemos encontrar um artigo que nos identificamos em termos de problema de pesquisa e também de solução. Então surge a vontade de saber como essa área de pesquisa se desenvolveu até chegar a esse ponto ou quais desdobramentos ocorreram a partir dessa solução proposta para identificar o estado da arte nesse tema. Podemos seguir duas abordagens:  realizar uma revisão sistemática usando palavras chaves que melhor caracterizam o tema em bibliotecas digitais de referência para encontrar artigos relacionados ou realizar snowballing ancorado nesse artigo que identificamos previamente, explorando os artigos citados (backward) ou os artigos que o citam (forward)  Mas a ferramenta Connected Papers propõe uma abordagem alternativa para essa busca. O problema inicial é dado um artigo de interesse, precisamos encontrar outros artigos relacionados de "certa forma". Find different methods and approaches to the same subject Track down the state of the art rese...

Knowledge Graph Embedding with Triple Context - Leitura de Abstract

  Jun Shi, Huan Gao, Guilin Qi, and Zhangquan Zhou. 2017. Knowledge Graph Embedding with Triple Context. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management (CIKM '17). Association for Computing Machinery, New York, NY, USA, 2299–2302. https://doi.org/10.1145/3132847.3133119 ABSTRACT Knowledge graph embedding, which aims to represent entities and relations in vector spaces, has shown outstanding performance on a few knowledge graph completion tasks. Most existing methods are based on the assumption that a knowledge graph is a set of separate triples, ignoring rich graph features, i.e., structural information in the graph. In this paper, we take advantages of structures in knowledge graphs, especially local structures around a triple, which we refer to as triple context. We then propose a Triple-Context-based knowledge Embedding model (TCE). For each triple, two kinds of structure information are considered as its context in the graph; one is the out...

KnOD 2021

Beyond Facts: Online Discourse and Knowledge Graphs A preface to the proceedings of the 1st International Workshop on Knowledge Graphs for Online Discourse Analysis (KnOD 2021, co-located with TheWebConf’21) https://ceur-ws.org/Vol-2877/preface.pdf https://knod2021.wordpress.com/   ABSTRACT Expressing opinions and interacting with others on the Web has led to the production of an abundance of online discourse data, such as claims and viewpoints on controversial topics, their sources and contexts . This data constitutes a valuable source of insights for studies into misinformation spread, bias reinforcement, echo chambers or political agenda setting. While knowledge graphs promise to provide the key to a Web of structured information, they are mainly focused on facts without keeping track of the diversity, connection or temporal evolution of online discourse data. As opposed to facts, claims are inherently more complex. Their interpretation strongly depends on the context and a vari...