Pular para o conteúdo principal

Busca por Wikidata no DBLP

Fiz uma pesquisa por Wikidata no DBLP em 19/04/2021 para identificar que tipos de pesquisas estão sendo feitas com e sobre Wikidata. Foram encontrados artigos de 2012 a 2021. Baixei as referências em formato bibTex para usar no Mendeley. 

 O primeiro artigo da conferência WWW de 2012 oficializa o lançamento da iniciativa pelo WMF. 

Denny Vrandečić. 2012. Wikidata: a new platform for collaborative data collection. In Proceedings of the 21st International Conference on World Wide Web (WWW '12 Companion). Association for Computing Machinery, New York, NY, USA, 1063–1064. DOI:https://doi.org/10.1145/2187980.2188242 

Na International Semantic Web Conference (ISWC) de 2015 um artigo comparativo com 4 abordagens de reificação dos dados da Wikidata

  1. standard reification (sr) whereby an RDF resource is used to denote the triple itself, denoting its subject, predicate and object as attributes and allowing additional meta-information to be added. 
  2. n-ary relations (nr) whereby an intermediate resource is used to denote the relationship, allowing it to be annotated with meta-information. 
  3. singleton properties (sp) whereby a predicate unique to the statement is created, which can be linked to the high-level predicate indicating the relationship, and onto which can be added additional meta-information. 
  4. Named Graphs (ng) whereby triples (or sets thereof) can be identified in a fourth field using, e.g., an IRI, onto which meta-information is added 

e em cinco Graph Databases: 4store, BlazeGraph, GraphDB, Jena TDB, Virtuoso. 

Hernández, D., A. Hogan and M. Krötzsch. “Reifying RDF: What Works Well With Wikidata?” SSWS@ISWC (2015).

Em 2016, na mesma conferência, outro artigo comparando bancos relacionais (PostgreSQL) e bancos em grafo (Virtuoso, Blazegraph, Neo4J) em relação ao desempenho de consultas SPARQL foi publicado

Hernández D., Hogan A., Riveros C., Rojas C., Zerega E. (2016) Querying Wikidata: Comparing SPARQL, Relational and Graph Databases. In: Groth P. et al. (eds) The Semantic Web – ISWC 2016. ISWC 2016. Lecture Notes in Computer Science, vol 9982. Springer, Cham. https://doi.org/10.1007/978-3-319-46547-0_10

Hernandez et al. have studied the performance of query answering over Wikidata using several graph databases, including BlazeGraph and Virtuoso, and several ways of encoding statements. They concluded that best performance might be achieved using Virtuoso and named graphs, which might seem at odds with our positive experience with BlazeGraph and statement reification. However, it is hard to apply their findings to our case, since they used BlazeGraph on spinning disks rather than SSD, which we discovered to have a critical impacton performance. Moreover, they used a plain version of BlazeGraph without our customisations, and focused on hypothetical query loads that heavily rely on accessing statements in full detail. It is therefore hard to tell if Virtuoso could retain a performance advantage under realistic conditions, making it an interesting topic for future investigations.

Na WWW de 2016, foi publicado em artigo sobre a migração do Freebase para a Wikidata. 

Thomas Pellissier Tanon, Denny Vrandečić, Sebastian Schaffert, Thomas Steiner, and Lydia Pintscher. 2016. From Freebase to Wikidata: The Great Migration. In Proceedings of the 25th International Conference on World Wide Web (WWW '16). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, CHE, 1419–1428. DOI:https://doi.org/10.1145/2872427.2874809 

O primeiro artigo do Scholia é de 2017 sendo que tem um segundo artigo no mesmo ano. 

Nielsen, F., Mietchen, D., & Willighagen, E. (2017). Scholia and scientometrics with Wikidata. ArXiv, abs/1703.04222.

Em 2017 também tem um artigo sobre Inferências com Wikidata. 

Marx, M., & Krötzsch, M. (2017). SQID: Towards Ontological Reasoning for Wikidata. International Semantic Web Conference.

E o primeiro estudo comparativo com a DBpedia. 

Abián D., Guerra F., Martínez-Romanos J., Trillo-Lado R. (2018) Wikidata and DBpedia: A Comparative Study. In: Szymański J., Velegrakis Y. (eds) Semantic Keyword-Based Search on Structured Data Sources. IKC 2017. Lecture Notes in Computer Science, vol 10546. Springer, Cham. https://doi.org/10.1007/978-3-319-74497-1_14

Outro benchmark de banco de dados em grafo usando a Wikidata foi publicado em 2019. O trabalho comparou Blazegrah, JanusGraph e Neo4J e 3 tipos de reificação (mas somente o Neo4J foi testado com os 3 tipos)

Kovács, T., Simon, G., & Mezei, G. (2019). Benchmarking Graph Database Backends—What Works Well with Wikidata?. Acta Cybernetica, 24(1), 43-60. https://doi.org/10.14232/actacyb.24.1.2019.5

Notas sobre uma revisão sistemática e um mapeamento sistemático de 2019.

O mapeamento sistemático analisou 67 artigos revisados por pares de jornais e conferencias e classificou a proposta/estudo em sub-categorias conforme a tabela a seguir.  As conferências mais populares entre as publicações foram: TheWebConf (The Web Conference), ISWC (International Semantic Web Conference), OpenSym (The International Symposium on Open Collaboration), ESWC (Extended SemanticWeb Conference), WSDM (ACM International Conference on Web Search and Data Mining) and MTSR (Research Conference on Metadata and Semantics Research).

4.1 Community-oriented Research
Is Wikidata just another peer production system? The research in this category reflects on Wikidata’s goals and features, existing design decisions (esp. multilingualism), analyzes the Wikidata community and their participation patterns.

4.2 Engineering-oriented Research
New approaches and features that enhance Wikidata’s functionality. These features are programmed for two main purposes: first, for improving the quality by adding new data or by interlinking with other sources, and second, for vandalism detection.

The ACM International Conference on Web Search and Data Mining, held the competition for developing vandalism detection mechanisms for Wikidata, the WSDM Cup 2017.

4.3 Application Use Cases
Wikidata received many attentions from members of various research fields. Many articles described possible use cases for utilizing Wikidata as a central data hub.

Burgataller-Muehlbacher et al. [10] import all human and mouse genes, and all human and mouse proteins into Wikidata to improve the state of biological data, and facilitate data management and data dissemination using the WDQS of Wikidata. Although, Wikidata is greatly being used in bioinformatics, it is still a challenging task for biologists to use it efficiently. One major issue is for example, that the “structured query languages like SPARQL are not commonly part of a researcher’s toolkit”. Thus, E. Putman et al. [16] describe WikiGenomes, a web application based on Wikidata, that facilitates the “consumption and curation of genomic data by the entire biomedical researcher community”. WikiGenomes provides access to the centralized biomedical data and a simple user interface for non-developer biologists.

4.4 Knowledge Graph Oriented Research
Wikidata is maintained by an active community of contributors who create a large amount of structured data. The knowledge base relies on the MediaWiki infrastructure. At the meantime, Wikidata’s structured data is stored in RDF and is accessible through SPARQL. Wikidata belongs, therefore, to a group of other general purpose knowledge graphs, such as DBpedia, YAGO, and Cyc.

4.4.1 Wikidata as Linked Data Provider. We summarize all articles that propose approaches for storing Wikidata’s structured data in RDF and on the other hand, suggest how projects in Wikimedia’s ecosystem can use the RDF data.

Reificação, Triplificação, Consultas SPARQL

Yang et al. [74] uses the data for improving Wikipedia. They discuss that KGs can help machines to analyze plain texts, and propose a Relation Linking System for Wikidata (RLSW) which links the Wikidata KG to data in plain text format in Wikipedia.

4.4.2 Comparison of KGs. Next, we discuss articles which compare Wikidata with other general domain knowledge graphs.

Qualidade, Completude

4.4.3 Common Issues of KGs.

Integração, Temporalidade,

The modern knowledge representation technologies and their advantages in information management, such as description logics, and their contribution to knowledge graphs, and motivates Wikidata as a use case [39].

[39] Markus Krötzsch. 2017. Ontologies for Knowledge Graphs?. In Proceedings of the 30th International Workshop on Description Logics (CEUR Workshop Proceedings),
Vol. Vol-1879. CEUR-WS. org, France.

4.5 Data-oriented Research make use of data from the Wikidata knowledge base and the knowledge graph. Some papers belong to KB and some to KG, while all focus on their defined category.

4.5.1 Data Quality.
Brasileiro et al. [9] discuss the quality of taxonomic hierarchies in Wikidata to have a consistent data model and representation schema.

Freddy Brasileiro, João Paulo A. Almeida, Victorio A. Carvalho, and Giancarlo Guizzardi. 2016. Applying a multi-level modeling theory to assess taxonomic
hierarchies in Wikidata. In Proceedings of the 25th International Conference Companion on World Wide Web. International World Wide Web Conferences Steering
Committee, 975–980.

4.5.2 Tools & Datasets. This category contains research that resulted in the development of new tools, which mainly use Wikidata as a backend data source.

Scholia

Extraído da Discussão do artigo: 

User studies concerning aspects such as the learnability or explainability are still rare on Wikidata. From the authors own experiences on conducting Wikidata workshops, it can be said, that people struggle with understanding Wikidata’s central concepts, for example, the difference between a class and an instance. It seems that Wikidata has still untapped potential in becoming accessible for non-technical experts.

A revisão sistemática analisou 57 artigos revisados por pares de jornais e conferencias (maior parte) onde a Wikidata foi usada em estudos experimentais e aplicações (maior parte). Os estudos foram classificados conforme abaixo: 

 

 4.2.2 Knowledge organization. According to Brasileiro et al. (2016), “the quality of taxonomic structures is key to properly capturing knowledge in Wikidata” and, after assessing the taxonomic hierarchies in Wikidata, they identified a significant number of issues, such as problematic classification and taxonomic statements, related to an inadequate use of instantiation and sub-classing in certain Wikidata hierarchies. For them, support to contributors would be beneficial in order to improve the quality of the Wikidata content.

Extraído da Discussão do artigo:

This is further amplified by the fact that most of the papers are descriptions, proposals or implementations of applications, models or tools that take advantage of Wikidata’s structure or knowledge graph, demonstrating how present efforts are mostly restricted to finding uses for Wikidata instead of conceptualizing its raison d'être or going further and deeper in some of its potential fields of application, which might bring new approaches and contribute to a real breakthrough in Wikidata’s research, use and purpose.

As for applications, most of the works are dedicated to natural language (either in processing or generation), data quality and IR. Such applications, however, are mainly reflexive; they are mostly limited to Wikidata itself (improving its data, expanding its capabilities or integrating more knowledge) and are rarely linked to disciplines outside information systems.

As of today, most of the existing research and, in particular, applications, are centered around another growing field: NLP and NLG. However, this is not the only application that could greatly benefit from a large-scale integrated knowledge base; information extraction and retrieval, fact checking, content enrichment, recommendation systems, alert systems and others could as well.

Comentários

  1. DBLP tem menos artigos sobre Wikidata que as revisões sistematica pq é focado em Computação.

    ResponderExcluir

Postar um comentário

Sinta-se a vontade para comentar. Críticas construtivas são sempre bem vindas.

Postagens mais visitadas deste blog

Connected Papers: Uma abordagem alternativa para revisão da literatura

Durante um projeto de pesquisa podemos encontrar um artigo que nos identificamos em termos de problema de pesquisa e também de solução. Então surge a vontade de saber como essa área de pesquisa se desenvolveu até chegar a esse ponto ou quais desdobramentos ocorreram a partir dessa solução proposta para identificar o estado da arte nesse tema. Podemos seguir duas abordagens:  realizar uma revisão sistemática usando palavras chaves que melhor caracterizam o tema em bibliotecas digitais de referência para encontrar artigos relacionados ou realizar snowballing ancorado nesse artigo que identificamos previamente, explorando os artigos citados (backward) ou os artigos que o citam (forward)  Mas a ferramenta Connected Papers propõe uma abordagem alternativa para essa busca. O problema inicial é dado um artigo de interesse, precisamos encontrar outros artigos relacionados de "certa forma". Find different methods and approaches to the same subject Track down the state of the art rese...

Knowledge Graph Embedding with Triple Context - Leitura de Abstract

  Jun Shi, Huan Gao, Guilin Qi, and Zhangquan Zhou. 2017. Knowledge Graph Embedding with Triple Context. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management (CIKM '17). Association for Computing Machinery, New York, NY, USA, 2299–2302. https://doi.org/10.1145/3132847.3133119 ABSTRACT Knowledge graph embedding, which aims to represent entities and relations in vector spaces, has shown outstanding performance on a few knowledge graph completion tasks. Most existing methods are based on the assumption that a knowledge graph is a set of separate triples, ignoring rich graph features, i.e., structural information in the graph. In this paper, we take advantages of structures in knowledge graphs, especially local structures around a triple, which we refer to as triple context. We then propose a Triple-Context-based knowledge Embedding model (TCE). For each triple, two kinds of structure information are considered as its context in the graph; one is the out...

KnOD 2021

Beyond Facts: Online Discourse and Knowledge Graphs A preface to the proceedings of the 1st International Workshop on Knowledge Graphs for Online Discourse Analysis (KnOD 2021, co-located with TheWebConf’21) https://ceur-ws.org/Vol-2877/preface.pdf https://knod2021.wordpress.com/   ABSTRACT Expressing opinions and interacting with others on the Web has led to the production of an abundance of online discourse data, such as claims and viewpoints on controversial topics, their sources and contexts . This data constitutes a valuable source of insights for studies into misinformation spread, bias reinforcement, echo chambers or political agenda setting. While knowledge graphs promise to provide the key to a Web of structured information, they are mainly focused on facts without keeping track of the diversity, connection or temporal evolution of online discourse data. As opposed to facts, claims are inherently more complex. Their interpretation strongly depends on the context and a vari...