Fiz uma pesquisa por Wikidata no DBLP em 19/04/2021 para identificar que tipos de pesquisas estão sendo feitas com e sobre Wikidata. Foram encontrados artigos de 2012 a 2021. Baixei as referências em formato bibTex para usar no Mendeley.
O primeiro artigo da conferência WWW de 2012 oficializa o lançamento da iniciativa pelo WMF.
Denny Vrandečić. 2012. Wikidata: a new platform for collaborative data collection. In Proceedings of the 21st International Conference on World Wide Web (WWW '12 Companion). Association for Computing Machinery, New York, NY, USA, 1063–1064. DOI:https://doi.org/10.1145/2187980.2188242
Na International Semantic Web Conference (ISWC) de 2015 um artigo comparativo com 4 abordagens de reificação dos dados da Wikidata
- standard reification (sr) whereby an RDF resource is used to denote the triple itself, denoting its subject, predicate and object as attributes and allowing additional meta-information to be added.
- n-ary relations (nr) whereby an intermediate resource is used to denote the relationship, allowing it to be annotated with meta-information.
- singleton properties (sp) whereby a predicate unique to the statement is created, which can be linked to the high-level predicate indicating the relationship, and onto which can be added additional meta-information.
- Named Graphs (ng) whereby triples (or sets thereof) can be identified in a fourth field using, e.g., an IRI, onto which meta-information is added
e em cinco Graph Databases: 4store, BlazeGraph, GraphDB, Jena TDB, Virtuoso.
Em 2016, na mesma conferência, outro artigo comparando bancos relacionais (PostgreSQL) e bancos em grafo (Virtuoso, Blazegraph, Neo4J) em relação ao desempenho de consultas SPARQL foi publicado
Hernández D., Hogan A., Riveros C., Rojas C., Zerega E. (2016) Querying Wikidata: Comparing SPARQL, Relational and Graph Databases. In: Groth P. et al. (eds) The Semantic Web – ISWC 2016. ISWC 2016. Lecture Notes in Computer Science, vol 9982. Springer, Cham. https://doi.org/10.1007/978-3-319-46547-0_10
Hernandez et al. have studied the performance of query answering over Wikidata using several graph databases, including BlazeGraph and Virtuoso, and several ways of encoding statements. They concluded that best performance might be achieved using Virtuoso and named graphs, which might seem at odds with our positive experience with BlazeGraph and statement reification. However, it is hard to apply their findings to our case, since they used BlazeGraph on spinning disks rather than SSD, which we discovered to have a critical impacton performance. Moreover, they used a plain version of BlazeGraph without our customisations, and focused on hypothetical query loads that heavily rely on accessing statements in full detail. It is therefore hard to tell if Virtuoso could retain a performance advantage under realistic conditions, making it an interesting topic for future investigations.
Na WWW de 2016, foi publicado em artigo sobre a migração do Freebase para a Wikidata.
Thomas Pellissier Tanon, Denny Vrandečić, Sebastian Schaffert, Thomas Steiner, and Lydia Pintscher. 2016. From Freebase to Wikidata: The Great Migration. In Proceedings of the 25th International Conference on World Wide Web (WWW '16). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, CHE, 1419–1428. DOI:https://doi.org/10.1145/2872427.2874809
O primeiro artigo do Scholia é de 2017 sendo que tem um segundo artigo no mesmo ano.
Em 2017 também tem um artigo sobre Inferências com Wikidata.
E o primeiro estudo comparativo com a DBpedia.
Abián D., Guerra F., Martínez-Romanos J., Trillo-Lado R. (2018) Wikidata and DBpedia: A Comparative Study. In: Szymański J., Velegrakis Y. (eds) Semantic Keyword-Based Search on Structured Data Sources. IKC 2017. Lecture Notes in Computer Science, vol 10546. Springer, Cham. https://doi.org/10.1007/978-3-319-74497-1_14
Outro benchmark de banco de dados em grafo usando a Wikidata foi publicado em 2019. O trabalho comparou Blazegrah, JanusGraph e Neo4J e 3 tipos de reificação (mas somente o Neo4J foi testado com os 3 tipos)
Kovács, T., Simon, G., & Mezei, G. (2019). Benchmarking Graph Database Backends—What Works Well with Wikidata?. Acta Cybernetica, 24(1), 43-60. https://doi.org/10.14232/actacyb.24.1.2019.5
Notas sobre uma revisão sistemática e um mapeamento sistemático de 2019.
O mapeamento sistemático analisou 67 artigos revisados por pares de jornais e conferencias e classificou a proposta/estudo em sub-categorias conforme a tabela a seguir. As conferências mais populares entre as publicações foram: TheWebConf (The Web Conference), ISWC (International Semantic Web Conference), OpenSym (The International Symposium on Open Collaboration), ESWC (Extended SemanticWeb Conference), WSDM (ACM International Conference on Web Search and Data Mining) and MTSR (Research Conference on Metadata and Semantics Research).
4.1 Community-oriented Research
Is Wikidata just another peer production system? The research in this category reflects on Wikidata’s goals and features, existing design decisions (esp. multilingualism), analyzes the Wikidata community and their participation patterns.
4.2 Engineering-oriented Research
New approaches and features that enhance Wikidata’s functionality. These features are programmed for two main purposes: first, for improving the quality by adding new data or by interlinking with other sources, and second, for vandalism detection.
The ACM International Conference on Web Search and Data Mining, held the competition for developing vandalism detection mechanisms for Wikidata, the WSDM Cup 2017.
4.3 Application Use Cases
Wikidata received many attentions from members of various research fields. Many articles described possible use cases for utilizing Wikidata as a central data hub.
Burgataller-Muehlbacher et al. [10] import all human and mouse genes, and all human and mouse proteins into Wikidata to improve the state of biological data, and facilitate data management and data dissemination using the WDQS of Wikidata. Although, Wikidata is greatly being used in bioinformatics, it is still a challenging task for biologists to use it efficiently. One major issue is for example, that the “structured query languages like SPARQL are not commonly part of a researcher’s toolkit”. Thus, E. Putman et al. [16] describe WikiGenomes, a web application based on Wikidata, that facilitates the “consumption and curation of genomic data by the entire biomedical researcher community”. WikiGenomes provides access to the centralized biomedical data and a simple user interface for non-developer biologists.
4.4 Knowledge Graph Oriented Research
Wikidata is maintained by an active community of contributors who create a large amount of structured data. The knowledge base relies on the MediaWiki infrastructure. At the meantime, Wikidata’s structured data is stored in RDF and is accessible through SPARQL. Wikidata belongs, therefore, to a group of other general purpose knowledge graphs, such as DBpedia, YAGO, and Cyc.
4.4.1 Wikidata as Linked Data Provider. We summarize all articles that propose approaches for storing Wikidata’s structured data in RDF and on the other hand, suggest how projects in Wikimedia’s ecosystem can use the RDF data.
Reificação, Triplificação, Consultas SPARQL
Yang et al. [74] uses the data for improving Wikipedia. They discuss that KGs can help machines to analyze plain texts, and propose a Relation Linking System for Wikidata (RLSW) which links the Wikidata KG to data in plain text format in Wikipedia.
4.4.2 Comparison of KGs. Next, we discuss articles which compare Wikidata with other general domain knowledge graphs.
Qualidade, Completude
4.4.3 Common Issues of KGs.
Integração, Temporalidade,
The modern knowledge representation technologies and their advantages in information management, such as description logics, and their contribution to knowledge graphs, and motivates Wikidata as a use case [39].
[39] Markus Krötzsch. 2017. Ontologies for Knowledge Graphs?. In Proceedings of the 30th International Workshop on Description Logics (CEUR Workshop Proceedings),
Vol. Vol-1879. CEUR-WS. org, France.
4.5 Data-oriented Research make use of data from the Wikidata knowledge base and the knowledge graph. Some papers belong to KB and some to KG, while all focus on their defined category.
4.5.1 Data Quality.
Brasileiro et al. [9] discuss the quality of taxonomic hierarchies in Wikidata to have a consistent data model and representation schema.
Freddy Brasileiro, João Paulo A. Almeida, Victorio A. Carvalho, and Giancarlo Guizzardi. 2016. Applying a multi-level modeling theory to assess taxonomic
hierarchies in Wikidata. In Proceedings of the 25th International Conference Companion on World Wide Web. International World Wide Web Conferences Steering
Committee, 975–980.
4.5.2 Tools & Datasets. This category contains research that resulted in the development of new tools, which mainly use Wikidata as a backend data source.
Scholia
Extraído da Discussão do artigo:
User studies concerning aspects such as the learnability or explainability are still rare on Wikidata. From the authors own experiences on conducting Wikidata workshops, it can be said, that people struggle with understanding Wikidata’s central concepts, for example, the difference between a class and an instance. It seems that Wikidata has still untapped potential in becoming accessible for non-technical experts.
A revisão sistemática analisou 57 artigos revisados por pares de jornais e conferencias (maior parte) onde a Wikidata foi usada em estudos experimentais e aplicações (maior parte). Os estudos foram classificados conforme abaixo:
4.2.2 Knowledge organization. According to Brasileiro et al. (2016), “the quality of taxonomic structures is key to properly capturing knowledge in Wikidata” and, after assessing the taxonomic hierarchies in Wikidata, they identified a significant number of issues, such as problematic classification and taxonomic statements, related to an inadequate use of instantiation and sub-classing in certain Wikidata hierarchies. For them, support to contributors would be beneficial in order to improve the quality of the Wikidata content.
Extraído da Discussão do artigo:
This is further amplified by the fact that most of the papers are descriptions, proposals or implementations of applications, models or tools that take advantage of Wikidata’s structure or knowledge graph, demonstrating how present efforts are mostly restricted to finding uses for Wikidata instead of conceptualizing its raison d'être or going further and deeper in some of its potential fields of application, which might bring new approaches and contribute to a real breakthrough in Wikidata’s research, use and purpose.
As for applications, most of the works are dedicated to natural language (either in processing or generation), data quality and IR. Such applications, however, are mainly reflexive; they are mostly limited to Wikidata itself (improving its data, expanding its capabilities or integrating more knowledge) and are rarely linked to disciplines outside information systems.
As of today, most of the existing research and, in particular, applications, are centered around another growing field: NLP and NLG. However, this is not the only application that could greatly benefit from a large-scale integrated knowledge base; information extraction and retrieval, fact checking, content enrichment, recommendation systems, alert systems and others could as well.
DBLP tem menos artigos sobre Wikidata que as revisões sistematica pq é focado em Computação.
ResponderExcluir