Pular para o conteúdo principal

Disputas e Ranking na WD - estatísticas

WD de Junho de 2022

# and % and distribution of % of "disputed by" statements

559,038,971 CLAIMS
1,577 disputed by

0,0028 %

Comando

(base) rootvm096:/app/kgtk/temp# zcat /app/kgtk/data/my-tsv/disputedBy-claims-sorted.tsv.gz | wc -l
1,578

TOP 10 PROPERTIES (Disputed By)

(base) root@vm096:/home/cloud-di# kgtk sort -i /app/kgtk/data/my-tsv/disputedBy-claims-pred-count-label.tsv -c node2 --reverse-columns node2 --numeric-columns node2 / head
node1   label   node2   node1;distribution      node1;label
P17     count   561     35.5739 'country'@en
P3355   count   186     11.7945 'negative therapeutic predictor'@en
P3354   count   140     8.8776  'positive therapeutic predictor'@en
P131    count   106     6.7216  'located in the administrative territorial entity'@en
P31     count   78      4.9461  'instance of'@en
P460    count   43      2.7267  'said to be the same as'@en
P3359   count   29      1.8389  'negative prognostic predictor'@en
P40     count   20      1.2682  'child'@en
P39     count   19      1.2048  'position held'@en
P170    count   18      1.1414  'creator'@en

node1;distribution = (node2 / 1,577) * 100

# and % of "ranked" statements

559,038,971 CLAIMS
553,558,106 normal rank
5,480,866 preferred rank
0 deprecated rank

0,98 % preferred rank

Comando

(base) rootvm096:/app/kgtk/temp# kgtk filter -i $GRAPH_CLAIMS --label rank -p " ; preferred ;  "  | wc -l
5,480,867

(base) rootvm096:/app/kgtk/temp# kgtk filter -i $GRAPH_CLAIMS --label rank -p " ; deprecated ;  "  | wc -l
1

(base) rootvm096:/home/cloud-di#      kgtk filter -i $GRAPH_CLAIMS --label rank -p " ; normal ;  "  | wc -l
553,558,106

*** O arquivo de CLAIMS tem somente normal ou preferred na coluna rank, não tem deprecated

TOP 10 PROPERTIES (Preferred Rank) with Distribution of % of "ranked" statements

(base) root@vm096:/home/cloud-di# kgtk sort -i /app/kgtk/data/my-tsv/preferredRank-claims-pred-count-label.tsv -c node2 --reverse-columns node2 --numeric-columns node2 / head
node1   label   node2   node1;distribution      node1;label
P1215   count   3914843 71.4275 'apparent magnitude'@en
P1082   count   225689  4.1178  'population'@en
P131    count   185778  3.3896  'located in the administrative territorial entity'@en
P8687   count   174665  3.1868  'social media followers'@en
P17     count   88322   1.6115  'country'@en
P31     count   77369   1.4116  'instance of'@en
P569    count   76903   1.4031  'date of birth'@en
P150    count   61848   1.1284  'contains administrative territorial entity'@en
P764    count   59749   1.0901  'OKTMO ID'@en
P1540   count   50994   0.9304  'male population'@en

node1;distribution = (node2 / 5,480,866) * 100

# and % of statements with preferred rank qualifier (P7452)

559,038,971 CLAIMS
72,234 preferred rank qualifier (P7452)

0,0129 %

Comando

(base) rootvm096:/home/cloud-di# zcat /app/kgtk/data/my-tsv/preferredRank-pred-P7452-sorted.tsv.gz | wc -l
72,235

TOP 10 PROPERTIES (Preferred Rank) with Distribution of % of "ranked" statememts

(base) root@vm096:/home/cloud-di# kgtk sort -i /app/kgtk/data/my-tsv/preferredRank-pred-P7452-count-label.tsv -c node2 --reverse-columns node2 --numeric-columns node2 / head
node1   label   node2   node1;distribution      node1;label
P569    count   34068   47.1634 'date of birth'@en
P570    count   17140   23.7284 'date of death'@en
P131    count   15474   21.422  'located in the administrative territorial entity'@en
P625    count   471     0.652   'coordinate location'@en
P19     count   371     0.5136  'place of birth'@en
P571    count   307     0.425   'inception'@en
P735    count   298     0.4125  'given name'@en
P856    count   297     0.4112  'official website'@en
P20     count   262     0.3627  'place of death'@en
P31     count   244     0.3378  'instance of'@en

node1;distribution = (node2 / 72,234) * 100

# and % of statements with deprecated rank qualifier (P2241)

559,038,971 CLAIMS
4,927 deprecated rank qualifier (P2241)

0,00088 %

Comando

(base) root@vm096:/home/cloud-di# zcat /app/kgtk/data/my-tsv/deprecatedRank-pred-P2241-sorted.tsv.gz | wc -l
4928

TOP 10 PROPERTIES (Deprecated Rank) with Distribution of % of "ranked" statemts

(base) root@vm096:/home/cloud-di# kgtk sort -i /app/kgtk/data/my-tsv/deprecatedRank-pred-P2241-count-label.tsv -c node2 --reverse-columns node2 --numeric-columns node2 / head
node1   label   node2   node1;distribution      node1;label
P2276   count   1579    32.0479 'UEFA player ID'@en
P856    count   695     14.1059 'official website'@en
P8580   count   561     11.3862 'NHK Archives Portal person ID'@en
P1367   count   306     6.2107  'Art UK artist ID'@en
P4762   count   139     2.8212  'Common Database on Designated Areas ID'@en
P809    count   138     2.8009  'WDPA ID'@en
P345    count   111     2.2529  'IMDb ID'@en
P136    count   95      1.9282  'genre'@en
P569    count   71      1.441   'date of birth'@en
P5161   count   69      1.4004  'Trustpilot company ID'@en

node1;distribution = (node2 / 4,927) * 100

Comentários

Postagens mais visitadas deste blog

Connected Papers: Uma abordagem alternativa para revisão da literatura

Durante um projeto de pesquisa podemos encontrar um artigo que nos identificamos em termos de problema de pesquisa e também de solução. Então surge a vontade de saber como essa área de pesquisa se desenvolveu até chegar a esse ponto ou quais desdobramentos ocorreram a partir dessa solução proposta para identificar o estado da arte nesse tema. Podemos seguir duas abordagens:  realizar uma revisão sistemática usando palavras chaves que melhor caracterizam o tema em bibliotecas digitais de referência para encontrar artigos relacionados ou realizar snowballing ancorado nesse artigo que identificamos previamente, explorando os artigos citados (backward) ou os artigos que o citam (forward)  Mas a ferramenta Connected Papers propõe uma abordagem alternativa para essa busca. O problema inicial é dado um artigo de interesse, precisamos encontrar outros artigos relacionados de "certa forma". Find different methods and approaches to the same subject Track down the state of the art rese...

Knowledge Graph Embedding with Triple Context - Leitura de Abstract

  Jun Shi, Huan Gao, Guilin Qi, and Zhangquan Zhou. 2017. Knowledge Graph Embedding with Triple Context. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management (CIKM '17). Association for Computing Machinery, New York, NY, USA, 2299–2302. https://doi.org/10.1145/3132847.3133119 ABSTRACT Knowledge graph embedding, which aims to represent entities and relations in vector spaces, has shown outstanding performance on a few knowledge graph completion tasks. Most existing methods are based on the assumption that a knowledge graph is a set of separate triples, ignoring rich graph features, i.e., structural information in the graph. In this paper, we take advantages of structures in knowledge graphs, especially local structures around a triple, which we refer to as triple context. We then propose a Triple-Context-based knowledge Embedding model (TCE). For each triple, two kinds of structure information are considered as its context in the graph; one is the out...

KnOD 2021

Beyond Facts: Online Discourse and Knowledge Graphs A preface to the proceedings of the 1st International Workshop on Knowledge Graphs for Online Discourse Analysis (KnOD 2021, co-located with TheWebConf’21) https://ceur-ws.org/Vol-2877/preface.pdf https://knod2021.wordpress.com/   ABSTRACT Expressing opinions and interacting with others on the Web has led to the production of an abundance of online discourse data, such as claims and viewpoints on controversial topics, their sources and contexts . This data constitutes a valuable source of insights for studies into misinformation spread, bias reinforcement, echo chambers or political agenda setting. While knowledge graphs promise to provide the key to a Web of structured information, they are mainly focused on facts without keeping track of the diversity, connection or temporal evolution of online discourse data. As opposed to facts, claims are inherently more complex. Their interpretation strongly depends on the context and a vari...