Disputas e Ranking na WD

Disputas e Ranking na WD - estatísticas

WD de Junho de 2022

# and % and distribution of % of "disputed by" statements

559,038,971 CLAIMS
1,577 disputed by

0,0028 %

Comando

(base) rootvm096:/app/kgtk/temp# zcat /app/kgtk/data/my-tsv/disputedBy-claims-sorted.tsv.gz | wc -l
1,578

TOP 10 PROPERTIES (Disputed By)

(base) root@vm096:/home/cloud-di# kgtk sort -i /app/kgtk/data/my-tsv/disputedBy-claims-pred-count-label.tsv -c node2 --reverse-columns node2 --numeric-columns node2 / head
node1   label   node2   node1;distribution      node1;label
P17     count   561     35.5739 'country'@en
P3355   count   186     11.7945 'negative therapeutic predictor'@en
P3354   count   140     8.8776 'positive therapeutic predictor'@en
P131    count   106     6.7216 'located in the administrative territorial entity'@en
P31     count   78      4.9461 'instance of'@en
P460    count   43      2.7267 'said to be the same as'@en
P3359   count   29      1.8389 'negative prognostic predictor'@en
P40     count   20      1.2682 'child'@en
P39     count   19      1.2048 'position held'@en
P170    count   18      1.1414 'creator'@en

node1;distribution = (node2 / 1,577) * 100

# and % of "ranked" statements

559,038,971 CLAIMS
553,558,106 normal rank
5,480,866 preferred rank
0 deprecated rank

0,98 % preferred rank

Comando

(base) rootvm096:/app/kgtk/temp# kgtk filter -i $GRAPH_CLAIMS --label rank -p " ; preferred ; " | wc -l
5,480,867

(base) rootvm096:/app/kgtk/temp# kgtk filter -i $GRAPH_CLAIMS --label rank -p " ; deprecated ; " | wc -l
1

(base) rootvm096:/home/cloud-di#      kgtk filter -i $GRAPH_CLAIMS --label rank -p " ; normal ; " | wc -l
553,558,106

*** O arquivo de CLAIMS tem somente normal ou preferred na coluna rank, não tem deprecated

TOP 10 PROPERTIES (Preferred Rank) with Distribution of % of "ranked" statements

(base) root@vm096:/home/cloud-di# kgtk sort -i /app/kgtk/data/my-tsv/preferredRank-claims-pred-count-label.tsv -c node2 --reverse-columns node2 --numeric-columns node2 / head
node1   label   node2   node1;distribution      node1;label
P1215   count   3914843 71.4275 'apparent magnitude'@en
P1082   count   225689 4.1178 'population'@en
P131    count   185778 3.3896 'located in the administrative territorial entity'@en
P8687   count   174665 3.1868 'social media followers'@en
P17     count   88322   1.6115 'country'@en
P31     count   77369   1.4116 'instance of'@en
P569    count   76903   1.4031 'date of birth'@en
P150    count   61848   1.1284 'contains administrative territorial entity'@en
P764    count   59749   1.0901 'OKTMO ID'@en
P1540   count   50994   0.9304 'male population'@en

node1;distribution = (node2 / 5,480,866) * 100

# and % of statements with preferred rank qualifier (P7452)

559,038,971 CLAIMS
72,234 preferred rank qualifier (P7452)

0,0129 %

Comando

(base) rootvm096:/home/cloud-di# zcat /app/kgtk/data/my-tsv/preferredRank-pred-P7452-sorted.tsv.gz | wc -l
72,235

TOP 10 PROPERTIES (Preferred Rank) with Distribution of % of "ranked" statememts

(base) root@vm096:/home/cloud-di# kgtk sort -i /app/kgtk/data/my-tsv/preferredRank-pred-P7452-count-label.tsv -c node2 --reverse-columns node2 --numeric-columns node2 / head
node1   label   node2   node1;distribution      node1;label
P569    count   34068   47.1634 'date of birth'@en
P570    count   17140   23.7284 'date of death'@en
P131    count   15474   21.422 'located in the administrative territorial entity'@en
P625    count   471     0.652   'coordinate location'@en
P19     count   371     0.5136 'place of birth'@en
P571    count   307     0.425   'inception'@en
P735    count   298     0.4125 'given name'@en
P856    count   297     0.4112 'official website'@en
P20     count   262     0.3627 'place of death'@en
P31     count   244     0.3378 'instance of'@en

node1;distribution = (node2 / 72,234) * 100

# and % of statements with deprecated rank qualifier (P2241)

559,038,971 CLAIMS
4,927 deprecated rank qualifier (P2241)

0,00088 %

Comando

(base) root@vm096:/home/cloud-di# zcat /app/kgtk/data/my-tsv/deprecatedRank-pred-P2241-sorted.tsv.gz | wc -l
4928

TOP 10 PROPERTIES (Deprecated Rank) with Distribution of % of "ranked" statemts

(base) root@vm096:/home/cloud-di# kgtk sort -i /app/kgtk/data/my-tsv/deprecatedRank-pred-P2241-count-label.tsv -c node2 --reverse-columns node2 --numeric-columns node2 / head
node1   label   node2   node1;distribution      node1;label
P2276   count   1579    32.0479 'UEFA player ID'@en
P856    count   695     14.1059 'official website'@en
P8580   count   561     11.3862 'NHK Archives Portal person ID'@en
P1367   count   306     6.2107 'Art UK artist ID'@en
P4762   count   139     2.8212 'Common Database on Designated Areas ID'@en
P809    count   138     2.8009 'WDPA ID'@en
P345    count   111     2.2529 'IMDb ID'@en
P136    count   95      1.9282 'genre'@en
P569    count   71      1.441   'date of birth'@en
P5161   count   69      1.4004 'Trustpilot company ID'@en

node1;distribution = (node2 / 4,927) * 100

Pesquisa de Doutorado da Veronica

Pesquisar este blog