WD de Junho de 2022
# and % and distribution of %, of controversial statements
559,038,971 CLAIMS
132,552,453 potencialmente controversos
23,71%
(base) root@vm096:/home/cloud-di# zcat /app/kgtk/data/wikidata/claims.tsv.gz | wc -l
559,038,972
(base) root@vm096:/home/cloud-di# more /app/kgtk/data/my-tsv/filtered-claims-sorted-uniq.tsv.gz | wc -l
132,552,454
# and % and distribution of % of controversial properties
9,653 PROPERTIES (All)
2,143 potencialmente controversos
22,20 %
Comando
(base) root@vm096:/app/kgtk/temp# cat /app/kgtk/data/my-tsv/all-claims-pred-counted.tsv | wc -l
9654
(base) root@vm096:/app/kgtk/temp# cat /app/kgtk/data/my-tsv/filtered-pred-count-sorted.tsv | wc -l
2144
TOP 10 PROPERTIES (All)
(base) root@vm096:/app/kgtk/temp# kgtk sort -i /app/kgtk/data/my-tsv/all-claims-pred-counted.tsv -c node2 --reverse-columns node2 --numeric-columns node2 / head
node1 label node2
P31 count 59717980
P1215 count 33122376
P528 count 28738709
P17 count 14996553
P131 count 11371144
P106 count 9608349
P625 count 9267000
P2215 count 8207685
P3083 count 8150658
P6257 count 8091255
TOP 10 PROPERTIES (potencialmente controversos)
(base) root@vm096:/app/kgtk/temp# kgtk sort -i /app/kgtk/data/my-tsv/filtered-pred-count-sorted.tsv -c node2 --reverse-columns node2 --numeric-columns node2 / head
node1 label node2
P1215 count 32818006
P528 count 26283488
P2215 count 8207274
P31 count 5407077
P684 count 4306414
P106 count 3987622
P1087 count 2866023
P1082 count 2016139
P361 count 1940350
P527 count 1679292
# and % and distribution of % of qualification for controversial statements
141,983,745 QUALIFICATION FOR CLAIMS (All)
104,362,344 qualificações para algeações potencialmente controversas
87,83 %
Comando
(base) cloud-di@vm096:~$ zcat /app/kgtk/data/my-tsv/quals-sorted.tsv.gz | wc -l
141,983,746
(base) root@vm096:/home/cloud-di# zcat /app/kgtk/data/my-tsv/filtered-quals-sorted.tsv.gz | wc -l
104,362,345
# and % and distribution of % of different qualifiers in controversial statements
9,905 QUALIFIERS (All)
8,700 potencialmente controversos
87,83 %
Comando
(base) root@vm096:/app/kgtk/temp# more /app/kgtk/data/my-tsv/quals-counted.tsv | wc -l
9,907
(base) root@vm096:/app/kgtk/temp# more /app/kgtk/data/my-tsv/filtered-quals-counted.tsv | wc -l
8,701
TOP 10 QUALIFIERS (All)
(base) root@vm096:/app/kgtk/temp# kgtk sort -i /app/kgtk/data/my-tsv/quals-counted.tsv -c node2 --reverse-columns node2 --numeric-columns node2 / head
node1 label node2
P1227 count 33122324
P972 count 23776643
P585 count 10432968
P642 count 8966228
P459 count 7930570
P580 count 7048511
P703 count 4317830
P582 count 3601028
P1545 count 3597770
P1057 count 2776317
TOP 10 QUALIFIERS (potencialmente controversos)
(base) root@vm096:/app/kgtk/temp# kgtk sort -i /app/kgtk/data/my-tsv/filtered-quals-counted.tsv -c node2 --reverse-columns node2 --numeric-columns node2 / head
node1 label node2
P1227 count 32818028
P972 count 22180541
P642 count 8480507
P585 count 7860132
P459 count 5364796
P703 count 4304082
P580 count 3331943
P582 count 2525992
P1545 count 2474157
P1013 count 813579
# and % and distribution of % of controversial statements without qualifiers (contextually incomplete)
132,552,453 potencialmente controversos
2,599,600 potencialmente controversos por incompletude
1,96 %
Comando
(base) root@vm096:/home/cloud-di# more /app/kgtk/data/my-tsv/filtered-claims-sorted-uniq.tsv.gz | wc -l
132,552,454
(base) root@vm096:/app/kgtk/temp# more /app/kgtk/data/my-tsv/filtered-claims-without-quals-sorted.tsv.gz | wc -l
2,599,601
Comentários
Postar um comentário
Sinta-se a vontade para comentar. Críticas construtivas são sempre bem vindas.