Pular para o conteúdo principal

Disputas e Ranking na WD - estatísticas

WD de Junho de 2022

# and % and distribution of % of "disputed by" statements

559,038,971 CLAIMS
1,577 disputed by

0,0028 %

Comando

(base) rootvm096:/app/kgtk/temp# zcat /app/kgtk/data/my-tsv/disputedBy-claims-sorted.tsv.gz | wc -l
1,578

TOP 10 PROPERTIES (Disputed By)

(base) root@vm096:/home/cloud-di# kgtk sort -i /app/kgtk/data/my-tsv/disputedBy-claims-pred-count-label.tsv -c node2 --reverse-columns node2 --numeric-columns node2 / head
node1   label   node2   node1;distribution      node1;label
P17     count   561     35.5739 'country'@en
P3355   count   186     11.7945 'negative therapeutic predictor'@en
P3354   count   140     8.8776  'positive therapeutic predictor'@en
P131    count   106     6.7216  'located in the administrative territorial entity'@en
P31     count   78      4.9461  'instance of'@en
P460    count   43      2.7267  'said to be the same as'@en
P3359   count   29      1.8389  'negative prognostic predictor'@en
P40     count   20      1.2682  'child'@en
P39     count   19      1.2048  'position held'@en
P170    count   18      1.1414  'creator'@en

node1;distribution = (node2 / 1,577) * 100

# and % of "ranked" statements

559,038,971 CLAIMS
553,558,106 normal rank
5,480,866 preferred rank
0 deprecated rank

0,98 % preferred rank

Comando

(base) rootvm096:/app/kgtk/temp# kgtk filter -i $GRAPH_CLAIMS --label rank -p " ; preferred ;  "  | wc -l
5,480,867

(base) rootvm096:/app/kgtk/temp# kgtk filter -i $GRAPH_CLAIMS --label rank -p " ; deprecated ;  "  | wc -l
1

(base) rootvm096:/home/cloud-di#      kgtk filter -i $GRAPH_CLAIMS --label rank -p " ; normal ;  "  | wc -l
553,558,106

*** O arquivo de CLAIMS tem somente normal ou preferred na coluna rank, não tem deprecated

TOP 10 PROPERTIES (Preferred Rank) with Distribution of % of "ranked" statements

(base) root@vm096:/home/cloud-di# kgtk sort -i /app/kgtk/data/my-tsv/preferredRank-claims-pred-count-label.tsv -c node2 --reverse-columns node2 --numeric-columns node2 / head
node1   label   node2   node1;distribution      node1;label
P1215   count   3914843 71.4275 'apparent magnitude'@en
P1082   count   225689  4.1178  'population'@en
P131    count   185778  3.3896  'located in the administrative territorial entity'@en
P8687   count   174665  3.1868  'social media followers'@en
P17     count   88322   1.6115  'country'@en
P31     count   77369   1.4116  'instance of'@en
P569    count   76903   1.4031  'date of birth'@en
P150    count   61848   1.1284  'contains administrative territorial entity'@en
P764    count   59749   1.0901  'OKTMO ID'@en
P1540   count   50994   0.9304  'male population'@en

node1;distribution = (node2 / 5,480,866) * 100

# and % of statements with preferred rank qualifier (P7452)

559,038,971 CLAIMS
72,234 preferred rank qualifier (P7452)

0,0129 %

Comando

(base) rootvm096:/home/cloud-di# zcat /app/kgtk/data/my-tsv/preferredRank-pred-P7452-sorted.tsv.gz | wc -l
72,235

TOP 10 PROPERTIES (Preferred Rank) with Distribution of % of "ranked" statememts

(base) root@vm096:/home/cloud-di# kgtk sort -i /app/kgtk/data/my-tsv/preferredRank-pred-P7452-count-label.tsv -c node2 --reverse-columns node2 --numeric-columns node2 / head
node1   label   node2   node1;distribution      node1;label
P569    count   34068   47.1634 'date of birth'@en
P570    count   17140   23.7284 'date of death'@en
P131    count   15474   21.422  'located in the administrative territorial entity'@en
P625    count   471     0.652   'coordinate location'@en
P19     count   371     0.5136  'place of birth'@en
P571    count   307     0.425   'inception'@en
P735    count   298     0.4125  'given name'@en
P856    count   297     0.4112  'official website'@en
P20     count   262     0.3627  'place of death'@en
P31     count   244     0.3378  'instance of'@en

node1;distribution = (node2 / 72,234) * 100

# and % of statements with deprecated rank qualifier (P2241)

559,038,971 CLAIMS
4,927 deprecated rank qualifier (P2241)

0,00088 %

Comando

(base) root@vm096:/home/cloud-di# zcat /app/kgtk/data/my-tsv/deprecatedRank-pred-P2241-sorted.tsv.gz | wc -l
4928

TOP 10 PROPERTIES (Deprecated Rank) with Distribution of % of "ranked" statemts

(base) root@vm096:/home/cloud-di# kgtk sort -i /app/kgtk/data/my-tsv/deprecatedRank-pred-P2241-count-label.tsv -c node2 --reverse-columns node2 --numeric-columns node2 / head
node1   label   node2   node1;distribution      node1;label
P2276   count   1579    32.0479 'UEFA player ID'@en
P856    count   695     14.1059 'official website'@en
P8580   count   561     11.3862 'NHK Archives Portal person ID'@en
P1367   count   306     6.2107  'Art UK artist ID'@en
P4762   count   139     2.8212  'Common Database on Designated Areas ID'@en
P809    count   138     2.8009  'WDPA ID'@en
P345    count   111     2.2529  'IMDb ID'@en
P136    count   95      1.9282  'genre'@en
P569    count   71      1.441   'date of birth'@en
P5161   count   69      1.4004  'Trustpilot company ID'@en

node1;distribution = (node2 / 4,927) * 100

Comentários

Postagens mais visitadas deste blog

Aula 12: WordNet | Introdução à Linguagem de Programação Python *** com NLTK

 Fonte -> https://youtu.be/0OCq31jQ9E4 A WordNet do Brasil -> http://www.nilc.icmc.usp.br/wordnetbr/ NLTK  synsets = dada uma palavra acha todos os significados, pode informar a língua e a classe gramatical da palavra (substantivo, verbo, advérbio) from nltk.corpus import wordnet as wn wordnet.synset(xxxxxx).definition() = descrição do significado É possível extrair hipernimia, hiponimia, antonimos e os lemas (diferentes palavras/expressões com o mesmo significado) formando uma REDE LEXICAL. Com isso é possível calcular a distância entre 2 synset dentro do grafo.  Veja trecho de código abaixo: texto = 'útil' print('NOUN:', wordnet.synsets(texto, lang='por', pos=wordnet.NOUN)) texto = 'útil' print('ADJ:', wordnet.synsets(texto, lang='por', pos=wordnet.ADJ)) print(wordnet.synset('handy.s.01').definition()) texto = 'computador' for synset in wn.synsets(texto, lang='por', pos=wn.NOUN):     print('DEF:',s

truth makers AND truth bearers - Palestra Giancarlo no SBBD

Dando uma googada https://iep.utm.edu/truth/ There are two commonly accepted constraints on truth and falsehood:     Every proposition is true or false.         [Law of the Excluded Middle.]     No proposition is both true and false.         [Law of Non-contradiction.] What is the difference between a truth-maker and a truth bearer? Truth-bearers are either true or false; truth-makers are not since, not being representations, they cannot be said to be true, nor can they be said to be false . That's a second difference. Truth-bearers are 'bipolar,' either true or false; truth-makers are 'unipolar': all of them obtain. What are considered truth bearers?   A variety of truth bearers are considered – statements, beliefs, claims, assumptions, hypotheses, propositions, sentences, and utterances . When I speak of a fact . . . I mean the kind of thing that makes a proposition true or false. (Russell, 1972, p. 36.) “Truthmaker theories” hold that in order for any truthbe

DGL-KE : Deep Graph Library (DGL)

Fonte: https://towardsdatascience.com/introduction-to-knowledge-graph-embedding-with-dgl-ke-77ace6fb60ef Amazon recently launched DGL-KE, a software package that simplifies this process with simple command-line scripts. With DGL-KE , users can generate embeddings for very large graphs 2–5x faster than competing techniques. DGL-KE provides users the flexibility to select models used to generate embeddings and optimize performance by configuring hardware, data sampling parameters, and the loss function. To use this package effectively, however, it is important to understand how embeddings work and the optimizations available to compute them. This two-part blog series is designed to provide this information and get you ready to start taking advantage of DGL-KE . Finally, another class of graphs that is especially important for knowledge graphs are multigraphs . These are graphs that can have multiple (directed) edges between the same pair of nodes and can also contain loops. The