Pular para o conteúdo principal

The Future of Knowledge Graphs in a World of Large Language Models - Video

 https://youtu.be/WqYBx2gB6vA

 https://www.linkedin.com/feed/update/urn:li:activity:7067193794328154112?utm_source=share&utm_medium=member_desktop

In this talk, he deliberated that despite #AI’s great strides in text processing, the compliance burden benefits the most from more straightforward, structured ways of encoding and sharing knowledge that fills the gap of modern risk-based, implementation-specific approaches. 

Esforço humano em construir KG 

Large Language Models ... modelos com redes neurais

ChatGPT fornece contexto assim como a pesquisa do Google

Custo de treinamento do modelo seria beeeem maior que o de manter um GraphDB 

Exemplo de conflito em local de nascimento: WD, Wikipedia, Google e ChatGPT em dois idiomas fornecem respostas diferentes. Não tem a fonte (que seria o contexto). A resposta em Croata foi baseada em probabilidade e não em uma fonte (mas deveria vir com um contexto de "acurácia")

ChatGPT decorou todos os QIDs??????

ChatGPT tem consciência de que pode dar respostas inconsistentes e tudo bem ... mas WD também pq não é a fonte e mesmo assim não exige referências. 

Enriquecer LLM com outros recursos como KG

Usar LLM para popular os KGs extraindo conhecimento do texto

KG podem ser editados, analisados, mantidos, .... para manter o conhecimento e cobrir a cauda longa (do que é menos buscado, divulgado, disseminado,...)

WD permite registrar que não sabemos e fatos negativos (mas não permite a negativa dos fatos apesar de tratar exceções). WD ainda não é explícito sobre o que não se consegue expressar com o KG, o que seria complicado para construir um statement/claim.




Comentários

  1. Transcrição do trecho final: We want to extract knowledge into a symbolic form. We want the system to overfit for truth.
    And this is why it makes so much sense to store the knowledge in a symbolic system.
    One that can be edited, audited, curated, understood, where we can cover the long tail by simply adding new nodes to the knowledge graph, one we don't train to return knowledge with a certain probability, to make stuff up on the fly, but one where we can simply look it up.
    And maybe not all of the pieces are in place to make this happen just yet. There are questions around identity and embeddings, how exactly do they talk with each other. There are good ideas to help with those problems. And knowledge graphs themselves should probably also evolve.
    I want to make one particular suggestion here: Freebase, the Google Knowledge Graph, Wikidata, they all have two kinds of special values or special statements: the first one is the possibility to say that a specific statement has no value. Here for example we are saying that Elizabeth I has no children. The second special value is the unknown value. That is, we know that there is a value for it but we don't know what the value is. It's like a question mark in the graph. For example, we don't know who Adam Smith's father is but we know he has one. It could be one of the existing nodes, it could be one node that we didn't represent yet, we have no idea.
    My suggestion is to introduce a third special value: "it's complicated". I usually get people laughing when I make the suggestion but I'm really serious. "It's complicated" is what you would use if the answer cannot be stated with the expressivity of your knowledge graph. This helps with maintaining the graph to mark difficult spots explicitly.
    This helps with avoiding embarrassing, wrong, or flat out dangerous answers and given the interaction with LLMs this can in particular mark areas of knowledge where we say "Don't trust the graph! Can we instead train the LLM harder on this particular question and assign a few extra parameters for that?"
    But really what we want to be able to say are more expressive statements.
    In order to build a much more expressive ground truth, to be able to say sentences like these: "Jupiter is the largest planet in the solar system".
    That's what we are working on right now with Abstract Wikipedia and Wikifunctions we aim to vastly extend the limited expressivity of Wikidata so that complicated things become stateable. This way we hope to provide a ground truth for large language models.
    In summary: large language models are truly awesome. They are particularly awesome as an incredibly enabling UX tool. It it's just breathtaking, honestly, things are happening which I didn't think possible in my lifetime. But they hallucinate. They need ground truth. They just make up stuff. They are expensive to train and to run. They're difficult to fix and repair, which isn't great if you have to explain to someone "hey sorry, I cannot fix your problem.
    The thing is making a mistake but I don't have a clue how to make it better"
    They are hard to audit and explain which in areas like finance and medicine is crucial.
    They give inconsistent answers. They struggle with low resource languages.
    And they have a coverage gap on long tail entities which is not easily overcome.
    All of these problems can be solved with knowledge graphs which is why I think that the future of knowledge graphs is brighter than ever especially thanks to a world that has large language models in it.

    ResponderExcluir

Postar um comentário

Sinta-se a vontade para comentar. Críticas construtivas são sempre bem vindas.

Postagens mais visitadas deste blog

Aula 12: WordNet | Introdução à Linguagem de Programação Python *** com NLTK

 Fonte -> https://youtu.be/0OCq31jQ9E4 A WordNet do Brasil -> http://www.nilc.icmc.usp.br/wordnetbr/ NLTK  synsets = dada uma palavra acha todos os significados, pode informar a língua e a classe gramatical da palavra (substantivo, verbo, advérbio) from nltk.corpus import wordnet as wn wordnet.synset(xxxxxx).definition() = descrição do significado É possível extrair hipernimia, hiponimia, antonimos e os lemas (diferentes palavras/expressões com o mesmo significado) formando uma REDE LEXICAL. Com isso é possível calcular a distância entre 2 synset dentro do grafo.  Veja trecho de código abaixo: texto = 'útil' print('NOUN:', wordnet.synsets(texto, lang='por', pos=wordnet.NOUN)) texto = 'útil' print('ADJ:', wordnet.synsets(texto, lang='por', pos=wordnet.ADJ)) print(wordnet.synset('handy.s.01').definition()) texto = 'computador' for synset in wn.synsets(texto, lang='por', pos=wn.NOUN):     print('DEF:',s...

truth makers AND truth bearers - Palestra Giancarlo no SBBD

Dando uma googada https://iep.utm.edu/truth/ There are two commonly accepted constraints on truth and falsehood:     Every proposition is true or false.         [Law of the Excluded Middle.]     No proposition is both true and false.         [Law of Non-contradiction.] What is the difference between a truth-maker and a truth bearer? Truth-bearers are either true or false; truth-makers are not since, not being representations, they cannot be said to be true, nor can they be said to be false . That's a second difference. Truth-bearers are 'bipolar,' either true or false; truth-makers are 'unipolar': all of them obtain. What are considered truth bearers?   A variety of truth bearers are considered – statements, beliefs, claims, assumptions, hypotheses, propositions, sentences, and utterances . When I speak of a fact . . . I mean the kind of thing that makes a proposition true or false. (Russe...

DGL-KE : Deep Graph Library (DGL)

Fonte: https://towardsdatascience.com/introduction-to-knowledge-graph-embedding-with-dgl-ke-77ace6fb60ef Amazon recently launched DGL-KE, a software package that simplifies this process with simple command-line scripts. With DGL-KE , users can generate embeddings for very large graphs 2–5x faster than competing techniques. DGL-KE provides users the flexibility to select models used to generate embeddings and optimize performance by configuring hardware, data sampling parameters, and the loss function. To use this package effectively, however, it is important to understand how embeddings work and the optimizations available to compute them. This two-part blog series is designed to provide this information and get you ready to start taking advantage of DGL-KE . Finally, another class of graphs that is especially important for knowledge graphs are multigraphs . These are graphs that can have multiple (directed) edges between the same pair of nodes and can also contain loops. The...