https://youtu.be/WqYBx2gB6vA
https://www.linkedin.com/feed/update/urn:li:activity:7067193794328154112?utm_source=share&utm_medium=member_desktop
In this talk, he deliberated that despite #AI’s great strides in text processing, the compliance burden benefits the most from more straightforward, structured ways of encoding and sharing knowledge that fills the gap of modern risk-based, implementation-specific approaches.
Esforço humano em construir KG
Large Language Models ... modelos com redes neurais
ChatGPT fornece contexto assim como a pesquisa do Google
Custo de treinamento do modelo seria beeeem maior que o de manter um GraphDB
Exemplo de conflito em local de nascimento: WD, Wikipedia, Google e ChatGPT em dois idiomas fornecem respostas diferentes. Não tem a fonte (que seria o contexto). A resposta em Croata foi baseada em probabilidade e não em uma fonte (mas deveria vir com um contexto de "acurácia")
ChatGPT decorou todos os QIDs??????
ChatGPT tem consciência de que pode dar respostas inconsistentes e tudo bem ... mas WD também pq não é a fonte e mesmo assim não exige referências.
Enriquecer LLM com outros recursos como KG
Usar LLM para popular os KGs extraindo conhecimento do texto
KG podem ser editados, analisados, mantidos, .... para manter o conhecimento e cobrir a cauda longa (do que é menos buscado, divulgado, disseminado,...)
WD permite registrar que não sabemos e fatos negativos (mas não permite a negativa dos fatos apesar de tratar exceções). WD ainda não é explícito sobre o que não se consegue expressar com o KG, o que seria complicado para construir um statement/claim.
Transcrição do trecho final: We want to extract knowledge into a symbolic form. We want the system to overfit for truth.
ResponderExcluirAnd this is why it makes so much sense to store the knowledge in a symbolic system.
One that can be edited, audited, curated, understood, where we can cover the long tail by simply adding new nodes to the knowledge graph, one we don't train to return knowledge with a certain probability, to make stuff up on the fly, but one where we can simply look it up.
And maybe not all of the pieces are in place to make this happen just yet. There are questions around identity and embeddings, how exactly do they talk with each other. There are good ideas to help with those problems. And knowledge graphs themselves should probably also evolve.
I want to make one particular suggestion here: Freebase, the Google Knowledge Graph, Wikidata, they all have two kinds of special values or special statements: the first one is the possibility to say that a specific statement has no value. Here for example we are saying that Elizabeth I has no children. The second special value is the unknown value. That is, we know that there is a value for it but we don't know what the value is. It's like a question mark in the graph. For example, we don't know who Adam Smith's father is but we know he has one. It could be one of the existing nodes, it could be one node that we didn't represent yet, we have no idea.
My suggestion is to introduce a third special value: "it's complicated". I usually get people laughing when I make the suggestion but I'm really serious. "It's complicated" is what you would use if the answer cannot be stated with the expressivity of your knowledge graph. This helps with maintaining the graph to mark difficult spots explicitly.
This helps with avoiding embarrassing, wrong, or flat out dangerous answers and given the interaction with LLMs this can in particular mark areas of knowledge where we say "Don't trust the graph! Can we instead train the LLM harder on this particular question and assign a few extra parameters for that?"
But really what we want to be able to say are more expressive statements.
In order to build a much more expressive ground truth, to be able to say sentences like these: "Jupiter is the largest planet in the solar system".
That's what we are working on right now with Abstract Wikipedia and Wikifunctions we aim to vastly extend the limited expressivity of Wikidata so that complicated things become stateable. This way we hope to provide a ground truth for large language models.
In summary: large language models are truly awesome. They are particularly awesome as an incredibly enabling UX tool. It it's just breathtaking, honestly, things are happening which I didn't think possible in my lifetime. But they hallucinate. They need ground truth. They just make up stuff. They are expensive to train and to run. They're difficult to fix and repair, which isn't great if you have to explain to someone "hey sorry, I cannot fix your problem.
The thing is making a mistake but I don't have a clue how to make it better"
They are hard to audit and explain which in areas like finance and medicine is crucial.
They give inconsistent answers. They struggle with low resource languages.
And they have a coverage gap on long tail entities which is not easily overcome.
All of these problems can be solved with knowledge graphs which is why I think that the future of knowledge graphs is brighter than ever especially thanks to a world that has large language models in it.