Pular para o conteúdo principal

Situating Search - Leitura de Artigo

Shah, C., & Bender, E. M. (2022, March). Situating Search. In ACM SIGIR Conference on Human Information Interaction and Retrieval (pp. 221-232).

Vídeo da apresentação: https://youtu.be/VY1GHbU_FYs

A proposta do Google é substituir as ferramentas de busca por ChatBots (simular a conversa com um especialista)
 
ABSTRACT
 
Search systems, like many other applications of machine learning, have become increasingly complex and opaque

[Opaco quanto as escolhas sobre o que estaria implícito na pergunta]
 
The notions of relevance, usefulness, and trustworthiness with respect to information were already overloaded and often difficult to articulate, study, or implement. Newly surfaced proposals that aim to use large language models to generate relevant information for a user’s needs pose even greater threat to transparency, provenance, and user interactions in a search system.
 
In this perspective paper we revisit the problem of search in the larger context of information seeking and argue that removing or reducing interactions in an effort to retrieve presumably more relevant information can be detrimental to many fundamental aspects of search, including information verification, information literacy, and serendipity.
 
[Uso de algoritmos de ML sem explicabilidade tornam o processo de busca opaco, sem explicação para os resultados. Requisitos como relevância, utilidade e confiabilidade em relação a informação recuperada no contexto da necessidade de informação podem não ser atendidos. O foco na redução do esforço para recuperação pode atrapalhar outros benefícios do processo de busca.] 

[Ao recuperar as alegações contextualizadas evita-se interações intermediárias para responder perguntas de contexto sobre as respostas]
 
1 INTRODUCTION
 
[Foca em busca na Web com Palavras chaves por ser a mais usada mas atenderia a outros tipos de busca e dentro da visão de Information Seeking]
 
We observe a trend towards valuing speed and convenience and ask: Is getting the user to a piece of relevant information as fast as possible the only or the most important goal of a search system? We argue that it should not be; that a search system needs to support more than matching or generating an answer; that an information processing system should provide more ways to interact with and make sense out of information than simply retrieving it based on programmed in notions of relevance and usefulness. More importantly, we argue that searching is a socially and contextually situated activity with diverse set of goals and needs for support that must not be boiled down to a combination of text matching and text generating algorithms.
 
[Eficiência em termos de tempo de resposta não se converte em eficácia em termos de satisfazer a necessidade de informação do usuário. Busca seria um processo social e contextualizado na tarefa a ser executada com a informação recuperada]
 
2 RETHINKING SEARCH
 
In ‘Rethinking Search’, Metzler et al . [49] propose a vision for the future of search which builds on today’s large language models and imagines users who wish for search engines to function as ‘domain experts’ able to answer their questions directly, rather than as tools for finding documents which may contain the information sought, or for other types of interactions with information.
 
[Busca como interação com um especialista na necessidade de informação do usuário, seria o Dono da Verdade, um Oráculo no sentido mais literal. Similar a Social Search que acontece em Fóruns e Redes Sociais.]
 
In brief, the argument from [ 8] is that a machine learning model cannot possibly learn what is not in its data, and the data for language model does not provide the machine with any signal it can use about meaning. Languages are systems of signs (pairings of form and meaning [ 20 ]). Once a person or other agent has acquired that system, they can use the form to reconstruct meaning, but the acquisition requires access to both. Thus, while the distributional information absorbed by language models can make them extremely useful components of larger systems, the fact that it also enables them to generate seemingly relevant and coherent text does not make them trustworthy sources of information — even as sounding conversational makes people more likely to trust them [1, 24].
 
[Tornar a busca uma interação que simule a conversa com um humano pode levar a acreditar mesmo sem evidências suficientes]
 
3 SITUATING SEARCH WITHIN SOCIETY AND TECH DEVELOPMENT

As the amount of information produced and made available online has increased dramatically, these tools and services have evolved in their ability to capture, store, and serve information. On the other hand, the users of these services have also changed how they use the systems, what they expect in return, and what makes them satisfied [44 , 45, 75]. The question to consider now is how should these services and the usage patterns they support develop next? Should the systems provide more or different ways to interact with information? Should they focus on reducing cognitive load of users by offloading some of their thinking or decision-making? Should the users develop better literacy with respect to the tools for accessing information or expect these tools to become more amenable to their current practices?
 
[Os ganhos em literacia da informação que o resultado contextualizado pode trazer. Manter com o usuário o poder de decisão a partir da análise das respostas, aplicando regras de confiança]

3.1 Search and society shape each other
 
[no SIGIR]
 
Norbert Fuhr (2012 winner) contrasted search systems with database systems and proposed to address information object needs as well as task needs.
 
[Atender a necessidade de informação no contexto da tarefa. Buscadores verticais.]
 
Nicholas J. Belkin (2015 winner), a strong proponent of interactive IR, envisioned how we could build search systems that incorporate utility of information to the user rather than objective relevance only.
 
[Utilidade ao invés de relevância ... mas eu não estou tratando relevância e sim dando opções para que o usuário identifique pelo contexto das alegações o que seria útil para ela na tarefa que pretende desempenhar]
 
Kalervo P. Jarvelin (2018 winner) emphasized how important it is to understand and model the context in which the information interactions take place, in order to serve the information seekers.
 
[Contexto da interação no sistema de informação e não da tarefa ou da alegação]  
 
Most recently, ChengXiang Zhai (2021 winner) presented a view of search systems where the notion of ‘intelligence’ has been shifting from system-centered to user-centered.

[O usuário é quem decide o que é verdadeiro e útil]

[Conferências para publicar / explorar: CHIR, The Web Conference, WSDM]

3.2 Searching has evolved
 
Searching is no longer only about finding relevant information from a few select sources. Since almost anyone can produce and disseminate information, knowing who created information and with what agenda became increasingly important for finding useful and trustworthy information [18, 62]

[Busca na era do Big Data e da Pós Verdade, Fake News, Desinformação. Contextualizar a Informação Recuperada. ]

In short, searching for relevant or useful information is not a simple problem of matching a clearly expressed information need to well-articulated answers from trusted sources. Information sources as well as people’s information seeking behavior have become more diverse, which in turn increases the need for flexible tools that can support diverse modes of usage.
 
[O comportamento de busca do usuário mudou. ]
 
3.3 Searching beyond lookup
 
Google VP Nayak’s MUM blog post presents two kinds of queries from the perspective of a prospective mountain climber. The first is open-ended: “I’ve hiked Mt. Adams and now I want to hike Mt. Fuji. What should I do differently to prepare?” The second is especially specific: The user uploads a photo of the hiking boots they wore on Mt. Adams and asks if they would be appropriate for Mt. Fuji. In both cases, Nayak invites us to imagine the search engine as an expert which is able to fill in relevant information (e.g., summit height, trail difficulty, weather conditions, what type of boots are pictured and what properties they have, etc.) and both provide the user with answers and direct them to further resources.
 
[A primeira pergunta é aberta e exploratória, investigar as diferenças, comparar. A segunda pode ser Look Up se houverem revisões sobre o equipamento em sites de montanhismo ou se essa pergunta já tiver sido respondida em algum fórum]
 
In order to design systems that support people in their search activities — or more broadly, in the activities that include search as a component — it is critical to first understand what those activities are and how search fits in. Marchionini [46] divides search behaviors into three types that he calls lookup, learn, and investigate. Lookup is the most basic kind of search task and has been the main focus of scholarly work on Web search engines and information retrieval (IR) techniques. In the remainder of this section, we briefly review the literature on search activities that go beyond lookup.

[Cita o clássico de Exploratory Search - Marchionini]
 
Searching as exploration. White and Roth [71 , p.38] define exploratory search as a “sense making activity focused on the gathering and use of information to foster intellectual development.” Users who conduct exploratory searches are generally unfamiliar with the domain of their goals, and unsure about how to achieve them [ 71]. Many scholars have investigated the main factors relating to this type of dynamic task, such as uncertainty, creativity, innovation, knowledge discovery, serendipity, convergence of ideas, learning, and investigation [2, 46, 71]  
 
[Busca Exploratória para Aprendizado, que pode ser usada para realizar tarefas depois]
 
Searching to accomplish tasks. Beyond searching as exploration, there are also many other tasks of which search serves as a component, and scholars such as Belkin [3] , Wilson [73] , Dervin [22] , and Shah and White [63] urge us to study search in that broader context. These tasks can be clearly defined (e.g., looking for hiking boots) or open-ended (e.g., ideas for organizing a birthday party in a pandemic). Such macro tasks can call for a search task [ 15, 67]. Search tasks can vary in complexity as they involve different activities and contextual factors. Some search tasks such as simple fact-finding require few interactions with the information systems and can be completed in short period of time with one or two queries. On the other hand, accomplishing a complex search task requires completing multiple sub-tasks in multi-round search sessions with multiple queries and interactions with multiple information objects (i.e., documents, items) [ 70]. Being able to identify users’ overall tasks and sub-tasks enables systems to provide people with better access to information [48]
 
[Como identificar o contexto da tarefa? Se for um sistema especialista como o Quem@PUC conhecemos a tarefa e o contexto pode ser modelado como um contexto default.]
 
The majority of search task and intent identifying methods take a contextual approach to understand task intents by analyzing searchers’ explicit and implicit behavioral actions recorded in search logs such as queries, clicks, time and other contextual information [40, 53 , 76 , 77]. Boiling down the richness of context, task, and user intents to their query or question may generate incomplete or incorrect results. That is why search experts (e.g., librarians) use interactive methods such as sense-making questionnaires [ 23 ] to elicit more information about the user’s task and purposes behind seeking information before trying to find and recommend relevant resources.
 
[O log de busca pode trazer insights sobre o contexto da tarefa caso o usuário tenha especificado na consulta]
 
Searching as learning.
When thinking of search, one might often think first of gathering information. However, there is also another important type of activity that we carry out via search: searching as learning [ 68 ] ...Various theories and studies in information science literature have tried connecting the search process to the dimension of knowledge [ 21, 31 , 35 , 59]. As information seekers find information to fill in the gaps in their knowledge, they also learn about the task and the topic [ 59 ]. This, in turn, changes what information they seek and how. Finding information and restructuring knowledge or learning can go hand-in-hand. In other words, information search is a sense-making process [ 21 ], bridging the uncertainty (gap in knowledge) between the expected and observed situation.
 
[Busca como Aprendizado sobre o que não se sabe do problema. Exploração de Grafos de Conhecimento]
 
4 LM-BASED DIALOGUE AGENTS IN DIFFERENT SEARCH SCENARIOS
 
4.1 Information seeking strategies (ISS)
 
Belkin et al.’s [ 5] model of information seeking behaviors posits four dimensions (Figure 2): method of interaction (searching/scanning), goal of interaction (selection/learning), mode of retrieval (specification/recognition), and resource considered (information/meta-information).

 

4.2 Addressing ISS-based search scenarios
 
The user might enter a query such as “Who can help me avoid being evicted?” The language-model-based agent envisioned by Metzler et al . [49] might synthesize some text based on any combination of those sites and then generate an associated citation in the form of a link to one or more of them. Nothing in that system design ensures a solid, reliable link between the synthesized text and the cited resource. But perhaps more importantly for this scenario, it does not display a range of possible resources, and thus prevents the user from being able to build their own model of the space of possibilities available.
 
[As opções não são apresentadas aos usuários para que ele possa decidir]
[Dar opções para que o usuário aplique seu próprio modelo de confiança]
 
The range of search activities that map to ISS-5 include cases where the user would scan through a list of options to find the best one, detect duplicates, evaluate one or more of the options for correctness, evaluate the usefulness of the options, find one specific one, identify additional options beyond those already known, etc. These all involve cases of browsing and sense-making. 

Imagine a user who is trying to decide on a new mattress to purchase. The user may not even have a good sense of how much a mattress should cost or the set of criteria to use for filtering through a wide range of possibilities. The question of what mattress is ‘best’ of course is highly dependent on many subjective factors. A query such as “What is the best mattress?” or even “What are the best deals on good mattresses for side sleepers?” or similar posed to an LM-based dialogue agent does not provide the user with a list of options which they can explore according to their own criteria.
 
[Novamente dar opções aos usuários, o que é bom para um pode não ser bom para outro, Cauda Longa]

  

[Recuperar informação e meta informação, no KG é difícil separar o que é informação do que é meta-informação por isso os mapeamentos são importantes para a Contextualização]
 
Finally, ISS-15 refers to a scenario where the user understand the problem quite well and can specify exactly what they are looking for. This includes simple cases of direct look up, which would seem to be well-supported by the Google proposals. However, as Dinan et al. [24] argue, there are safety concerns when the queries touch on sensitive topics
 
[Qual o impacto se o usuário assumir a resposta como verdadeira e completa e ela não for ou então que a resposta reforce estereótipos e cause problemas na sociedade]   
 
First, the system is likely to come across as too authoritative, as providing answers to questions rather than pointers for where to look further suggests a finality to the answer. As case in point is what the search system should do with questions that embed false presuppositions
 
Second, by synthesizing results from multiple different sources and thus masking the range that is available, rather than providing a range of sources, the system cuts off the user’s ability to explore that space. We note that Metzler et al. do consider the problem of handling ‘controversial’ queries, and in this context propose to provide a range of answers. However, just knowing a range of viewpoints exists, without any contextualization of how widely supported each is or what kinds of source documents support each, does not position users to build on their information literacy [64].
 
[Dar ao usuário a possibilidade de desenvolver senso crítico ao lhe apresentar diferentes pontos de vista contextualizados]
 
Modern systems, which allow purveyors of misinformation and other fringe elements to SEO their way into search results to be presented side-by-side with credible sources are clearly insufficient. But what is needed here is not a system that purports to answer questions and flags cases of ‘disagreement’ or ‘controversy’, while generating synthetic links to possible sources for ‘both sides’, but rather information exploration tools that help users to differentiate among information sources

[Já existe identificação na WD sobre controversas mas isso é na perspectiva da comunidade que constrói o WD e não na de quem vai usar a informação]
[O trabalho cognitivo de reconhecer a verdade deve ser do usuário mas o sistema pode lhe suporte ao oferecer a Melhor Resposta Possível]
 
4.3 Addressing different types of searchers
 
The issue of serving people with low information literacy has been raised by many, but the gap between a user’s need and an information system may not be exclusively due to low information literacy. When it comes to accessing information, as many scholars have pointed out, people don’t know what they don’t know [ 61 ]. Relying on the user of a search system to provide a clear articulation of their information need may be insufficient in many cases.  
 
[Completar a consulta com o contexto mapeado, indicar o contexto padrão é uma forma de ajudar o usuário quando ele não sabe]
 
Smith and Rieh [64] argue in their CHIIR perspective paper that search engines should support information-literate actions such as comparing, evaluating, and differentiating between information sources. While this argument can be debated (and indeed it was debated extensively at the CHIIR 2019 conference), it is clear that people do not use search engines for only finding specific information based on preconceived notion of a need; instead, they are also using it to learn, explore, and make decisions. More importantly, many people could use more support and guidance in their search process than simply responding to queries or questions.

[Usar somente o conteúdo da WD não seria o suficiente. As regras de confiança estão em uma camada acima.]
 
4.4 Addressing bias in search results
 
A shift to placing language modeling at the core of search risks further exacerbating this problem, both in terms of increasing the range and extent of harmful biases amplified by the system and in terms of  decreasing users’ ability to recognize and refute those biases.
 
[Reflete o que existe no corpus e isso reflete como as pessoas são na realidade. Respostas prontas podem confirmar viéses e reduzir a capacidade de questionar dos usuários]
 
Nonetheless, looking at the arrayed results, the user is positioned to ask: Where do these come from? What else is in the corpus but not returned (or not in the first page of results)? What else is not in the corpus (is not indexed by the search engine), and why not?
 
[Estimular o usuário a fazer perguntas sobre as respostas retornadas. Perguntas sobre o contexto espacial, temporal, da proveniência e da Identidade das entidades envolvidas]
 
Where are the toe-holds that would allow a user to start to understand where the results are coming from, what biases the source data might contain, how those data were collected, and how modeling decisions might have amplified biases?
 
5 PATHS FORWARD
 
5.1 Deploying guardrails for status quo
 
We argue that to the extent that language-model-based dialogue agents are used in search scenarios, there is an urgent need for transparency along many dimensions: such systems should be transparent to their users about their limitations, about the nature of their source corpus and any other data used in training system components, about the economic forces that shape search results, about the potential for the system to reflect and amplify societal biases, and about options for redress when examples of bias perpetuation are found.
 
[Transparência sobre a incompletude, sobre as dimensões contextuais representadas e como os contextos relativos são gerados]
 
5.2 A new vision
 
We should not, however, assume that language-model-based dialogue agents are the only possible future for search. In this section, we briefly lay out an alternative vision. We first present desiderata for building an ideal search system.
 
The system should provide sufficient transparency about the sources where the information objects are coming from, as well as the process through which they are either ranked or consolidated and presented. The system should support users in increasing their information literacy [64]
 
Rather than mapping rich contexts and variety of tasks to query-document or question-passage mappings for quick retrieval, the system should instead first focus on better understanding those contexts and tasks through a combination of context extraction techniques, dialogue with the user, and support for interaction.
 
[A nossa abordagem tem por objetivo ser one-shot/stateless e colaborativa ao recuperar todas as alegações contextualizadas que respondem a consulta]
 
Finally, as the ability to understand the context and provenance of information is critical users’ ability to vet it and, if appropriate, integrate it into their own mental models, the system should foreground sources and avoid decontextualizing snippets of text (or ‘information’). On a broader scale, preservation of context is crucial to combating the pernicious effects of pattern recognition over datasets expressing harmful social biases: The search system of the future should support curation of datasets, transparent documentation of the types of sources contained in a source corpus, and democratic governance of the overall information system.
 
[O resultado será sempre alegações contextualizadas e não alegações somente para não tirar as alegações de seu contexto.]
 
6 CONCLUSION
 
In seeking to support those searchers, we should be looking to build tools that help users find and make sense of information rather than tools that purport to do it all for them. 

[Desenvolver senso crítico para aplicar em outras formas de Busca por Informação]
    
[64] Catherine L. Smith and Soo Young Rieh. 2019. Knowledge-Context in Search Systems: Toward Information-Literate Actions. In Proceedings of the 2019 Conference on Human Information Interaction and Retrieval (Glasgow, Scotland UK) (CHIIR ’19). Association for Computing Machinery, New York, NY, USA, 55–62. https://doi.org/10.1145/3295750.3298940

[Já li e comentei essa referência acima -> https://versant-pesquisadedoutorado.blogspot.com/2022/06/knowledge-context-in-search-systems.html]
 

Comentários

Postagens mais visitadas deste blog

Aula 12: WordNet | Introdução à Linguagem de Programação Python *** com NLTK

 Fonte -> https://youtu.be/0OCq31jQ9E4 A WordNet do Brasil -> http://www.nilc.icmc.usp.br/wordnetbr/ NLTK  synsets = dada uma palavra acha todos os significados, pode informar a língua e a classe gramatical da palavra (substantivo, verbo, advérbio) from nltk.corpus import wordnet as wn wordnet.synset(xxxxxx).definition() = descrição do significado É possível extrair hipernimia, hiponimia, antonimos e os lemas (diferentes palavras/expressões com o mesmo significado) formando uma REDE LEXICAL. Com isso é possível calcular a distância entre 2 synset dentro do grafo.  Veja trecho de código abaixo: texto = 'útil' print('NOUN:', wordnet.synsets(texto, lang='por', pos=wordnet.NOUN)) texto = 'útil' print('ADJ:', wordnet.synsets(texto, lang='por', pos=wordnet.ADJ)) print(wordnet.synset('handy.s.01').definition()) texto = 'computador' for synset in wn.synsets(texto, lang='por', pos=wn.NOUN):     print('DEF:',s...

truth makers AND truth bearers - Palestra Giancarlo no SBBD

Dando uma googada https://iep.utm.edu/truth/ There are two commonly accepted constraints on truth and falsehood:     Every proposition is true or false.         [Law of the Excluded Middle.]     No proposition is both true and false.         [Law of Non-contradiction.] What is the difference between a truth-maker and a truth bearer? Truth-bearers are either true or false; truth-makers are not since, not being representations, they cannot be said to be true, nor can they be said to be false . That's a second difference. Truth-bearers are 'bipolar,' either true or false; truth-makers are 'unipolar': all of them obtain. What are considered truth bearers?   A variety of truth bearers are considered – statements, beliefs, claims, assumptions, hypotheses, propositions, sentences, and utterances . When I speak of a fact . . . I mean the kind of thing that makes a proposition true or false. (Russe...

DGL-KE : Deep Graph Library (DGL)

Fonte: https://towardsdatascience.com/introduction-to-knowledge-graph-embedding-with-dgl-ke-77ace6fb60ef Amazon recently launched DGL-KE, a software package that simplifies this process with simple command-line scripts. With DGL-KE , users can generate embeddings for very large graphs 2–5x faster than competing techniques. DGL-KE provides users the flexibility to select models used to generate embeddings and optimize performance by configuring hardware, data sampling parameters, and the loss function. To use this package effectively, however, it is important to understand how embeddings work and the optimizations available to compute them. This two-part blog series is designed to provide this information and get you ready to start taking advantage of DGL-KE . Finally, another class of graphs that is especially important for knowledge graphs are multigraphs . These are graphs that can have multiple (directed) edges between the same pair of nodes and can also contain loops. The...