Fast rule mining in ontological knowledge bases with AMIE+ - Leitura de Artigo

Fast rule mining in ontological knowledge bases with AMIE+ - Leitura de Artigo - Ano 2015

Abstract ... While ILP is a mature field, mining logical rules from KBs is difficult, because KBs make an open-world assumption. This means that absent information cannot be taken as counterexamples. ...

1 Introduction

In recent years, however, the KBs have become so large that they can themselves be mined for information. It is possible to find rules in the KBs that describe common correlations in the data.

[As regras se baseiam nas instâncias para extrair estes padrões SE-ENTÃO]

Since RDF has only positive inference rules, these KBs contain only positive statements and no negations. Furthermore, they operate under the open-world assumption (OWA). Under the OWA, a statement that is not contained in the KB is not necessarily false; it is just unknown. This is a crucial difference to many standard database settings that operate under the closed-world assumption(CWA). Consider an example KB that does not contain the information that a particular person is married. Under the CWA, we can conclude that the person is not married. Under the OWA, however, the person could be either married or single.

[OWA x CWA]

Yet, classical ILP systems cannot be applied to semantic KBs for two reasons: First, they usually require negative statements as counterexamples. Semantic KBs, however, usually do not contain negative statements. The semantics of RDF schema are too weak to deduce negative evidence from the facts in a KB. Because of the OWA, absent statements cannot serve as counterevidence either.

[Eficácia]

Second, today’s ILP systems are slow and cannot handle the huge amount of data that KBs provide.

[Eficiência]

With the AMIE project [17], we have shown how to mine logical rules from KBs despite the absence of explicit counterexamples. The key technique was the partial completeness assumption (PCA). It allowed AMIE to “guess” counterexamples for rules and thus estimate their quality even under the OWA.

We also show how the precision of the predictions can be increased to up to 70% by using type information and joint reasoning.
[No anterior eles haviam retirado o tipo]

More precisely, our contributions are as follows:
A comprehensive investigation and description of the AMIE approach, including a description of our inmemory database and an evaluation of AMIE’s fundamental assumption, the PCA.

[Entender melhor o PCA e se se aplica quando assumimos Dual Open World Assumption]

2 Related work

2.1 Association rule mining

However, these are not the kind of rules that we aim to mine in this paper; we aim at mining Horn rules. ... also makes use of a language bias [3] to reduce the search space.

[Entender melhor o que Language Bias]

2.2 Logical rule mining

Sherlock [38] is an unsupervised ILP method to learn first-order Horn clauses from open domain facts. The WARMR system [14,15] mines patterns in databases that correspond to conjunctive queries. It uses a declarative language bias to reduce the search space. An extension of the system, WARMER [18], modified the approach to support a broader range of conjunctive queries and increase efficiency of search space exploration.

2.3 Expert rule mining

2.4 Generating schemas In this paper, we aim to generate Horn rules on a KB. Other approaches use rule mining to generate the schema or taxonomy of a KB.

[Mas as constraints que poderiam ser geradas pelas regras e criadas na WD são parte do esquema]

2.5 Relational machine learning

Some approaches learn new associations from semantic data without mining explicit logical rules. For example, relational machine learning methods propose a holistic statistical approach that considers both the attribute information and the relationships between entities to learn new links and concepts.

[Link Prediction]

In both approaches, however, the predictions are opaque. It is possible to generate predictions, but not to derive general structural knowledge about the data that can explain the reasons why the predictions were made. For example, these approaches will tell us that Michelle Obama most likely lives in Washington, but they will not tell us that this is because her husband lives in Washington and people tend to live in same place as their spouses. Our approach, in contrast, aims at mining explicit logical rules that capture the correlations in the data. These can then be used to derive new facts and also to explain why these facts were derived.

[Explicabilidade da regra que as abordagens ML não permitem]

2.6 Learning rules from hybrid sources

2.7 Further applications of rule mining

Jozefowska et al. [23] proposes an algorithm for frequent pattern mining in KBs that uses DL-safe rules. Such KBs can be transformed into a disjunctive datalog program, which allows seeing patterns as queries. This approach does not mine the Horn rules that we aim at. Some approaches use rule mining for ontology merging and alignment [13,30,36].

[Deve ter para Entity Matching também]

Several approaches, such as Markov Logic [37] or URDF [33] use Horn rules to perform reasoning. These approaches can be consumers of the rules we mine with AMIE.

3 Preliminaries

3.1 RDF KBs

In this paper, we focus on RDF [44] knowledge bases. We follow here the introduction of the preliminaries from [17]. An RDF KB can be considered a set of facts, where each fact is a triple of the form <x, r, y> with x denoting the subject, r the relation (or predicate), and y the object of the fact. While the A-Box contains instance data, the T-Box is the subset of facts that define classes, domains, ranges for predicates, and the class hierarchy. Although TBox information can also be used by our mining approach, we are mainly concerned with the A-Box, i.e., the set of facts relating one particular entity to another.

[As regras são geradas pelas instâncias e não pelo esquema]

3.2 Functions

A function is a relation r that has at most one object for every subject, i.e., ∀x : | {y : r (x, y)}| ≤ 1 Similarly, a relation is an inverse function if each of its objects has at most one subject. Since RDF KBs are usually noisy, even relations that should be functions (such as hasBirthdate) may exhibit two objects for the same subject. Vice versa, there are relations that are not functions in the strict sense, but that exhibit a similar behavior. For example, hasNationality can give several nationalities to a person, but the vast majority of people only have one nationality. Therefore, we use the notion of functionality [39]. The functionality of a relation r is a value between 0 and 1, which is 1 if r is a function:

[A função teria a intenção de obter um ÚNICO valor de saída (objeto) dado um valor de entrada (sujeito)]
[Mas existem casos na prática que pode haver mais de um valor registrado no KG. Na WD ao permitir diferentes perspectivas, todas as relações poderiam ter mais de um valor para fontes distintas apesar de constraints do tipo sigle-value e single-best-value]

3.3 Rules

An atom is a fact that can have variables at the subject and/or object position. A (Horn) rule consists of a head and a body, where the head is a single atom and the body is a set of atoms. AMIE, like other ILP systems, does not mine general Horn clauses, but uses a language bias (constraints to the form of the mined rules) in order to restrict the size of the search space. Language biases offer a trade-off between the expressiveness of the mined rules and the speed of themining process. As an example, rules with three atoms can capture more complicated correlations than rules with two atoms, but come with a larger search space and thus with a much slower performance. The less restrictive the language bias is, the more expressive the rules can potentially be, the larger the search space grows, and the less tractable the search becomes.

[Trade-off entre expressividade e computabilidade: Regras CONECTADAS e FECHADAS]

AMIE’s language bias requires rules to be connected. We say that two atoms in a rule are connected if they share a variable or an entity. A rule is connected if every atom is connected transitively to every other atom of the rule. The restriction to connected rules avoids mining rules with completely unrelated atoms, such as diedIn(x, y) ⇒ wasBornIn(w, z).

[Eficácia na qualidade das regras geradas]

AMIE also requires the rules to be closed. A variable in a rule is closed if it appears at least twice in the rule. A rule is closed if all its variables are closed. The restriction to closed rules avoids mining rules that predict merely the existence of a fact, as in diedIn(x, y) ⇒ ∃z : wasBornIn(x, z).

[Eficácia na qualidade das regras geradas]

AMIE omits also reflexive rules, i.e., rules with atoms of the form r (x, x), as they are typically of less interest in real world KBs. However, unlike some other ILP systems, AMIE allows mining recursive rules. These are rules that contain the head relation in the body, as e.g., isMarriedTo(x, z) ∧ hasChild(z, y) ⇒ hasChild(x, y).

[Regras podem reforçar viéses mas pelo menos são explicáveis enquanto os modelos de linguagem não são]

3.4 Measures of significance

Normally, data mining systems define a notion of significance or support for rules, which quantifies the amount of evidence for the rule in the data. If a rule applies only to a few instances, it is too risky to use it to draw conclusions. For this reason, data mining systems frequently report only rules above a given support threshold. In the following, we define this metric for AMIE’s setting and introduce another notion of significance, the head coverage.

Support In our context, the support of a rule quantifies the number of correct predictions in the existing data. One desired property for support is monotonicity, that is, the addition of more atoms and constraints to the rule should always decrease its support. Note that the support is defined even for rules that are not yet closed. This allows for early pruning of unpromising candidate rules.

[Quanto maior o suporte nos dados atuais mais chances a regra teria de prever novos valores corretos]

4 Confidence measures

The support of a rule quantifies the number of known correct predictions of the rule. However, it does not take into account the false predictions of the rule. Following [17], we will now describe measures that judge the quality of a rule.

[Mas além de maximizar os acertos é necessário minimizar os erros]

4.1 Challenges

[A figura dos 4 espectros]

Semantic KBs do not contain negative evidence. Thus, the area KBfalse is empty. This is the central challenge of our setting: to provide counterexamples for the rule mining. These can take the role of KBfalse so that we can estimate the areas NEWtrue and NEWfalse. We describe two approaches to this problem: creating counterexamples according to the closed-world assumption (CWA) that traditional association rule mining systems apply and according to the partial completeness assumption (PCA) that we propose.

4.2 The CWA and standard confidence

The standard confidence measure takes all facts that are not in the KB (i.e., NEWtrue and NEWfalse) as negative evidence.

[Não é bom e não se aplica ao nosso caso já que nem o que está presente no KG pode ser considerado verdadeiro independente do contexto]

4.3 The PCA and the PCA confidence

In AMIE, we generate negative examples for a rule by means of the partial completeness assumption (PCA). The PCA is the assumption that if r (x, y) ∈ KBtrue for some x, y, then ∀y' : r (x, y') ∈ KBtrue ∪ NEWtrue ⇒ r (x, y') ∈ KBtrue
In other words, we assume that if we know one y for a given x and r , then we know all y for that x and r . This assumption allow us to generate counterexamples in a way that is less restrictive than the standard confidence.

[Se existe <s1,p1,o1> então existe qualquer <s1,p1,*> mas se não existe qualquer <s2,p2,*> então <s2,p2,o2> é um contra exemplo]

Consider again the KB given in Table 1 and the rule R:livesIn(x, y) ⇒ wasBornIn(x, y). In this case, conf pca(R) = 1/2. This is because (a) there is one positive example for the rule, wasBornIn(Adam, Paris), and (b) the prediction wasBornIn (Adam, Rome) is counted as negative example, because we already know a different place of birth for Adam. The prediction wasBorn(Bob, Zurich) is completely disregarded as evidence, because we neither know where Bob was born nor where he was not born.

Table 1 An example KB containing two relations between people and cities
livesIn (Adam, Paris) (Adam, Rome) (Bob, Zurich)
wasBornIn (Adam, Paris) (Carl, Rome)

[Adam nasceu e morou em Paris (+1) mas também morou em Roma (-1)]
[A regra não se aplica a Bob pq não tem info sobre onde nasceu]
[A regra permitiria deduzir que Carl morou em Roma]

In spite of being an assumption, the PCA is certainly true for functions, such as birthdate and capital. The PCA also holds for relations that are not functions but that have a high functionality, as we shall see in our qualitative analysis of the PCA in Sect. 7.4. The PCA has been applied in the Google Knowledge Vault under the name “local completeness assumption” [16].

[Capital é uma relação que muda no tempo, só é função se estiver interessado somente em valor corrente]

16. Dong, X., Gabrilovich, E., Heitz, G., Horn, W., Lao, N., Murphy, K., Strohmann, T., Sun, S., Zhang, W.: Knowledge vault: a webscale approach to probabilistic knowledge fusion. In: KDD (2014)

5 AMIE

5.1 Algorithm

Refinement
One of the major challenges of rule mining is to find an efficient way to explore the search space. ...Hence, we explore the search space by iteratively extending rules using a set of mining operators (line 10 of Algorithm 1). ... Note that all above operators create connected rules.

When to output
Not every rule that the mining algorithm dequeues is output. This is because some rules may not be closed, or may not be better than rules that have already been output. ... The algorithm first checks if the rule is of the form described in Sect. 3 (i.e., closed and connected). The refinement operators used by AMIE (see Sect. 5.1) always produce connected rules. So, at this point, the algorithm only checks if the rule is closed. Then, the algorithm calculates the confidence of the rule and performs a quality check. The rule should have a confidence value that (1) passes the confidence threshold (line 1) and (2) improves over the confidence of all its parents (line 7).

Parameters and pruning ... Interestingly, a larger number of rules is not necessarily a good thing. For instance, a rule that covers only 1% or less of the instances of a relation is probably not interesting.

That being said, if the user is interested in less confident, more complex, or less supported rules, she can change these thresholds. However, we believe that there is no good reason to deviate from the default values. In particular, relaxing these values will not output better rules. This makes AMIE a system that can be run off the shelf, without the need for parameter tuning.

[O tuning dos parâmetros para geração das regras pode não ser necessário]

Duplicate elimination
Multithreading

5.2 Count projection queries

AMIE tries to expand a given rule by applying all mining operators defined in the last section (one each time).

Count projection queries
Assume that AMIE needs to add the atom r (x, y) to a rule. For efficiency reasons, we do not blindly try all possible relations in the place of r . Instead, we first find all relations that lead to a new rule that passes the head coverage threshold.

5.3 Query implementation details

[Eficiência]

6 Scalability improvements: AMIE+

7.1 Experimental setup

For Wikidata, we used a dump from December 2014, available for download at http://tools.wmflabs.org/wikidata-exports/rdf/exports/20140420/.

[Link quebrado e não descreve se são triplas e truthy]

7.2 AMIE versus AMIE+

In this section, we discuss the runtime improvements in AMIE+ over the previous version AMIE.

We first note that AMIE is not able to finish within a day for YAGO2s, DBPedia 2.0, DBpedia 3.8, and Wikidata. In contrast, AMIE+ can mine rules on all these datasets in a matter of hours, and even minutes.

7.3 AMIE(+) versus state-of-the-art systems

In this section we compare AMIE and AMIE+ to two stateof-the-art rule mining systems that are publicly available: WARMR [18] and ALEPH [32]. We compare the systems in terms of runtime and quality of produced rules.

[Eficácia e Eficiência]

Results After filtering out non-connected rules, WARMR mined 41 closed rules. AMIE and AMIE+, in contrast, mined 75 closed rules, which included the ones mined by WARMR. We checked back with the WARMR team and learned that for a given set of atoms B1, . . . , Bn, WARMR will mine only one rule, picking one of the atoms as head atom

Moreover, AMIE(+) does not make a closed-world assumption as WARMR does. In Sect. 7.4 we show that the PCA confidence defined by AMIE(+) is more suitable than the standard confidence to identify predictive rules in a web extracted KB designed under an open-world assumption.

7.4 Evaluation of the PCA

The partial completeness assumption (PCA) says that if, for a given subject s and a given relation r, the KB knows one object o with r (s, o), then the KB knows all objects o' with r (s, o') (Sect. 4). The original AMIE paper used the PCA but it did not evaluate whether this assumption is true or not [17]. Since the PCA is one of the basic ingredients of AMIE(+)’s mining model, we wanted to know to what extent this assumption holds in a real-world KB.

Therefore, we resorted to a qualitative analysis. We analyzed each of the relations manually, and grouped the relations into categories. Some relations fall into multiple categories.

The PCA extends well to relations that are strictly speaking not functions, but that have a high functionality. These are relations that usually have one object per subject, even though there could be several objects. For example, a person can graduate from several universities, but most people graduate from a single university. We call these relations quasi functions.

[No WD single-best-values]

The PCA worked very well also on these, and predicted completeness correctly for 73–100% of the subjects under investigation. Since the PCA takes into account the direction of the functionality, the PCA also holds for quasi-inverse functional relations such as directed.

Granularity differences
Some relations, such as locatedIn and livesIn, hold between an entity and a geographical region. In that case, the region can be given at the granularity of a city, a region, a country, or a continent. Naturally, if YAGO contains one of these, the others are possible options. Hence, PCA fails and we found rather low-precision values. However, these cases could be addressed if one restricts the range of the relation (say, to cities). With such a restriction, the relations become functions or quasi-functions, which lifts them into the category where the PCA works well. As we will see in Sect. 7.5, the use of types can significantly improve the performance of AMIE.

[Multiplos valores com diferentes granularidades]

Source incompleteness
For many relations, the source itself (Wikipedia) is incomplete. Usually, these relations have, for each subject, some objects that are undisputed. For example, it is undisputed that Albert Einstein is interested in physics. However, these relations also have objects that are less important, disputed, or unknown. For example, Albert Einstein might also be interested in music (he played the violin), but maybe also in pancakes. These less prominent objects are a lot less likely to appear in Wikipedia, or indeed on any Web page. Even if they do, we can never be sure whether there is not still something else that Einstein was interested in. For these relations, the knowledge sources are often incomplete by nature.

Extraction incompleteness
For a large number of relations, the Wikipedia page contains more objects for a given subject than the KB. These are cases where the extraction process was incomplete.

Discussion In summary, our analysis shows that it depends on the nature of the relation and on its type signature whether the PCA holds or not. There is a large number of relations for which the PCA is reasonable. These are not just functions and inverse functions, but also relations that exhibit a similar behavior.

[PCA pode funcionar ou não dependendo da relação mas na WD sempre pode ser Dual]

For many other cases, the PCA does not hold. In these cases, AMIE(+) will falsely assume that a rule is making incorrect predictions—although, in reality, the predictions might be correct. Thus, when the PCA does not hold, AMIE(+) will err on the side of caution. At the same time, the PCA is not as restrictive as the closed-world assumption (CWA): the PCA admits that there can be facts that are true, but not known to the KB.

[Com CWA não faz sentido prever novas triplas !!!! ]

For example, if a person has a birth date, then both the CWA and PCA would not admit another birth date. However, if a person does not have a birth date, then the PCA will admit that there can be a birth date, while the CWA will assume that there cannot be a birth date. Thus, the PCA is more permissive than the CWA. This encourages us to use the PCA for the definition of our confidence. In the following, we will show that this definition of confidence produces more predictive and more accurate rules than the standard confidence, which is based on the CWA.

7.5 Predicting facts

Standard versus PCA confidence

Our first goal is to see whether the PCA confidence or the standard confidence perform better in this task.

[Esta comparação é de eficácia no sentido de gerar regras relevantes e que produzam novos valores corretos]

As we see, ranking the rules by standard confidence is a very conservative approach: it identifies rules with reasonable precision, but these do not produce many predictions. Going down in the list of ranked rules, the rules produce more predictions—but at lower precision.

The top 30 rules produce three times more predictions than the top 30 rules by standard confidence—at comparable precision. This is because the PCA confidence is less conservative than the standard confidence.

Using type information
The previous experiment showed us the precision of individual rules for prediction. To make more accurate predictions, we have to combine these rules with more signals

Joint prediction
Our second observation is that the same prediction can be fired from multiple rules. If we consider rules as signals of evidence, then facts predicted by more rules should get a higher confidence score.

[Mesma tripla gerada por regras diferentes teria mais confiança]

8 Conclusion

In this paper, we have presented AMIE, an approach to mine Horn rules on large RDF knowledge bases. AMIE is based on a formal model for rule mining under the open-world assumption, a method to simulate counterexamples, and a scalable mining algorithm. In contrast to state-of-the-art approaches, AMIE requires no input other than the KB and does not need configurations or parameter tuning.

If we combine these rules with simple heuristics for type checking and joint prediction, we can use them to predict facts with a precision of about 70%.

Pesquisa de Doutorado da Veronica

Pesquisar este blog

Fast rule mining in ontological knowledge bases with AMIE+ - Leitura de Artigo - Ano 2015

Marcadores

Comentários

Postar um comentário

Postagens mais visitadas deste blog

Connected Papers: Uma abordagem alternativa para revisão da literatura

Knowledge Graph Embedding with Triple Context - Leitura de Abstract

Exploratory Search: From Finding to Understanding - Leitura de Artigo