Content deleted Content added
→Approaches to ILP: rm link to self, bold-->italics, per MOS:NOBOLD |
m formatting |
||
Line 5:
* Schema: ''positive examples'' + ''negative examples'' + ''background knowledge'' ⇒ ''hypothesis''.
Inductive logic programming is particularly useful in [[bioinformatics]] and [[natural language processing]].
== History ==
Building on earlier work on [[Inductive inference]], [[Gordon Plotkin]] was the first to formalise induction in a [[Horn clause|clausal]] setting around 1970, adopting an approach of generalising from examples.<ref name=":
Several inductive logic programming systems that proved influential appeared in the early 1990s. [[First-order inductive learner|FOIL]], introduced by [[Ross Quinlan]] in 1990<ref>{{Cite journal |last=Quinlan |first=J. R. |date=August 1990 |title=Learning logical definitions from relations |journal=Machine Learning |volume=5 |issue=3 |pages=239–266 |doi=10.1007/bf00117105 |issn=0885-6125|doi-access=free }}</ref> was based on upgrading [[Propositional calculus|propositional]] learning algorithms [[AQ (machine learning)|AQ]] and [[ID3 algorithm|ID3]].<ref name=":
At around the same time, the first practical applications emerged, particularly in [[bioinformatics]], where by 2000 inductive logic programming had been successfully applied to drug design, carcinogenicity and mutagenicity prediction, and elucidation of the structure and function of proteins.<ref>{{Citation |last=Džeroski |first=Sašo |title=Relational Data Mining Applications: An Overview |date=2001 |url=http://link.springer.com/10.1007/978-3-662-04599-2_14 |work=Relational Data Mining |pages=339–364 |editor-last=Džeroski |editor-first=Sašo |access-date=2023-11-27 |place=Berlin, Heidelberg |publisher=Springer Berlin Heidelberg |language=en |doi=10.1007/978-3-662-04599-2_14 |isbn=978-3-642-07604-6 |editor2-last=Lavrač |editor2-first=Nada}}</ref> Unlike the focus on [[automatic programming]] inherent in the early work, these fields used inductive logic programming techniques from a viewpoint of [[relational data mining]]. The success of those initial applications and the lack of progress in recovering larger traditional logic programs shaped the focus of the field.<ref>{{Citation |last=De Raedt |first=Luc |editor-first1=<!-- Deny Citation Bot--> |editor-last1=<!-- Deny Citation Bot--> |url=http://dx.doi.org/10.1007/978-3-540-68856-3 |title=Logical and Relational Learning |series=Cognitive Technologies |page=14|place=Berlin, Heidelberg |publisher=Springer |year=2008 |doi=10.1007/978-3-540-68856-3 |bibcode=2008lrl..book.....D |isbn=978-3-540-20040-6}}</ref>
Recently, classical tasks from automated programming have moved back into focus, as the introduction of meta-interpretative learning makes predicate invention and learning recursive programs more feasible. This technique was pioneered with the [[Metagol]] system introduced by Muggleton, Dianhuan Lin, Niels Pahlavi and Alireza Tamaddoni-Nezhad in 2014.<ref>{{Cite journal |last1=Muggleton |first1=Stephen H. |last2=Lin |first2=Dianhuan |last3=Pahlavi |first3=Niels |last4=Tamaddoni-Nezhad |first4=Alireza |date=2013-05-01 |title=Meta-interpretive learning: application to grammatical inference |url=http://dx.doi.org/10.1007/s10994-013-5358-3 |journal=Machine Learning |volume=94 |issue=1 |pages=25–49 |doi=10.1007/s10994-013-5358-3 |s2cid=254738603 |issn=0885-6125}}</ref> This allows ILP systems to work with fewer examples, and brought successes in learning string transformation programs, answer set grammars and general algorithms.<ref>{{Cite journal |last1=Cropper |first1=Andrew |last2=Dumančić |first2=Sebastijan |last3=Evans |first3=Richard |last4=Muggleton |first4=Stephen |date=2022 |title=Inductive logic programming at 30 |journal=Machine Learning |language=en |volume=111 |issue=1 |pages=147–172 |doi=10.1007/s10994-021-06089-1 |issn=0885-6125|doi-access=free }}</ref>
Line 19:
Inductive logic programming has adopted several different learning settings, the most common of which are learning from [[entailment]] and learning from interpretations.<ref name="setting">{{Cite journal |last1=Cropper |first1=Andrew |last2=Dumančić |first2=Sebastijan |date=2022-06-15 |title=Inductive Logic Programming At 30: A New Introduction |journal=Journal of Artificial Intelligence Research |volume=74 |pages=779{{endash}}782 |doi=10.1613/jair.1.13507 |issn=1076-9757 |doi-access=free}}</ref> In both cases, the input is provided in the form of ''background knowledge {{mvar|B}}'', a logical theory (commonly in the form of [[Clause (logic)|clauses]] used in [[logic programming]]), as well as positive and negative examples, denoted <math display="inline">E^+</math> and <math display="inline">E^{-}</math> respectively. The output is given as a ''hypothesis'' ''{{mvar|H}}'', itself a logical theory that typically consists of one or more clauses.
The two settings differ in the format of examples presented.
=== Learning from entailment ===
Line 36:
Completeness requires any generated hypothesis ''{{mvar|h}}'' to explain all positive examples <math display="inline">E^+</math>, and consistency forbids generation of any hypothesis ''{{mvar|h}}'' that is inconsistent with the negative examples <math display="inline">E^{-}</math>, both given the background knowledge ''{{mvar|B}}''.
In Muggleton's setting of concept learning,<ref name="setting2">{{cite journal |last1=Muggleton |first1=Stephen |year=1999 |title=Inductive Logic Programming: Issues, Results and the Challenge of Learning Language in Logic |journal=Artificial Intelligence |volume=114 |issue=1–2 |pages=283–296 |doi=10.1016/s0004-3702(99)00067-3 |doi-access=}}; here: Sect.2.1</ref> "completeness" is referred to as "sufficiency", and "consistency" as "strong consistency". Two further conditions are added: "''Necessity''", which postulates that ''{{mvar|B}}'' does not entail <math display="inline">E^+</math>, does not impose a restriction on ''{{mvar|h}}'', but forbids any generation of a hypothesis as long as the positive facts are explainable without it. . "Weak consistency", which states that no contradiction can be derived from <math display="inline">B\land H</math>, forbids generation of any hypothesis ''{{mvar|h}}'' that contradicts the background knowledge ''{{mvar|B}}''. Weak consistency is implied by strong consistency; if no negative examples are given, both requirements coincide. Weak consistency is particularly important in the case of noisy data, where completeness and strong consistency cannot be guaranteed.<ref name="setting2" />
=== Learning from interpretations ===
In learning from interpretations, the ''positive'' and ''negative'' examples are given as a set of complete or partial [[Herbrand structure
== Approaches to ILP ==
Line 47:
=== Bottom-up search ===
Bottom-up methods to search the subsumption lattice have been investigated since Plotkin's first work on formalising induction in clausal logic in 1970.<ref name=":02"
==== Least general generalisation ====
Line 54:
To account for background knowledge, inductive logic programming systems employ ''relative least general generalisations'', which are defined in terms of subsumption relative to a background theory. In general, such relative least general generalisations are not guaranteed to exist; however, if the background theory ''{{mvar|B}}'' is a finite set of [[Ground expression|ground]] [[Literal (mathematical logic)|literals]], then the negation of ''{{mvar|B}}'' is itself a clause. In this case, a relative least general generalisation can be computed by disjoining the negation of ''{{mvar|B}}'' with both <math display="inline">C_1</math> and <math display="inline">C_2</math> and then computing their least general generalisation as before.<ref>{{Cite book |last1=Nienhuys-Cheng |first1=Shan-hwei |title=Foundations of inductive logic programming |last2=Wolf |first2=Ronald de |date=1997 |publisher=Spinger |isbn=978-3-540-62927-6 |series=Lecture notes in computer science Lecture notes in artificial intelligence |___location=Berlin Heidelberg |page=286}}</ref>
Relative least general generalisations are the foundation of the bottom-up system [[Golem (ILP)|Golem]].<ref name=":12"/><ref name="Springer/Ohmsha"/>
==== Inverse resolution ====
Line 61:
Inverse resolution takes information about the [[Resolvent (logic)|resolvent]] of a resolution step to compute possible resolving clauses. Two types of inverse resolution operator are in use in inductive logic programming: V-operators and W-operators. A V-operator takes clauses <math display="inline">R</math> and <math display="inline">C_1</math>as input and returns a clause <math display="inline">C_2</math> such that <math display="inline">R</math> is the resolvent of <math display="inline">C_1</math> and <math display="inline">C_2</math>. A W-operator takes two clauses <math display="inline">R_1</math> and <math display="inline">R_2</math> and returns thre clauses <math display="inline">C_1</math>, <math display="inline">C_2</math> and <math display="inline">C_3</math> such that <math display="inline">R_1</math> is the resolvent of <math display="inline">C_1</math> and <math display="inline">C_2</math> and <math display="inline">R_2</math> is the resolvent of <math display="inline">C_2</math> and <math display="inline">C_3</math>.<ref name="invres">{{Cite book |last1=Nienhuys-Cheng |first1=Shan-hwei |title=Foundations of inductive logic programming |last2=Wolf |first2=Ronald de |date=1997 |publisher=Spinger |isbn=978-3-540-62927-6 |series=Lecture notes in computer science Lecture notes in artificial intelligence |___location=Berlin Heidelberg |page=197}}</ref>
Inverse resolution was first introduced by [[Stephen Muggleton]] and Wray Buntine in 1988 for use in the inductive logic programming system Cigol.<ref
=== Top-down search ===
Line 94:
Probabilistic inductive logic programming adapts the setting of inductive logic programming to learning [[Probabilistic logic programming|probabilistic logic programs]]. It can be considered as [[statistical relational learning]] within the formalism of probabilistic logic programming.<ref>{{Citation |last1=De Raedt |first1=Luc |title=Probabilistic Inductive Logic Programming |date=2008 |url=http://dx.doi.org/10.1007/978-3-540-78652-8_1 |pages=1–27 |access-date=2023-12-09 |place=Berlin, Heidelberg |publisher=Springer Berlin Heidelberg |isbn=978-3-540-78651-1 |last2=Kersting |first2=Kristian|doi=10.1007/978-3-540-78652-8_1 }}</ref><ref name="pilp">{{Cite journal |last1=Riguzzi |first1=Fabrizio |last2=Bellodi |first2=Elena |last3=Zese |first3=Riccardo |date=2014-09-18 |title=A History of Probabilistic Inductive Logic Programming |journal=Frontiers in Robotics and AI |volume=1 |doi=10.3389/frobt.2014.00006 |issn=2296-9144 |doi-access=free }}</ref>
Given
# background knowledge as a probabilistic logic program {{Mvar|B}}, and
Line 101:
the goal of probabilistic inductive logic programming is to find a probabilistic logic program <math display="inline">H</math> such that the probability of positive examples according to <math display="inline">{H \cup B}</math> is maximized and the probability of negative examples is minimized.<ref name="pilp" />
This problem has two variants: parameter learning and structure learning. In the first, one is given the structure (the clauses) of {{Mvar|H}} and the goal is to infer the probabilities annotations of the given clauses, while in the second the goal was to infer both the structure and the probability parameters of {{Mvar|H}}. Just as in classical inductive logic programming, the examples can be given as examples or as (partial) interpretations.<ref name="pilp" />
=== Parameter Learning ===
Parameter learning for languages following the distribution semantics has been performed by using an [[expectation-maximisation algorithm]] or by [[gradient descent]].
An expectation-maximisation algorithm consists of a cycle in which the steps of expectation and maximization are repeatedly performed. In the expectation step, the distribution of the hidden variables is computed according to the current values of the probability parameters, while in the maximisation step, the new values of the parameters are computed.
Gradient descent methods compute the gradient of the target function and iteratively modify the parameters moving in the direction of the gradient.<ref name="pilp" />
=== Structure Learning ===
Line 113:
In 2008, [[Luc De Raedt|De Raedt]] et al. presented an algorithm for performing [[theory compression]] on [[ProbLog]] programs, where theory compression means removing as many clauses as possible from the theory in order to maximize the probability. No new clause can be added to the theory.<ref name="pilp" />
In the same year, Meert, W. et al. introduced a method for learning parameters and structure of [[Ground term|ground]] probabilistic logic programs by considering the [[Bayesian network
ProbFOIL, introduced by De Raedt and Ingo Thon in 2010, combined the inductive logic programming system [[First-order inductive learner|FOIL]] with [[ProbLog]]. Logical rules are learned from probabilistic data in the sense that both the examples themselves and their classifications can be probabilistic. The set of rules has to allow to predict the probability of the examples from their description. In this setting, the parameters (the probability values) are fixed and the structure has to be learned.<ref>{{Citation |last1=De Raedt |first1=Luc |title=Probabilistic Rule Learning |date=2011 |url=http://link.springer.com/10.1007/978-3-642-21295-6_9 |work=Inductive Logic Programming |volume=6489 |pages=47–58 |editor-last=Frasconi |editor-first=Paolo |access-date=2023-12-09 |place=Berlin, Heidelberg |publisher=Springer Berlin Heidelberg |doi=10.1007/978-3-642-21295-6_9 |isbn=978-3-642-21294-9 |last2=Thon |first2=Ingo |s2cid=11727522 |editor2-last=Lisi |editor2-first=Francesca A.}}</ref><ref name="pilp" />
Line 132:
{{reflist}}
{{Free-content attribution|author=Fabrizio Riguzzi, Elena Bellodi and Riccardo Zese |date=2014-09-18 |title=A History of Probabilistic Inductive Logic Programming |documentURL=http://journal.frontiersin.org/article/10.3389/frobt.2014.00006/ |publisher=[[Frontiers Media]]|license=CC-BY 4.0|license statement URL = http://creativecommons.org/licenses/by/4.0/}}
==Further reading==
{{Refbegin}}
|