Inductive logic programming: Difference between revisions

Content deleted Content added
Approaches to ILP: rm link to self, bold-->italics, per MOS:NOBOLD
m formatting
Line 5:
* Schema: ''positive examples'' + ''negative examples'' + ''background knowledge'' ⇒ ''hypothesis''.
 
Inductive logic programming is particularly useful in [[bioinformatics]] and [[natural language processing]].
 
== History ==
Building on earlier work on [[Inductive inference]], [[Gordon Plotkin]] was the first to formalise induction in a [[Horn clause|clausal]] setting around 1970, adopting an approach of generalising from examples.<ref name=":002">{{Cite book |last1=Nienhuys-Cheng |first1=Shan-hwei |title=Foundations of inductive logic programming |last2=Wolf |first2=Ronald de |date=1997 |publisher=Spinger |isbn=978-3-540-62927-6 |series=Lecture notes in computer science Lecture notes in artificial intelligence |___location=Berlin Heidelberg |pages=174–177}}</ref><ref>{{cite thesis |first=G.D. |last=Plotkin |title=Automatic Methods of Inductive Inference |date=1970 |type=PhD |publisher=University of Edinburgh |url=https://www.era.lib.ed.ac.uk/bitstream/handle/1842/6656/Plotkin1972.pdf |hdl=1842/6656}}</ref> In 1981, [[Ehud Shapiro]] introduced several ideas that would shape the field in his new approach of model inference, an algorithm employing refinement and backtracing to search for a complete axiomatisation of given examples.<ref name=":002" /><ref>{{cite tech report|first=Ehud Y.|last=Shapiro|title=Inductive inference of theories from facts|id=192|date=1981|publisher=Department of Computer Science, Yale University|url=http://ftp.cs.yale.edu/publications/techreports/tr192.pdf}} Reprinted in {{cite book |title=Computational logic : essays in honor of Alan Robinson |publisher=MIT Press |year=1991 |isbn=978-0-262-12156-9 |editor1-last=Lassez |editor1-first=J.-L. |pages=199–254 |editor2-last=Plotkin |editor2-first=G.}}</ref> His first implementation was the [[Model Inference System]] in 1981:<ref>{{cite book |last=Shapiro |first=Ehud Y. |url= |title=Proceedings of the 7th international joint conference on Artificial intelligence |date=1981 |publisher=Morgan Kaufmann |isbn= |volume=2 |___location= |pages=1064 |chapter=The model inference system |chapter-url=https://www.ijcai.org/Proceedings/81-2/Papers/100.pdf}}</ref><ref>{{cite book |last=Shapiro |first=Ehud Y. |url= |title=Algorithmic program debugging |date=1983 |publisher=MIT Press |isbn=0-262-19218-7 |___location= |pages=}}</ref> a [[Prolog]] program that inductively inferred [[Horn clause]] logic programs from positive and negative examples.<ref name=":002" /> The term ''Inductive Logic Programming'' was first introduced in a paper by [[Stephen Muggleton]] in 1990, defined as the intersection of machine learning and logic programming.<ref name=":002" /> Muggleton and Wray Buntine introduced predicate invention and [[inverse resolution]] in 1988.<ref name=":002" /><ref name="Proceedings of the 5th Internationa">{{cite book |last1=Muggleton |first1=S.H. |url= |title=Proceedings of the 5th International Conference on Machine Learning |last2=Buntine |first2=W. |date=1988 |isbn=978-0-934613-64-4 |pages=339–352 |chapter=Machine invention of first-order predicate by inverting resolution |doi=10.1016/B978-0-934613-64-4.50040-2}}</ref>
 
Several inductive logic programming systems that proved influential appeared in the early 1990s. [[First-order inductive learner|FOIL]], introduced by [[Ross Quinlan]] in 1990<ref>{{Cite journal |last=Quinlan |first=J. R. |date=August 1990 |title=Learning logical definitions from relations |journal=Machine Learning |volume=5 |issue=3 |pages=239–266 |doi=10.1007/bf00117105 |issn=0885-6125|doi-access=free }}</ref> was based on upgrading [[Propositional calculus|propositional]] learning algorithms [[AQ (machine learning)|AQ]] and [[ID3 algorithm|ID3]].<ref name=":112">{{Cite book |last1=Nienhuys-Cheng |first1=Shan-hwei |title=Foundations of inductive logic programming |last2=Wolf |first2=Ronald de |date=1997 |publisher=Spinger |isbn=978-3-540-62927-6 |series=Lecture notes in computer science Lecture notes in artificial intelligence |___location=Berlin Heidelberg |pages=354–358}}</ref> [[Golem (ILP)|Golem]], introduced by Muggleton and Feng in 1990, went back to a restricted form of Plotkin's least generalisation algorithm.<ref name=":112" /><ref name="Springer/Ohmsha">{{Cite journal |last1=Muggleton |first1=Stephen H. |last2=Feng |first2=Cao |date=1990 |editor-last=Arikawa |editor-first=Setsuo |editor2-last=Goto |editor2-first=Shigeki |editor3-last=Ohsuga |editor3-first=Setsuo |editor4-last=Yokomori |editor4-first=Takashi |title=Efficient Induction of Logic Programs |url=https://dblp.org/rec/conf/alt/MuggletonF90.bib |journal=Algorithmic Learning Theory, First International Workshop, ALT '90, Tokyo, Japan, October 8-108–10, 1990, Proceedings |publisher=Springer/Ohmsha |pages=368–381}}</ref> The [[Progol]] system, introduced by Muggleton in 1995, first implemented inverse entailment, and inspired many later systems.<ref name=":112" /><ref name=":3">{{Cite journal |last1=Cropper |first1=Andrew |last2=Dumančić |first2=Sebastijan |date=2022-06-15 |title=Inductive Logic Programming At 30: A New Introduction |journal=Journal of Artificial Intelligence Research |volume=74 |page=808 |doi=10.1613/jair.1.13507 |issn=1076-9757|doi-access=free }}</ref><ref name=":2">{{cite journal |last1=Muggleton |first1=S.H. |year=1995 |title=Inverting entailment and Progol |journal=New Generation Computing |volume=13 |issue=3–4 |pages=245–286 |citeseerx=10.1.1.31.1630 |doi=10.1007/bf03037227 |s2cid=12643399}}</ref> [[Aleph (ILP)|Aleph]], a descendant of Progol introduced by Ashwin Srinivasan in 2001, is still one of the most widely used systems {{As of|2022|lc=y|bare=}}.<ref name=":3" />
 
At around the same time, the first practical applications emerged, particularly in [[bioinformatics]], where by 2000 inductive logic programming had been successfully applied to drug design, carcinogenicity and mutagenicity prediction, and elucidation of the structure and function of proteins.<ref>{{Citation |last=Džeroski |first=Sašo |title=Relational Data Mining Applications: An Overview |date=2001 |url=http://link.springer.com/10.1007/978-3-662-04599-2_14 |work=Relational Data Mining |pages=339–364 |editor-last=Džeroski |editor-first=Sašo |access-date=2023-11-27 |place=Berlin, Heidelberg |publisher=Springer Berlin Heidelberg |language=en |doi=10.1007/978-3-662-04599-2_14 |isbn=978-3-642-07604-6 |editor2-last=Lavrač |editor2-first=Nada}}</ref> Unlike the focus on [[automatic programming]] inherent in the early work, these fields used inductive logic programming techniques from a viewpoint of [[relational data mining]]. The success of those initial applications and the lack of progress in recovering larger traditional logic programs shaped the focus of the field.<ref>{{Citation |last=De Raedt |first=Luc |editor-first1=<!-- Deny Citation Bot--> |editor-last1=<!-- Deny Citation Bot--> |url=http://dx.doi.org/10.1007/978-3-540-68856-3 |title=Logical and Relational Learning |series=Cognitive Technologies |page=14|place=Berlin, Heidelberg |publisher=Springer |year=2008 |doi=10.1007/978-3-540-68856-3 |bibcode=2008lrl..book.....D |isbn=978-3-540-20040-6}}</ref>
 
Recently, classical tasks from automated programming have moved back into focus, as the introduction of meta-interpretative learning makes predicate invention and learning recursive programs more feasible. This technique was pioneered with the [[Metagol]] system introduced by Muggleton, Dianhuan Lin, Niels Pahlavi and Alireza Tamaddoni-Nezhad in 2014.<ref>{{Cite journal |last1=Muggleton |first1=Stephen H. |last2=Lin |first2=Dianhuan |last3=Pahlavi |first3=Niels |last4=Tamaddoni-Nezhad |first4=Alireza |date=2013-05-01 |title=Meta-interpretive learning: application to grammatical inference |url=http://dx.doi.org/10.1007/s10994-013-5358-3 |journal=Machine Learning |volume=94 |issue=1 |pages=25–49 |doi=10.1007/s10994-013-5358-3 |s2cid=254738603 |issn=0885-6125}}</ref> This allows ILP systems to work with fewer examples, and brought successes in learning string transformation programs, answer set grammars and general algorithms.<ref>{{Cite journal |last1=Cropper |first1=Andrew |last2=Dumančić |first2=Sebastijan |last3=Evans |first3=Richard |last4=Muggleton |first4=Stephen |date=2022 |title=Inductive logic programming at 30 |journal=Machine Learning |language=en |volume=111 |issue=1 |pages=147–172 |doi=10.1007/s10994-021-06089-1 |issn=0885-6125|doi-access=free }}</ref>
Line 19:
Inductive logic programming has adopted several different learning settings, the most common of which are learning from [[entailment]] and learning from interpretations.<ref name="setting">{{Cite journal |last1=Cropper |first1=Andrew |last2=Dumančić |first2=Sebastijan |date=2022-06-15 |title=Inductive Logic Programming At 30: A New Introduction |journal=Journal of Artificial Intelligence Research |volume=74 |pages=779{{endash}}782 |doi=10.1613/jair.1.13507 |issn=1076-9757 |doi-access=free}}</ref> In both cases, the input is provided in the form of ''background knowledge {{mvar|B}}'', a logical theory (commonly in the form of [[Clause (logic)|clauses]] used in [[logic programming]]), as well as positive and negative examples, denoted <math display="inline">E^+</math> and <math display="inline">E^{-}</math> respectively. The output is given as a ''hypothesis'' ''{{mvar|H}}'', itself a logical theory that typically consists of one or more clauses.
 
The two settings differ in the format of examples presented.
 
=== Learning from entailment ===
Line 36:
Completeness requires any generated hypothesis ''{{mvar|h}}'' to explain all positive examples <math display="inline">E^+</math>, and consistency forbids generation of any hypothesis ''{{mvar|h}}'' that is inconsistent with the negative examples <math display="inline">E^{-}</math>, both given the background knowledge ''{{mvar|B}}''.
 
In Muggleton's setting of concept learning,<ref name="setting2">{{cite journal |last1=Muggleton |first1=Stephen |year=1999 |title=Inductive Logic Programming: Issues, Results and the Challenge of Learning Language in Logic |journal=Artificial Intelligence |volume=114 |issue=1–2 |pages=283–296 |doi=10.1016/s0004-3702(99)00067-3 |doi-access=}}; here: Sect.2.1</ref> "completeness" is referred to as "sufficiency", and "consistency" as "strong consistency". Two further conditions are added: "''Necessity''", which postulates that ''{{mvar|B}}'' does not entail <math display="inline">E^+</math>, does not impose a restriction on ''{{mvar|h}}'', but forbids any generation of a hypothesis as long as the positive facts are explainable without it. . "Weak consistency", which states that no contradiction can be derived from <math display="inline">B\land H</math>, forbids generation of any hypothesis ''{{mvar|h}}'' that contradicts the background knowledge ''{{mvar|B}}''. Weak consistency is implied by strong consistency; if no negative examples are given, both requirements coincide. Weak consistency is particularly important in the case of noisy data, where completeness and strong consistency cannot be guaranteed.<ref name="setting2" />
 
=== Learning from interpretations ===
In learning from interpretations, the ''positive'' and ''negative'' examples are given as a set of complete or partial [[Herbrand structure|Herbrand structures]]s, each of which are themselves a finite set of ground literals. Such a structure ''{{mvar|e}}'' is said to be a model of the set of clauses <math display="inline">B \cup H</math> if for any [[Substitution (logic)|substitution]] <math display="inline">\theta</math> and any clause <math display="inline">\mathrm{head} \leftarrow \mathrm{body}</math> in <math display="inline">B \cup H</math> such that <math display="inline">\mathrm{body}\theta \subseteq e</math>, <math>\mathrm{head}\theta \subseteq e</math> also holds. The goal is then to output a hypothesis that is ''complete,'' meaning every positive example is a model of <math display="inline">B \cup H</math>, and ''consistent,'' meaning that no negative example is a model of <math display="inline">B \cup H</math>.<ref name="setting" />
 
== Approaches to ILP ==
Line 47:
 
=== Bottom-up search ===
Bottom-up methods to search the subsumption lattice have been investigated since Plotkin's first work on formalising induction in clausal logic in 1970.<ref name=":02">{{Cite book |last1=Nienhuys-Cheng |first1=Shan-hwei |title=Foundations of inductive logic programming |last2=Wolf |first2=Ronald de |date=1997 |publisher=Spinger |isbn=978-3-540-62927-6 |series=Lecture notes in computer science Lecture notes in artificial intelligence |___location=Berlin Heidelberg |pages=174–177}}</ref><ref>{{cite thesis |first=G.D. |last=Plotkin |title=Automatic Methods of Inductive Inference |date=1970 |type=PhD |publisher=University of Edinburgh |url=https://www.era.lib.ed.ac.uk/bitstream/handle/1842/6656/Plotkin1972.pdf |hdl=1842/6656}}</ref> Techniques used include least general generalisation, based on [[Anti-unification (computer science)|anti-unification]], and inverse resolution, based on inverting the [[Resolution (logic)|resolution]] inference rule.
 
==== Least general generalisation ====
Line 54:
To account for background knowledge, inductive logic programming systems employ ''relative least general generalisations'', which are defined in terms of subsumption relative to a background theory. In general, such relative least general generalisations are not guaranteed to exist; however, if the background theory ''{{mvar|B}}'' is a finite set of [[Ground expression|ground]] [[Literal (mathematical logic)|literals]], then the negation of ''{{mvar|B}}'' is itself a clause. In this case, a relative least general generalisation can be computed by disjoining the negation of ''{{mvar|B}}'' with both <math display="inline">C_1</math> and <math display="inline">C_2</math> and then computing their least general generalisation as before.<ref>{{Cite book |last1=Nienhuys-Cheng |first1=Shan-hwei |title=Foundations of inductive logic programming |last2=Wolf |first2=Ronald de |date=1997 |publisher=Spinger |isbn=978-3-540-62927-6 |series=Lecture notes in computer science Lecture notes in artificial intelligence |___location=Berlin Heidelberg |page=286}}</ref>
 
Relative least general generalisations are the foundation of the bottom-up system [[Golem (ILP)|Golem]].<ref name=":12"/><ref name="Springer/Ohmsha"/>
Relative least general generalisations are the foundation of the bottom-up system [[Golem (ILP)|Golem]].<ref name=":12">{{Cite book |last1=Nienhuys-Cheng |first1=Shan-hwei |title=Foundations of inductive logic programming |last2=Wolf |first2=Ronald de |date=1997 |publisher=Spinger |isbn=978-3-540-62927-6 |series=Lecture notes in computer science Lecture notes in artificial intelligence |___location=Berlin Heidelberg |pages=354–358}}</ref><ref>{{Cite journal |last1=Muggleton |first1=Stephen H. |last2=Feng |first2=Cao |date=1990 |editor-last=Arikawa |editor-first=Setsuo |editor2-last=Goto |editor2-first=Shigeki |editor3-last=Ohsuga |editor3-first=Setsuo |editor4-last=Yokomori |editor4-first=Takashi |title=Efficient Induction of Logic Programs |url=https://dblp.org/rec/conf/alt/MuggletonF90.bib |journal=Algorithmic Learning Theory, First International Workshop, ALT '90, Tokyo, Japan, October 8-10, 1990, Proceedings |publisher=Springer/Ohmsha |pages=368–381}}</ref>
 
==== Inverse resolution ====
Line 61:
Inverse resolution takes information about the [[Resolvent (logic)|resolvent]] of a resolution step to compute possible resolving clauses. Two types of inverse resolution operator are in use in inductive logic programming: V-operators and W-operators. A V-operator takes clauses <math display="inline">R</math> and <math display="inline">C_1</math>as input and returns a clause <math display="inline">C_2</math> such that <math display="inline">R</math> is the resolvent of <math display="inline">C_1</math> and <math display="inline">C_2</math>. A W-operator takes two clauses <math display="inline">R_1</math> and <math display="inline">R_2</math> and returns thre clauses <math display="inline">C_1</math>, <math display="inline">C_2</math> and <math display="inline">C_3</math> such that <math display="inline">R_1</math> is the resolvent of <math display="inline">C_1</math> and <math display="inline">C_2</math> and <math display="inline">R_2</math> is the resolvent of <math display="inline">C_2</math> and <math display="inline">C_3</math>.<ref name="invres">{{Cite book |last1=Nienhuys-Cheng |first1=Shan-hwei |title=Foundations of inductive logic programming |last2=Wolf |first2=Ronald de |date=1997 |publisher=Spinger |isbn=978-3-540-62927-6 |series=Lecture notes in computer science Lecture notes in artificial intelligence |___location=Berlin Heidelberg |page=197}}</ref>
 
Inverse resolution was first introduced by [[Stephen Muggleton]] and Wray Buntine in 1988 for use in the inductive logic programming system Cigol.<ref>{{cite book |last1=Muggleton |first1=S.H. |url= |titlename="Proceedings of the 5th International Conference on Machine Learning |last2=Buntine |first2=W. |date=1988 |isbn=978-0-934613-64-4 |pages=339–352 |chapter=Machine invention of first-order predicate by inverting resolution |doi=10.1016/B978-0-934613-64-4.50040-2}}<Internationa"/ref> By 1993, this spawned a surge of research into inverse resolution operators and their properties.<ref name="invres" />
 
=== Top-down search ===
Line 94:
Probabilistic inductive logic programming adapts the setting of inductive logic programming to learning [[Probabilistic logic programming|probabilistic logic programs]]. It can be considered as [[statistical relational learning]] within the formalism of probabilistic logic programming.<ref>{{Citation |last1=De Raedt |first1=Luc |title=Probabilistic Inductive Logic Programming |date=2008 |url=http://dx.doi.org/10.1007/978-3-540-78652-8_1 |pages=1–27 |access-date=2023-12-09 |place=Berlin, Heidelberg |publisher=Springer Berlin Heidelberg |isbn=978-3-540-78651-1 |last2=Kersting |first2=Kristian|doi=10.1007/978-3-540-78652-8_1 }}</ref><ref name="pilp">{{Cite journal |last1=Riguzzi |first1=Fabrizio |last2=Bellodi |first2=Elena |last3=Zese |first3=Riccardo |date=2014-09-18 |title=A History of Probabilistic Inductive Logic Programming |journal=Frontiers in Robotics and AI |volume=1 |doi=10.3389/frobt.2014.00006 |issn=2296-9144 |doi-access=free }}</ref>
 
Given
 
# background knowledge as a probabilistic logic program {{Mvar|B}}, and
Line 101:
the goal of probabilistic inductive logic programming is to find a probabilistic logic program <math display="inline">H</math> such that the probability of positive examples according to <math display="inline">{H \cup B}</math> is maximized and the probability of negative examples is minimized.<ref name="pilp" />
 
This problem has two variants: parameter learning and structure learning. In the first, one is given the structure (the clauses) of {{Mvar|H}} and the goal is to infer the probabilities annotations of the given clauses, while in the second the goal was to infer both the structure and the probability parameters of {{Mvar|H}}. Just as in classical inductive logic programming, the examples can be given as examples or as (partial) interpretations.<ref name="pilp" />
 
=== Parameter Learning ===
Parameter learning for languages following the distribution semantics has been performed by using an [[expectation-maximisation algorithm]] or by [[gradient descent]].
An expectation-maximisation algorithm consists of a cycle in which the steps of expectation and maximization are repeatedly performed. In the expectation step, the distribution of the hidden variables is computed according to the current values of the probability parameters, while in the maximisation step, the new values of the parameters are computed.
Gradient descent methods compute the gradient of the target function and iteratively modify the parameters moving in the direction of the gradient.<ref name="pilp" />
 
=== Structure Learning ===
Line 113:
In 2008, [[Luc De Raedt|De Raedt]] et al. presented an algorithm for performing [[theory compression]] on [[ProbLog]] programs, where theory compression means removing as many clauses as possible from the theory in order to maximize the probability. No new clause can be added to the theory.<ref name="pilp" />
 
In the same year, Meert, W. et al. introduced a method for learning parameters and structure of [[Ground term|ground]] probabilistic logic programs by considering the [[Bayesian network|Bayesian networks]]s equivalent to them and applying techniques for learning Bayesian networks.<ref>{{Citation |last1=Blockeel |first1=Hendrik |title=Towards Learning Non-recursive LPADs by Transforming Them into Bayesian Networks |url=http://dx.doi.org/10.1007/978-3-540-73847-3_16 |work=Inductive Logic Programming |pages=94–108 |access-date=2023-12-09 |place=Berlin, Heidelberg |publisher=Springer Berlin Heidelberg |isbn=978-3-540-73846-6 |last2=Meert |first2=Wannes|series=Lecture Notes in Computer Science |date=2007 |volume=4455 |doi=10.1007/978-3-540-73847-3_16 }}</ref><ref name="pilp" />
 
ProbFOIL, introduced by De Raedt and Ingo Thon in 2010, combined the inductive logic programming system [[First-order inductive learner|FOIL]] with [[ProbLog]]. Logical rules are learned from probabilistic data in the sense that both the examples themselves and their classifications can be probabilistic. The set of rules has to allow to predict the probability of the examples from their description. In this setting, the parameters (the probability values) are fixed and the structure has to be learned.<ref>{{Citation |last1=De Raedt |first1=Luc |title=Probabilistic Rule Learning |date=2011 |url=http://link.springer.com/10.1007/978-3-642-21295-6_9 |work=Inductive Logic Programming |volume=6489 |pages=47–58 |editor-last=Frasconi |editor-first=Paolo |access-date=2023-12-09 |place=Berlin, Heidelberg |publisher=Springer Berlin Heidelberg |doi=10.1007/978-3-642-21295-6_9 |isbn=978-3-642-21294-9 |last2=Thon |first2=Ingo |s2cid=11727522 |editor2-last=Lisi |editor2-first=Francesca A.}}</ref><ref name="pilp" />
Line 132:
{{reflist}}
{{Free-content attribution|author=Fabrizio Riguzzi, Elena Bellodi and Riccardo Zese |date=2014-09-18 |title=A History of Probabilistic Inductive Logic Programming |documentURL=http://journal.frontiersin.org/article/10.3389/frobt.2014.00006/ |publisher=[[Frontiers Media]]|license=CC-BY 4.0|license statement URL = http://creativecommons.org/licenses/by/4.0/}}
 
==Further reading==
{{Refbegin}}