Inductive logic programming: Difference between revisions

Content deleted Content added
OAbot (talk | contribs)
m Open access bot: doi updated in citation with #oabot.
Formal definition: Rewrite and expand the section on the formal setting using material sourced to an extensive 2022 review paper.
Line 16:
Recently, classical tasks from automated programming have moved back into focus, as the introduction of meta-interpretative learning makes predicate invention and learning recursive programs more feasible. This technique was pioneered with the [[Metagol]] system introduced by Muggleton, Dianhuan Lin, Niels Pahlavi and Alireza Tamaddoni-Nezhad in 2014.<ref>{{Cite journal |last=Muggleton |first=Stephen H. |last2=Lin |first2=Dianhuan |last3=Pahlavi |first3=Niels |last4=Tamaddoni-Nezhad |first4=Alireza |date=2013-05-01 |title=Meta-interpretive learning: application to grammatical inference |url=http://dx.doi.org/10.1007/s10994-013-5358-3 |journal=Machine Learning |volume=94 |issue=1 |pages=25–49 |doi=10.1007/s10994-013-5358-3 |issn=0885-6125}}</ref> This allows ILP systems to work with fewer examples, and brought successes in learning string transformation programs, answer set grammars and general algorithms.<ref>{{Cite journal |last=Cropper |first=Andrew |last2=Dumančić |first2=Sebastijan |last3=Evans |first3=Richard |last4=Muggleton |first4=Stephen |date=2022 |title=Inductive logic programming at 30 |url=https://link.springer.com/10.1007/s10994-021-06089-1 |journal=Machine Learning |language=en |volume=111 |issue=1 |pages=147–172 |doi=10.1007/s10994-021-06089-1 |issn=0885-6125|doi-access=free }}</ref>
 
==Formal definitionSetting ==
Inductive logic programming has adopted several different learning settings, the most common of which are learning from [[entailment]] and learning from interpretations.<ref name=":32">{{Cite journal |last=Cropper |first=Andrew |last2=Dumančić |first2=Sebastijan |date=2022-06-15 |title=Inductive Logic Programming At 30: A New Introduction |url=http://dx.doi.org/10.1613/jair.1.13507 |journal=Journal of Artificial Intelligence Research |volume=74 |pages=779{{endash}}782 |doi=10.1613/jair.1.13507 |issn=1076-9757 |doi-access=free}}</ref> In both cases, the input is provided in the form of ''background knowledge ''{{mvar|B}}'''', a logical theory (commonly in the form of [[Clause (logic)|clauses]] used in [[logic programming]]), as well as positive and negative examples, denoted <math display="inline">E^+</math> and <math display="inline">E^{-}</math> respectively. The output is given as a ''hypothesis'' ''''{{mvar|H}}'''', itself a logical theory that typically consists of one or more clauses.
 
The two settings differ in the format of examples presented.
The ''background knowledge'' is given as a logic theory {{mvar|B}}, commonly in the form of [[Horn clauses]] used in [[logic programming]].
 
The ''positive'' and ''negative'' examples are given as a conjunction <math>E^+</math> and <math>E^-</math> of unnegated and negated [[ground expression|ground]] [[Literal (mathematical logic)|literals]], respectively.
=== Learning from entailment ===
A ''correct hypothesis'' {{mvar|h}} is a logic proposition satisfying the following requirements.<ref>{{cite journal|first1=Stephen|last1=Muggleton|title=Inductive Logic Programming: Issues, Results and the Challenge of Learning Language in Logic|journal=Artificial Intelligence|volume=114|issue=1–2|pages=283–296|year=1999|doi=10.1016/s0004-3702(99)00067-3|doi-access=}}; here: Sect.2.1</ref>
{{As of|2022}}, learning from entailment is by far the most popular setting for inductive logic programming.<ref name=":32" /> In this setting, the ''positive'' and ''negative'' examples are given as finite sets <math display="inline">E^+</math> and <math display="inline">E^{-}</math> of positive and negated [[Ground expression|ground]] [[Literal (mathematical logic)|literals]], respectively. A ''correct hypothesis'' ''''{{mvar|H}}'''' is a set of clauses satisfying the following requirements, where the turnstile symbol <math>\models</math> stands for [[logical entailment]]:<ref name=":32" /><ref>{{cite book |last1=Džeroski |first1=Sašo |title=Advances in Knowledge Discovery and Data Mining |publisher=MIT Press |year=1996 |editor1-last=Fayyad |editor1-first=U.M. |pages=117–152 See §5.2.4 |chapter=Inductive Logic Programming and Knowledge Discovery in Databases |access-date=2021-09-27 |editor2-last=Piatetsky-Shapiro |editor2-first=G. |editor3-last=Smith |editor3-first=P. |editor4-last=Uthurusamy |editor4-first=R. |chapter-url=http://kt.ijs.si/SasoDzeroski/pdfs/1996/Chapters/1996_InductiveLogicProgramming.pdf |archive-url=https://web.archive.org/web/20210927141157/http://kt.ijs.si/SasoDzeroski/pdfs/1996/Chapters/1996_InductiveLogicProgramming.pdf |archive-date=2021-09-27 |url-status=dead}}</ref><ref>{{Cite journal |last=De Raedt |first=Luc |date=1997 |title=Logical settings for concept-learning |url=https://linkinghub.elsevier.com/retrieve/pii/S0004370297000416 |journal=Artificial Intelligence |language=en |volume=95 |issue=1 |pages=187–201 |doi=10.1016/S0004-3702(97)00041-6}}</ref><math display="block">\begin{array}{llll}
:[[Entailment#Semantic consequence|
\text{Completeness:}
<math>\begin{array}{llll}
& B \landcup hH
\text{Necessity:}
& B\models
& \not\models
& E^+
 
\\
\text{SufficiencyConsistency: }
& B \landcup hH \cup E^-
& \color{blue}{\models}
& E^+
\\
\text{Weak consistency:}
& B \land h
& \not\models
& \textit{false}
\end{array}</math>Completeness requires any generated hypothesis ''''{{mvar|h}}'''' to explain all positive examples <math display="inline">E^+</math>, and consistency forbids generation of any hypothesis ''''{{mvar|h}}'''' that is inconsistent with the negative examples <math display="inline">E^{-}</math>, both given the background knowledge ''''{{mvar|B}}''''.
\\
 
\text{Strong consistency:}
In Muggleton's setting of concept learning,<ref name=":02">{{cite journal |last1=Muggleton |first1=Stephen |year=1999 |title=Inductive Logic Programming: Issues, Results and the Challenge of Learning Language in Logic |journal=Artificial Intelligence |volume=114 |issue=1–2 |pages=283–296 |doi=10.1016/s0004-3702(99)00067-3 |doi-access=}}; here: Sect.2.1</ref> "completeness" is referred to as "sufficiency", and "consistency" as "strong consistency". Two further conditions are added: "''Necessity''", which postulates that ''''{{mvar|B}}'''' does not entail <math display="inline">E^+</math>, does not impose a restriction on ''''{{mvar|h}}'''', but forbids any generation of a hypothesis as long as the positive facts are explainable without it. . "Weak consistency", which states that no contradiction can be derived from <math display="inline">B\land H</math>, forbids generation of any hypothesis ''''{{mvar|h}}'''' that contradicts the background knowledge ''''{{mvar|B}}''''. Weak consistency is implied by strong consistency; if no negative examples are given, both requirements coincide. Weak consistency is particularly important in the case of noisy data, where completeness and strong consistency cannot be guaranteed.<ref name=":02" />
& B \land h \land E^-
& \not\models
& \textit{false}
\end{array}</math>]]
 
=== Learning from interpretations ===
"''Necessity''" does not impose a restriction on {{mvar|h}}, but forbids any generation of a hypothesis as long as the positive facts are explainable without it.
In learning from interpretations, the ''positive'' and ''negative'' examples are given as a set of complete or partial [[Herbrand structure|Herbrand structures]], each of which are themselves a finite set of ground literals. Such a structure ''''{{mvar|e}}'''' is said to be a model of the set of clauses <math display="inline">B \cup H</math> if for any [[Substitution (logic)|substitution]] <math display="inline">\theta</math> and any clause <math display="inline">\mathrm{head} \leftarrow \mathrm{body}</math> in <math display="inline">B \cup H</math> such that <math display="inline">\mathrm{body}\theta \subseteq e</math>, <math>\mathrm{head}\theta \subseteq e</math> also holds. The goal is then to output a hypothesis that is ''complete,'' meaning every positive example is a model of <math display="inline">B \cup H</math>, and ''consistent,'' meaning that no negative example is a model of <math display="inline">B \cup H</math>.<ref name=":32" />
"''Sufficiency''" requires any generated hypothesis {{mvar|h}} to explain all positive examples <math>E^+</math>.
"''Weak consistency''" forbids generation of any hypothesis {{mvar|h}} that contradicts the background knowledge {{mvar|B}}.
"''Strong consistency''" also forbids generation of any hypothesis {{mvar|h}} that is inconsistent with the negative examples <math>E^-</math>, given the background knowledge {{mvar|B}}; it implies "''Weak consistency''"; if no negative examples are given, both requirements coincide. Džeroski <ref>{{cite book|first1=Sašo|last1=Džeroski|chapter=Inductive Logic Programming and Knowledge Discovery in Databases|pages=117–152 See §5.2.4|editor1-first=U.M.|editor1-last=Fayyad|editor2-first=G.|editor2-last=Piatetsky-Shapiro|editor3-first=P.|editor3-last=Smith|editor4-first=R.|editor4-last=Uthurusamy|title=Advances in Knowledge Discovery and Data Mining|publisher=MIT Press|year=1996|chapter-url=http://kt.ijs.si/SasoDzeroski/pdfs/1996/Chapters/1996_InductiveLogicProgramming.pdf|access-date=2021-09-27|archive-date=2021-09-27|archive-url=https://web.archive.org/web/20210927141157/http://kt.ijs.si/SasoDzeroski/pdfs/1996/Chapters/1996_InductiveLogicProgramming.pdf|url-status=dead}}</ref> requires only "''Sufficiency''" (called "Completeness" there) and "''Strong consistency''".
 
==Example==