Content deleted Content added
m spelling |
Rescuing 1 sources and tagging 0 as dead.) #IABot (v2.0.9.5 |
||
(17 intermediate revisions by 11 users not shown) | |||
Line 1:
{{short description|Paradigm of rule-based machine learning methods}}
[[File:Function approximation with LCS rules.jpg|thumb|2D visualization of LCS rules learning to approximate a 3D function. Each blue ellipse represents an individual rule covering part of the solution space. (Adapted from images taken from XCSF<ref name=":9">{{Cite journal|last1=Stalph|first1=Patrick O.|last2=Butz|first2=Martin V.|date=2010-02-01|title=JavaXCSF: The XCSF Learning Classifier System in Java|journal=ACM SIGEVOlution|volume=4|issue=3|pages=16–19|doi=10.1145/1731888.1731890|s2cid=16861908|issn=1931-8499}}</ref> with permission from Martin Butz)]]
'''Learning classifier systems''', or '''LCS''', are a paradigm of [[rule-based machine learning]] methods that combine a discovery component (e.g. typically a [[genetic algorithm]] in [[evolutionary computation]]) with a learning component (performing either [[supervised learning]], [[reinforcement learning]], or [[unsupervised learning]]).<ref name=":1">{{Cite journal|last1=Urbanowicz|first1=Ryan J.|last2=Moore|first2=Jason H.|date=2009-09-22|title=Learning Classifier Systems: A Complete Introduction, Review, and Roadmap|journal=Journal of Artificial Evolution and Applications|language=en|volume=2009|pages=1–25|doi=10.1155/2009/736398|issn=1687-6229|doi-access=free}}</ref> Learning classifier systems seek to identify a set of context-dependent rules that collectively store and apply knowledge in a [[piecewise]] manner in order to make predictions (e.g. [[behavior modeling]],<ref>{{Cite journal|last=Dorigo|first=Marco|title=Alecsys and the AutonoMouse: Learning to control a real robot by distributed classifier systems|journal=Machine Learning|language=en|volume=19|issue=3|pages=209–240|doi=10.1007/BF00996270|issn=0885-6125|year=1995|doi-access=free}}</ref> [[Statistical classification|classification]],<ref>{{Cite journal|last1=Bernadó-Mansilla|first1=Ester|last2=Garrell-Guiu|first2=Josep M.|date=2003-09-01|title=Accuracy-Based Learning Classifier Systems: Models, Analysis and Applications to Classification Tasks|journal=Evolutionary Computation|volume=11|issue=3|pages=209–238|doi=10.1162/106365603322365289|pmid=14558911|s2cid=9086149|issn=1063-6560}}</ref><ref name=":0">{{Cite journal|last1=Urbanowicz|first1=Ryan J.|last2=Moore|first2=Jason H.|date=2015-04-03|title=ExSTraCS 2.0: description and evaluation of a scalable learning classifier system|journal=Evolutionary Intelligence|language=en|volume=8|issue=2–3|pages=89–116|doi=10.1007/s12065-015-0128-8|issn=1864-5909|pmc=4583133|pmid=26417393}}</ref> [[data mining]],<ref name=":0" /><ref>{{Cite book|title=Advances in Learning Classifier Systems|url=https://archive.org/details/advanceslearning00lanz|url-access=limited|last1=Bernadó|first1=Ester|last2=Llorà|first2=Xavier|last3=Garrell|first3=Josep M.|chapter=XCS and GALE: A Comparative Study of Two Learning Classifier Systems on Data Mining |date=2001-07-07|publisher=Springer Berlin Heidelberg|isbn=9783540437932|editor-last=Lanzi|editor-first=Pier Luca|series=Lecture Notes in Computer Science|volume=2321 |pages=[https://archive.org/details/advanceslearning00lanz/page/n120 115]–132|language=en|doi=10.1007/3-540-48104-4_8|editor-last2=Stolzmann|editor-first2=Wolfgang|editor-last3=Wilson|editor-first3=Stewart W.}}</ref><ref>{{Cite book|title=Learning Classifier Systems|url=https://archive.org/details/learningclassifi00kova_690|url-access=limited|last1=Bacardit|first1=Jaume|last2=Butz|first2=Martin V.|chapter=Data Mining in Learning Classifier Systems: Comparing XCS with GAssist |date=2007-01-01|publisher=Springer Berlin Heidelberg|isbn=9783540712305|editor-last=Kovacs|editor-first=Tim|series=Lecture Notes in Computer Science|volume=4399 |pages=[https://archive.org/details/learningclassifi00kova_690/page/n291 282]–290|language=en|doi=10.1007/978-3-540-71231-2_19|editor-last2=Llorà|editor-first2=Xavier|editor-last3=Takadama|editor-first3=Keiki|editor-last4=Lanzi|editor-first4=Pier Luca|editor-last5=Stolzmann|editor-first5=Wolfgang|editor-last6=Wilson|editor-first6=Stewart W.|citeseerx = 10.1.1.553.4679}}</ref> [[Regression analysis|regression]],<ref>{{Cite book|last1=Urbanowicz|first1=Ryan|last2=Ramanand|first2=Niranjan|last3=Moore|first3=Jason
The founding concepts behind learning classifier systems came from attempts to model [[complex adaptive system]]s, using rule-based agents to form an artificial cognitive system (i.e. [[artificial intelligence]]).
== Methodology ==
The architecture and components of a given learning classifier system can be quite variable.
=== Elements of a generic LCS algorithm ===
[[File:Generic Michigan-style Supervised LCS Schematic.png|thumb|A step-wise schematic illustrating a generic Michigan-style learning classifier system learning cycle performing supervised learning
Keeping in mind that LCS is a paradigm for genetic-based machine learning rather than a specific method, the following outlines key elements of a generic, modern (i.e. post-XCS) LCS algorithm.
==== Environment ====
The environment is the source of data upon which an LCS learns.
==== Rule/classifier/population ====
A rule is a context dependent relationship between state values and some prediction.
Rules can be represented in many different ways to handle different data types (e.g. binary, discrete-valued, ordinal, continuous-valued).
In any LCS, the trained model is a set of rules/classifiers, rather than any single rule/classifier. In Michigan-style LCS, the entire trained (and optionally, compacted) classifier population forms the prediction model.
==== Matching ====
One of the most critical and often time-consuming elements of an LCS is the matching process.
==== Covering ====
At this point in the learning cycle, if no classifiers made it into either [M] or [C] (as would be the case when the population starts off empty), the covering mechanism is applied (fifth step).
==== Parameter updates/credit assignment/learning ====
In the sixth step, the rule parameters of any rule in [M] are updated to reflect the new experience gained from the current training instance.
==== Subsumption ====
In the seventh step, a ''subsumption'' mechanism is typically applied.
==== Rule discovery/genetic algorithm ====
In the eighth step, LCS adopts a highly elitist [[genetic algorithm]] (GA) which will select two parent classifiers based on fitness (survival of the fittest). Parents are selected from [C] typically using [[tournament selection]].
==== Deletion ====
The last step in a generic LCS learning cycle is to maintain the maximum population size. The deletion mechanism will select classifiers for deletion (commonly using roulette wheel selection).
==== Training ====
LCS will cycle through these steps repeatedly for some user defined number of training iterations, or until some user defined termination criteria have been met.
==== Rule compaction ====
Once training is complete, the rule population will inevitably contain some poor, redundant and inexperienced rules.
==== Prediction ====
Whether or not rule compaction has been applied, the output of an LCS algorithm is a population of classifiers which can be applied to making predictions on previously unseen instances.
==== Interpretation ====
Individual LCS rules are typically human readable IF:THEN expression.
== History ==
Line 66 ⟶ 67:
on articial intelligence. Morgan Kaufmann, Los Altos, pp
421–425
</ref><ref>De Jong KA (1988) Learning with genetic algorithms: an overview. Mach Learn 3:121–138</ref> This new approach was more similar to a standard genetic algorithm but evolved independent sets of rules. Since that time LCS methods inspired by the online learning framework introduced by Holland at the [[University of Michigan]] have been referred to as '''Michigan-style LCS''', and those inspired by Smith and De Jong at the [[University of Pittsburgh]] have been referred to as '''Pittsburgh-style LCS'''.<ref name=":1" /><ref name=":3" /> In 1986, Holland developed what would be considered the standard Michigan-style LCS for the next decade.<ref name=":4">[http://dl.acm.org/citation.cfm?id=216016 Holland, John H. "Escaping brittleness: the possibilities of general purpose learning algorithms applied to parallel rule-based system." ''Machine learning''(1986): 593-623.]</ref>
Other important concepts that emerged in the early days of LCS research included (1) the formalization of a ''bucket brigade algorithm'' (BBA) for credit assignment/learning,<ref>{{Cite book|last=Holland|first=John H.|date=1985-01-01|title=Properties of the Bucket Brigade|url=http://dl.acm.org/citation.cfm?id=645511.657087|journal=Proceedings of the 1st International Conference on Genetic Algorithms|___location=Hillsdale, NJ, USA|publisher=L. Erlbaum Associates Inc.|pages=1–7|isbn=978-0805804263}}</ref> (2) selection of parent rules from a common 'environmental niche' (i.e. the ''match set'' [M]) rather than from the whole ''population'' [P],<ref>{{Cite thesis|last=Booker|first=L|title=Intelligent Behavior as
of the 3rd Annual Genetic Programming Conference, pp.
Line 78 ⟶ 79:
Interest in learning classifier systems was reinvigorated in the mid 1990s largely due to two events; the development of the [[Q-learning|Q-Learning]] algorithm<ref>Watkins, Christopher John Cornish Hellaby. "Learning from delayed rewards." PhD diss., University of Cambridge, 1989.</ref> for [[reinforcement learning]], and the introduction of significantly simplified Michigan-style LCS architectures by Stewart Wilson.<ref name=":10">{{Cite journal|last=Wilson|first=Stewart W.|date=1995-06-01|title=Classifier Fitness Based on Accuracy|journal=Evol. Comput.|volume=3|issue=2|pages=149–175|doi=10.1162/evco.1995.3.2.149|issn=1063-6560|citeseerx=10.1.1.363.2210|s2cid=18341635}}</ref><ref name=":6">{{Cite journal|last=Wilson|first=Stewart W.|date=1994-03-01|title=ZCS: A Zeroth Level Classifier System|journal=Evolutionary Computation|volume=2|issue=1|pages=1–18|doi=10.1162/evco.1994.2.1.1|issn=1063-6560|citeseerx=10.1.1.363.798|s2cid=17680778}}</ref> Wilson's '''Zeroth-level Classifier System (ZCS)'''<ref name=":6" /> focused on increasing algorithmic understandability based on Hollands standard LCS implementation.<ref name=":4" /> This was done, in part, by removing rule-bidding and the internal message list, essential to the original BBA credit assignment, and replacing it with a hybrid BBA/[[Q-learning|Q-Learning]] strategy. ZCS demonstrated that a much simpler LCS architecture could perform as well as the original, more complex implementations. However, ZCS still suffered from performance drawbacks including the proliferation of over-general classifiers.
In 1995, Wilson published his landmark paper, "Classifier fitness based on accuracy" in which he introduced the classifier system '''XCS'''.<ref name=":10" /> XCS took the simplified architecture of ZCS and added an accuracy-based fitness, a niche GA (acting in the action set [A]), an explicit generalization mechanism called ''subsumption'', and an adaptation of the [[Q-learning|Q-Learning]] credit assignment. XCS was popularized by its ability to reach optimal performance while evolving accurate and maximally general classifiers as well as its impressive problem flexibility (able to perform both [[reinforcement learning]] and [[supervised learning]]). XCS later became the best known and most studied LCS algorithm and defined a new family of ''accuracy-based LCS''. ZCS alternatively became synonymous with ''strength-based LCS''. XCS is also important, because it successfully bridged the gap between LCS and the field of [[reinforcement learning]]. Following the success of XCS, LCS were later described as reinforcement learning systems endowed with a generalization capability.<ref>{{Cite journal|last=Lanzi|first=P. L.|title=Learning classifier systems from a reinforcement learning perspective|journal=Soft Computing|language=en|volume=6|issue=3–4|pages=162–170|doi=10.1007/s005000100113|issn=1432-7643|year=2002|s2cid=39103390}}</ref> [[Reinforcement learning]] typically seeks to learn a value function that maps out a complete representation of the state/action space. Similarly, the design of XCS drives it to form an all-inclusive and accurate representation of the problem space (i.e. a ''complete map'') rather than focusing on high payoff niches in the environment (as was the case with strength-based LCS). Conceptually, complete maps don't only capture what you should do, or what is correct, but also what you shouldn't do, or what's incorrect. Differently, most strength-based LCSs, or exclusively supervised learning LCSs seek a rule set of efficient generalizations in the form of a ''best action map'' (or a ''partial map''). Comparisons between strength vs. accuracy-based fitness and complete vs. best action maps have since been examined in greater detail.<ref>Kovacs, Timothy Michael Douglas. ''A Comparison of Strength and Accuracy-based Fitness in Learning and Classifier Systems''. 2002.</ref><ref>
=== In the wake of XCS ===
XCS inspired the development of a whole new generation of LCS algorithms and applications. In 1995, Congdon was the first to apply LCS to real-world [[Epidemiology|epidemiological]] investigations of disease <ref name=":8" /> followed closely by Holmes who developed the '''BOOLE++''',<ref>{{Cite journal|last=Holmes|first=John H.|date=1996-01-01|title=A Genetics-Based Machine Learning Approach to Knowledge Discovery in Clinical Data|journal=Proceedings of the AMIA Annual Fall Symposium|pages=883|issn=1091-8280|pmc=2233061}}</ref> '''EpiCS''',<ref>Holmes, John H. "[https://web.archive.org/web/20180820234915/https://pdfs.semanticscholar.org/71e4/eb6c630dee4b762e74b2970f6dc638a351ab.pdf Discovering Risk of Disease with a Learning Classifier System]." In ''ICGA'', pp. 426-433. 1997.</ref> and later '''EpiXCS'''<ref>Holmes, John H., and Jennifer A. Sager. "[https://link.springer.com/10.1007%2F11527770_60 Rule discovery in epidemiologic surveillance data using EpiXCS: an evolutionary computation approach]." In''Conference on Artificial Intelligence in Medicine in Europe'', pp. 444-452. Springer Berlin Heidelberg, 2005.</ref> for [[Epidemiology|epidemiological]] classification. These early works inspired later interest in applying LCS algorithms to complex and large-scale [[data mining]] tasks epitomized by [[bioinformatics]] applications. In 1998, Stolzmann introduced '''anticipatory classifier systems (ACS)''' which included rules in the form of 'condition-action-effect, rather than the classic 'condition-action' representation.<ref name=":7" /> ACS was designed to predict the perceptual consequences of an action in all possible situations in an environment. In other words, the system evolves a model that specifies not only what to do in a given situation, but also provides information of what will happen after a specific action will be executed. This family of LCS algorithms is best suited to multi-step problems, planning, speeding up learning, or disambiguating perceptual aliasing (i.e. where the same observation is obtained in distinct states but requires different actions). Butz later pursued this anticipatory family of LCS developing a number of improvements to the original method.<ref>Butz, Martin V. "[https://web.archive.org/web/20180820234943/https://pdfs.semanticscholar.org/3572/7a56fcce7a73ccc43e5bfa19389780e6d436.pdf Biasing exploration in an anticipatory learning classifier system]." In ''International Workshop on Learning Classifier Systems'', pp. 3-22. Springer Berlin Heidelberg, 2001.</ref> In 2002, Wilson introduced '''XCSF''', adding a computed action in order to perform function approximation.<ref>{{Cite journal|last=Wilson|first=Stewart W.|title=Classifiers that approximate functions|journal=Natural Computing|language=en|volume=1|issue=2–3|pages=211–234|doi=10.1023/A:1016535925043|issn=1567-7818|year=2002|s2cid=23032802}}</ref> In 2003, Bernado-Mansilla introduced a '''sUpervised Classifier System (UCS)''', which specialized the XCS algorithm to the task of [[supervised learning]], single-step problems, and forming a best action set. UCS removed the [[reinforcement learning]] strategy in favor of a simple, accuracy-based rule fitness as well as the explore/exploit learning phases, characteristic of many reinforcement learners. Bull introduced a simple accuracy-based LCS '''(YCS)'''<ref>Bull, Larry. "[https://web.archive.org/web/20180820234941/https://pdfs.semanticscholar.org/120c/8f5057995c36ee60ec320c2263b20af05444.pdf A simple accuracy-based learning classifier system]." ''Learning Classifier Systems Group Technical Report UWELCSG03-005, University of the West of England, Bristol, UK'' (2003).</ref> and a simple strength-based LCS '''Minimal Classifier System (MCS)'''<ref>Bull, Larry. "[https://link.springer.com/chapter/10.1007/978-3-540-30217-9_104 A simple payoff-based learning classifier system]." In''International Conference on Parallel Problem Solving from Nature'', pp. 1032-1041. Springer Berlin Heidelberg, 2004.</ref> in order to develop a better theoretical understanding of the LCS framework. Bacardit introduced '''GAssist'''<ref>Peñarroya, Jaume Bacardit. "Pittsburgh genetic-based machine learning in the data mining era: representations, generalization, and run-time." PhD diss., Universitat Ramon Llull, 2004.</ref> and '''BioHEL''',<ref>{{Cite journal|last1=Bacardit|first1=Jaume|last2=Burke|first2=Edmund K.|last3=Krasnogor|first3=Natalio|date=2008-12-12|title=Improving the scalability of rule-based evolutionary learning|journal=Memetic Computing|language=en|volume=1|issue=1|pages=55–67|doi=10.1007/s12293-008-0005-4|s2cid=775199|issn=1865-9284}}</ref> Pittsburgh-style LCSs designed for [[data mining]] and [[scalability]] to large datasets in [[bioinformatics]] applications. In 2008, Drugowitsch published the book titled "Design and Analysis of Learning Classifier Systems" including some theoretical examination of LCS algorithms.<ref>{{Cite book|title=Design and Analysis of Learning Classifier Systems - Springer|volume = 139|doi=10.1007/978-3-540-79866-8|series = Studies in Computational Intelligence|year = 2008|isbn = 978-3-540-79865-1|last1 = Drugowitsch|first1 = Jan}}</ref> Butz introduced the first rule online learning visualization within a [[Graphical user interface|GUI]] for XCSF<ref name=":9" /> (see the image at the top of this page). Urbanowicz extended the UCS framework and introduced '''ExSTraCS,''' explicitly designed for [[supervised learning]] in noisy problem domains (e.g. epidemiology and bioinformatics).<ref>Urbanowicz, Ryan J., Gediminas Bertasius, and Jason H. Moore. "[http://www.seas.upenn.edu/~gberta/uploads/3/1/4/8/31486883/urbanowicz_2014_exstracs_algorithm.pdf An extended michigan-style learning classifier system for flexible supervised learning, classification, and data mining] {{Webarchive|url=https://web.archive.org/web/20170829055517/http://www.seas.upenn.edu/~gberta/uploads/3/1/4/8/31486883/urbanowicz_2014_exstracs_algorithm.pdf |date=2017-08-29 }}." In ''International Conference on Parallel Problem Solving from Nature'', pp. 211-221. Springer International Publishing, 2014.</ref> ExSTraCS integrated (1) expert knowledge to drive covering and genetic algorithm towards important features in the data,<ref>Urbanowicz, Ryan J., Delaney Granizo-Mackenzie, and Jason H. Moore. "[https://web.archive.org/web/20180820234834/https://pdfs.semanticscholar.org/b407/8f8bb6aa9e39e84b0b20874662a6ed8b7df1.pdf Using expert knowledge to guide covering and mutation in a michigan style learning classifier system to detect epistasis and heterogeneity]." In''International Conference on Parallel Problem Solving from Nature'', pp. 266-275. Springer Berlin Heidelberg, 2012.</ref> (2) a form of long-term memory referred to as attribute tracking,<ref>{{Cite book|last1=Urbanowicz|first1=Ryan|last2=Granizo-Mackenzie|first2=Ambrose|last3=Moore|first3=Jason
== Variants ==
Line 144 ⟶ 145:
== Terminology ==
The name, "Learning Classifier System (LCS)", is a bit misleading since there are many [[machine learning]] algorithms that 'learn to classify' (e.g. [[decision tree]]s, [[artificial neural network]]s), but are not LCSs. The term 'rule-based machine learning ([[Rule-
Up until the 2000s nearly all learning classifier system methods were developed with reinforcement learning problems in mind. As a result, the term ‘learning classifier system’ was commonly defined as the combination of ‘trial-and-error’ reinforcement learning with the global search of a genetic algorithm. Interest in supervised learning applications, and even unsupervised learning have since broadened the use and definition of this term.
== See also ==
* [[Rule-based machine learning]]
* [[Production system (computer science)|Production system]]
* [[Expert system]]
|