Content deleted Content added
Rescuing 1 sources and tagging 0 as dead.) #IABot (v2.0.9.5 |
|||
(48 intermediate revisions by 33 users not shown) | |||
Line 1:
{{short description|Paradigm of rule-based machine learning methods}}
[[File:Function approximation with LCS rules.jpg|thumb|2D visualization of LCS rules learning to approximate a 3D function. Each blue ellipse represents an individual rule covering part of the solution space. (Adapted from images taken from XCSF<ref name=":9">{{Cite journal|
'''Learning classifier systems''', or '''LCS''', are a paradigm of [[rule-based machine learning]] methods that combine a discovery component (e.g. typically a [[genetic algorithm]] in [[evolutionary computation]]) with a learning component (performing either [[supervised learning]], [[reinforcement learning]], or [[unsupervised learning]]).<ref name=":1">{{Cite journal|
The founding concepts behind learning classifier systems came from attempts to model [[complex adaptive system]]s, using rule-based agents to form an artificial cognitive system (i.e. [[artificial intelligence]]).
== Methodology ==
The architecture and components of a given learning classifier system can be quite variable.
=== Elements of a generic LCS algorithm ===
[[File:Generic Michigan-style Supervised LCS Schematic.png|thumb|A step-wise schematic illustrating a generic Michigan-style learning classifier system learning cycle performing supervised learning
Keeping in mind that LCS is a paradigm for genetic-based machine learning rather than a specific method, the following outlines key elements of a generic, modern (i.e. post-XCS) LCS algorithm.
==== Environment ====
The environment is the source of data upon which an LCS learns.
==== Rule/classifier/population ====
A rule is a context dependent relationship between state values and some prediction.
Rules can be represented in many different ways to handle different data types (e.g. binary, discrete-valued, ordinal, continuous-valued).
In any LCS, the trained model is a set of rules/classifiers, rather than any single rule/classifier. In Michigan-style LCS, the entire trained (and optionally, compacted) classifier population forms the prediction model.
==== Matching ====
One of the most critical and often time
==== Covering ====
At this point in the learning cycle, if no classifiers made it into either [M] or [C] (as would be the case when the population starts off empty), the covering mechanism is applied (fifth step).
==== Parameter updates/credit assignment/learning ====
In the sixth step, the rule parameters of any rule in [M] are updated to reflect the new experience gained from the current training instance.
==== Subsumption ====
In the seventh step, a ''subsumption'' mechanism is typically applied.
==== Rule discovery/genetic algorithm ====
In the eighth step, LCS adopts a highly elitist [[genetic algorithm]] (GA) which will select two parent classifiers based on fitness (survival of the fittest). Parents are selected from [C] typically using [[tournament selection]].
==== Deletion ====
The last step in a generic LCS learning cycle is to maintain the maximum population size. The deletion mechanism will select classifiers for deletion (commonly using roulette wheel selection).
==== Training ====
LCS will cycle through these steps repeatedly for some user defined number of training iterations, or until some user defined termination criteria have been met.
==== Rule compaction ====
Once training is complete, the rule population will inevitably contain some poor, redundant and inexperienced rules.
==== Prediction ====
Whether or not rule compaction has been applied, the output of an LCS algorithm is a population of classifiers which can be applied to making predictions on previously unseen instances.
==== Interpretation ====
Individual LCS rules are typically human readable IF:THEN expression.
== History ==
=== Early years ===
[[John Henry Holland]] was best known for his work popularizing [[genetic algorithm]]s (GA), through his ground-breaking book "Adaptation in Natural and Artificial Systems"<ref>{{Cite book|title=Adaptation in natural and artificial systems: an introductory analysis with applications to biology, control, and artificial intelligence|last=Holland|first=John|publisher=Michigan Press|year=1975|isbn=9780262581110|
adaptive algorithms Reprinted in: Evolutionary computation.
The fossil record. In: David BF (ed) IEEE Press, New York
1998. {{ISBN
</ref> This first system, named '''Cognitive System One (CS-1)''' was conceived as a modeling tool, designed to model a real system (i.e. ''environment'') with unknown underlying dynamics using a population of human readable rules. The goal was for a set of rules to perform [[online machine learning]] to adapt to the environment based on infrequent payoff/reward (i.e. reinforcement learning) and apply these rules to generate a behavior that matched the real system. This early, ambitious implementation was later regarded as overly complex, yielding inconsistent results.<ref name=":1" /><ref name=":3">{{Cite journal|last=Lanzi|first=Pier Luca|date=2008-02-08|title=Learning classifier systems: then and now
Beginning in 1980, [[Kenneth A De Jong|Kenneth
algorithms. Ph.D. thesis, Department of Computer Science,
University of Pittsburgh
</ref><ref>Smith S (1983) [https://www.researchgate.net/profile/Stephen_Smith14/publication/220815785_Flexible_Learning_of_Problem_Solving_Heuristics_Through_Adaptive_Search/links/0deec52c18dbd0dd53000000.pdf Flexible learning of problem solving heuristics through adaptive search]. In: Eighth international joint conference
on articial intelligence. Morgan Kaufmann, Los Altos, pp
421–425
</ref><ref>De Jong KA (1988) Learning with genetic algorithms: an overview. Mach Learn 3:121–138</ref> This new approach was more similar to a standard genetic algorithm but evolved independent sets of rules. Since that time LCS methods inspired by the online learning framework introduced by Holland at the [[University of Michigan]] have been referred to as '''Michigan-style LCS''', and those inspired by Smith and De Jong at the [[University of Pittsburgh]] have been referred to as '''Pittsburgh-style LCS'''.<ref name=":1" /><ref name=":3" /> In 1986, Holland developed what would be considered the standard Michigan-style LCS for the next decade.<ref name=":4">[http://dl.acm.org/citation.cfm?id=216016 Holland, John H. "Escaping brittleness: the possibilities of general purpose learning algorithms applied to parallel rule-based system." ''Machine learning''(1986): 593-623.]</ref>
Other important concepts that emerged in the early days of LCS research included (1) the formalization of a ''bucket brigade algorithm'' (BBA) for credit assignment/learning,<ref>{{Cite
of the 3rd Annual Genetic Programming Conference, pp.
658–664, 1998.
</ref>), and (12) the introduction of the first [[Q-learning]]-like credit assignment technique.<ref>{{Cite
=== The revolution ===
Interest in learning classifier systems was reinvigorated in the mid
In 1995, Wilson published his landmark paper, "Classifier fitness based on accuracy" in which he introduced the classifier system '''XCS'''.<ref name=":10" /> XCS took the simplified architecture of ZCS and added an accuracy-based fitness, a niche GA (acting in the action set [A]), an explicit generalization mechanism called ''subsumption'', and an adaptation of the [[Q-learning|Q-Learning]] credit assignment. XCS was popularized by its ability to reach optimal performance while evolving accurate and maximally general classifiers as well as its impressive problem flexibility (able to perform both [[reinforcement learning]] and [[supervised learning]])
=== In the wake of XCS ===
XCS inspired the development of a whole new generation of LCS algorithms and applications. In 1995, Congdon was the first to apply LCS to real-world [[Epidemiology|epidemiological]] investigations of disease <ref name=":8" /> followed closely by Holmes who developed the '''BOOLE++''',<ref>{{Cite journal|last=Holmes|first=John H.|date=1996-01-01|title=A Genetics-Based Machine Learning Approach to Knowledge Discovery in Clinical Data|journal=Proceedings of the AMIA Annual Fall Symposium|pages=883|issn=1091-8280|pmc=2233061}}</ref> '''EpiCS''',<ref>Holmes, John H. "[https://web.archive.org/web/20180820234915/https://pdfs.semanticscholar.org/71e4/eb6c630dee4b762e74b2970f6dc638a351ab.pdf Discovering Risk of Disease with a Learning Classifier System]." In ''ICGA'', pp. 426-433. 1997.</ref> and later '''EpiXCS'''<ref>Holmes, John H., and Jennifer A. Sager. "[https://link.springer.com/10.1007%2F11527770_60 Rule discovery in epidemiologic surveillance data using EpiXCS: an evolutionary computation approach]." In''Conference on Artificial Intelligence in Medicine in Europe'', pp. 444-452. Springer Berlin Heidelberg, 2005.</ref> for [[Epidemiology|epidemiological]] classification. These early works inspired later interest in applying LCS algorithms to complex and large-scale [[data mining]] tasks epitomized by [[bioinformatics]] applications. In 1998, Stolzmann introduced '''anticipatory classifier systems (ACS)''' which included rules in the form of 'condition-action-effect, rather than the classic 'condition-action' representation.<ref name=":7" /> ACS was designed to predict the perceptual consequences of an action in all possible situations in an environment. In other words, the system evolves a model that specifies not only what to do in a given situation, but also provides information of what will happen after a specific action will be executed. This family of LCS algorithms is best suited to multi-step problems, planning, speeding up learning, or disambiguating perceptual aliasing (i.e. where the same observation is obtained in distinct states but requires different actions). Butz later pursued this anticipatory family of LCS developing a number of improvements to the original method.<ref>Butz, Martin V. "[https://web.archive.org/web/20180820234943/https://pdfs.semanticscholar.org/3572/7a56fcce7a73ccc43e5bfa19389780e6d436.pdf Biasing exploration in an anticipatory learning classifier system]." In ''International Workshop on Learning Classifier Systems'', pp. 3-22. Springer Berlin Heidelberg, 2001.</ref> In 2002, Wilson introduced '''XCSF''', adding a computed action in order to perform function approximation.<ref>{{Cite journal|last=Wilson|first=Stewart W.|title=Classifiers that approximate functions
== Variants ==
Line 89 ⟶ 90:
=== Pittsburgh-Style Learning Classifier System ===
Pittsburgh-Style LCSs are characterized by a population of variable length rule-sets where each rule-set is a potential solution. The genetic algorithm typically operates at the level of an entire rule-set. Pittsburgh-style systems
=== Hybrid systems ===
Line 97 ⟶ 98:
* Adaptive: They can acclimate to a changing environment in the case of online learning.
* Model free: They make limited assumptions about the environment, or the patterns of association within the data.
**
** They make no assumptions about the number of predictive vs. non-predictive features in the data.
* Ensemble Learner: No single model is applied to a given instance that universally provides a prediction. Instead a relevant and often conflicting set of rules contribute a 'vote' which can be interpreted as a fuzzy prediction.
* Stochastic Learner: Non-deterministic learning is advantageous in large-scale or high complexity problems where deterministic or exhaustive learning becomes intractable.
* Implicitly Multi-objective: Rules evolve towards accuracy with implicit and explicit pressures encouraging maximal generality/simplicity. This implicit generalization pressure is unique to LCS. Effectively, more general rules, will appear more often in match sets. In turn, they have a more frequent opportunity to be selected as parents, and pass on their more general (genomes) to offspring rules.
* Interpretable:In the interest of data mining and knowledge discovery individual LCS rules are logical, and can be made to be human interpretable IF:THEN statements. Effective strategies have also been introduced to allow for global knowledge discovery identifying significant features, and patterns of association from the rule population as a whole.<ref name=":11" />
* Flexible application
** Single or multi-step problems
Line 115 ⟶ 116:
== Disadvantages ==
* Limited Software Availability: There are a limited number of open source, accessible LCS implementations, and even fewer that are designed to be user friendly or accessible to machine learning practitioners.
* Interpretation: While LCS algorithms are certainly more interpretable than some advanced machine learners, users must interpret a set of rules (sometimes large sets of rules to comprehend the LCS model.
* Theory/Convergence Proofs: There is a relatively small body of theoretical work behind LCS algorithms. This is likely due to their relative algorithmic complexity (applying a number of interacting components) as well as their stochastic nature.
* Overfitting: Like any machine learner, LCS can suffer from [[overfitting]] despite implicit and explicit generalization pressures.
Line 130 ⟶ 131:
* Game-Play
* Image Classification
* Knowledge
* Medical Diagnosis
* Modeling
Line 144 ⟶ 145:
== Terminology ==
The name, "Learning Classifier System (LCS)", is a bit misleading since there are many [[machine learning]] algorithms that 'learn to classify' (e.g. [[decision tree]]s, [[artificial neural network]]s), but are not LCSs. The term 'rule-based machine learning ([[Rule-based machine learning|RBML]])' is useful, as it more clearly captures the essential 'rule-based' component of these systems, but it also generalizes to methods that are not considered to be LCSs (e.g. [[association rule learning]], or [[artificial immune system]]s). More general terms such as, 'genetics-based machine learning', and even 'genetic algorithm'<ref name=":8">Congdon, Clare Bates. "A comparison of genetic algorithms and other machine learning systems on a complex classification task from common disease research." PhD diss., The University of Michigan, 1995.</ref> have also been applied to refer to what would be more characteristically defined as a learning classifier system. Due to their similarity to [[genetic algorithm]]s, Pittsburgh-style learning classifier systems are sometimes generically referred to as 'genetic algorithms'. Beyond this, some LCS algorithms, or closely related methods, have been referred to as 'cognitive systems',<ref name=":2" /> 'adaptive agents', '[[production system (computer science)|production system]]s', or generically as a 'classifier system'.<ref>{{Cite journal|
Up until the
== See also ==
* [[Rule-based machine learning]]
* [[Production system (computer science)|Production system]]
* [[Expert system]]
|