Learning classifier system: Difference between revisions

Content deleted Content added
GenEars (talk | contribs)
m Terminology: Link to Rule-based_machine_learning
m replaced: time consuming → time-consuming
Line 22:
 
==== Matching ====
One of the most critical and often time -consuming elements of an LCS is the matching process. The first step in an LCS learning cycle takes a single training instance from the environment and passes it to [P] where matching takes place. In step two, every rule in [P] is now compared to the training instance to see which rules match (i.e. are contextually relevant to the current instance). In step three, any matching rules are moved to a ''match set'' [M]. A rule matches a training instance if all feature values specified in the rule condition are equivalent to the corresponding feature value in the training instance. For example, assuming the training instance is (001001 ~ 0), these rules would match: (###0## ~ 0), (00###1 ~ 0), (#01001 ~ 1), but these rules would not (1##### ~ 0), (000##1 ~ 0), (#0#1#0 ~ 1). Notice that in matching, the endpoint/action specified by the rule is not taken into consideration. As a result, the match set may contain classifiers that propose conflicting actions. In the fourth step, since we are performing supervised learning, [M] is divided into a correct set [C] and an incorrect set [I]. A matching rule goes into the correct set if it proposes the correct action (based on the known action of the training instance), otherwise it goes into [I]. In reinforcement learning LCS, an action set [A] would be formed here instead, since the correct action is not known.
 
==== Covering ====
Line 78:
Interest in learning classifier systems was reinvigorated in the mid 1990s largely due to two events; the development of the [[Q-learning|Q-Learning]] algorithm<ref>Watkins, Christopher John Cornish Hellaby. "Learning from delayed rewards." PhD diss., University of Cambridge, 1989.</ref> for [[reinforcement learning]], and the introduction of significantly simplified Michigan-style LCS architectures by Stewart Wilson.<ref name=":10">{{Cite journal|last=Wilson|first=Stewart W.|date=1995-06-01|title=Classifier Fitness Based on Accuracy|journal=Evol. Comput.|volume=3|issue=2|pages=149–175|doi=10.1162/evco.1995.3.2.149|issn=1063-6560|citeseerx=10.1.1.363.2210}}</ref><ref name=":6">{{Cite journal|last=Wilson|first=Stewart W.|date=1994-03-01|title=ZCS: A Zeroth Level Classifier System|journal=Evolutionary Computation|volume=2|issue=1|pages=1–18|doi=10.1162/evco.1994.2.1.1|issn=1063-6560|citeseerx=10.1.1.363.798}}</ref> Wilson's '''Zeroth-level Classifier System (ZCS)'''<ref name=":6" /> focused on increasing algorithmic understandability based on Hollands standard LCS implementation.<ref name=":4" /> This was done, in part, by removing rule-bidding and the internal message list, essential to the original BBA credit assignment, and replacing it with a hybrid BBA/[[Q-learning|Q-Learning]] strategy. ZCS demonstrated that a much simpler LCS architecture could perform as well as the original, more complex implementations. However, ZCS still suffered from performance drawbacks including the proliferation of over-general classifiers.
 
In 1995, Wilson published his landmark paper, "Classifier fitness based on accuracy" in which he introduced the classifier system '''XCS'''.<ref name=":10" /> XCS took the simplified architecture of ZCS and added an accuracy-based fitness, a niche GA (acting in the action set [A]), an explicit generalization mechanism called ''subsumption'', and an adaptation of the [[Q-learning|Q-Learning]] credit assignment. XCS was popularized by its ability to reach optimal performance while evolving accurate and maximally general classifiers as well as its impressive problem flexibility (able to perform both [[reinforcement learning]] and [[supervised learning]]) . XCS later became the best known and most studied LCS algorithm and defined a new family of ''accuracy-based LCS''. ZCS alternatively became synonymous with ''strength-based LCS''. XCS is also important, because it successfully bridged the gap between LCS and the field of [[reinforcement learning]]. Following the success of XCS, LCS were later described as reinforcement learning systems endowed with a generalization capability.<ref>{{Cite journal|last=Lanzi|first=P. L.|title=Learning classifier systems from a reinforcement learning perspective|journal=Soft Computing|language=en|volume=6|issue=3–4|pages=162–170|doi=10.1007/s005000100113|issn=1432-7643|year=2002}}</ref> [[Reinforcement learning]] typically seeks to learn a value function that maps out a complete representation of the state/action space. Similarly, the design of XCS drives it to form an all-inclusive and accurate representation of the problem space (i.e. a ''complete map'') rather than focusing on high payoff niches in the environment (as was the case with strength-based LCS). Conceptually, complete maps don't only capture what you should do, or what is correct, but also what you shouldn't do, or what's incorrect. Differently, most strength-based LCSs, or exclusively supervised learning LCSs seek a rule set of efficient generalizations in the form of a ''best action map'' (or a ''partial map''). Comparisons between strength vs. accuracy-based fitness and complete vs. best action maps have since been examined in greater detail.<ref>Kovacs, Timothy Michael Douglas. ''A Comparison of Strength and Accuracy-based Fitness in Learning and Classifier Systems''. 2002.</ref><ref>[https://link.springer.com/chapter/10.1007/3-540-48104-4_6 Kovacs, Tim. "Two views of classifier systems." In ''International Workshop on Learning Classifier Systems'', pp. 74-87. Springer Berlin Heidelberg, 2001]</ref>
 
=== In the wake of XCS ===