Revision as of 14:34, 19 February 2020 edit GenEars (talk \| contribs) 12 edits m →Terminology: Link to Rule-based_machine_learning ← Previous edit		Revision as of 15:11, 22 February 2020 edit undo Chris the speller (talk \| contribs) Autopatrolled, Extended confirmed users, Pending changes reviewers 894,124 edits m replaced: time consuming → time-consuming Tag: AWB Next edit →
Line 22: ==== Matching ==== One of the most critical and often time -consuming elements of an LCS is the matching process. The first step in an LCS learning cycle takes a single training instance from the environment and passes it to [P] where matching takes place. In step two, every rule in [P] is now compared to the training instance to see which rules match (i.e. are contextually relevant to the current instance). In step three, any matching rules are moved to a ''match set'' [M]. A rule matches a training instance if all feature values specified in the rule condition are equivalent to the corresponding feature value in the training instance. For example, assuming the training instance is (001001 ~ 0), these rules would match: (###0## ~ 0), (00###1 ~ 0), (#01001 ~ 1), but these rules would not (1##### ~ 0), (000##1 ~ 0), (#0#1#0 ~ 1). Notice that in matching, the endpoint/action specified by the rule is not taken into consideration. As a result, the match set may contain classifiers that propose conflicting actions. In the fourth step, since we are performing supervised learning, [M] is divided into a correct set [C] and an incorrect set [I]. A matching rule goes into the correct set if it proposes the correct action (based on the known action of the training instance), otherwise it goes into [I]. In reinforcement learning LCS, an action set [A] would be formed here instead, since the correct action is not known. ==== Covering ==== Line 78: Interest in learning classifier systems was reinvigorated in the mid 1990s largely due to two events; the development of the [[Q-learning\|Q-Learning]] algorithm<ref>Watkins, Christopher John Cornish Hellaby. "Learning from delayed rewards." PhD diss., University of Cambridge, 1989.</ref> for [[reinforcement learning]], and the introduction of significantly simplified Michigan-style LCS architectures by Stewart Wilson.<ref name=":10">{{Cite journal\|last=Wilson\|first=Stewart W.\|date=1995-06-01\|title=Classifier Fitness Based on Accuracy\|journal=Evol. Comput.\|volume=3\|issue=2\|pages=149–175\|doi=10.1162/evco.1995.3.2.149\|issn=1063-6560\|citeseerx=10.1.1.363.2210}}</ref><ref name=":6">{{Cite journal\|last=Wilson\|first=Stewart W.\|date=1994-03-01\|title=ZCS: A Zeroth Level Classifier System\|journal=Evolutionary Computation\|volume=2\|issue=1\|pages=1–18\|doi=10.1162/evco.1994.2.1.1\|issn=1063-6560\|citeseerx=10.1.1.363.798}}</ref> Wilson's '''Zeroth-level Classifier System (ZCS)'''<ref name=":6" /> focused on increasing algorithmic understandability based on Hollands standard LCS implementation.<ref name=":4" /> This was done, in part, by removing rule-bidding and the internal message list, essential to the original BBA credit assignment, and replacing it with a hybrid BBA/[[Q-learning\|Q-Learning]] strategy. ZCS demonstrated that a much simpler LCS architecture could perform as well as the original, more complex implementations. However, ZCS still suffered from performance drawbacks including the proliferation of over-general classifiers. In 1995, Wilson published his landmark paper, "Classifier fitness based on accuracy" in which he introduced the classifier system '''XCS'''.<ref name=":10" /> XCS took the simplified architecture of ZCS and added an accuracy-based fitness, a niche GA (acting in the action set [A]), an explicit generalization mechanism called ''subsumption'', and an adaptation of the [[Q-learning\|Q-Learning]] credit assignment. XCS was popularized by its ability to reach optimal performance while evolving accurate and maximally general classifiers as well as its impressive problem flexibility (able to perform both [[reinforcement learning]] and [[supervised learning]]) . XCS later became the best known and most studied LCS algorithm and defined a new family of ''accuracy-based LCS''. ZCS alternatively became synonymous with ''strength-based LCS''. XCS is also important, because it successfully bridged the gap between LCS and the field of [[reinforcement learning]]. Following the success of XCS, LCS were later described as reinforcement learning systems endowed with a generalization capability.<ref>{{Cite journal\|last=Lanzi\|first=P. L.\|title=Learning classifier systems from a reinforcement learning perspective\|journal=Soft Computing\|language=en\|volume=6\|issue=3–4\|pages=162–170\|doi=10.1007/s005000100113\|issn=1432-7643\|year=2002}}</ref> [[Reinforcement learning]] typically seeks to learn a value function that maps out a complete representation of the state/action space. Similarly, the design of XCS drives it to form an all-inclusive and accurate representation of the problem space (i.e. a ''complete map'') rather than focusing on high payoff niches in the environment (as was the case with strength-based LCS). Conceptually, complete maps don't only capture what you should do, or what is correct, but also what you shouldn't do, or what's incorrect. Differently, most strength-based LCSs, or exclusively supervised learning LCSs seek a rule set of efficient generalizations in the form of a ''best action map'' (or a ''partial map''). Comparisons between strength vs. accuracy-based fitness and complete vs. best action maps have since been examined in greater detail.<ref>Kovacs, Timothy Michael Douglas. ''A Comparison of Strength and Accuracy-based Fitness in Learning and Classifier Systems''. 2002.</ref><ref>[https://link.springer.com/chapter/10.1007/3-540-48104-4_6 Kovacs, Tim. "Two views of classifier systems." In ''International Workshop on Learning Classifier Systems'', pp. 74-87. Springer Berlin Heidelberg, 2001]</ref> === In the wake of XCS ===

Learning classifier system: Difference between revisions