Alpha algorithm: Difference between revisions

Content deleted Content added
Ɯ (talk | contribs)
No edit summary
Citation bot (talk | contribs)
Added bibcode. Removed URL that duplicated identifier. Removed parameters. | Use this bot. Report bugs. | Suggested by Headbomb | Linked from Wikipedia:WikiProject_Academic_Journals/Journals_cited_by_Wikipedia/Sandbox | #UCB_webform_linked 109/1032
 
(54 intermediate revisions by 37 users not shown)
Line 1:
The '''α-algorithm''' or '''α-miner''' is an algorithm used in [[process mining]], aimed at reconstructing causality from a set of [[sequence of events|sequences of events]].
{{Userspace draft|source=ArticleWizard|date=April 2010}}
It was first put forward by [[Wil van der Aalst|van der Aalst]], Weijters and Măruşter.<ref>van der Aalst, W M P and Weijters, A J M M and Maruster, L (2004). "Workflow Mining: Discovering process models from event logs", ''IEEE Transactions on Knowledge and Data Engineering'', vol 16</ref> The goal of Alpha miner is to convert the event log into a workflow-net based on the relations between various activities in the event log. An event log is a multi-set of traces, and a trace is a sequence of activity names. Several extensions or modifications of it have since been presented, which will be listed below.
{{DISPLAYTITLE:α-algorithm}}
{{stub}}
 
Alpha miner was the first [[Business process discovery|process discovery]] algorithm ever proposed, and it gives a good overview of the aim of process discovery and how various activities within the process are executed. Alpha miner was also the basis for the development of many other process mining techniques such as [https://www.researchgate.net/publication/229124308_Process_Mining_with_the_Heuristics_Miner-algorithm heuristic miner], [https://www.researchgate.net/publication/220783863_Genetic_Process_Mining genetic mining] was developed based on the idea alpha miner is built on.<ref name=":02">{{Cite journal|last1=van der Aalst|first1=W.|last2=Weijters|first2=T.|last3=Maruster|first3=L.|date=September 2004|title=Workflow mining: discovering process models from event logs|journal=IEEE Transactions on Knowledge and Data Engineering|volume=16|issue=9|pages=1128–1142|doi=10.1109/TKDE.2004.47|bibcode=2004IDSO...16.1128V |s2cid=5282914 |issn=1558-2191}}</ref>
The '''α-algorithm''' is an algorithm used in [[process mining]].
It was first put forward by [[Wil van der Aalst|van der Aalst]], Weijter and Maruster <ref>van der Aalst, W M P and Weijter, A J M M and Maruster, L (2003). "Workflow Mining: Discovering process models from event logs", ''IEEE Transactions on Knowledge and Data Engineering'', vol 16</ref>. Several extensions or modifications of it have since been presented, which will be listed below.
 
==Short description==
It constructs [[petri nets|P/T nets]] with special properties ([[workflow nets]]) from event logs (as might be collected by an [[ERP]] system). Each transition in the net corresponds to an observed task.
The algorithm takes a workflow log <math>W\subseteq T^{*}</math> as input and results in a workflow net being constructed.
 
== Short description ==
The algorithm takes a workflow log <math>W\subseteq T^{*}</math> as input and results in a [[workflow net]] being constructed.
 
It does so by examining causal relationships observed between tasks. For example, one specific task might always precede another specific task in every execution trace, which would be useful information.
 
===Definitions used===
It can be shown <ref>van der Aalst et al. 2003</ref> that in the case of a complete workflow log generated by a [[sound SWF net]], the net generating it can be reconstructed.
 
=== Definitions used ===
* A '''workflow trace''' or '''execution trace''' is a [[String (computer science)|string]] over an [[alphabet (computer science)|alphabet]] <math>T</math> of ''tasks''.
* A '''workflow log''' is a set of workflow traces.
* A '''complete workflow log''' of a workflow net <math>N</math> is the set of firing sequences of <math>(N,m_i)</math> resulting in the end marking <math>m_o</math>.
<!--- need definitions of SWF net etc. --->
 
== DescriptionEvent log ==
Declaratively, the algorithm can be presented as follows.
Three sets of tasks are determined:
* <math>T_W</math> is the set of all tasks which occur in at least one trace
* <math>T_I</math> is the set of all tasks which occur trace-initially
* <math>T_O</math> is the set of all tasks which occur trace-terminally
 
Event log is the primary requirement for applying any process discovery algorithm. An event log consists of a unique identifier for a case, activity name describing the action occurring in the process and timestamp. An event log can be represented as a multi-set of activities. For the sake of simplicity the following example would use alphabetic letter to represent an activity. Consider an example event log shown in the following figure:
Basic ordering relations are determined (<math>\succ_{W}</math> first, the latter three can be constructed therefrom)
{| class="wikitable"
* <math>a \succ_W b</math> iff b directly precedes a in some trace
|+Sample event log
* <math>a\rightarrow_W b</math> iff <math>a\succ_Wb \wedge b\not\succ_Wa</math>
!Case id
* <math>a\#{}_Wb</math> iff <math>a\not\succ_Wb \wedge b\not\succ_Wa</math>
!Activity
* <math>a\Vert_Wb</math> iff <math>a\succ_Wb \wedge b\succ_Wa</math>
!Timestamp
|-
|1
|A
|2019/10/09 14:50:17.000
|-
|1
|B
|2019/10/09 15:50:17.000
|-
|1
|C
|2019/10/09 15:56:17.000
|-
|1
|D
|2019/10/10 13:50:17.000
|-
|2
|A
|2019/10/10 14:30:17.000
|-
|2
|C
|2019/10/10 14:50:14.000
|-
|2
|B
|2019/10/11 09:50:17.000
|-
|2
|D
|2019/10/11 10:50:17.000
|-
|3
|A
|2019/10/11 12:50:17.000
|-
|3
|E
|2019/10/21 14:50:17.000
|-
|3
|D
|2019/10/21 15:50:17.000
|}
An event log is a multi set of traces, and a trace is a sequence of activities. Thus, an event log such as above can be represented using the following notation:
 
<math>L1 = [<A,B,C,D>, <A,C,B,D>, <A,E,D>]</math>
Places are discovered. Each place is identified with a pair of ''sets of'' tasks, in order to keep the number of places low.
<!--- as would be the result of merging --->
* <math>Y_W</math> is the set of all maximal pairs <math>(A,B)</math> of sets of tasks such that
* Neither <math>A \times A</math> and <math>B \times B</math> contain any members of <math>\succ_W</math> and
* <math>A \times B</math> is a subset of <math>\rightarrow_W</math>
* <math>P_W</math> contains one place <math>p_{(A,B)}</math> for every member of <math>Y_W</math>,
plus the input place <math>i_W</math> and the output place <math>o_W</math>
 
Every event log can be boiled down into a multi-set of traces, and such traces can be further used to break down relationships between various activities in the process. According to the rules of alpha miner, activities belonging to various cases can have 4 types of relationships between them:<ref name=":02"/>
The flow relation <math>F_W</math> is the union of the following:
* <math>\{(a,p_{(A,B)}) | (A,B) \in Y_W \wedge a \in A\}</math>
* <math>\{(p_{(A,B)},v) | (A,B) \in Y_W \wedge b \in B\}</math>
* <math>\{(i_W,t) | t\in T_I\}</math>
* <math>\{(t,i_O) | t\in T_O\}</math>
 
* '''Direct Succession: x > y''' if and only if some relation x is directly following by y. In our example, we can consider that '''A > B''', '''A > E''', '''A > C.'''
The result is
* '''Causality: x → y''' iff x > y and not y > x. In our example, we can consider that '''A → E.'''
* a petri net structure <math>\alpha(W) = (P_W,T_W,F_W)</math>
* '''Parallel: x || y''' iff x > y and y > x. In our example, we have '''B || C.'''
* with one input place <math>i_W</math> and one output place <math>o_W</math>
* '''Choice: x # y''' iff not(x > y) and not(y > x). In our example, we have '''A # D.'''
 
== Patterns ==
<!--- show: which is indeed always a workflow net because there is always a <math>F_W</math>-path from <math>i_W</math> through every transition to <math>o_W</math> --->
[[File:Sequence_pattern_for_petri_nets.png|alt=Sequence Pattern|349x349px]] '''Sequence Pattern: A → B'''
 
[[File:Xor-split_pattern_for_petri_nets.png|alt=XOR-split Pattern|321x321px]] '''XOR-split Pattern: A → B, A → C,''' and '''B # C'''
<!--- consequence: short loops never coour. --->
 
[[File:And-split_pattern_for_petri_nets.png|alt=AND-split Pattern|321x321px]] '''AND-split Pattern: A → B, A → C,''' and '''B || C'''
== Limitations ==
Workflow nets may contain several types of constructs <ref>A. de Medeiros, A K and van der Aalst, W M P and Weijters, A J M M (2003). "Workflow Mining: Current Status and Future Directions". in: "volume 2888 of Lecture Notes in Computer Science", Springer-Verlag</ref> which the α-algorithm cannot rediscover.
 
==Description==
{{section stub}}
The alpha miner starts with converting an event log into directly-follows, sequence, parallel, and choice relations, and using them to create a petri net describing the process model. Initially the algorithm constructs a footprint matrix. Using the footprint matrix and the above shown pattern, one can construct a process model.
Based on the four relations described earlier a footprint based matrix is first discovered. Using the footprint based matrix places are discovered. Each place is identified with a pair of ''sets of'' tasks, in order to keep the number of places low.
{| class="wikitable"
|+Footprint matrix for the log L1
!
!a
!b
!c
!d
!e
|-
|'''a'''
|#
|'''→'''
|'''→'''
|#
|'''→'''
|-
|'''b'''
|←
|#
|<nowiki>||</nowiki>
|'''→'''
|#
|-
|'''c'''
|←
|<nowiki>||</nowiki>
|#
|'''→'''
|#
|-
|'''d'''
|#
|←
|←
|#
|←
|-
|'''e'''
|←
|#
|#
|'''→'''
|#
|}
* <math>Y_W</math> is the set of all pairs <math>(A,B)</math> of maximal sets of tasks such that
** Neither <math>A \times A</math> and <math>B \times B</math> contain any members of '''>''' and
** <math>A \times B</math> is a subset of '''→'''
* <math>P_W</math> contains one place <math>p_{(A,B)}</math> for every member of <math>Y_W</math>, plus the input place <math>i_W</math> and the output place <math>o_W</math>
 
The flow relation <math>F_W</math> is the union of the following:
Constructing <math>Y_W</math> takes exponential time in the number of tasks, since <math>\succ_W</math> is not constrained and arbitrary subsets of <math>T_W</math> must be considered.
* <math>\{(a,p_{(A,B)}) | (A,B) \in Y_W \wedge a \in A\}</math>
<!--- double-check, don't swallow! --->
* <math>\{(p_{(A,B)},b) | (A,B) \in Y_W \wedge b \in B\}</math>
* <math>\{(i_W,t) | t\in T_I\}</math>
* <math>\{(t,i_O) | t\in T_O\}</math>
 
The result is
== Extensions ==
* a [[Petri net]] structure <math>\alpha(W) = (P_W,T_W,F_W)</math>
{{section stub}}
* with one input place <math>i_W</math> and one output place <math>o_W</math>
for example <ref name="extending_the">A. de Medeiros, A K and van Dongen, B F and van der Aalst, W M P and Weijters, A (2004). "Process mining: extending the α-algorithm to mine short loops"</ref>
* because every transition of <math>T_W</math> is on a <math>F_W</math>-path from <math>i_W</math> to <math>o_W</math>, it is indeed a workflow net.
<ref name="wen2007mining">Wen, L and van der Aalst, W M P and Wang, J and Sun, J (2007). "Mining process models with non-free-choice constructs",
For the example given above, the following petri net would be resultant of the application of alpha miner.
"Data Mining and Knowledge Discovery" vol 15, p. 145--180, Springer-Verlag</ref>
 
[[File:Petri_net_for_alpha_miner_example.png|alt=Petri net for alpha miner example|448x448px]]
== References ==
{{Reflist}}
 
==Properties==
<!--- Categories --->
It can be shown <ref>van der Aalst et al. 2003</ref> that in the case of a complete workflow log generated by a [[sound SWF net]], the net generating it can be reconstructed. Complete means that its <math>\succ_W</math> relation is maximal. It is ''not'' required that all possible traces be present (which would be countably infinite for a net with a loop).
[[Category:Articles created via the Article Wizard]]
 
[[Category:Algorithms]]
==Limitations==
 
* '''Implicit places:''' Alpha miner cannot distinguish between implicit and required places and thus might result in additional non required places in the discovered petri net.<ref>{{Cite web|title=Discovering Petri Nets from Event Logs|url=https://www.researchgate.net/publication/265764352|access-date=2021-08-31|website=ResearchGate|language=en}}</ref>
* '''Loops:''' Alpha miner cannot discover loops of the length 1 and 2 in the process model.<ref name=":1">{{Cite web|title=Limitations of Alpha miner|url=https://courses.edsa-project.eu/pluginfile.php/281/mod_resource/content/0/17%20Alpha%20Algorithm%20-%20Limitations.pdf|url-status=live|archive-url=https://web.archive.org/web/20210831175934/https://courses.edsa-project.eu/pluginfile.php/281/mod_resource/content/0/17%20Alpha%20Algorithm%20-%20Limitations.pdf |archive-date=2021-08-31 }}</ref>
* Local dependencies are often missed in alpha miner.<ref name=":1" />
* '''Representational bias:''' Alpha miner can only discover petri net thus adding representational bias such as requirement on unique visible labels for every transition.<ref name=":1" />
 
==References==
{{Reflist}}
 
<!--- [[de:Alpha-Algorithmus]] --->
<!--- [[fr:Algorithme alpha]] --->
 
{{DEFAULTSORT:Alpha Algorithm}}
[[Category:Process mining]]
[[Category:Data mining algorithms]]