[[File:Global key-route main paths for a citation network.svg|thumb|'''Main path analysis uncovers the most significant paths, or citation chains, in a citation network.''' '''The figure shows the global key-route main paths (in red) for a sample citation network (based on search path count and at key-route 1).''']]
Main path analysis was first proposed by Hummon and Doreian.<ref name=":0">{{Cite journal|last=Hummon|first=Norman P.|last2=Doreian|first2=Patrick|date=|title=Connectivity in a citation network: The development of DNA theory|url=https://doi.org/10.1016/0378-8733(89)90017-8|journal=Social Networks|volume=11|issue=1|pages=39–63|doi=10.1016/0378-8733(89)90017-8|via=}}</ref>. It is a mathematical tool to identify the major paths in a [[citation network]], which is one form of a [[directed acyclic graph]] (DAG). The method begins by measuring the significance of all the links in a citation network through the concept of ‘traversal count’ and then sequentially chains the most significant links into a "main path", which is deemed the most significant historical path in the target [[citation network]]. The method is applicable to any human activity that can be organized in the form of a [[citation network]]. The method is commonly applied to trace the knowledge flow paths or development trajectories of a science or technology field, through bibliographic citations or patent citations.<ref name=":2">{{Cite journal|last=Liu|first=John S.|last2=Lu|first2=Louis Y.Y.|last3=Lu|first3=Wen-Min|last4=Lin|first4=Bruce J.Y.|title=Data envelopment analysis 1978–2010: A citation-based literature survey|url=https://doi.org/10.1016/j.omega.2010.12.006|journal=Omega|volume=41|issue=1|pages=3–15|doi=10.1016/j.omega.2010.12.006}}</ref><ref name="Verspagen 93–115">{{Cite journal|last=Verspagen|first=Bart|date=2007-03-01|title=Mapping technological trajectories as patent citation networks: a study on the history of fuel cell research|url=http://www.worldscientific.com/doi/abs/10.1142/S0219525907000945|journal=Advances in Complex Systems|volume=10|issue=01|pages=93–115|doi=10.1142/S0219525907000945|issn=0219-5259}}</ref><ref name=":3">{{Cite journal|last=Lucio-Arias|first=Diana|last2=Leydesdorff|first2=Loet|date=2008-10-01|title=Main-path analysis and path-dependent transitions in HistCite™-based historiograms|url=http://onlinelibrary.wiley.com/doi/10.1002/asi.20903/abstract|journal=Journal of the American Society for Information Science and Technology|language=en|volume=59|issue=12|pages=1948–1962|doi=10.1002/asi.20903|issn=1532-2890}}</ref>. It has also been applied to judicial decisions to trace the evolving changes of legal opinions.<ref name=":4">{{Cite journal|last=Liu|first=John S.|last2=Chen|first2=Hsiao-Hui|last3=Ho|first3=Mei Hsiu-Ching|last4=Li|first4=Yu-Chen|date=2014-12-01|title=Citations with different levels of relevancy: Tracing the main paths of legal opinions|url=http://onlinelibrary.wiley.com/doi/10.1002/asi.23135/abstract|journal=Journal of the Association for Information Science and Technology|language=en|volume=65|issue=12|pages=2479–2488|doi=10.1002/asi.23135|issn=2330-1643}}</ref>.
== History ==
== The method ==
Main path analysis operates in two steps. The first step obtains the traversal counts of each link in a citation network. Several types of traversal counts are mentioned in the literature. The second step searches for the main paths by linking the significant links according to the size of traversal counts. One needs to prepare a citation network before proceeding for main path analysis.
=== Preparing a citation network ===
It is necessary to prepare a [[citation network]] before starting main path analysis. In a citation network, the nodes represent the documents such as academic articles, patents, or legal cases. These nodes are connected using citation information. Citation networks are by nature directed because the two nodes on the opposite end of a link are not symmetrical in their roles. As regards to the direction, this article adopts the convention that the cited node points to the citing node, signifying the fact that knowledge in the cited node flows to the citing node. Citation network is also by nature acyclic, which means that a node can never chain back to itself if one moves along the links following their direction.
Several terms related to a citation network are defined here before proceeding further. Heads are the nodes the direction arrow leads to. Tails are the nodes on other ends of the direction arrow. Sources are the nodes that are cited but cite no others. Sinks cite other nodes but are not cited. Ancestors are the nodes that can be traced back to from a target node. Descendants are the nodes that one can reach from a target if one moves along the links following their direction.
[[File:SPC values for a citation network.png|thumb|Figure 1. SPC values for a sample citation network]]
=== Traversal counts ===
Traversal counts measure the significance of a link. The literature discusses several types of traversal counts, including search path count (SPC), search path link count (SPLC), search path node pair (SPNP), and other variations.<ref name=":5" />. All these traversal counts will be noted as SPX.
[[File:SPLC values for a citation network.png|thumb|Figure 2. SPLC values for a sample citation network]]
==== Search path count (SPC) ====
A link’s SPC is the number of times the link is traversed if one runs through all possible paths from all the sources to all the sinks. SPC is first proposed by [[Vladimir Batagelj]].<ref>Batagelj, V. (2003). Efficient algorithms for citation network analysis. ''arXiv preprint cs/0309023''.</ref>. SPC values for each link in a sample citation network is shown in Figure 1. The SPC value for the link (B, D) is 5 because five paths (B-D-F-H-K, B-D-F-I-L, B-D-F-I-M-N, B-D-I-L, and B-D-I-M-M) traverse through it.
[[File:SPNP values for a citation network.png|thumb|Figure 3. SPNP values for a sample citation network]]
==== Search path link count (SPLC) ====
A link’s SPLC is the number of times the link is traversed if one runs through all possible paths from all the ancestors of the tail node (including itself) to all the sinks. SPLC is first proposed by Hummon and Doreian.<ref name=":0" />. Figure 2 presents the SPLC values for each link in the same citation network as shown in Figure 1. Six paths traverse through the link (D, F) thus give it the SPLC value 6. They are: B-D-F-H-K, B-D-F-I-L, B-D-F-I-M-K, D-F-H-K, D-F-I-L, and D-F-I-M-K, noting that all the paths begin either from the ancestor of D, which is B, and D itself.
==== Search path node pair (SPNP) ====
A link’s SPNP is the number of times the link is traversed if one runs through all possible paths from all the ancestors of the tail node (including itself) to all the descendants of the head node (including itself). SPNP is first proposed by Hummon and Doreian.<ref name=":0" />. The SPNP values of the link (C, H) is 6 because there are 6 paths that begin from A, B, C (A and B are C's ancestors) and end at H and K (K is H's descendant). These paths are A-C-H, A-C-H-K, B-C-H, B-C-H-K, C-H, and C-H-K.
[[File:Local main paths SPC.png|thumb|Figure 4. Local main paths in a sample citation network]]
==== Key-route search ====
Key-route search is designed to avoid the problem of missing significant links in both the local and global search. The problem is in the local and global main paths shown above, in which one of the most important links (H, K) is not included in the main paths. As described in Liu and Lu (2012),<ref name=":1" />, the approach searches main paths from the specified links (key-routes) thus guarantees the inclusion of the links. One can also specify multiple links to obtain multiple main paths. An additional advantage of the key-route approach is that one is able to control the detail of the main paths by varying the number of key-routes. The larger the number of key-route is specified, the more detail is revealed. When the number of key-route increases to a certain point the search returns the whole citation network. Figure 6 and 7 show the local key-route and global key-route main paths of the sample citation network. In both main paths the number of key-route is set to 1, i.e., doing the search base on only the top links. Since there are two top links (B, D) and (H, K), the resulting main paths include both of them.
== The Variants ==
In addition to the key-route search approach, variations of the method include the approach that is aggregative and stochastic,<ref>{{Cite journal|last=Yeo|first=Woondong|last2=Kim|first2=Seonho|last3=Lee|first3=Jae-Min|last4=Kang|first4=Jaewoo|date=2014-01-01|title=Aggregative and stochastic model of main path identification: a case study on graphene|url=https://link.springer.com/article/10.1007/s11192-013-1140-3|journal=Scientometrics|language=en|volume=98|issue=1|pages=633–655|doi=10.1007/s11192-013-1140-3|issn=0138-9130}}</ref>, considers decay in knowledge diffusion,<ref name=":5">{{Cite journal|last=Liu|first=John S.|last2=Kuan|first2=Chung-Huei|date=2016-02-01|title=A new approach for main path analysis: Decay in knowledge diffusion|url=http://onlinelibrary.wiley.com/doi/10.1002/asi.23384/abstract|journal=Journal of the Association for Information Science and Technology|language=en|volume=67|issue=2|pages=465–476|doi=10.1002/asi.23384|issn=2330-1643}}</ref>, etc.
== Applications ==
=== Academic article ===
Academic citation databases such as [[Web of Science]] and [[Scopus]] include comprehensive digitized citation information. These information make it possible to apply main path analysis to examine the knowledge structure or trace the knowledge flow of any scientific fields. Some early applications explores the subject of centrality-productivity,<ref>{{Cite journal|last=Hummon|first=Norman P.|last2=Doreian|first2=Patrick|last3=Freeman|first3=Linton C.|date=2016-08-18|title=Analyzing the Structure of the Centrality-Productivity Literature Created Between 1948 and 1979|url=http://journals.sagepub.com/doi/10.1177/107554709001100405|journal=Knowledge|language=en|volume=11|issue=4|pages=459–480|doi=10.1177/107554709001100405}}</ref>, conflict resolution,<ref>{{Cite journal|last=Carley|first=Kathleen M.|last2=Hummon|first2=Norman P.|last3=Harty|first3=Martha|date=2016-08-17|title=Scientific Influence|url=https://doi.org/10.1177/107554709301400406|journal=Knowledge|language=en|volume=14|issue=4|pages=417–447|doi=10.1177/107554709301400406}}</ref>, etc. More recent applications include fullerenes,<ref name=":3" />, nanotubes,<ref name=":3" />, data envelopment analysis,<ref name=":2" /><ref>{{Cite journal|last=Liu|first=John S.|last2=Lu|first2=Louis Y.Y.|last3=Lu|first3=Wen-Min|title=Research fronts in data envelopment analysis|url=https://doi.org/10.1016/j.omega.2015.04.004|journal=Omega|volume=58|pages=33–45|doi=10.1016/j.omega.2015.04.004}}</ref><ref>{{Cite journal|last=Liu|first=John S.|last2=Lu|first2=Louis Y.Y.|last3=Lu|first3=Wen-Min|last4=Lin|first4=Bruce J.Y.|title=A survey of DEA applications|url=https://doi.org/10.1016/j.omega.2012.11.004|journal=Omega|volume=41|issue=5|pages=893–902|doi=10.1016/j.omega.2012.11.004}}</ref>, supply chain management,<ref>{{Cite journal|last=Claudia Colicchia|last2=Fernanda Strozzi|date=2012-06-15|title=Supply chain risk management: a new methodology for a systematic literature review|url=http://www.emeraldinsight.com/doi/abs/10.1108/13598541211246558|journal=Supply Chain Management: An International Journal|volume=17|issue=4|pages=403–418|doi=10.1108/13598541211246558|issn=1359-8546}}</ref>, corporate social responsibility,<ref>{{Cite journal|last=Lu|first=Louis Y.Y.|last2=Liu|first2=John S.|date=2014-03-01|title=The Knowledge Diffusion Paths of Corporate Social Responsibility – From 1970 to 2011|url=http://onlinelibrary.wiley.com/doi/10.1002/csr.1309/abstract|journal=Corporate Social Responsibility and Environmental Management|language=en|volume=21|issue=2|pages=113–128|doi=10.1002/csr.1309|issn=1535-3966}}</ref>, IT outsourcing,<ref>{{Cite journal|last=Liang|first=Huigang|last2=Wang|first2=Jian-Jun|last3=Xue|first3=Yajiong|last4=Cui|first4=Xiaocong|title=IT outsourcing research from 1992 to 2013: A literature review based on main path analysis|url=http://dx.doi.org/10.1016/j.im.2015.10.001|journal=Information & Management|volume=53|issue=2|pages=227–251|doi=10.1016/j.im.2015.10.001}}</ref>, medical tourism,<ref>{{Cite journal|last=Chuang|first=Thomas C.|last2=Liu|first2=John S.|last3=Lu|first3=Louis Y.Y.|last4=Lee|first4=Yachi|title=The main paths of medical tourism: From transplantation to beautification|url=https://doi.org/10.1016/j.tourman.2014.03.016|journal=Tourism Management|volume=45|pages=49–58|doi=10.1016/j.tourman.2014.03.016}}</ref>, etc.
=== Patent ===
Patents referencing prior arts is a common practice. For example, each United States patent document includes a "References Cited" section that lists the prior arts of the patent. Patent databases such as [[Clarivate Analytics]] and Webpat provide digitized patent citation information. Verspagen (2007)<ref>{{Cite journal|lastname="Verspagen|first=Bart|date=2007-03-01|title=Mapping technological trajectories as patent citation networks: a study on the history of fuel cell research|url=http://www.worldscientific.com/doi/abs/10.1142/S0219525907000945|journal=Advances in Complex Systems|volume=10|issue=01|pages=93–115|doi=10.1142/S0219525907000945|issn=0219-5259}}<"/ref> and Mina (2007)<ref>{{Cite journal|last=Mina|first=A.|last2=Ramlogan|first2=R.|last3=Tampubolon|first3=G.|last4=Metcalfe|first4=J.S.|title=Mapping evolutionary trajectories: Applications to the growth and transformation of medical knowledge|url=https://doi.org/10.1016/j.respol.2006.12.007|journal=Research Policy|volume=36|issue=5|pages=789–806|doi=10.1016/j.respol.2006.12.007}}</ref> are the two early works that apply main path analysis to the patent data.
=== Judicial document ===
After traversal counts are computed, the following command sequences find the main paths.
For local main paths
''<small>Network → Acyclic Network → Create (Sub)Network → Main Paths → Local Search → Forward</small>''
== External links ==
* [http://mrvar.fdv.uni-lj.si/pajek/ Pajek], a free social network analysis software.
__FORCETOC__
|