Bioinformatics workflow management system: Difference between revisions

Content deleted Content added
Examples: added a new example
Tags: Mobile edit Mobile web edit
Fix typo
 
(45 intermediate revisions by 22 users not shown)
Line 1:
#REDIRECT [[Scientific workflow system]] {{R from merge}}
{{Redirect|IDBS||IDB (disambiguation){{!}}IDB}}
 
{{Example farm|date=February 2012}}
A '''bioinformatics workflow management system''' is a specialized form of [[workflow management system]] designed specifically to compose and execute a series of computational or data manipulation steps, or a [[workflows|workflow]], that relate to [[bioinformatics]].
 
There are currently many different workflow systems. Some have been developed more generally as [[scientific workflow system]]s for use by scientists from many different disciplines like [[astronomy]] and [[earth science]]. All such systems are based on an abstract representation of how a computation proceeds in the form of a directed graph, where each node represents a task to be executed and edges represent either data flow or execution dependencies between different tasks. Each system typically provides a visual front-end, allowing the user to build and modify complex applications with little or no programming expertise.<ref>{{Cite journal | last1 = Oinn | first1 = T. | last2 = Greenwood | first2 = M. | last3 = Addis | first3 = M. | last4 = Alpdemir | first4 = M. N. | last5 = Ferris | first5 = J. | last6 = Glover | first6 = K. | last7 = Goble | first7 = C. | authorlink7 = Carole Goble| last8 = Goderis | first8 = A. | last9 = Hull | first9 = D. | doi = 10.1002/cpe.993 | last10 = Marvin | first10 = D. | last11 = Li | first11 = P. | last12 = Lord | first12 = P. | last13 = Pocock | first13 = M. R. | last14 = Senger | first14 = M. | last15 = Stevens | first15 = R. | last16 = Wipat | first16 = A. | last17 = Wroe | first17 = C. | title = Taverna: Lessons in creating a workflow environment for the life sciences | journal = Concurrency and Computation: Practice and Experience | volume = 18 | issue = 10 | pages = 1067–1100 | year = 2006 | pmid = | pmc = | url = https://eprints.soton.ac.uk/260908/1/taverna-ccpe-reviewed.pdf }}</ref><ref>{{Cite journal | last1 = Yu | first1 = J. | last2 = Buyya | first2 = R. | doi = 10.1145/1084805.1084814 | title = A taxonomy of scientific workflow systems for grid computing | journal = ACM SIGMOD Record | volume = 34 | issue = 3 | pages = 44 | year = 2005 | pmid = | pmc = | citeseerx = 10.1.1.63.3176 }}</ref><ref name="CIBEC 2008">{{Cite book | last1 = Curcin | first1 = V. | last2 = Ghanem | first2 = M. | title = Scientific workflow systems - can one size fit all? | doi = 10.1109/CIBEC.2008.4786077 | pages = 1–9 | year = 2008 | pmid = | pmc = | journal=2008 Cairo International Biomedical Engineering Conference| isbn = 978-1-4244-2694-2 }}</ref>
 
==Examples==
In alphabetical order, some examples of bioinformatics workflow management systems include:
* [[Anduril (workflow engine)|Anduril]] bioinformatics and image analysis<ref>{{Cite web|url=http://www.anduril.org|title=Anduril workflow website}}</ref><ref>{{Cite journal|last=Ovaska|first=Kristian|last2=Laakso|first2=Marko|last3=Haapa-Paananen|first3=Saija|last4=Louhimo|first4=Riku|last5=Chen|first5=Ping|last6=Aittomäki|first6=Viljami|last7=Valo|first7=Erkka|last8=Núñez-Fontarnau|first8=Javier|last9=Rantanen|first9=Ville|date=2010-09-07|title=Large-scale data integration framework provides a comprehensive view on glioblastoma multiforme|journal=Genome Medicine|volume=2|issue=9|pages=65|doi=10.1186/gm186|issn=1756-994X|pmc=3092116|pmid=20822536}}</ref>
* [[BioBIKE]]: a Web-based, programmable, integrated biological knowledge base<ref>{{Cite journal
| last1 = Elhai | first1 = J.
| last2 = Taton | first2 = A.
| last3 = Massar | first3 = J.
| last4 = Myers | first4 = J. K.
| last5 = Travers | first5 = M.
| last6 = Casey | first6 = J.
| last7 = Slupesky | first7 = M.
| last8 = Shrager | first8 = J.
| doi = 10.1093/nar/gkp354
| title = BioBIKE: A Web-based, programmable, integrated biological knowledge base
| journal = Nucleic Acids Research
| volume = 37
| issue = Web Server issue
| pages = W28–W32
| year = 2009
| pmid = 19433511
| pmc =2703918
}}</ref>
*[[CLC bio]], a bioinformatics analysis and workflow management platform from [[Qiagen|QIAGEN Digital Insights]].
* [https://nus.edu/3ipotMy CSI NGS Portal]: An online platform for automated NGS data analysis and sharing<ref>{{Cite journal
| last = An | first = Omer
| last2 = Tan | first2 = Kar-Tong
| last3 = Li | first3 = Ying
| last4 = Li | first4 = Jia
| last5 = Wu | first5 = Chan-Shuo
| last6 = Zhang | first6 = Bin
| last7 = Chen | first7 = Leilei
| last8 = Yang | first8 = Henry
| url = https://www.mdpi.com/1422-0067/21/11/3828
| doi = 10.3390/ijms21113828
| title = CSI NGS Portal: An Online Platform for Automated NGS Data Analysis and Sharing
| journal = Int. J. Mol. Sci.
| year = 2020
}}</ref>
*[[Cuneiform (programming language)|Cuneiform]]: A functional workflow language for large-scale data analysis<ref>{{Cite journal
| last1 = Brandt | first1 = Jörgen
| last2 = Bux | first2 = Marc N.
| last3 = Leser | first3 = Ulf
| title = Cuneiform: A functional language for large scale scientific data analysis
| journal = Proceedings of the Workshops of the EDBT/ICDT
| volume = 1330
| pages = 17–26
| year = 2015
| url = http://ceur-ws.org/Vol-1330/paper-03.pdf
}}</ref>
* [[Discovery Net]]: one of the earliest examples of a scientific workflow system, later commercialized as InforSense which was then acquired by IDBS.{{citation needed|date=September 2016}}
*[[Galaxy (computational biology)|Galaxy]]: initially targeted at [[genomics]]<ref>{{Cite journal
| last1 = Goecks | first1 = J.
| last2 = Nekrutenko | first2 = A.
| last3 = Taylor | first3 = J.
| last4 = Galaxy Team | first4 = T.
| title = Galaxy: A comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences
| doi = 10.1186/gb-2010-11-8-r86
| journal = Genome Biology
| volume = 11
| issue = 8
| pages = R86
| year = 2010
| pmid = 20738864
| pmc =2945788
}}</ref>
* [[GenePattern]]: A powerful scientific workflow system that provides access to hundreds of genomic analysis tools.<ref>{{Cite journal
| pmid = 16642009
| year = 2006
| author1 = Reich
| first1 = Michael
| title = GenePattern 2.0
| journal = Nature Genetics
| volume = 38
| issue = 1
| pages = 500–5001
| doi = 10.1038/ng0506-500
|display-authors=etal}}</ref>
* [[KNIME]] the Konstanz Information Miner<ref>{{Cite journal | doi = 10.1016/j.compbiolchem.2007.08.009| pmid = 17931570| title = Workflow based framework for life science informatics| journal = Computational Biology and Chemistry| year = 2007| volume=31| issue = 5–6| pages=305–319| last1 = Tiwari| first1 = Abhishek| last2 = Sekhar| first2 = Arvind K.T.}}</ref>
* [[OnlineHPC]] Online workflow designer based on [[Taverna workbench|Taverna]]{{citation needed|date=September 2016}}
*[[UGENE]] provides a workflow management system that is installed on a local computer<ref>{{Cite journal
| pmid = 22368248
| year = 2012
| author1 = Okonechnikov
| first1 = K
| title = Unipro UGENE: A unified bioinformatics toolkit
| journal = Bioinformatics
| volume = 28
| issue = 8
| pages = 1166–7
| last2 = Golosova
| first2 = O
| last3 = Fursov
| first3 = M
| last4 = Ugene
| first4 = Team
| doi = 10.1093/bioinformatics/bts091
| doi-access = free
}}</ref>
* [[VisTrails]]<ref>{{Cite book | doi = 10.1109/VISUAL.2005.1532788 | title = VisTrails: enabling interactive multiple-view visualizations| year = 2005 | journal=VIS 05. IEEE Visualization, 2005.| pages = 135–142| last1 = Bavoil| first1 = L.| last2 = Callahan| first2 = S.P.| last3 = Crossno| first3 = P.J.| last4 = Freire| first4 = J.| last5 = Scheidegger| first5 = C.E.| last6 = Silva| first6 = C.T.| last7 = Vo| first7 = H.T.| isbn = 978-0-7803-9462-9}}</ref>
 
==Comparisons between workflow systems==
With a large number of bioinformatics workflow systems to choose from<ref>{{cite web|url=https://s.apache.org/existing-workflow-systems|title=Existing Workflow systems|website=Common Workflow Language wiki|archive-url=https://web.archive.org/web/20191017094453/https://github.com/common-workflow-language/common-workflow-language/wiki/Existing-Workflow-systems|archive-date=2019-10-17|url-status=live|access-date=2019-10-17}}</ref>, it becomes difficult to understand and compare the features of the different workflow systems. There has been little work conducted in evaluating and comparing the systems from a bioinformatician's perspective, especially when it comes to comparing the data types they can deal with, the in-built functionalities that are provided to the user or even their performance or usability. Examples of existing comparisons include:
 
* The paper "Scientific workflow systems-can one size fit all?",<ref name="CIBEC 2008"/> which provides a high-level framework for comparing workflow systems based on their control flow and data flow properties. The systems compared include [[Discovery Net]], [[Taverna workbench|Taverna]], Triana, [[Kepler scientific workflow system|Kepler]] as well as Yawl and [[Business Process Execution Language|BPEL]].
* The paper "Meta-workflows: pattern-based interoperability between Galaxy and Taverna"<ref>
{{Cite book | last1 = Abouelhoda | first1 = M. | last2 = Alaa | first2 = S. | last3 = Ghanem | first3 = M. | doi = 10.1145/1833398.1833400 | chapter = Meta-workflows | title = Proceedings of the 1st International Workshop on Workflow Approaches to New Data-centric Science - Wands '10 | pages = 1 | year = 2010 | isbn = 9781450301886 | pmid = | pmc = }}</ref> which provides a more user-oriented comparison between [[Taverna workbench|Taverna]] and [[Galaxy (computational biology)|Galaxy]] in the context of enabling interoperability between both systems.
 
* The infrastructure paper "Delivering ICT Infrastructure for Biomedical Research"<ref>
{{Citation
| last1 = Nyrönen | first1 = TH
| last2 = Laitinen | first2 = J
| title = Delivering ICT infrastructure for biomedical research
| pages = 37–44
| series = Proceedings of the WICSA/ECSA 2012 Companion Volume (WICSA/ECSA '12)
| year = 2012
| publisher = ACM
| doi = 10.1145/2361999.2362006
|display-authors=etal| isbn = 9781450315685
}}
</ref> compares two workflow systems, [[Anduril (workflow engine)|Anduril]] and Chipster,<ref name=chipster>{{Cite journal
| pmid = 21999641
| pmc = 3215701
| year = 2011
| author1 = Kallio
| first1 = M. A.
| title = Chipster: User-friendly analysis software for microarray and other high-throughput data
| journal = BMC Genomics
| volume = 12
| pages = 507
| last2 = Tuimala
| first2 = J. T.
| last3 = Hupponen
| first3 = T
| last4 = Klemelä
| first4 = P
| last5 = Gentile
| first5 = M
| last6 = Scheinin
| first6 = I
| last7 = Koski
| first7 = M
| last8 = Käki
| first8 = J
| last9 = Korpelainen
| first9 = E. I.
| doi = 10.1186/1471-2164-12-507
}}</ref> in terms of infrastructure requirements in a cloud-delivery model.
 
* The paper "A review of bioinformatic pipeline frameworks"<ref>{{cite journal |last=Leipzig |first=J |date=2016 |title=A review of bioinformatic pipeline frameworks |url=http://bib.oxfordjournals.org/content/early/2016/03/23/bib.bbw020.full |journal=Briefings in Bioinformatics |volume=18 |issue=3 |pages=530–536 |access-date=23 March 2016 |doi=10.1093/bib/bbw020 |pmid=27013646 |pmc=5429012 |name-list-format=vanc }}
</ref> attempts to classify workflow management systems based on three dimensions: "using an implicit or explicit syntax, using a configuration, convention or class-based design paradigm and offering a command line or workbench interface".
 
==References==
{{Reflist}}
 
{{DEFAULTSORT:Bioinformatics workflow management systems}}
[[Category:Bioinformatics]]
[[Category:Bioinformatics software]]
[[Category:Workflow applications]]