Bioinformatics Open Source Conference: Difference between revisions

Content deleted Content added
Adding information about the 2023 conference
Added a section about the 2024 BOSC conference
Line 46:
 
== Conference Highlights ==
=== BOSC 2024 ===
 
The BOSC 2024 conference was a part of the [https://www.iscb.org/ismb2024/home|Intelligent Systems for Molecular Biology Conference of 2024]. The 2024 event also marked the 25th anniversary of the conference, which took place in '''Montreal, Canada'''.
 
The conference was held in a hybrid setting, with around 200 people attending in person and many others viewing the presentations online.
 
The conference covered a wide variety of topics, with the main theme focusing on approaches to using '''Artificial Intelligence''' and '''Machine Learning''' in '''Bioinformatics'''.
 
==== Event Highlights ====
 
The conference featured two keynote speakers.
 
One of them, Dr. Mélanie Courtot, gave a presentation titled '''"The Data Shows We Need Better Data"''' on day one of the conference. During her speech, she discussed some of the resources available to obtain quality free data and open-source software programs for conducting research. In addition, she introduced the '''TRUE principles''' for preparing data for AI tools. '''TRUE''' is an acronym standing for '''Tracked''', '''Reasonable''', '''Understandable''', and '''Ethical'''.
 
Dr. Courtot explained that tracked data for AI means that it should be known how the data was obtained, there should be evidence to support the claims of the data, and the authors who released the data should be properly credited. The final part of this principle is that the data should be computationally manageable.
 
The '''Reasonable''' component of the principle states that the data should be organized in a logical way so that new inferences and conclusions can be made from it.
 
The '''Understandable''' part dictates that the data should be able to be processed by open-source AI models. Some of the models she included in her presentation were [https://ai.meta.com/llama/ LLaMA] and [https://mistral.ai Mistral].
 
Finally, the '''Ethical''' principle emphasized that available data should promote diversity, equity, and inclusion, while maintaining the privacy of those the data may be linked to.
 
The next keynote speaker to present on day two was '''Andrew Su''', who gave a presentation titled '''"Open Data, Knowledge Graphs, and Large Language Models"'''. This presentation discussed how, despite the usefulness of large language models (LLMs) for retrieving data or answering specific questions, they are not always accurate and the responses they generate still need to be verified.
 
A solution he presented was [[Retrieval-augmented generation|Retrieval-Augmented Generation (RAG)]]. He explained this as a way to improve the accuracy of answers provided by LLMs by keeping the information they query well-organized.
 
Another topic in his presentation included tools that can be used to test the accuracy and rate the quality of answers obtained from LLMs.
 
==== Timeline ====
 
==== Day 1 ====
 
Other than the keynote speakers, there were a total of 36 talks and 23 posters selected to be presented at the conference. One of the sessions for day one was '''Data Analysis'''. These presentations were about open-source approaches to analyzing biomedical data, different types of data that are freely available for use, and some of the research that has been done using these open-source tools and data. Some of the presentations for this session included:
* '''"Gemma: Curation, Re-analysis and Dissemination of 18,000 Gene Expression Studies"''' by Paul Pavlidis
**[https://www.youtube.com/watch?v=vpqd5nt5Juc Recorded presentation]
* '''"ROC Picker: Propagating Statistical and Systematic Uncertainties in Biological Analysis"''' by Jeffery Roskes
**[https://www.youtube.com/watch?v=8tq_aBc1nh4 Recorded presentation]
* '''"Antimicrobial Resistance Prediction of Non-Tuberculosis Mycobacteria from Whole Genome Sequence Data"''' by Idowu Olawoye
**[https://www.youtube.com/watch?v=ya723kllWfo Recorded presentation]
 
The next session of day one was the '''Open Data Session''', which included presentations about some of the databases, data portals, and platforms that are being used by researchers around the world. Some of the presentations in this session were:
* '''"Creating an Open-source Data Platform"''' by Mitchell Shiell
**[https://www.youtube.com/watch?v=vv_gT6cwJPM Recorded presentation]
* '''"Going Viral: The Development of the VirusSeq Data Portal"''' by Justin Richardsson
**[https://www.youtube.com/watch?v=Qb9Kn75kkgA Recorded presentation]
* '''"intermine.bio2rdf.org: A QLever SPARQL Endpoint for InterMine Databases"''' by Francois Belleau
**[https://www.youtube.com/watch?v=YuCdruCgX_Y Recorded presentation]
 
The next session was '''Visualization''', which included presentations about new additions to older databases. Presentations in this session included:
* '''"Connecting Integrated Genome Browser to a Huge Genome Database Using Its Own API Solves One Problem and Creates Another"''' by Ann Loraine
**[https://www.youtube.com/watch?v=xT330tEGvJ8 Recorded presentation]
* '''"Collaborating Our Way to Optimal Integration Between Tripai 4 and JBrowse 2"''' by Carolyn T. Caron
**[https://www.youtube.com/watch?v=UDYQ6FlazZo Recorded presentation]
* '''"An Integrated Environment for Browsing 3-D Protein Structures and Multiple Sequence Alignment in JBrowse 2"''' by Colin Diesh
**[https://www.youtube.com/watch?v=EQmUowU6Y8A Recorded presentation]
 
The last session for day one was '''Developer Tools and Libraries''', displaying some of the open-source tools used for analyzing data. Some of the presentations in this session included:
* '''"Codefair: Make Biomedical Research FAIR Without Breaking a Sweat"''' by Bhavesh Patel
**[https://www.youtube.com/watch?v=8OBm0SsJw7s Recorded presentation]
* '''"An Open-source Ecosystem for Scalable and Computationally Efficient Nanopore Data Processing"''' by Avishai Weissberg
**[https://www.youtube.com/watch?v=VaSctMRQYxQ Recorded presentation]
* '''"Tattaki: Enhancing the Robustness of Bioinformatics Workflows with Simple, Tolerant File Format Detection"''' by Masaki Fuki
**[https://www.youtube.com/watch?v=7GGluYq7qD8 Recorded presentation]
 
====Day 2====
The first session of day 2 was '''“Standards and Frameworks for Open Science”'''. This session was all about how to create consistent, recyclable, and long lasting software. Presentations in this session included.
 
*"'''Enhancing Reproducibility in Immunogenetics: Leveraging Containerization Technology for Bioinformatics Workflows"''' by Rayo Suseno
**[https://www.youtube.com/watch?v=5k_32AYe-iw Recorded presentation]
*"'''Breaking the silo: composable bioinformatics through cross-disciplinary open standards"''' by Nezar Abdennur
**[https://www.youtube.com/watch?v=mzkE-O8Jrq0 Recorded presentation]
 
*'''"For long-term sustainable software in bioinformatics: a manifesto"''' by Luis Pedro Coelho
**[https://www.youtube.com/watch?v=u9h83qnCEsI Recorded presentation]
 
The next session was called '''“Open Approaches to AI/ML”''' , which was about how to use machine learning to solve biological problems. Presentations in this session included.
*"'''Gene Set Summarization Using Large Language Models"''' by Marcin Joanchimiak
**[https://www.youtube.com/watch?v=hRWbBkKqjakf Recorded presentation]
*"'''FAIR, modular and reproducible image-based ML workflows for biologists: a template and case study from imageomics"''' by Hilmar Lapp
**[https://www.youtube.com/watch?v=PUusHdapEss Recorded presentation]
*"'''Trust and Transparency in Reporting Machine Learning: The DOME-GigaScience Press Trial"''' by Chris Armi
**[https://www.youtube.com/watch?v=XDh9N4c68pA Recorded presentation]
 
====Open Panel Discussion====
 
The events of day two concluded with an open panel discussion titled '''“Open Source AI/ML: A Game Changer for Bioinformatics?”'''.
The researchers on the panel included Lawrence Hunter, Thomas Hervé Mboa Nkoudou, Mélanie Courtot, and Andrew Su. The moderator of the panel was Monica Munoz-Torres. This open discussion revolved around the potential gains and pitfalls of using AI and ML methods to conduct bioinformatic research.
 
Once each of the panelists had explained their positions, the discussion was opened to the audience. After a long discussion the sances of the panelists were split with half thinking the use of AI and ML in bioinformatics has been an important and bettering for the field while the other half were still weary of the potential harms of it.
<ref>{{cite web |title=BOSC 2024 |url=https://www.open-bio.org/events/bosc-2024/ |website=Open Bioinformatics Foundation |access-date=April 21, 2025}}</ref>
<ref>{{cite web |last=Courtot |first=Mélanie |title=BOSC keynote |url=https://courtotlab.genomeinformatics.org/2024/07/15/BOSC-keynote.html |website=Courtot Lab Genome Informatics |date=July 15, 2024 |access-date=April 21, 2025}}</ref>
<ref>{{cite web |title=BOSC 2024 Schedule |url=https://www.open-bio.org/events/bosc-2024/bosc-2024-schedule/ |website=Open Bioinformatics Foundation |access-date=April 21, 2025}}</ref>
<ref>{{cite journal |last=Harris |first=Nomi L. |last2=Hokamp |first2=Karsten |last3=Maia |first3=Jessica |last4=Ménager |first4=Hervé |last5=Munoz-Torres |first5=Monica C. |last6=Sawant |first6=Swapnil |last7=Unni |first7=Deepak |last8=Williams |first8=Jason |title=25 Years of BOSC, the Bioinformatics Open Source Conference [version 1; peer review: not peer reviewed] |journal=F1000Research |date=September 27, 2024 |volume=13 |pages=1100 |doi=10.12688/f1000research.156426.1 |url=https://f1000research.com/articles/13-1100 |access-date=April 21, 2025}}</ref>
 
=== BOSC 2023 ===