Content deleted Content added
Chompy Ace (talk | contribs) No edit summary Tags: Mobile edit Mobile web edit |
|||
Line 175:
== Data analysis ==
Transcriptomics methods are highly parallel and require significant computation to produce meaningful data for both microarray and RNA-Seq experiments.<ref name="Thind">{{cite journal | vauthors = Thind AS, Monga I, Thakur PK, Kumari P, Dindhoria K, Krzak M, Ranson M, Ashford B| title = Demystifying emerging bulk RNA-Seq applications: the application and utility of bioinformatic methodology | journal = Briefings in Bioinformatics | volume = 22 | issue = 6 | date = Nov 2021 | pmid = 34329375 | doi = 10.1093/bib/bbab259}}</ref><ref name="#25605792" /><ref name="#19910308" /><ref name="#25633503" /><ref>{{Cite book|title=Bioinformatics and Computational Biology Solutions Using R and Bioconductor|last=Smyth|first=G. K.|date=2005|publisher=Springer, New York, NY|isbn=9780387251462|series=Statistics for Biology and Health|pages=397–420|language=en|doi=10.1007/0-387-29362-0_23|chapter = Limma: Linear Models for Microarray Data|citeseerx = 10.1.1.361.8519}}</ref> Microarray data is recorded as [[Image resolution|high-resolution]] images, requiring [[Feature detection (computer vision)|feature detection]] and spectral analysis.<ref>{{Cite book|title=Microarray Technology in Practice.|last=Steve.|first=Russell|date=2008|publisher=Elsevier|others=Meadows, Lisa A.|isbn=9780080919768|___location=Burlington|oclc=437246554}}</ref> Microarray raw image files are each about 750 MB in size, while the processed intensities are around 60 MB in size. Multiple short probes matching a single transcript can reveal details about the [[intron]]-[[exon]] structure, requiring statistical models to determine the authenticity of the resulting signal. RNA-Seq studies produce billions of short DNA sequences, which must be aligned to [[reference genome]]s composed of millions to billions of base pairs. [[De novo transcriptome assembly|''De novo'' assembly of reads]] within a dataset requires the construction of highly complex [[sequence graph]]s.<ref name="#23845962" /> RNA-Seq operations are highly repetitious and benefit from [[Parallel computing|parallelised computation]] but modern algorithms mean consumer computing hardware is sufficient for simple transcriptomics experiments that do not require ''de novo'' assembly of reads.<ref name="Pertea_2015" /> A human transcriptome could be accurately captured using RNA-Seq with 30 million 100 bp sequences per sample.<ref name="#23961961">{{cite journal | vauthors = Hart SN, Therneau TM, Zhang Y, Poland GA, Kocher JP | title = Calculating sample size estimates for RNA sequencing data | journal = Journal of Computational Biology | volume = 20 | issue = 12 | pages = 970–8 | date = December 2013 | pmid = 23961961 | pmc = 3842884 | doi = 10.1089/cmb.2012.0283 }}</ref><ref name="#26813401">{{cite journal | vauthors = Conesa A, Madrigal P, Tarazona S, Gomez-Cabrero D, Cervera A, McPherson A, Szcześniak MW, Gaffney DJ, Elo LL, Zhang X, Mortazavi A | title = A survey of best practices for RNA-seq data analysis | journal = Genome Biology | volume = 17 | pages = 13 | date = January 2016 | pmid = 26813401 | pmc = 4728800 | doi = 10.1186/s13059-016-0881-8 }}</ref> This example would require approximately 1.8 gigabytes of disk space per sample when stored in a compressed [[FASTQ format|fastq format]]. Processed count data for each gene would be much smaller, equivalent to processed microarray intensities. Sequence data may be stored in public repositories, such as the [[Sequence Read Archive]] (SRA).<ref name="#22009675">{{cite journal | vauthors = Kodama Y, Shumway M, Leinonen R | title = The Sequence Read Archive: explosive growth of sequencing data | journal = Nucleic Acids Research | volume = 40 | issue = Database issue | pages = D54–6 | date = January 2012 | pmid = 22009675 | pmc = 3245110 | doi = 10.1093/nar/gkr854 }}</ref> RNA-Seq datasets can be uploaded via the Gene Expression Omnibus.<ref name="#11752295" />
=== Image processing ===
|