Revision as of 06:48, 1 June 2023 edit Chompy Ace (talk \| contribs) Extended confirmed users, Pending changes reviewers, Rollbackers 93,428 edits No edit summary Tags: Mobile edit Mobile web edit ← Previous edit		Revision as of 14:06, 25 July 2023 edit undo 146.203.121.12 (talk) →Data analysis Next edit →
Line 175: == Data analysis == Transcriptomics methods are highly parallel and require significant computation to produce meaningful data for both microarray and RNA-Seq experiments.<ref name="Thind">{{cite journal \| vauthors = Thind AS, Monga I, Thakur PK, Kumari P, Dindhoria K, Krzak M, Ranson M, Ashford B\| title = Demystifying emerging bulk RNA-Seq applications: the application and utility of bioinformatic methodology \| journal = Briefings in Bioinformatics \| volume = 22 \| issue = 6 \| date = Nov 2021 \| pmid = 34329375 \| doi = 10.1093/bib/bbab259}}</ref><ref name="#25605792" /><ref name="#19910308" /><ref name="#25633503" /><ref>{{Cite book\|title=Bioinformatics and Computational Biology Solutions Using R and Bioconductor\|last=Smyth\|first=G. K.\|date=2005\|publisher=Springer, New York, NY\|isbn=9780387251462\|series=Statistics for Biology and Health\|pages=397–420\|language=en\|doi=10.1007/0-387-29362-0_23\|chapter = Limma: Linear Models for Microarray Data\|citeseerx = 10.1.1.361.8519}}</ref> Microarray data is recorded as [[Image resolution\|high-resolution]] images, requiring [[Feature detection (computer vision)\|feature detection]] and spectral analysis.<ref>{{Cite book\|title=Microarray Technology in Practice.\|last=Steve.\|first=Russell\|date=2008\|publisher=Elsevier\|others=Meadows, Lisa A.\|isbn=9780080919768\|___location=Burlington\|oclc=437246554}}</ref> Microarray raw image files are each about 750 MB in size, while the processed intensities are around 60 MB in size. Multiple short probes matching a single transcript can reveal details about the [[intron]]-[[exon]] structure, requiring statistical models to determine the authenticity of the resulting signal. RNA-Seq studies produce billions of short DNA sequences, which must be aligned to [[reference genome]]s composed of millions to billions of base pairs. [[De novo transcriptome assembly\|''De novo'' assembly of reads]] within a dataset requires the construction of highly complex [[sequence graph]]s.<ref name="#23845962" /> RNA-Seq operations are highly repetitious and benefit from [[Parallel computing\|parallelised computation]] but modern algorithms mean consumer computing hardware is sufficient for simple transcriptomics experiments that do not require ''de novo'' assembly of reads.<ref name="Pertea_2015" /> A human transcriptome could be accurately captured using RNA-Seq with 30 million 100 bp sequences per sample.<ref name="#23961961">{{cite journal \| vauthors = Hart SN, Therneau TM, Zhang Y, Poland GA, Kocher JP \| title = Calculating sample size estimates for RNA sequencing data \| journal = Journal of Computational Biology \| volume = 20 \| issue = 12 \| pages = 970–8 \| date = December 2013 \| pmid = 23961961 \| pmc = 3842884 \| doi = 10.1089/cmb.2012.0283 }}</ref><ref name="#26813401">{{cite journal \| vauthors = Conesa A, Madrigal P, Tarazona S, Gomez-Cabrero D, Cervera A, McPherson A, Szcześniak MW, Gaffney DJ, Elo LL, Zhang X, Mortazavi A \| title = A survey of best practices for RNA-seq data analysis \| journal = Genome Biology \| volume = 17 \| pages = 13 \| date = January 2016 \| pmid = 26813401 \| pmc = 4728800 \| doi = 10.1186/s13059-016-0881-8 }}</ref> This example would require approximately 1.8 gigabytes of disk space per sample when stored in a compressed [[FASTQ format\|fastq format]]. Processed count data for each gene would be much smaller, equivalent to processed microarray intensities. Sequence data may be stored in public repositories, such as the [[Sequence Read Archive]] (SRA).<ref name="#22009675">{{cite journal \| vauthors = Kodama Y, Shumway M, Leinonen R \| title = The Sequence Read Archive: explosive growth of sequencing data \| journal = Nucleic Acids Research \| volume = 40 \| issue = Database issue \| pages = D54–6 \| date = January 2012 \| pmid = 22009675 \| pmc = 3245110 \| doi = 10.1093/nar/gkr854 }}</ref> RNA-Seq datasets can be uploaded via the Gene Expression Omnibus.<ref name="#11752295" /> === Image processing ===

Transcriptomics technologies: Difference between revisions