Transcriptomics technologies: Difference between revisions

Content deleted Content added
Dd659 (talk | contribs)
Principles and advances: Removed "linear" as this is often incorrect. Modified existing vague statement (cellular structures).
Line 178:
 
=== Image processing ===
[[File:Microarray and sequencing flow cell.svg|thumb|300x300px|''Microarray and sequencing flow cell''. Microarrays and RNA-seq rely on image analysis in different ways. In a microarray chip, each spot on a chip is a defined oligonucleotide probe, and fluorescence intensity directly detects the abundance of a specific sequence (Affymetrix). In a high-throughput sequencing flow cell, spots are sequenced one nucleotide at a time, with the colour at each round indicating the next nucleotide in the sequence (Illumina Hiseq). Other variations of these techniques use more or fewer colour channels.<ref name="Lowe_2017" /><ref>{{Cite journal|last1=Petrov|first1=Anton|last2=Shams|first2=Soheil | name-list-style = vanc |date=2004-11-01|title=Microarray Image Processing and Quality Control |journal=Journal of VLSI Signal Processing Systems for Signal, Image and Video Technology |volume=38|issue=3|pages=211–226|doi=10.1023/B:VLSI.0000042488.08307.ad |s2cid=31598448}}</ref>]]
Microarray [[image processing]] must correctly identify the [[regular grid]] of features within an image and independently quantify the fluorescence [[Luminous intensity|intensity]] for each feature. [[Visual artefact|Image artefacts]] must be additionally identified and removed from the overall analysis. Fluorescence intensities directly indicate the abundance of each sequence, since the sequence of each probe on the array is already known.<ref name="PetrovShams2004">{{cite journal | year=2004|title=Microarray Image Processing and Quality Control|journal=The Journal of VLSI Signal Processing-Systems for Signal, Image, and Video Technology|volume=38|issue=3|pages=211–226|doi=10.1023/B:VLSI.0000042488.08307.ad|last1=Petrov|first1=Anton |last2=Shams|first2=Soheil |s2cid=31598448| name-list-style = vanc }}</ref>
 
Line 267:
 
==== Quantification ====
[[File:Transcriptomes_heatmap_exampleTranscriptomes heatmap example.svg|thumb|upright=1.5|''[[Heatmap]] identification of gene co-expression patterns across different samples.'' Each column contains the measurements for gene expression change for a single sample. Relative gene expression is indicated by colour: high-expression (red), median-expression (white) and low-expression (blue). Genes and samples with similar expression profiles can be automatically grouped (left and top trees). Samples may be different individuals, tissues, environments or health conditions. In this example, expression of gene set 1 is high and expression of gene set 2 is low in samples 1, 2, and 3.<ref name="Lowe_2017" /><ref>{{cite journal | vauthors = Gehlenborg N, O'Donoghue SI, Baliga NS, Goesmann A, Hibbs MA, Kitano H, Kohlbacher O, Neuweger H, Schneider R, Tenenbaum D, Gavin AC | title = Visualization of omics data for systems biology | language = En | journal = Nature Methods | volume = 7 | issue = 3 Suppl | pages = S56–68 | date = March 2010 | pmid = 20195258 | doi = 10.1038/nmeth.1436 | s2cid = 205419270 }}</ref>]]
Quantification of sequence alignments may be performed at the gene, exon, or transcript level.<ref name="Thind">{{cite journal | vauthors = Thind AS, Monga I, Thakur PK, Kumari P, Dindhoria K, Krzak M, Ranson M, Ashford B| title = Demystifying emerging bulk RNA-Seq applications: the application and utility of bioinformatic methodology | journal = Briefings in Bioinformatics | volume = 22 | issue = 6 | date = Nov 2021 | pmid = 34329375 | doi = 10.1093/bib/bbab259}}</ref><ref name="#24020486" /> Typical outputs include a table of read counts for each feature supplied to the software; for example, for genes in a [[general feature format]] file. Gene and exon read counts may be calculated quite easily using HTSeq, for example.<ref name="#25260700">{{cite journal | vauthors = Anders S, Pyl PT, Huber W | title = HTSeq—a Python framework to work with high-throughput sequencing data | journal = Bioinformatics | volume = 31 | issue = 2 | pages = 166–9 | date = January 2015 | pmid = 25260700 | pmc = 4287950 | doi = 10.1093/bioinformatics/btu638 }}</ref> Quantitation at the transcript level is more complicated and requires probabilistic methods to estimate transcript isoform abundance from short read information; for example, using cufflinks software.<ref name="#20436464" /> Reads that align equally well to multiple locations must be identified and either removed, aligned to one of the possible locations, or aligned to the most probable ___location.
 
Some quantification methods can circumvent the need for an exact alignment of a read to a reference sequence altogether. The kallisto software method combines pseudoalignment and quantification into a single step that runs 2 orders of magnitude faster than contemporary methods such as those used by tophat/cufflinks software, with less computational burden.<ref name="#27043002">{{cite journal | vauthors = Bray NL, Pimentel H, Melsted P, Pachter L | title = Near-optimal probabilistic RNA-seq quantification | journal = Nature Biotechnology | volume = 34 | issue = 5 | pages = 525–7 | date = May 2016 | pmid = 27043002 | doi = 10.1038/nbt.3519 | s2cid = 205282743 }}</ref>
Line 325:
Transcriptomic profiling also provides crucial information on mechanisms of [[drug resistance]]. Analysis of over 1000 isolates of ''[[Plasmodium falciparum]]'', a virulent parasite responsible for malaria in humans,<ref name="Rich et al">{{cite journal | vauthors = Rich SM, Leendertz FH, Xu G, LeBreton M, Djoko CF, Aminake MN, Takang EE, Diffo JL, Pike BL, Rosenthal BM, Formenty P, Boesch C, Ayala FJ, Wolfe ND | title = The origin of malignant malaria | journal = Proceedings of the National Academy of Sciences of the United States of America | volume = 106 | issue = 35 | pages = 14902–7 | date = September 2009 | pmid = 19666593 | pmc = 2720412 | doi = 10.1073/pnas.0907740106 | bibcode = 2009PNAS..10614902R | doi-access = free }}</ref> identified that upregulation of the [[unfolded protein response]] and slower progression through the early stages of the asexual intraerythrocytic [[Plasmodium falciparum#Life cycle|developmental cycle]] were associated with [[Artemisinin#Resistance|artemisinin resistance]] in isolates from [[Southeast Asia]].<ref name="#25502316">{{cite journal | vauthors = Mok S, Ashley EA, Ferreira PE, Zhu L, Lin Z, Yeo T, Chotivanich K, Imwong M, Pukrittayakamee S, Dhorda M, Nguon C, Lim P, Amaratunga C, Suon S, Hien TT, Htut Y, Faiz MA, Onyamboko MA, Mayxay M, Newton PN, Tripura R, Woodrow CJ, Miotto O, Kwiatkowski DP, Nosten F, Day NP, Preiser PR, White NJ, Dondorp AM, Fairhurst RM, Bozdech Z | display-authors = 6 | title = Drug resistance. Population transcriptomics of human malaria parasites reveals the mechanism of artemisinin resistance | journal = Science | volume = 347 | issue = 6220 | pages = 431–5 | date = January 2015 | pmid = 25502316 | pmc = 5642863 | doi = 10.1126/science.1260403 | bibcode = 2015Sci...347..431M }}</ref>
 
The use of transcriptomics is also important to investigate responses in the marine environment.<ref name=":0"> {{Cite journal |last1=Page |first1=Tessa M. |last2=Lawley |first2=Jonathan W. |date=2022 |title=The Next Generation Is Here: A Review of Transcriptomic Approaches in Marine Ecology |journal=Frontiers in Marine Science |volume=9 |doi=10.3389/fmars.2022.757921 |issn=2296-7745|doi-access=free }}</ref> In marine ecology, "[[Stress (biology)|stress]]" and "[[adaptation]]" have been among the most common research topics, especially related to anthropogenic stress, such as [[global change]] and [[pollution]].<ref name=":0" /> Most of the studies in this area have been done in [[animal]]s, although [[invertebrate]]s have been underrepresented.<ref name=":0" /> One issue still is a deficiency in functional genetic studies, which hamper [[gene annotation]]s, especially for non-model species, and can lead to vague conclusions on the effects of responses studied.<ref name=":0" />
 
=== Gene function annotation ===
All transcriptomic techniques have been particularly useful in [[Gene annotation|identifying the functions of genes]] and identifying those responsible for particular phenotypes. Transcriptomics of ''Arabidopsis'' [[ecotype]]s that [[Hyperaccumulator|hyperaccumulate metals]] correlated genes involved in [[Bioinorganic chemistry#Metal ion transport and storage|metal uptake]], tolerance, and [[homeostasis]] with the phenotype.<ref name="#19192189">{{cite journal | vauthors = Verbruggen N, Hermans C, Schat H | title = Molecular mechanisms of metal hyperaccumulation in plants | journal = The New Phytologist | volume = 181 | issue = 4 | pages = 759–76 | date = March 2009 | pmid = 19192189 | doi = 10.1111/j.1469-8137.2008.02748.x | url = https://dipot.ulb.ac.be/dspace/bitstream/2013/58126/3/58126.pdf }}</ref> Integration of RNA-Seq datasets across different tissues has been used to improve annotation of gene functions in commercially important organisms (e.g. [[Cucumis sativus|cucumber]])<ref name="#22047402">{{cite journal | vauthors = Li Z, Zhang Z, Yan P, Huang S, Fei Z, Lin K | title = RNA-Seq improves annotation of protein-coding genes in the cucumber genome | journal = BMC Genomics | volume = 12 | pages = 540 | date = November 2011 | pmid = 22047402 | pmc = 3219749 | doi = 10.1186/1471-2164-12-540 | doi-access = free }}</ref> or threatened species (e.g. [[koala]]).<ref name="#25214207">{{cite journal | vauthors = Hobbs M, Pavasovic A, King AG, Prentis PJ, Eldridge MD, Chen Z, Colgan DJ, Polkinghorne A, Wilkins MR, Flanagan C, Gillett A, Hanger J, Johnson RN, Timms P | title = A transcriptome resource for the koala (Phascolarctos cinereus): insights into koala retrovirus transcription and sequence diversity | journal = BMC Genomics | volume = 15 | pages = 786 | date = September 2014 | issue = 1 | pmid = 25214207 | pmc = 4247155 | doi = 10.1186/1471-2164-15-786 | doi-access = free }}</ref>