Transcriptomics technologies: Difference between revisions

Content deleted Content added
Undid revision 1149303835 by 2001:648:2010:2107:809A:5135:DB63:F888 (talk)
m Cleaned up using AutoEd
Line 12:
Transcriptomics has been characterised by the development of new techniques which have redefined what is possible every decade or so and rendered previous technologies obsolete. The first attempt at capturing a partial human transcriptome was published in 1991 and reported 609 [[Messenger RNA|mRNA]] sequences from the [[human brain]].<ref name="ref2047873"/> In 2008, two human transcriptomes, composed of millions of transcript-derived sequences covering 16,000 genes, were published,<ref name="#18978789">{{cite journal | vauthors = Pan Q, Shai O, Lee LJ, Frey BJ, Blencowe BJ | title = Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing | journal = Nature Genetics | volume = 40 | issue = 12 | pages = 1413–5 | date = December 2008 | pmid = 18978789 | doi = 10.1038/ng.259 | s2cid = 9228930 }}</ref><ref name="#18599741">{{cite journal | vauthors = Sultan M, Schulz MH, Richard H, Magen A, Klingenhoff A, Scherf M, Seifert M, Borodina T, Soldatov A, Parkhomchuk D, Schmidt D, O'Keeffe S, Haas S, Vingron M, Lehrach H, Yaspo ML | display-authors = 6 | title = A global view of gene activity and alternative splicing by deep sequencing of the human transcriptome | journal = Science | volume = 321 | issue = 5891 | pages = 956–60 | date = August 2008 | pmid = 18599741 | doi = 10.1126/science.1160342 | bibcode = 2008Sci...321..956S | s2cid = 10013179 }}</ref> and by 2015 transcriptomes had been published for hundreds of individuals.<ref name="#24037378">{{cite journal | vauthors = Lappalainen T, Sammeth M, Friedländer MR, 't Hoen PA, Monlong J, Rivas MA, Gonzàlez-Porta M, Kurbatova N, Griebel T, Ferreira PG, Barann M, Wieland T, Greger L, van Iterson M, Almlöf J, Ribeca P, Pulyakhina I, Esser D, Giger T, Tikhonov A, Sultan M, Bertier G, MacArthur DG, Lek M, Lizano E, Buermans HP, Padioleau I, Schwarzmayr T, Karlberg O, Ongen H, Kilpinen H, Beltran S, Gut M, Kahlem K, Amstislavskiy V, Stegle O, Pirinen M, Montgomery SB, Donnelly P, McCarthy MI, Flicek P, Strom TM, Lehrach H, Schreiber S, Sudbrak R, Carracedo A, Antonarakis SE, Häsler R, Syvänen AC, van Ommen GJ, Brazma A, Meitinger T, Rosenstiel P, Guigó R, Gut IG, Estivill X, Dermitzakis ET | display-authors = 6 | title = Transcriptome and genome sequencing uncovers functional variation in humans | journal = Nature | volume = 501 | issue = 7468 | pages = 506–11 | date = September 2013 | pmid = 24037378 | pmc = 3918453 | doi = 10.1038/nature12531 | bibcode = 2013Natur.501..506L }}</ref><ref name="#25954002">{{cite journal | vauthors = Melé M, Ferreira PG, Reverter F, DeLuca DS, Monlong J, Sammeth M, Young TR, Goldmann JM, Pervouchine DD, Sullivan TJ, Johnson R, Segrè AV, Djebali S, Niarchou A, Wright FA, Lappalainen T, Calvo M, Getz G, Dermitzakis ET, Ardlie KG, Guigó R | display-authors = 6 | title = Human genomics. The human transcriptome across tissues and individuals | journal = Science | volume = 348 | issue = 6235 | pages = 660–5 | date = May 2015 | pmid = 25954002 | pmc = 4547472 | doi = 10.1126/science.aaa0355 | bibcode = 2015Sci...348..660M }}</ref> Transcriptomes of different [[disease]] states, [[Tissue (biology)|tissues]], or even single [[Cell (biology)|cells]] are now routinely generated.<ref name="#25954002" /><ref name="#24524133">{{cite journal | vauthors = Sandberg R | title = Entering the era of single-cell transcriptomics in biology and medicine | journal = Nature Methods | volume = 11 | issue = 1 | pages = 22–4 | date = January 2014 | pmid = 24524133 | doi = 10.1038/nmeth.2764 | s2cid = 27632439 | url = https://zenodo.org/record/890299 }}</ref><ref name="#26000846">{{cite journal | vauthors = Kolodziejczyk AA, Kim JK, Svensson V, Marioni JC, Teichmann SA | title = The technology and biology of single-cell RNA sequencing | journal = Molecular Cell | volume = 58 | issue = 4 | pages = 610–20 | date = May 2015 | pmid = 26000846 | doi = 10.1016/j.molcel.2015.04.005 | doi-access = free }}</ref> This explosion in transcriptomics has been driven by the rapid development of new technologies with improved sensitivity and economy.<ref name="#23290152">{{cite journal | vauthors = McGettigan PA | title = Transcriptomics in the RNA-seq era | journal = Current Opinion in Chemical Biology | volume = 17 | issue = 1 | pages = 4–11 | date = February 2013 | pmid = 23290152 | doi = 10.1016/j.cbpa.2012.12.008 }}</ref><ref name="#19015660">{{cite journal | vauthors = Wang Z, Gerstein M, Snyder M | title = RNA-Seq: a revolutionary tool for transcriptomics | journal = Nature Reviews Genetics | volume = 10 | issue = 1 | pages = 57–63 | date = January 2009 | pmid = 19015660 | pmc = 2949280 | doi = 10.1038/nrg2484 }}</ref><ref name="#21191423">{{cite journal | vauthors = Ozsolak F, Milos PM | title = RNA sequencing: advances, challenges and opportunities | journal = Nature Reviews Genetics | volume = 12 | issue = 2 | pages = 87–98 | date = February 2011 | pmid = 21191423 | pmc = 3031867 | doi = 10.1038/nrg2934 }}</ref><ref name="#19715439">{{cite journal | vauthors = Morozova O, Hirst M, Marra MA | title = Applications of new sequencing technologies for transcriptome analysis | journal = Annual Review of Genomics and Human Genetics | volume = 10 | pages = 135–51 | date = 2009 | pmid = 19715439 | doi = 10.1146/annurev-genom-082908-145957 }}</ref>
 
=== Before transcriptomics ===
Studies of individual [[Primary transcript|transcripts]] were being performed several decades before any transcriptomics approaches were available. [[CDNA library|Libraries]] of [[Antheraea polyphemus|silkmoth]] mRNA transcripts were collected and converted to [[complementary DNA]] (cDNA) for storage using [[reverse transcriptase]] in the late 1970s.<ref name="#519770">{{cite journal | vauthors = Sim GK, Kafatos FC, Jones CW, Koehler MD, Efstratiadis A, Maniatis T | title = Use of a cDNA library for studies on evolution and developmental expression of the chorion multigene families | journal = Cell | volume = 18 | issue = 4 | pages = 1303–16 | date = December 1979 | pmid = 519770 | doi = 10.1016/0092-8674(79)90241-1 | doi-access = free }}</ref> In the 1980s, low-throughput sequencing using the [[Sanger sequencing|Sanger]] method was used to sequence random transcripts, producing [[expressed sequence tag]]s (ESTs).<ref name="ref2047873">{{cite journal | vauthors = Adams MD, Kelley JM, Gocayne JD, Dubnick M, Polymeropoulos MH, Xiao H, Merril CR, Wu A, Olde B, Moreno RF | display-authors = 6 | title = Complementary DNA sequencing: expressed sequence tags and human genome project | journal = Science | volume = 252 | issue = 5013 | pages = 1651–6 | date = June 1991 | pmid = 2047873 | doi = 10.1126/science.2047873 | bibcode = 1991Sci...252.1651A | s2cid = 13436211 }}</ref><ref name="#6956902">{{cite journal | vauthors = Sutcliffe JG, Milner RJ, Bloom FE, Lerner RA | title = Common 82-nucleotide sequence unique to brain RNA | journal = Proceedings of the National Academy of Sciences of the United States of America | volume = 79 | issue = 16 | pages = 4942–6 | date = August 1982 | pmid = 6956902 | pmc = 346801 | bibcode = 1982PNAS...79.4942S | doi = 10.1073/pnas.79.16.4942 | doi-access = free }}</ref><ref name="#6687628">{{cite journal | vauthors = Putney SD, Herlihy WC, Schimmel P | title = A new troponin T and cDNA clones for 13 different muscle proteins, found by shotgun sequencing | journal = Nature | volume = 302 | issue = 5910 | pages = 718–21 | date = April 1983 | pmid = 6687628 | doi = 10.1038/302718a0 | bibcode = 1983Natur.302..718P | s2cid = 4364361 }}</ref><ref name="#9448457" /> The [[Sanger sequencing|Sanger method of sequencing]] was predominant until the advent of [[DNA sequencing#High-throughput methods|high-throughput methods]] such as [[sequencing by synthesis]] (Solexa/Illumina). [[Expressed sequence tag|ESTs]] came to prominence during the 1990s as an efficient method to determine the [[gene annotation|gene content]] of an organism without [[Whole genome sequencing|sequencing]] the entire [[genome]].<ref name="#9448457">{{cite journal | vauthors = Marra MA, Hillier L, Waterston RH | title = Expressed sequence tags—ESTablishing bridges between genomes | journal = Trends in Genetics | volume = 14 | issue = 1 | pages = 4–7 | date = January 1998 | pmid = 9448457 | doi = 10.1016/S0168-9525(97)01355-3 }}</ref> Amounts of individual transcripts were quantified using [[Northern blotting]], [[Reverse northern blot|nylon membrane arrays]], and later [[Reverse transcription polymerase chain reaction|reverse transcriptase quantitative PCR]] (RT-qPCR) methods,<ref name="#414220">{{cite journal | vauthors = Alwine JC, Kemp DJ, Stark GR | title = Method for detection of specific RNAs in agarose gels by transfer to diazobenzyloxymethyl-paper and hybridization with DNA probes | journal = Proceedings of the National Academy of Sciences of the United States of America | volume = 74 | issue = 12 | pages = 5350–4 | date = December 1977 | pmid = 414220 | pmc = 431715 | doi = 10.1073/pnas.74.12.5350 | bibcode = 1977PNAS...74.5350A | doi-access = free }}</ref><ref name="#2479917">{{cite journal | vauthors = Becker-André M, Hahlbrock K | title = Absolute mRNA quantification using the polymerase chain reaction (PCR). A novel approach by a PCR aided transcript titration assay (PATTY) | journal = Nucleic Acids Research | volume = 17 | issue = 22 | pages = 9437–46 | date = November 1989 | pmid = 2479917 | pmc = 335144 | doi = 10.1093/nar/17.22.9437 }}</ref> but these methods are laborious and can only capture a tiny subsection of a transcriptome.<ref name="#19715439" /> Consequently, the manner in which a transcriptome as a whole is expressed and regulated remained unknown until higher-throughput techniques were developed.
 
=== Early attempts ===
The word "transcriptome" was first used in the 1990s.<ref name="#10022985">{{cite journal | vauthors = Piétu G, Mariage-Samson R, Fayein NA, Matingou C, Eveno E, Houlgatte R, Decraene C, Vandenbrouck Y, Tahi F, Devignes MD, Wirkner U, Ansorge W, Cox D, Nagase T, Nomura N, Auffray C | title = The Genexpress IMAGE knowledge base of the human brain transcriptome: a prototype integrated resource for functional and computational genomics | journal = Genome Research | volume = 9 | issue = 2 | pages = 195–209 | date = February 1999 | pmid = 10022985 | pmc = 310711 | doi=10.1101/gr.9.2.195}}</ref><ref name="#9008165">{{cite journal | vauthors = Velculescu VE, Zhang L, Zhou W, Vogelstein J, Basrai MA, Bassett DE, Hieter P, Vogelstein B, Kinzler KW | title = Characterization of the yeast transcriptome | journal = Cell | volume = 88 | issue = 2 | pages = 243–51 | date = January 1997 | pmid = 9008165 | doi = 10.1016/S0092-8674(00)81845-0 | s2cid = 11430660 | doi-access = free }}</ref> In 1995, one of the earliest sequencing-based transcriptomic methods was developed, [[serial analysis of gene expression]] (SAGE), which worked by [[Sanger sequencing]] of concatenated random transcript fragments.<ref name="#7570003">{{cite journal | vauthors = Velculescu VE, Zhang L, Vogelstein B, Kinzler KW | title = Serial analysis of gene expression | journal = Science | volume = 270 | issue = 5235 | pages = 484–7 | date = October 1995 | pmid = 7570003 | doi = 10.1126/science.270.5235.484 | bibcode = 1995Sci...270..484V | s2cid = 16281846 }}</ref> Transcripts were quantified by matching the fragments to known genes. A variant of SAGE using high-throughput sequencing techniques, called digital gene expression analysis, was also briefly used.<ref name="#23290152" /><ref name="#9331369">{{cite journal | vauthors = Audic S, Claverie JM | title = The significance of digital gene expression profiles | journal = Genome Research | volume = 7 | issue = 10 | pages = 986–95 | date = October 1997 | pmid = 9331369 | doi = 10.1101/gr.7.10.986 | doi-access = free }}</ref> However, these methods were largely overtaken by high throughput sequencing of entire transcripts, which provided additional information on transcript structure such as [[alternative splicing|splice variants]].<ref name="#23290152" />
 
=== Development of contemporary techniques ===
{| class="wikitable floatright" style="width:500px"
|+ '''Comparison of contemporary methods'''<ref name="#25149683">{{cite journal | vauthors = Mantione KJ, Kream RM, Kuzelova H, Ptacek R, Raboch J, Samuel JM, Stefano GB | title = Comparing bioinformatic gene expression profiling methods: microarray and RNA-Seq | journal = Medical Science Monitor Basic Research | volume = 20 | pages = 138–42 | date = August 2014 | pmid = 25149683 | pmc = 4152252 | doi = 10.12659/MSMBR.892101 }}</ref><ref name="#24454679">{{cite journal | vauthors = Zhao S, Fung-Leung WP, Bittner A, Ngo K, Liu X | title = Comparison of RNA-Seq and microarray in transcriptome profiling of activated T cells | journal = PLOS ONE | volume = 9 | issue = 1 | pages = e78644 | date = 2014 | pmid = 24454679 | pmc = 3894192 | doi = 10.1371/journal.pone.0078644 | bibcode = 2014PLoSO...978644Z | doi-access = free }}</ref><ref name="#19015660" />
Line 24:
!RNA-Seq
!Microarray
|-
|[[Throughput]]
|1 day to 1 week per experiment<ref name="#19015660" />
|1–2 days per experiment<ref name="#19015660" />
|-
|[[Nucleic acid quantitation|Input RNA amount]]
|Low ~ 1 [[Orders of magnitude (mass)|ng]] total RNA<ref name="#22939981"/>
|High ~ 1 μg mRNA<ref name="#11015604">{{cite journal | vauthors = Stears RL, Getts RC, Gullans SR | title = A novel, sensitive detection system for high-density microarrays using dendrimer technology | journal = Physiological Genomics | volume = 3 | issue = 2 | pages = 93–9 | date = August 2000 | pmid = 11015604 | doi = 10.1152/physiolgenomics.2000.3.2.93 }}</ref>
|-
|Labour intensity
|High (sample preparation and data analysis)<ref name="#19015660" /><ref name="#25149683" />
Line 40:
|None required, although a reference genome/transcriptome sequence is useful<ref name="#25149683" />
|Reference genome/transcriptome is required for design of [[Molecular probe|probes]]<ref name="#25149683" />
|-
|[[Quantification (science)|Quantitation]] accuracy
|~90% (limited by sequence coverage)<ref name="EPR">{{Cite web|url=http://www.europeanpharmaceuticalreview.com/wp-content/uploads/Illumina_whitepaper.pdf|title=RNA-Seq Data Comparison with Gene Expression Microarrays|last=Illumina|publisher=European Pharmaceutical Review|date=2011-07-11}}</ref>
Line 148:
|[[ABI Solid Sequencing|SOLiD]]
|2008
|50 bp
|320 Gbp
|99.9%
Line 178:
 
=== Image processing ===
[[File:Microarray_and_sequencing_flow_cellMicroarray and sequencing flow cell.svg|thumb|300x300px|''Microarray and sequencing flow cell''. Microarrays and RNA-seq rely on image analysis in different ways. In a microarray chip, each spot on a chip is a defined oligonucleotide probe, and fluorescence intensity directly detects the abundance of a specific sequence (Affymetrix). In a high-throughput sequencing flow cell, spots are sequenced one nucleotide at a time, with the colour at each round indicating the next nucleotide in the sequence (Illumina Hiseq). Other variations of these techniques use more or fewer colour channels.<ref name="Lowe_2017" /><ref>{{Cite journal|last1=Petrov|first1=Anton|last2=Shams|first2=Soheil | name-list-style = vanc |date=2004-11-01|title=Microarray Image Processing and Quality Control |journal=Journal of VLSI Signal Processing Systems for Signal, Image and Video Technology |volume=38|issue=3|pages=211–226|doi=10.1023/B:VLSI.0000042488.08307.ad |s2cid=31598448}}</ref>]]
Microarray [[image processing]] must correctly identify the [[regular grid]] of features within an image and independently quantify the fluorescence [[Luminous intensity|intensity]] for each feature. [[Visual artefact|Image artefacts]] must be additionally identified and removed from the overall analysis. Fluorescence intensities directly indicate the abundance of each sequence, since the sequence of each probe on the array is already known.<ref name="PetrovShams2004">{{cite journal | year=2004|title=Microarray Image Processing and Quality Control|journal=The Journal of VLSI Signal Processing-Systems for Signal, Image, and Video Technology|volume=38|issue=3|pages=211–226|doi=10.1023/B:VLSI.0000042488.08307.ad|last1=Petrov|first1=Anton |last2=Shams|first2=Soheil |s2cid=31598448| name-list-style = vanc }}</ref>
 
Line 207:
|2008
|2011
|Low, single-threaded, high RAM requirement
|The original short read assembler. It is now largely superseded.
|-
Line 325:
Transcriptomic profiling also provides crucial information on mechanisms of [[drug resistance]]. Analysis of over 1000 isolates of ''[[Plasmodium falciparum]]'', a virulent parasite responsible for malaria in humans,<ref name="Rich et al">{{cite journal | vauthors = Rich SM, Leendertz FH, Xu G, LeBreton M, Djoko CF, Aminake MN, Takang EE, Diffo JL, Pike BL, Rosenthal BM, Formenty P, Boesch C, Ayala FJ, Wolfe ND | title = The origin of malignant malaria | journal = Proceedings of the National Academy of Sciences of the United States of America | volume = 106 | issue = 35 | pages = 14902–7 | date = September 2009 | pmid = 19666593 | pmc = 2720412 | doi = 10.1073/pnas.0907740106 | bibcode = 2009PNAS..10614902R | doi-access = free }}</ref> identified that upregulation of the [[unfolded protein response]] and slower progression through the early stages of the asexual intraerythrocytic [[Plasmodium falciparum#Life cycle|developmental cycle]] were associated with [[Artemisinin#Resistance|artemisinin resistance]] in isolates from [[Southeast Asia]].<ref name="#25502316">{{cite journal | vauthors = Mok S, Ashley EA, Ferreira PE, Zhu L, Lin Z, Yeo T, Chotivanich K, Imwong M, Pukrittayakamee S, Dhorda M, Nguon C, Lim P, Amaratunga C, Suon S, Hien TT, Htut Y, Faiz MA, Onyamboko MA, Mayxay M, Newton PN, Tripura R, Woodrow CJ, Miotto O, Kwiatkowski DP, Nosten F, Day NP, Preiser PR, White NJ, Dondorp AM, Fairhurst RM, Bozdech Z | display-authors = 6 | title = Drug resistance. Population transcriptomics of human malaria parasites reveals the mechanism of artemisinin resistance | journal = Science | volume = 347 | issue = 6220 | pages = 431–5 | date = January 2015 | pmid = 25502316 | pmc = 5642863 | doi = 10.1126/science.1260403 | bibcode = 2015Sci...347..431M }}</ref>
 
The use of transcriptomics is also important to investigate responses in the marine environment.<ref name=":0"> {{Cite journal |last1=Page |first1=Tessa M. |last2=Lawley |first2=Jonathan W. |date=2022 |title=The Next Generation Is Here: A Review of Transcriptomic Approaches in Marine Ecology |journal=Frontiers in Marine Science |volume=9 |doi=10.3389/fmars.2022.757921 |issn=2296-7745|doi-access=free }}</ref> In marine ecology, "[[Stress (biology)|stress]]" and "[[adaptation]]" have been among the most common research topics, especially related to anthropogenic stress, such as [[global change]] and [[pollution]].<ref name=":0" /> Most of the studies in this area have been done in [[Animal|animalsanimal]]s, although [[Invertebrate|invertebratesinvertebrate]]s have been underrepresented.<ref name=":0" /> One issue still is a deficiency in functional genetic studies, which hamper [[Genegene annotation|gene annotations]]s, especially for non-model species, and can lead to vague conclusions on the effects of responses studied.<ref name=":0" />
=== Gene function annotation ===
All transcriptomic techniques have been particularly useful in [[Gene annotation|identifying the functions of genes]] and identifying those responsible for particular phenotypes. Transcriptomics of ''Arabidopsis'' [[ecotype]]s that [[Hyperaccumulator|hyperaccumulate metals]] correlated genes involved in [[Bioinorganic chemistry#Metal ion transport and storage|metal uptake]], tolerance, and [[homeostasis]] with the phenotype.<ref name="#19192189">{{cite journal | vauthors = Verbruggen N, Hermans C, Schat H | title = Molecular mechanisms of metal hyperaccumulation in plants | journal = The New Phytologist | volume = 181 | issue = 4 | pages = 759–76 | date = March 2009 | pmid = 19192189 | doi = 10.1111/j.1469-8137.2008.02748.x | url = https://dipot.ulb.ac.be/dspace/bitstream/2013/58126/3/58126.pdf }}</ref> Integration of RNA-Seq datasets across different tissues has been used to improve annotation of gene functions in commercially important organisms (e.g. [[Cucumis sativus|cucumber]])<ref name="#22047402">{{cite journal | vauthors = Li Z, Zhang Z, Yan P, Huang S, Fei Z, Lin K | title = RNA-Seq improves annotation of protein-coding genes in the cucumber genome | journal = BMC Genomics | volume = 12 | pages = 540 | date = November 2011 | pmid = 22047402 | pmc = 3219749 | doi = 10.1186/1471-2164-12-540 }}</ref> or threatened species (e.g. [[koala]]).<ref name="#25214207">{{cite journal | vauthors = Hobbs M, Pavasovic A, King AG, Prentis PJ, Eldridge MD, Chen Z, Colgan DJ, Polkinghorne A, Wilkins MR, Flanagan C, Gillett A, Hanger J, Johnson RN, Timms P | title = A transcriptome resource for the koala (Phascolarctos cinereus): insights into koala retrovirus transcription and sequence diversity | journal = BMC Genomics | volume = 15 | pages = 786 | date = September 2014 | issue = 1 | pmid = 25214207 | pmc = 4247155 | doi = 10.1186/1471-2164-15-786 }}</ref>