Adaptive sampling: Difference between revisions

Content deleted Content added
m grammar
m Small typo
Tags: Visual edit Mobile edit Mobile web edit
 
(23 intermediate revisions by 19 users not shown)
Line 1:
'''Adaptive sampling''' is an approach to [[Sampling (statistics)|sampling]] that uses heuristics to provide [[Efficiency (statistics)|efficiency]]. The term ''adaptive sampling'' represents a general approach to the problem of sampling, rather than being a special method itself, meaning it can be combined with suitable other approaches/methods.
'''Adaptive sampling''' is a technique used in computational [[molecular biology]] to efficiently simulate [[protein folding]].
 
In some real world problems, sampling is implicitly/explicitly needed and used to obtain practical solutions. The sampling process will need resources and efficient usage of these resources is usually crucial. This is why there are multiple sampling methods instead of the brute-force approach.
==Background==
Proteins spend a large portion;– nearly 96% in some cases<ref name="10.1016/j.sbi.2011.12.001"/> – of their [[protein folding|folding]] time "waiting" in various [[thermodynamic free energy]] minimas. Consequently, a straightforward simulation of this process would spend a great deal of computation to this state, with the transitions between the states – the aspects of protein folding of greater scientific interest – taking place only rarely.<ref name="Simulation FAQ"/> Adaptive sampling exploits this property to simulate the protein's [[phase space]] in between these states. Using adaptive sampling, molecular simulations that previously would have taken decades can be performed in a matter of weeks.<ref name="10.1016/j.sbi.2010.10.006"/>
 
Let f(x) be a function that is to be sampled. For simplicity, let C(x,'''s''') be the cost for sample x given the previous set of samples '''s''' (For simplicity, we can assume that C(x,'''s''') is constant since sampling cost usually does not depend on the previous samples and the sampling input x to the function. In time-critical systems, where the cost for each sample is strongly related to computation time; usually there are other parameters to the function C like the current time...); and G(x, '''s''') be the gain (anti-cost) from sampling the function at x, given the set of previous samples '''s'''. For example, it can be assumed that G(x, '''s''')=0 if x has already been sampled. The sampling problem is then maximizing our cumulative gain minus cumulative cost. Which usually comes down to sampling the function n times until the next sample's estimated/deterministic cost C(x,s) is larger than the gain G(x,s) of that sample.
==Theory==
 
Adaptive sampling then assumes that given necessary knowledge about the problem, there is a theoretically optimal sequence '''s''' of samples that will maximize the information (gain) induced by that sample; and it is possible to estimate '''s''' using [[Heuristic|heuristics]]. Adaptive sampling usually focuses on estimating the next optimal sample input x, given the previous set of samples. Thus, being adaptive to the current knowledge about the function.
 
== Computational Molecular Biology ==
In computational [[molecular biology]], adaptive sampling is used to efficiently simulate [[protein folding]] when coupled with molecular dynamics simulations.
 
=== Background ===
Proteins spend a large portion; – nearly 96% in some cases<ref name="10.1016/j.sbi.2011.12.001"/> – of their [[protein folding|folding]] time "waiting" in various [[thermodynamic free energy]] minimasminima. Consequently, a straightforward simulation of this process would spend a great deal of computation to this state, with the transitions between the states – the aspects of protein folding of greater scientific interest – taking place only rarely.<ref name="Simulation FAQ"/> Adaptive sampling exploits this property to simulate the protein's [[phase space]] in between these states. Using adaptive sampling, molecular simulations that previously would have taken decades can be performed in a matter of weeks.<ref name="10.1016/j.sbi.2010.10.006"/>
 
=== Theory ===
If a protein folds through the [[metastable state]]s A -> B -> C, researchers can calculate the length of the transition time between A and C by simulating the A -> B transition and the B -> C transition. The protein may fold through alternative routes which may overlap in part with the A -> B -> C pathway. Decomposing the problem in this manner is efficient because each step can be simulated in parallel.<ref name="10.1016/j.sbi.2010.10.006"/>
 
=== Applications ===
Adaptive sampling is used by the [[Folding@home]] distributed computing project in combination with [[Hidden markovMarkov model|Markov state models]].<ref name="Simulation FAQ"/><ref name="10.1016/j.sbi.2010.10.006"/>
 
=== Disadvantages ===
While adaptive sampling is useful for short simulations, longer trajectories may be more helpful for certain types of biochemical problems.<ref name="10.1145/1364782.1364802"/><ref name="10.1146/annurev-biophys-042910-155245"/>
 
=== See also ===
* [[Folding@home]]
* [[Hidden markovMarkov model]]
* [[Computational biology]]
* [[Molecular biology]]
Line 24 ⟶ 33:
| refs =
 
<ref name="10.1016/j.sbi.2011.12.001">{{cite journal | author = Robert B Best | title = Atomistic molecular simulations of protein folding | journal = Current Opinion in Structural Biology | year = 2012 | formattype = review | volume = 22 | issue = 1 | pages = 52–61 | doi = 10.1016/j.sbi.2011.12.001 | pmid = 22257762}}</ref>
 
<ref name="Simulation FAQ">{{cite web | url = http://folding.stanford.edu/English/FAQ-Simulation | title = Folding@home Simulation FAQ | author author1= TJ Lane, |author2=Gregory Bowman, |author3=Robert McGibbon, |author4=Christian Schwantes, |author5=Vijay Pande, and |author6=Bruce Borden | work = Folding@home | publisher = [[Stanford University]] | date = September 10, 2012 | accessdate access-date= September 10, 2012 | archiveurl archive-url= httphttps://wwwweb.webcitationarchive.org/6AqqrNstMweb/20120913150805/http://folding.stanford.edu/English/FAQ-Simulation | archivedate archive-date= September 21, 2012-09-13 | deadurl url-status=dead no}}</ref>
 
<ref name="10.1016/j.sbi.2010.10.006">{{cite journal | author author1= G. Bowman, |author2=V. Volez, and |author3=V. S. Pande | title = Taming the complexity of protein folding | journal = Current Opinion in Structural Biology | year = 2011 | volume = 21 | issue = 1 | pages = 4–11 | doi = 10.1016/j.sbi.2010.10.006 | pmc = 3042729 | pmid = 21081274}}</ref>
 
<ref name="10.1145/1364782.1364802">{{cite journal | author = David E. Shaw, |author2=Martin M. Deneroff, |author3=Ron O. Dror, |author4=Jeffrey S. Kuskin, |author5=Richard H. Larson, |author6=John K. Salmon, |author7=Cliff Young, |author8=Brannon Batson, |author9=Kevin J. Bowers, |author10=Jack C. Chao, |author11=Michael P. Eastwood, |author12=Joseph Gagliardo, |author13=J. P. Grossman, |author14=C. Richard Ho, |author15=Douglas J. Ierardi, Ist | title = Anton, A Special-Purpose Machine for Molecular Dynamics Simulation | journal = Communications of the ACM | volume = 51 | issue = 7 | pages = 91–97 | year = 2008 | pmid = | doi = 10.1145/1364782.1364802 | pmc doi-access=free }}</ref>
 
<ref name="10.1146/annurev-biophys-042910-155245">{{cite journal | title = Biomolecular Simulation: A Computational Microscope for Molecular Biology | author author1= Ron O. Dror, |author2=Robert M. Dirks, |author3=J.P. Grossman, |author4=Huafeng Xu, and |author5=David E. Shaw | journal = [[Annual Review of Biophysics]] | year = 2012 | volume = 41 | issue = | pagepages = 429–52 | doi = 10.1146/annurev-biophys-042910-155245 | bibcode pmid=22577825 }}</ref>
 
}}
 
==External links==
{{Empty section|date=September 2012|section=}}
 
[[Category:Molecular modelling]]
Line 46 ⟶ 52:
[[Category:Computational chemistry]]
[[Category:Hidden Markov models]]
 
{{stub}}