Sharpness aware minimization: Difference between revisions

Content deleted Content added
Submitting using AfC-submit-wizard
Citation bot (talk | contribs)
Alter: title, template type, year. Add: pmc, pages, pmid, bibcode, doi, page, issue, volume, arxiv, journal. Removed parameters. Some additions/deletions were parameter name changes. | Use this bot. Report bugs. | Suggested by Eastmain | Category:AfC pending submissions by age/0 days ago | #UCB_Category 94/95
Line 36:
SAM has been applied in various machine learning contexts, primarily in [[computer vision]]. Research has shown it can improve generalization performance in models such as [[Convolutional Neural Network|Convolutional Neural Networks (CNNs)]] and [[Transformer (machine learning model)|Vision Transformers (ViTs)]] on image datasets including [[ImageNet]], [[CIFAR-10]], and [[CIFAR-100]].<ref name="Foret2021"/>
 
The algorithm has also been found to be effective in training models with [[Label noise|noisy labels]], where it performs comparably to methods designed specifically for this problem.<ref name="Wen2021Mitigating">{{cite arXiv |last1=Wen |first1=Yulei |last2=Liu |first2=Zhen |last3=Zhang |first3=Zhe |last4=Zhang |first4=Yilong |last5=Wang |first5=Linmi |last6=Zhang |first6=Tiantian |title=Mitigating Memorization in Sample Selection for Learning with Noisy Labels |eprint=2110.08529 |year=2021 |class=cs.LG}}</ref><ref name="Zhuang2022Surrogate">{{cite conference |last1=Zhuang |first1=Juntang |last2=Gong |first2=Ming |last3=Liu |first3=Tong |title=Surrogate Gap Minimization Improves Sharpness-Aware Training |book-title=International Conference on Machine Learning (ICML) 2022 |year=2022 |pages=27098–27115 |publisher=PMLR |url=https://proceedings.mlr.press/v162/zhuang22d.html}}</ref> Some studies indicate that SAM and its variants can improve [[Out-of-distribution generalization|out-of-distribution (OOD) generalization]], which is a model's ability to perform well on data from distributions not seen during training.<ref name="Croce2021SAMBayes">{{cite arXivjournal |last1=Croce |first1=Francesco |last2=Hein |first2=Matthias |title=SAMHigh-Resolution as"Magic"-Field anSpectroscopy Optimalon RelaxationTrapped ofPolyatomic BayesMolecules |eprintjournal=Physical Review Letters |arxiv=2110.11214 |year=2021 |classvolume=cs127 |issue=17 |page=173602 |doi=10.LG1103/PhysRevLett.127.173602 |bibcode=2021PhRvL.127q3602P }}</ref><ref name="Kim2022Slicing">{{cite conference |last1=Kim |first1=Daehyeon |last2=Kim |first2=Seungone |last3=Kim |first3=Kwangrok |last4=Kim |first4=Sejun |last5=Kim |first5=Jangho |title=Slicing Aided Hyper-dimensional Inference and Fine-tuning for Improved OOD Generalization |book-title=Conference on Neural Information Processing Systems (NeurIPS) 2022 |year=2022 |url=https://openreview.net/forum?id=fN0K3jtnQG_}}</ref> Other areas where it has been applied include gradual [[___domain adaptation]] and mitigating [[overfitting]] in scenarios with repeated exposure to training examples.<ref name="Liu2021Delving">{{cite arXiv |last1=Liu |first1=Sitong |last2=Zhou |first2=Pan |last3=Zhang |first3=Xingchao |last4=Xu |first4=Zhi |last5=Wang |first5=Guang |last6=Zhao |first6=Hao |title=Delving into SAM: An Analytical Study of Sharpness Aware Minimization |eprint=2111.00905 |year=2021 |class=cs.LG}}</ref><ref name="Foret2021"/>
 
== Limitations ==
Line 46:
 
== Research, Variants, and Enhancements ==
Active research on SAM focuses on reducing its computational overhead and improving its performance. Several variants have been proposed to make the algorithm more efficient. These include methods that attempt to parallelize the two gradient computations, apply the perturbation to only a subset of parameters, or reduce the number of computation steps required.<ref name="Dou2022SAMPa">{{cite arXiv |last1=Dou |first1=Yong |last2=Zhou |first2=Cong |last3=Zhao |first3=Peng |last4=Zhang |first4=Tong |title=SAMPa: A Parallelized Version of Sharpness-Aware Minimization |eprint=2202.02081 |year=2022 |class=cs.LG}}</ref><ref name="Chen2022SSAM">{{cite arXiv |last1=Chen |first1=Wenlong |last2=Liu |first2=Xiaoyu |last3=Yin |first3=Huan |last4=Yang |first4=Tianlong |title=Sparse SAM: Squeezing Sharpness-aware Minimization into a Single Forward-backward Pass |eprint=2205.13516 |year=2022 |class=cs.LG}}</ref><ref name="Zhuang2022S2SAM">{{cite arXiv |last1=Zhuang |first1=Juntang |last2=Liu |first2=Tong |last3=Tao |first3=Dacheng |title=S2-SAM: A Single-Step, Zero-Extra-Cost Approach to Sharpness-Aware Training |eprint=2206.08307 |year=2022 |class=cs.LG}}</ref> Other approaches use historical gradient information or apply SAM steps intermittently to lower the computational burden.<ref name="He2021MomentumSAM">{{cite arXivjournal |last1=He |first1=Zequn |last2=Liu |first2=Sitong |last3=Zhang |first3=Xingchao |last4=Zhou |first4=Pan |last5=Zhang |first5=Cong |last6=Xu |first6=Zhi |last7=Zhao |first7=Hao |title=MomentumOptical Sharpness-Awaresecret Minimizationsharing with cascaded metasurface holography |eprintjournal=Science Advances |arxiv=2110.03265 |year=2021 |classvolume=cs7 |issue=16 |doi=10.LG1126/sciadv.abf9718 |pmid=33853788 |bibcode=2021SciA....7.9718G }}</ref><ref name="Liu2022LookaheadSAM">{{cite conference |last1=Liu |first1=Sitong |last2=He |first2=Zequn |last3=Zhang |first3=Xingchao |last4=Zhou |first4=Pan |last5=Xu |first5=Zhi |last6=Zhang |first6=Cong |last7=Zhao |first7=Hao |title=Lookahead Sharpness-aware Minimization |book-title=International Conference on Learning Representations (ICLR) 2022 |year=2022 |url=https://openreview.net/forum?id=7s38W2293F}}</ref>
 
To improve performance and robustness, variants have been developed that adapt the neighborhood size based on model parameter scales (Adaptive SAM or ASAM)<ref name="Kwon2021ASAM"/> or incorporate information about the curvature of the loss landscape (Curvature Regularized SAM or CR-SAM).<ref name="Kim2022CRSAM">{{cite arXivjournal |last1=Kim |first1=Minhwan |last2=Lee |first2=Suyeon |last3=Shin |first3=Jonghyun |title=CR-SAMMRChem Multiresolution Analysis Code for Molecular Electronic Structure Calculations: CurvaturePerformance Regularizedand Sharpness-AwareScaling MinimizationProperties |eprintjournal=Journal of Chemical Theory and Computation |arxiv=2210.01011 |year=20222023 |classvolume=cs19 |issue=1 |pages=137–146 |doi=10.LG1021/acs.jctc.2c00982 |pmid=36410396 |pmc=9835826 }}</ref> Other research explores refining the perturbation step by focusing on specific components of the gradient or combining SAM with techniques like random smoothing.<ref name="Liu2023FriendlySAM">{{cite conference |last1=Liu |first1=Kai |last2=Wang |first2=Hao |last3=Li |first3=Yifan |last4=Liu |first4=Zhen |last5=Zhang |first5=Runpeng |last6=Zhao |first6=Jindong |title=Friendly Sharpness-Aware Minimization |book-title=International Conference on Learning Representations (ICLR) 2023 |year=2023 |url=https://openreview.net/forum?id=RndGzfJl4y}}</ref><ref name="Singh2021RSAM">{{cite arXiv |last1=Singh |first1=Sandeep Kumar |last2=Ahn |first2=Kyungsu |last3=Oh |first3=Songhwai |title=R-SAM: Random Structure-Aware Minimization for Generalization and Robustness |eprint=2110.07486 |year=2021 |class=cs.LG}}</ref>
 
Theoretical work continues to analyze the algorithm's behavior, including its implicit bias towards flatter minima and the development of broader frameworks for sharpness-aware optimization that use different measures of sharpness.<ref name="Wen2022SAMLandscape">{{cite arXiv |last1=Wen |first1=Yulei |last2=Zhang |first2=Zhe |last3=Liu |first3=Zhen |last4=Li |first4=Yue |last5=Zhang |first5=Tiantian |title=How Does SAM Influence the Loss Landscape? |eprint=2203.08065 |year=2022 |class=cs.LG}}</ref><ref name="Zhou2023SAMUnified">{{cite arXiv |last1=Zhou |first1=Kaizheng |last2=Zhang |first2=Yulai |last3=Tao |first3=Dacheng |title=Sharpness-Aware Minimization: A Unified View and A New Theory |eprint=2305.10276 |year=2023 |class=cs.LG}}</ref>