Sharpness aware minimization: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 22:32, 3 July 2025 edit Pichpich (talk \| contribs) Autopatrolled, Extended confirmed users, New page reviewers, Pending changes reviewers, Rollbackers 90,588 edits +Category:Machine learning algorithms; +Category:Optimization algorithms and methods using HotCat ← Previous edit		Latest revision as of 09:00, 27 July 2025 edit undo Citation bot (talk \| contribs) Bots 5,870,552 edits Removed URL that duplicated identifier. Removed access-date with no URL. Removed parameters. \| Use this bot. Report bugs. \| Suggested by Abductive \| Category:Orphaned articles from June 2025 \| #UCB_Category 462/875
(One intermediate revision by one other user not shown)
Line 1: {{Short description\|~~machine~~Machine learning optimization algorithm}} {{Multiple issues\| {{technical\|date=June 2025}} Line 43: Active research on SAM focuses on reducing its computational overhead and improving its performance. Several variants have been proposed to make the algorithm more efficient. These include methods that attempt to parallelize the two gradient computations, apply the perturbation to only a subset of parameters, or reduce the number of computation steps required.<ref name="Dou2022SAMPa">{{cite arXiv \|eprint=2410.10683 \|class=cs.LG \|first1=Wanyun \|last1=Xie \|first2=Thomas \|last2=Pethick \|title=SAMPa: Sharpness-aware Minimization Parallelized \|last3=Cevher \|first3=Volkan \|year=2022}}</ref><ref name="u277">{{citation \|last1=Mi \|first1=Peng \|title=Make Sharpness-Aware Minimization Stronger: A Sparsified Perturbation Approach \|date=2022 \|page= \|arxiv=2210.05177 \|last2=Shen \|first2=Li \|last3=Ren \|first3=Tianhe \|last4=Zhou \|first4=Yiyi \|last5=Sun \|first5=Xiaoshuai \|last6=Ji \|first6=Rongrong \|last7=Tao \|first7=Dacheng }}</ref><ref name="k651">{{cite conference \|last1=Ji \|first1=Jie \|last2=Li \|first2=Gen \|last3=Fu \|first3=Jingjing \|last4=Afghah \|first4=Fatemeh \|last5=Guo \|first5=Linke \|last6=Yuan \|first6=Xiaoyong \|last7=Ma \|first7=Xiaolong \|date=2025-06-05 \|title=Proceedings of the 38th International Conference on Neural Information Processing Systems \|url=https://dl.acm.org/doi/10.5555/3737916.3739321 \|publisher=Curran Associates Inc. \|publication-place=Red Hook, NY, USA \|volume=37 \|page= \|pages=44269–44290 \|isbn=979-8--33131438-5 \|access-date=2025-06-26}}</ref> Other approaches use historical gradient information or apply SAM steps intermittently to lower the computational burden.<ref name="Liu2022LookaheadSAM">{{cite conference \|last1=Yu \|first1=Runsheng \|last2=Zhang \|first2=Youzhi \|last3=Kwok \|first3=James \|year=2024 \|title=Improving Sharpness-Aware Minimization by Lookahead \|url=https://proceedings.mlr.press/v235/yu24q.html \|conference= \|book-title=International Conference on Learning Representations (ICLR) 2022}}</ref> To improve performance and robustness, variants have been developed that adapt the neighborhood size based on model parameter scales (Adaptive SAM or ASAM)<ref name="Kwon2021ASAM"/> or incorporate information about the curvature of the loss landscape (Curvature Regularized SAM or CR-SAM). Other research explores refining the perturbation step by focusing on specific components of the gradient or combining SAM with techniques like random smoothing.<ref name="m141">{{cite conference \|last1=Li \|first1=Tao \|last2=Zhou \|first2=Pan \|last3=He \|first3=Zhengbao \|last4=Cheng \|first4=Xinwen \|last5=Huang \|first5=Xiaolin \|title=2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) \|date=2024-06-16 \|chapter=Friendly Sharpness-Aware Minimization \|page= ~~\|chapter-url=https://ieeexplore.ieee.org/document/10657696~~ \|publisher=IEEE \|pages=5631–5640 \|doi=10.1109/CVPR52733.2024.00538 \|isbn=979-8-3503-5300-6 ~~\|access-date=2025-06-26\|chapter-url-access=subscription~~ }}</ref><ref name="t248">{{cite journal \|last1=Liu \|first1=Yong \|last2=Mai \|first2=Siqi \|last3=Cheng \|first3=Minhao \|last4=Chen \|first4=Xiangning \|last5=Hsieh \|first5=Cho-Jui \|last6=You \|first6=Yang \|date=2022-12-06 \|title=Random Sharpness-Aware Minimization \|url=https://papers.nips.cc/paper_files/paper/2022/hash/9b79416c0dc4b09feaa169ed5cdd63d4-Abstract-Conference.html \|journal=Advances in Neural Information Processing Systems \|volume=35 \|pages=24543–24556 \|access-date=2025-06-26}}</ref> Theoretical work continues to analyze the algorithm's behavior, including its implicit bias towards flatter minima and the development of broader frameworks for sharpness-aware optimization that use different measures of sharpness.