Sharpness aware minimization: Difference between revisions

Content deleted Content added
Citation bot (talk | contribs)
Removed URL that duplicated identifier. Removed access-date with no URL. Removed parameters. | Use this bot. Report bugs. | Suggested by Abductive | Category:Orphaned articles from June 2025 | #UCB_Category 462/875
 
(One intermediate revision by one other user not shown)
Line 1:
{{Short description|machineMachine learning optimization algorithm}}
{{Multiple issues|
{{technical|date=June 2025}}
Line 43:
Active research on SAM focuses on reducing its computational overhead and improving its performance. Several variants have been proposed to make the algorithm more efficient. These include methods that attempt to parallelize the two gradient computations, apply the perturbation to only a subset of parameters, or reduce the number of computation steps required.<ref name="Dou2022SAMPa">{{cite arXiv |eprint=2410.10683 |class=cs.LG |first1=Wanyun |last1=Xie |first2=Thomas |last2=Pethick |title=SAMPa: Sharpness-aware Minimization Parallelized |last3=Cevher |first3=Volkan |year=2022}}</ref><ref name="u277">{{citation |last1=Mi |first1=Peng |title=Make Sharpness-Aware Minimization Stronger: A Sparsified Perturbation Approach |date=2022 |page= |arxiv=2210.05177 |last2=Shen |first2=Li |last3=Ren |first3=Tianhe |last4=Zhou |first4=Yiyi |last5=Sun |first5=Xiaoshuai |last6=Ji |first6=Rongrong |last7=Tao |first7=Dacheng }}</ref><ref name="k651">{{cite conference |last1=Ji |first1=Jie |last2=Li |first2=Gen |last3=Fu |first3=Jingjing |last4=Afghah |first4=Fatemeh |last5=Guo |first5=Linke |last6=Yuan |first6=Xiaoyong |last7=Ma |first7=Xiaolong |date=2025-06-05 |title=Proceedings of the 38th International Conference on Neural Information Processing Systems |url=https://dl.acm.org/doi/10.5555/3737916.3739321 |publisher=Curran Associates Inc. |publication-place=Red Hook, NY, USA |volume=37 |page= |pages=44269–44290 |isbn=979-8--33131438-5 |access-date=2025-06-26}}</ref> Other approaches use historical gradient information or apply SAM steps intermittently to lower the computational burden.<ref name="Liu2022LookaheadSAM">{{cite conference |last1=Yu |first1=Runsheng |last2=Zhang |first2=Youzhi |last3=Kwok |first3=James |year=2024 |title=Improving Sharpness-Aware Minimization by Lookahead |url=https://proceedings.mlr.press/v235/yu24q.html |conference= |book-title=International Conference on Learning Representations (ICLR) 2022}}</ref>
 
To improve performance and robustness, variants have been developed that adapt the neighborhood size based on model parameter scales (Adaptive SAM or ASAM)<ref name="Kwon2021ASAM"/> or incorporate information about the curvature of the loss landscape (Curvature Regularized SAM or CR-SAM). Other research explores refining the perturbation step by focusing on specific components of the gradient or combining SAM with techniques like random smoothing.<ref name="m141">{{cite conference |last1=Li |first1=Tao |last2=Zhou |first2=Pan |last3=He |first3=Zhengbao |last4=Cheng |first4=Xinwen |last5=Huang |first5=Xiaolin |title=2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) |date=2024-06-16 |chapter=Friendly Sharpness-Aware Minimization |page= |chapter-url=https://ieeexplore.ieee.org/document/10657696 |publisher=IEEE |pages=5631–5640 |doi=10.1109/CVPR52733.2024.00538 |isbn=979-8-3503-5300-6 |access-date=2025-06-26|chapter-url-access=subscription }}</ref><ref name="t248">{{cite journal |last1=Liu |first1=Yong |last2=Mai |first2=Siqi |last3=Cheng |first3=Minhao |last4=Chen |first4=Xiangning |last5=Hsieh |first5=Cho-Jui |last6=You |first6=Yang |date=2022-12-06 |title=Random Sharpness-Aware Minimization |url=https://papers.nips.cc/paper_files/paper/2022/hash/9b79416c0dc4b09feaa169ed5cdd63d4-Abstract-Conference.html |journal=Advances in Neural Information Processing Systems |volume=35 |pages=24543–24556 |access-date=2025-06-26}}</ref>
 
Theoretical work continues to analyze the algorithm's behavior, including its implicit bias towards flatter minima and the development of broader frameworks for sharpness-aware optimization that use different measures of sharpness.