Software composition analysis: Difference between revisions

Content deleted Content added
AnomieBOT (talk | contribs)
m Dating maintenance tags: {{Citation needed}}
No edit summary
 
(14 intermediate revisions by 7 users not shown)
Line 1:
{{Use dmy dates|date=February 2023}}
{{Short description|SoftwareExamining Compositionthe Analysisembedded components of software}}
 
'''Software composition analysis''' (SCA) is a practice in the fields of Information technology and software engineering for analyzing custom-built software applications to detect embedded open-source software and detect if they are up-to-date, contain security flaws, or have licensing requirements.<ref>
{{Cite journal
|last1=Prana|first1=Gede Artha Azriadi
|last2=Sharma|first2=Abhishek
|last3=Shar|first3=Lwin Khin
|last4=Foo|first4=Darius
|last5=Santosa|first5=Andrew E
|last6=Sharma|first6=Asankhaya
|last7=Lo|first7=David
|date=July 2021
|title= Out of sight, out of mind? How vulnerable dependencies affect open-source projects
|journal=Empirical Software Engineering
|volume=26
|issue=4
|pages=1–34
|article-number=59
|publisher=Springer
|doi=10.1007/s10664-021-09959-3
|s2cid=197679660
|url=https://ink.library.smu.edu.sg/sis_research/6048
}}</ref>
 
==Background==
It is a common software engineering practice to develop software by using different components.<ref>
{{Cite journal
Line 15 ⟶ 38:
|doi=10.1145/210376.210389
|s2cid=17612128
|url=https://hdl.acm.org/doi/pdf/10.1145/210376.210389
|doi-access=free
}}</ref> Using [[Component-based_software_engineering#Software_component|software components]] segments the complexity of larger elements into smaller pieces of code and increases flexibility by enabling easier reuse of components to address new requirements.<ref>
Line 24 ⟶ 46:
|title= Object-oriented software composition
|pages=3–28
|publisher=Prentice Hall International (UK) Ltd.
|citeseerx=10.1.1.90.8174
}}</ref> The practice has widely expanded since the late 1990s with the popularization of [[open-source software]] (OSS) to help speed up the software development process and reduce time to market.<ref>
Line 35 ⟶ 57:
|title= Open source clustering software
|journal=Bioinformatics
|volume=20
|issue=9
|pages=1453–1454
|doi=10.1093/bioinformatics/bth078
|publisher=Oxford University Press
|pmid=14871861
|bibcode=2004Bioin..20.1453D
|citeseerx=10.1.1.114.3335
}}</ref>
Line 84 ⟶ 110:
SCA strives to detect all the 3rd party components in use within a software application to help reduce risks associated with security vulnerabilities, IP licensing requirements, and obsolescence of components being used.
 
==Principle of operation==
==Overview==
'''Software composition analysis''' (SCA) is a practice in the fields of Information technology and software engineering for analyzing custom-built software applications to detect embedded open-source software and detect if they are up-to-date, contain security flaws, or have licensing requirements.<ref>
{{Cite journal
|last1=Prana|first1=Gede Artha Azriadi
|last2=Sharma|first2=Abhishek
|last3=Shar|first3=Lwin Khin
|last4=Foo|first4=Darius
|last5=Santosa|first5=Andrew E
|last6=Sharma|first6=Asankhaya
|last7=Lo|first7=David
|date=July 2021
|title= Out of sight, out of mind? How vulnerable dependencies affect open-source projects
|journal=Empirical Software Engineering
|volume=26
|issue=4
|pages=1–34
|publisher=Springer
|doi=10.1007/s10664-021-09959-3
|s2cid=197679660
|url=https://ink.library.smu.edu.sg/sis_research/6048
}}</ref>
 
SCA products typically work as follows:<ref>
Line 115 ⟶ 121:
|issue=10
|pages=262–264
|publisher=IEEE
|doi=10.1109/MC.2020.3011082
|bibcode=2020Compr..53j.105O
|s2cid=222232127
|url=https://ieeexplore.ieee.org/document/9206429
|doi-access=free
}}</ref>
* An engine scans the software source code, and the associated artifacts used to compile a software application.
* The engine identifies the OSS components and their versions and usually storestores this information in a database creating a catalog of OSS in use in the scanned application.
* This catalog is then compared to databases referencing known security vulnerabilities for each component, the licensing requirements for using the component, and the historical versions of the component.{{citation needed| reason=reference was to blog|date=January 2024}} For security vulnerability detection, this comparison is typically made against known security vulnerabilities (CVEs) that are tracked in the [[National Vulnerability Database]] (NVD). Some products use an additional proprietary database of vulnerabilities. For [[Legal_governance,_risk_management,_and_compliance#Legal_compliance|IP / Legal Compliance]], SCA products will extract and evaluate the type of licensing used for the OSS component.<ref>
{{Cite conference
|last1=Chen|first1=Yang
|last2=Santosa|first2=Andrew E
|last3=Yi|first3=Ang Ming
|last4=Sharma|first4=Abhishek
|last5=Sharma|first5=Asankhaya
|last6=Lo|first6=David
|date=2020
|title=A Machine Learning Approach for Vulnerability Curation
|conference=Proceedings of the 17th International Conference on Mining Software Repositories
|pages=32–42
|doi=10.1145/3379597.3387461
}}</ref> For security vulnerability detection, this comparison is typically made against known security vulnerabilities (CVEs) that are tracked in the [[National Vulnerability Database]] (NVD). Some products use an additional proprietary database of vulnerabilities. For [[Legal_governance,_risk_management,_and_compliance#Legal_compliance|IP / Legal Compliance]], SCA products will extract and evaluate the type of licensing used for the OSS component.<ref>
{{Cite book
|last1=Duan|first1=Ruian
Line 140 ⟶ 158:
|chapter-url=https://dl.acm.org/doi/pdf/10.1145/3133956.3134048
}}</ref> Versions of components are extracted from popular open source repositories such as [[GitHub]], [[Apache Maven|Maven]], [[Python Package Index|PyPi]], [[NuGet]], and many others.
* Modern SCA systems have incorporated advanced analysis techniques to improve accuracy and reduce false positives. Notable contributions include '''vulnerable method analysis''', which determines whether vulnerable methods identified in dependencies are actually reachable from the application code. This approach, pioneered by [[Asankhaya Sharma]] and colleagues, uses call graph analysis to trace execution paths from application entry points to vulnerability-specific sinks in third-party libraries.<ref>
{{Cite arxiv
|last1=Foo|first1=Darius
|last2=Yeo|first2=Jason
|last3=Xiao|first3=Hao
|last4=Sharma|first4=Asankhaya
|title=The Dynamics of Software Composition Analysis
|date=2019
|eprint=1909.00973
|class=cs.SE
}}</ref>
* '''Hybrid static-dynamic analysis''' techniques combine statically-constructed call graphs with dynamic instrumentation to improve the performance of false positive elimination. This modular approach addresses limitations of purely static analysis, which can introduce both false positives and false negatives on real-world projects.<ref>
{{Cite arxiv
|last1=Foo|first1=Darius
|last2=Yeo|first2=Jason
|last3=Xiao|first3=Hao
|last4=Sharma|first4=Asankhaya
|title=The Dynamics of Software Composition Analysis
|date=2019
|eprint=1909.00973
|class=cs.SE
}}</ref>
* '''Machine learning-based vulnerability curation''' automates the process of building and maintaining vulnerability databases by predicting the vulnerability-relatedness of data items from various sources such as bug tracking systems, commits, and mailing lists. These systems use self-training techniques to iteratively improve model quality and include deployment stability metrics to evaluate new models before production deployment.<ref>
{{Cite conference
|last1=Chen|first1=Yang
|last2=Santosa|first2=Andrew E
|last3=Yi|first3=Ang Ming
|last4=Sharma|first4=Abhishek
|last5=Sharma|first5=Asankhaya
|last6=Lo|first6=David
|date=2020
|title=A Machine Learning Approach for Vulnerability Curation
|conference=Proceedings of the 17th International Conference on Mining Software Repositories
|pages=32–42
|doi=10.1145/3379597.3387461
}}</ref>
* '''Natural language processing techniques''' for automated vulnerability identification analyze commit messages and bug reports to identify security-related issues that may not have been publicly disclosed. This approach uses machine learning classifiers trained on textual features extracted from development artifacts to discover previously unknown vulnerabilities in open-source libraries.<ref>
{{Cite conference
|last1=Zhou|first1=Yaqin
|last2=Sharma|first2=Asankhaya
|date=2017
|title=Automated identification of security issues from commit messages and bug reports
|conference=Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering
|pages=914–919
|doi=10.1145/3106237.3106293
}}</ref>
* The results are then made available to end users using different digital formats. The content and format depend on the SCA product and may include guidance to evaluate and interpret the risk, and recommendations especially when it concerns the legal requirements of open source components such as [[Copyleft#Strong_and_weak_copyleft|strong or weak copyleft]] licensing. The output may also contain a [[Software supply chain|Software Bill of Materials]] (SBOM) detailing all the open source components and associated attributes used in a software application<ref>
{{Cite journal
Line 147 ⟶ 211:
|date=2022
|title= Strengthening the Security of Operational Technology: Understanding Contemporary Bill of Materials
|journal=JCIP the Journal of Critical Infrastructure Policy
|volume=3
|pages=111
|pages=111–135
|doi=10.18278/jcip.3.1.8
|url=https://www.jcip1.org/uploads/1/3/6/5/136597491/jcip_3.1_online.pdf#page=117
}}</ref>
 
==Advanced techniques==
 
Since the early 2010s, researchers have developed several advanced techniques to improve the accuracy and efficiency of SCA tools:
 
===Vulnerable method analysis===
Vulnerable method analysis addresses the problem of determining whether a vulnerability in a third-party library poses an actual risk to an application. Rather than simply detecting the presence of vulnerable libraries, this technique analyzes whether the specific vulnerable methods within those libraries are reachable from the application's execution paths. The approach involves constructing call graphs that map the relationships between application code and library methods, then determining if there exists a path from application entry points to vulnerability-specific sinks in the libraries.<ref>
{{Cite arxiv
|last1=Foo|first1=Darius
|last2=Yeo|first2=Jason
|last3=Xiao|first3=Hao
|last4=Sharma|first4=Asankhaya
|title=The Dynamics of Software Composition Analysis
|date=2019
|eprint=1909.00973
|class=cs.SE
}}</ref>
 
===Machine learning for vulnerability databases===
Traditional vulnerability databases rely on manual curation by security researchers, which can be time-intensive and may miss relevant vulnerabilities. Machine learning approaches automate this process by training models to predict whether data items from various sources (such as bug reports, commits, and mailing lists) are vulnerability-related. These systems implement complete pipelines from data collection through model training and prediction, with iterative improvement mechanisms that generate better models as new data becomes available.<ref>
{{Cite conference
|last1=Chen|first1=Yang
|last2=Santosa|first2=Andrew E
|last3=Yi|first3=Ang Ming
|last4=Sharma|first4=Abhishek
|last5=Sharma|first5=Asankhaya
|last6=Lo|first6=David
|date=2020
|title=A Machine Learning Approach for Vulnerability Curation
|conference=Proceedings of the 17th International Conference on Mining Software Repositories
|pages=32–42
|doi=10.1145/3379597.3387461
}}</ref>
 
===Static analysis for library compatibility===
As SCA tools increasingly recommend library updates to address vulnerabilities, ensuring compatibility becomes critical. Advanced static analysis techniques can automatically detect [[API]] incompatibilities that would be introduced by library upgrades, enabling automated vulnerability remediation without breaking existing functionality. These lightweight analyses are designed to integrate into [[continuous integration]] and [[continuous delivery]] pipelines.<ref>
{{Cite conference
|last1=Foo|first1=Darius
|last2=Chua|first2=Hendy
|last3=Yeo|first3=Jason
|last4=Ang|first4=Ming Yi
|last5=Sharma|first5=Asankhaya
|date=2018
|title=Efficient static checking of library updates
|conference=Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering
|pages=791–796
|doi=10.1145/3236024.3275535
}}</ref>
 
Line 185 ⟶ 299:
|isbn=978-1-7281-8535-4
|s2cid=236193144
|chapter-url=https://ieeexplore.ieee.org/document/9482270
}}</ref>
 
Line 207 ⟶ 320:
}}</ref>
 
== SCA Strengths ==
The automatic nature of SCA products is their primary strength. Developers don't have to manually do an extra work when using and integrating OSS components.<ref>
{{Cite book
Line 223 ⟶ 336:
|url=https://ink.library.smu.edu.sg/sis_research/5501
|chapter-url=https://dl.acm.org/doi/pdf/10.1145/3377813.3381360
}}</ref> The automation also applies to indirect references to other OSS components within code and artifacts.<ref>
{{Cite journalbook
|last1=Kengo Oka|first1=Dennis
|chapter= Software Composition Analysis in the Automotive Industry
|title=Building Secure Cars
|date=2021
|title= Software Composition Analysis in the Automotive Industry
|journal=Building Secure Cars: Assuring the Automotive Software Development Lifecycle
|pages=91–110
|publisher=Wiley
|doi=10.1002/9781119710783.ch6
|isbn=9781119710783
|s2cid=233582862
Line 237 ⟶ 350:
}}</ref>
 
Modern SCA implementations have significantly improved accuracy through advanced analysis techniques. Vulnerable method analysis reduces false positives by determining actual reachability of vulnerable code paths, while machine learning approaches for vulnerability curation help maintain more comprehensive and up-to-date vulnerability databases. These advances address many traditional limitations of metadata-only approaches.<ref>
== SCA Weaknesses ==
{{Cite arxiv
|last1=Foo|first1=Darius
|last2=Yeo|first2=Jason
|last3=Xiao|first3=Hao
|last4=Sharma|first4=Asankhaya
|title=The Dynamics of Software Composition Analysis
|date=2019
|eprint=1909.00973
|class=cs.SE
}}</ref>
 
== Weaknesses ==
Conversely, some key weaknesses of current SCA products may include:
* Complex and labor-intensive deployment that can take months to get fully operational <ref>
Line 287 ⟶ 412:
}}</ref>
* Lack of guidance on the legal requirements of OSS licenses that are detected <ref>
{{Cite journalweb
|last1=Millar|first1=Stuart
|date=November 2017
|title= Vulnerability Detection in Open Source Software: The Cure and the Cause
|journalpublisher=Queen's University Belfast
|url=https://pureadmin.qub.ac.uk/ws/portalfiles/portal/128394396/SMillar_13616005_VulnerabilityDetectionInOSS.pdf
}}</ref>
Line 302 ⟶ 427:
* [[Open-source license]]
* [[Software intelligence]]
* [[Asankhaya Sharma]]
* [[Static program analysis]]
* [[Call graph]]
 
==References==
{{reflist}}
 
[[Category:Information technology governance]]
[[Category:ComputersSoftware]]