Code property graph: Difference between revisions

Content deleted Content added
No edit summary
WikiCleanerBot (talk | contribs)
m v2.05b - Bot T20 CW#61 - Fix errors for CW project (Reference before punctuation)
 
(16 intermediate revisions by 10 users not shown)
Line 1:
{{Short description|Representation of a computer program}}
In [[computer science]], a '''code property graph''' (CPG) is a [[computer program]] representation that captures [[Abstract syntax tree|syntactic structure]], [[Control-flow graph|control flow]], and [[data dependencies]] in a [[Graph database|property graph]]. The concept was originally introduced to identify security vulnerabilities in [[C/ (programming language)|C]] and [[C++]] system code,<ref>{{cite journalbook |last1=Yamaguchi |first1=Fabian |last2=Golde |first2=Nico |last3=Arp |first3=Daniel |last4=Rieck |first4=Konrad |title=2014 IEEE Symposium on Security and Privacy |chapter=Modeling and Discovering Vulnerabilities with Code Property Graphs |journal=2014 IEEE Symposium on Security and Privacy |date=May 2014 |pages=590–604 |doi=10.1109/SP.2014.44|isbn=978-1-4799-4686-0 |s2cid=2231082 }}</ref> but has since been employed to analyze Web[[web applicationsapplication]]s,<ref>{{cite journalbook |last1=Backes |first1=Michael |last2=Rieck |first2=Konrad |last3=Skoruppa |first3=Malte |last4=Stock |first4=Ben |last5=Yamaguchi |first5=Fabian |title=Efficient and Flexible Discovery of PHP Application Vulnerabilities |journal=2017 IEEE European Symposium on Security and Privacy (EuroS&P) |chapter=Efficient and Flexible Discovery of PHP Application Vulnerabilities |date=April 2017 |pages=334–349 |doi=10.1109/EuroSP.2017.14|isbn=978-1-5090-5762-7 |s2cid=206649536 }}</ref><ref>{{cite journalbook |last1=Li |first1=Song |last2=Kang |first2=Mingqing |last3=Hou |first3=Jianwei |last4=Cao |first4=Yinzhi |title=Mining Node.js Vulnerabilities via Object Dependence Graph and Query |date=2022 |pages=143–160 |isbn=9781939133311 |url=https://www.usenix.org/conference/usenixsecurity22/presentation/li-song |language=en}}</ref><ref>{{cite journal |last1=Brito |first1=Tiago |last2=Lopes |first2=Pedro |last3=Santos |first3=Nuno |last4=Santos |first4=José Fragoso |title=Wasmati: An efficient static vulnerability scanner for WebAssembly |journal=Computers & Security |date=1 July 2022 |volume=118 |pages=102745 |doi=10.1016/j.cose.2022.102745|arxiv=2204.12575 |s2cid=248405811 }}</ref><ref>{{cite journalbook |last1=Khodayari |first1=Soheil |last2=Pellegrino |first2=Giancarlo |title=JAW: Studying Client-side CSRF with Hybrid Property Graphs and Declarative Traversals |date=2021 |pages=2525–2542 |isbn=9781939133243 |url=https://www.usenix.org/conference/usenixsecurity21/presentation/khodayari |language=en}}</ref>, cloud deployments,<ref>{{cite journalbook |last1=Banse |first1=Christian |last2=Kunz |first2=Immanuel |last3=Schneider |first3=Angelika |last4=Weiss |first4=Konrad |title=2021 IEEE 14th International Conference on Cloud Computing (CLOUD) |chapter=Cloud Property Graph: Connecting Cloud Security Assessments with Static Code Analysis |journal=2021 IEEE 14th International Conference on Cloud Computing (CLOUD) |date=September 2021 |pages=13–19 |doi=10.1109/CLOUD53861.2021.00014|arxiv=2206.06938 |isbn=978-1-6654-0060-2 |s2cid=243946828 }}</ref>, and smart contracts.<ref>{{cite journal |last1=Giesen |first1=Jens-Rene |last2=Andreina |first2=Sebastien |last3=Rodler |first3=Michael |last4=Karame |first4=Ghassan |last5=Davi |first5=Lucas |title=Practical Mitigation of Smart Contract Bugs {{!}} TeraFlow |journalwebsite=www.teraflow-h2020.eu |url=https://www.teraflow-h2020.eu/publications/practical-mitigation-smart-contract-bugs}}</ref>. Beyond vulnerability discovery, code property graphs find applications in code clone detection,<ref>{{cite journalbook |last1=Wi |first1=Seongil |last2=Woo |first2=Sijae |last3=Whang |first3=Joyce Jiyoung |last4=Son |first4=Sooel |title=Proceedings of the ACM Web Conference 2022 |chapter=HiddenCPG: Large-Scale Vulnerable Clone Detection Using Subgraph Isomorphism of Code Property Graphs |journal=Proceedings of the ACM Web Conference 2022 |date=25 April 2022 |pages=755–766 |doi=10.1145/3485447.3512235|isbn=9781450390965 |s2cid=248367462 }}</ref><ref>{{cite journalbook |last1=Bowman |first1=Benjamin |last2=Huang |first2=H. Howie |title=2020 IEEE European Symposium on Security and Privacy (EuroS&P) |chapter=VGRAPH: A Robust Vulnerable Code Clone Detection System Using Code Property Triplets |journal=2020 IEEE European Symposium on Security and Privacy (EuroS&P) |date=September 2020 |pages=53–69 |doi=10.1109/EuroSP48549.2020.00012|isbn=978-1-7281-5087-1 |s2cid=226268429 }}</ref>, attack-surface detection,<ref>{{cite journalbook |last1=Du |first1=Xiaoning |last2=Chen |first2=Bihuan |last3=Li |first3=Yuekang |last4=Guo |first4=Jianmin |last5=Zhou |first5=Yaqin |last6=Liu |first6=Yang |last7=Jiang |first7=Yu |title=LEOPARD: Identifying Vulnerable Code for Vulnerability Assessment Through Program Metrics |journal=2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE) |chapter=LEOPARD: Identifying Vulnerable Code for Vulnerability Assessment Through Program Metrics |date=May 2019 |pages=60–71 |doi=10.1109/ICSE.2019.00024|arxiv=1901.11479 |isbn=978-1-7281-0869-8 |s2cid=59523689 }}</ref>, exploit generation,<ref>{{cite journalbook |last1=Alhuzali |first1=Abeer |last2=Gjomemo |first2=Rigel |last3=Eshete |first3=Birhanu |last4=Venkatakrishnan |first4=V. N. |title=NAVEX: Precise and Scalable Exploit Generation for Dynamic Web Applications |date=2018 |pages=377–392 |isbn=9781939133045 |url=https://www.usenix.org/conference/usenixsecurity18/presentation/alhuzali |language=en}}</ref>, measuring code testability,<ref>{{cite journal |last1=Al Kassar |first1=Feras |last2=Clerici |first2=Giulia |last3=Compagna |first3=Luca |last4=Balzarotti |first4=Davide |last5=Yamaguchi |first5=Fabian |title=Testability Tarpits: the Impact of Code Patterns on the Security Testing of Web Applications – NDSS Symposium |journal=NDSS Symposium |url=https://www.ndss-symposium.org/ndss-paper/auto-draft-206/}}</ref>, and backporting of security patches.<ref>{{cite journalbook |last1=Shi |first1=Youkun |last2=Zhang |first2=Yuan |last3=Luo |first3=Tianhan |last4=Mao |first4=Xiangyu |last5=Cao |first5=Yinzhi |last6=Wang |first6=Ziwen |last7=Zhao |first7=Yudi |last8=Huang |first8=Zongan |last9=Yang |first9=Min |title=Backporting Security Patches of Web Applications: A Prototype Design and Implementation on Injection Vulnerability Patches |date=2022 |pages=1993–2010 |isbn=9781939133311 |url=https://www.usenix.org/conference/usenixsecurity22/presentation/shi |language=en}}</ref>.
{{Draft topics|stem}}
 
In computer science, a '''code property graph''' (CPG) is a program representation that captures [[Abstract syntax tree|syntactic structure]], [[Control-flow graph|control flow]], and [[data dependencies]] in a [[Graph database|property graph]]. The concept was originally introduced to identify security vulnerabilities in C/C++ system code<ref>{{cite journal |last1=Yamaguchi |first1=Fabian |last2=Golde |first2=Nico |last3=Arp |first3=Daniel |last4=Rieck |first4=Konrad |title=Modeling and Discovering Vulnerabilities with Code Property Graphs |journal=2014 IEEE Symposium on Security and Privacy |date=May 2014 |pages=590–604 |doi=10.1109/SP.2014.44}}</ref> but has since been employed to analyze Web applications<ref>{{cite journal |last1=Backes |first1=Michael |last2=Rieck |first2=Konrad |last3=Skoruppa |first3=Malte |last4=Stock |first4=Ben |last5=Yamaguchi |first5=Fabian |title=Efficient and Flexible Discovery of PHP Application Vulnerabilities |journal=2017 IEEE European Symposium on Security and Privacy (EuroS&P) |date=April 2017 |pages=334–349 |doi=10.1109/EuroSP.2017.14}}</ref><ref>{{cite journal |last1=Li |first1=Song |last2=Kang |first2=Mingqing |last3=Hou |first3=Jianwei |last4=Cao |first4=Yinzhi |title=Mining Node.js Vulnerabilities via Object Dependence Graph and Query |date=2022 |pages=143–160 |url=https://www.usenix.org/conference/usenixsecurity22/presentation/li-song |language=en}}</ref><ref>{{cite journal |last1=Brito |first1=Tiago |last2=Lopes |first2=Pedro |last3=Santos |first3=Nuno |last4=Santos |first4=José Fragoso |title=Wasmati: An efficient static vulnerability scanner for WebAssembly |journal=Computers & Security |date=1 July 2022 |volume=118 |pages=102745 |doi=10.1016/j.cose.2022.102745}}</ref><ref>{{cite journal |last1=Khodayari |first1=Soheil |last2=Pellegrino |first2=Giancarlo |title=JAW: Studying Client-side CSRF with Hybrid Property Graphs and Declarative Traversals |date=2021 |pages=2525–2542 |url=https://www.usenix.org/conference/usenixsecurity21/presentation/khodayari |language=en}}</ref>, cloud deployments<ref>{{cite journal |last1=Banse |first1=Christian |last2=Kunz |first2=Immanuel |last3=Schneider |first3=Angelika |last4=Weiss |first4=Konrad |title=Cloud Property Graph: Connecting Cloud Security Assessments with Static Code Analysis |journal=2021 IEEE 14th International Conference on Cloud Computing (CLOUD) |date=September 2021 |pages=13–19 |doi=10.1109/CLOUD53861.2021.00014}}</ref>, and smart contracts<ref>{{cite journal |last1=Giesen |first1=Jens-Rene |last2=Andreina |first2=Sebastien |last3=Rodler |first3=Michael |last4=Karame |first4=Ghassan |last5=Davi |first5=Lucas |title=Practical Mitigation of Smart Contract Bugs {{!}} TeraFlow |journal=www.teraflow-h2020.eu |url=https://www.teraflow-h2020.eu/publications/practical-mitigation-smart-contract-bugs}}</ref>. Beyond vulnerability discovery, code property graphs find applications in code clone detection<ref>{{cite journal |last1=Wi |first1=Seongil |last2=Woo |first2=Sijae |last3=Whang |first3=Joyce Jiyoung |last4=Son |first4=Sooel |title=HiddenCPG: Large-Scale Vulnerable Clone Detection Using Subgraph Isomorphism of Code Property Graphs |journal=Proceedings of the ACM Web Conference 2022 |date=25 April 2022 |pages=755–766 |doi=10.1145/3485447.3512235}}</ref><ref>{{cite journal |last1=Bowman |first1=Benjamin |last2=Huang |first2=H. Howie |title=VGRAPH: A Robust Vulnerable Code Clone Detection System Using Code Property Triplets |journal=2020 IEEE European Symposium on Security and Privacy (EuroS&P) |date=September 2020 |pages=53–69 |doi=10.1109/EuroSP48549.2020.00012}}</ref>, attack-surface detection<ref>{{cite journal |last1=Du |first1=Xiaoning |last2=Chen |first2=Bihuan |last3=Li |first3=Yuekang |last4=Guo |first4=Jianmin |last5=Zhou |first5=Yaqin |last6=Liu |first6=Yang |last7=Jiang |first7=Yu |title=LEOPARD: Identifying Vulnerable Code for Vulnerability Assessment Through Program Metrics |journal=2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE) |date=May 2019 |pages=60–71 |doi=10.1109/ICSE.2019.00024}}</ref>, exploit generation<ref>{{cite journal |last1=Alhuzali |first1=Abeer |last2=Gjomemo |first2=Rigel |last3=Eshete |first3=Birhanu |last4=Venkatakrishnan |first4=V. N. |title=NAVEX: Precise and Scalable Exploit Generation for Dynamic Web Applications |date=2018 |pages=377–392 |url=https://www.usenix.org/conference/usenixsecurity18/presentation/alhuzali |language=en}}</ref>, measuring code testability<ref>{{cite journal |last1=Al Kassar |first1=Feras |last2=Clerici |first2=Giulia |last3=Compagna |first3=Luca |last4=Balzarotti |first4=Davide |last5=Yamaguchi |first5=Fabian |title=Testability Tarpits: the Impact of Code Patterns on the Security Testing of Web Applications – NDSS Symposium |journal=NDSS Symposium |url=https://www.ndss-symposium.org/ndss-paper/auto-draft-206/}}</ref>, and backporting of security patches<ref>{{cite journal |last1=Shi |first1=Youkun |last2=Zhang |first2=Yuan |last3=Luo |first3=Tianhan |last4=Mao |first4=Xiangyu |last5=Cao |first5=Yinzhi |last6=Wang |first6=Ziwen |last7=Zhao |first7=Yudi |last8=Huang |first8=Zongan |last9=Yang |first9=Min |title=Backporting Security Patches of Web Applications: A Prototype Design and Implementation on Injection Vulnerability Patches |date=2022 |pages=1993–2010 |url=https://www.usenix.org/conference/usenixsecurity22/presentation/shi |language=en}}</ref>.
 
== Definition ==
A code property graph of a program is a graph representation of the program obtained by merging its [[Abstractabstract syntax tree|abstract syntax trees]]s (AST), [[Controlcontrol-flow graph|control flow graphs]]s (CFG) and [[Programprogram dependence graph|program dependence graphs]]s (PDG) at statement and predicate nodes. The resulting graph is a property graph, which is the underlying graph model of [[graph databasesdatabase]]s such as Neo4J[[Neo4j]], [[JanusGraph]] and [[OrientDB]] where data is stored in the nodes and edges as [[key-value pairspair]]s. In effect, code property graphs can be stored in graph databases and queried using graph query languages.
 
== Example ==
 
Consider the function of a [[C (programming language)|C]] program:
<syntaxhighlight lang="c">
void foo() {
Line 20 ⟶ 18:
</syntaxhighlight>
 
The code property graph of the function is obtained by merging its abstract syntax tree, control -flow graph, and program dependence graph at statements and predicates as seen in the following figure:
 
[[File:CodePropertyGraph.png|700px|Code property graph of a sample C code snippet]]
 
== Implementations ==
 
'''Joern CPG.''' The original code property graph was implemented for C/C++ in 2013 at [[University of Göttingen]] as part of the open-source code analysis tool Joern.<ref>{{cite web |title=Joern - A Robust Code Analysis Platform for C/C++ |url=http://www.mlsec.org/joern/index.shtml |website=www.mlsec.org}}</ref>. This original version has been discontinued and superseded by the open-source Joern Project,<ref>{{cite web |title=Joern - The Bug Hunter's Workbench |url=https://joern.io |website=Joern - The Bug Hunter's Workbench |language=en}}</ref>, which provides a formal code property graph specification<ref>{{cite web |title=Code Property Graph Specification |url=http://cpg.joern.io/ |website=cpg-spec.github.io |language=en}}</ref> applicable to multiple programming languages. The project provides code property graph generators for C/C++, Java, Java bytecode, Kotlin, Python, JavaScript, TypeScript, LLVM bitcode, and x86 binaries (via the [[Ghidra]] disassembler).
 
'''Plume CPG.''' Developed at [[Stellenbosch University]] in 2020 and sponsored by Amazon Science, the open-source Plume<ref>{{cite web |title=Plume |url=https://plume-oss.github.io/plume-docs/ |website=plume-oss.github.io}}</ref> project provides a code property graph for Java bytecode compatible with the code property graph specification provided by the Joern project. The two projects merged in 2021.
 
'''Fraunhofer AISEC CPG.''' The [[Fraunhofer Society{{ill|Fraunhofer]] Institute for Applied and Integrated Security|de|Fraunhofer-Institut für Angewandte und Integrierte Sicherheit}} provides open-source code property graph generators for C/C++, Java, Golang, and Python, TypeScript and LLVM-IR.<ref>{{cite web |title=Code Property Graph |url=https://github.com/Fraunhofer-AISEC/cpg |publisher=Fraunhofer AISEC |date=31 August 2022}}</ref>, albeitIt withoutalso includes a formal schema specification of the graph<ref>{{Cite web |title=Specifications - Code Property Graph |url=https://fraunhofer-aisec.github.io/cpg/CPG/specs/ It|access-date=2025-01-10 also|website=fraunhofer-aisec.github.io}}</ref> and its various node types. Furthermore, it provides the Cloud Property Graph,<ref>{{cite journalbook |last1=Banse |first1=Christian |last2=Kunz |first2=Immanuel |last3=Schneider |first3=Angelika |last4=Weiss |first4=Konrad |title=2021 IEEE 14th International Conference on Cloud Computing (CLOUD) |chapter=Cloud Property Graph: Connecting Cloud Security Assessments with Static Code Analysis |journal=2021 IEEE 14th International Conference on Cloud Computing (CLOUD) |date=September 2021 |pages=13–19 |doi=10.1109/CLOUD53861.2021.00014|arxiv=2206.06938 |isbn=978-1-6654-0060-2 |s2cid=243946828 }}</ref>, an extension of the code property graph concept that models details of cloud deployments.
 
'''Galois’ CPG for LLVM.''' Galois Inc. provides a code property graph based on the [[LLVM]] compiler.<ref>{{cite web |title=The Code Property Graph — MATE 0.1.0.0 documentation |url=https://galoisinc.github.io/MATE/cpg.html |website=galoisinc.github.io}}</ref>. The graph represents code at different stages of the compilation and a mapping between these representations. It follows a custom schema that is defined in its documentation.
 
== Machine learning on code property graphs ==
Code property graphs provide the basis for several machine-learning-based approaches to vulnerability discovery. In particular, [[Graphgraph neural network|graph neural networks]]s (GNN) have been employed to derive vulnerability detectors.<ref>{{cite journal |last1=Zhou |first1=Yaqin |last2=Liu |first2=Shangqing |last3=Siow |first3=Jingkai |last4=Du |first4=Xiaoning |last5=Liu |first5=Yang |title=Devign: effective vulnerability identification by learning comprehensive program semantics via graph neural networks |journal=Proceedings of the 33rd International Conference on Neural Information Processing Systems |date=8 December 2019 |pages=10197–10207 |url=https://dl.acm.org/doi/10.5555/3454287.3455202 |publisher=Curran Associates Inc.|arxiv=1909.03496 }}</ref><ref>{{cite journalbook |last1=Haojie |first1=Zhang |last2=Yujun |first2=Li |last3=Yiwei |first3=Liu |last4=Nanxin |first4=Zhou |title=2021 18th International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP) |chapter=Vulmg: A Static Detection Solution Forfor Source Code Vulnerabilities Based Onon Code Property Graph and Graph Attention Network |journal=2021 18th International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP) |date=December 2021 |pages=250–255 |doi=10.1109/ICCWAMTIP53232.2021.9674145|isbn=978-1-6654-1364-0 |s2cid=246039350 }}</ref><ref>{{cite journalbook |last1=Zheng |first1=Weining |last2=Jiang |first2=Yuan |last3=Su |first3=Xiaohong |title=Vu1SPG: Vulnerability detection based on slice property graph representation learning |journal=2021 IEEE 32nd International Symposium on Software Reliability Engineering (ISSRE) |chapter=Vu1SPG: Vulnerability detection based on slice property graph representation learning |date=October 2021 |pages=457–467 |doi=10.1109/ISSRE52982.2021.00054|isbn=978-1-6654-2587-2 |s2cid=246751595 }}</ref><ref>{{cite journal |last1=Chakraborty |first1=Saikat |last2=Krishna |first2=Rahul |last3=Ding |first3=Yangruibo |last4=Ray |first4=Baishakhi |title=Deep Learning based Vulnerability Detection: Are We There Yet |journal=IEEE Transactions on Software Engineering |date=2021 |volume=48 |issue=9 |pages=1–13280–3296 |doi=10.1109/TSE.2021.3087402|arxiv=2009.07235 |s2cid=221703797 }}</ref><ref>{{cite journalbook |last1=Zhou |first1=Li |last2=Huang |first2=Minhuan |last3=Li |first3=Yujun |last4=Nie |first4=Yuanping |last5=Li |first5=Jin |last6=Liu |first6=Yiwei |title=2021 IEEE Sixth International Conference on Data Science in Cyberspace (DSC) |chapter=GraphEye: A Novel Solution for Detecting Vulnerable Functions Based on Graph Attention Network |journal=2021 IEEE Sixth International Conference on Data Science in Cyberspace (DSC) |date=October 2021 |pages=381–388 |doi=10.1109/DSC53577.2021.00060|arxiv=2202.02501 |isbn=978-1-6654-1815-7 |s2cid=246634824 }}</ref><ref>{{cite journalbook |last1=Ganz |first1=Tom |last2=Härterich |first2=Martin |last3=Warnecke |first3=Alexander |last4=Rieck |first4=Konrad |title=Explaining Graph Neural Networks for Vulnerability Discovery |journal=Proceedings of the 14th ACM Workshop on Artificial Intelligence and Security |chapter=Explaining Graph Neural Networks for Vulnerability Discovery |date=15 November 2021 |pages=145–156 |doi=10.1145/3474369.3486866|isbn=9781450386579 |s2cid=240001850 |doi-access=free }}</ref><ref>{{cite journalbook |last1=Duan |first1=Xu |last2=Wu |first2=Jingzheng |last3=Ji |first3=Shouling |last4=Rui |first4=Zhiqing |last5=Luo |first5=Tianyue |last6=Yang |first6=Mutian |last7=Wu |first7=Yanjun |title=VulSniper: Focus Your Attention to Shoot Fine-Grained Vulnerabilities |journal=Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence |chapter=VulSniper: Focus Your Attention to Shoot Fine-Grained Vulnerabilities |date=August 2019 |pages=4665–4671 |doi=10.24963/ijcai.2019/648|isbn=978-0-9992411-4-1 |s2cid=199466292 |doi-access=free }}</ref>
 
== See also ==
* [[Abstract syntax tree|Abstract Syntax Tree]] (AST)
* [[Control-flow graph|Control Flow Graph]] (CFG)
* [[Program dependence graph|Program Dependence Graph]] (PDG)
* [[Graph database|Graph Database]]
 
==References==
{{reflist}}
 
[[Category:Computer security software]]
[[Category:Application-specific graphs]]