Code property graph: Difference between revisions

Content deleted Content added
No edit summary
Line 2:
{{Draft topics|stem}}
 
In [[computer science]], a '''code property graph''' (CPG) is a [[computer program]] representation that captures [[Abstract syntax tree|syntactic structure]], [[Control-flow graph|control flow]], and [[data dependencies]] in a [[Graph database|property graph]]. The concept was originally introduced to identify security vulnerabilities in [[C/ (programming language)|C]] and [[C++]] system code,<ref>{{cite journal |last1=Yamaguchi |first1=Fabian |last2=Golde |first2=Nico |last3=Arp |first3=Daniel |last4=Rieck |first4=Konrad |title=Modeling and Discovering Vulnerabilities with Code Property Graphs |journal=2014 IEEE Symposium on Security and Privacy |date=May 2014 |pages=590–604 |doi=10.1109/SP.2014.44}}</ref> but has since been employed to analyze Web[[web applicationsapplication]]s,<ref>{{cite journal |last1=Backes |first1=Michael |last2=Rieck |first2=Konrad |last3=Skoruppa |first3=Malte |last4=Stock |first4=Ben |last5=Yamaguchi |first5=Fabian |title=Efficient and Flexible Discovery of PHP Application Vulnerabilities |journal=2017 IEEE European Symposium on Security and Privacy (EuroS&P) |date=April 2017 |pages=334–349 |doi=10.1109/EuroSP.2017.14}}</ref><ref>{{cite journal |last1=Li |first1=Song |last2=Kang |first2=Mingqing |last3=Hou |first3=Jianwei |last4=Cao |first4=Yinzhi |title=Mining Node.js Vulnerabilities via Object Dependence Graph and Query |date=2022 |pages=143–160 |url=https://www.usenix.org/conference/usenixsecurity22/presentation/li-song |language=en}}</ref><ref>{{cite journal |last1=Brito |first1=Tiago |last2=Lopes |first2=Pedro |last3=Santos |first3=Nuno |last4=Santos |first4=José Fragoso |title=Wasmati: An efficient static vulnerability scanner for WebAssembly |journal=Computers & Security |date=1 July 2022 |volume=118 |pages=102745 |doi=10.1016/j.cose.2022.102745}}</ref><ref>{{cite journal |last1=Khodayari |first1=Soheil |last2=Pellegrino |first2=Giancarlo |title=JAW: Studying Client-side CSRF with Hybrid Property Graphs and Declarative Traversals |date=2021 |pages=2525–2542 |url=https://www.usenix.org/conference/usenixsecurity21/presentation/khodayari |language=en}}</ref> cloud deployments,<ref>{{cite journal |last1=Banse |first1=Christian |last2=Kunz |first2=Immanuel |last3=Schneider |first3=Angelika |last4=Weiss |first4=Konrad |title=Cloud Property Graph: Connecting Cloud Security Assessments with Static Code Analysis |journal=2021 IEEE 14th International Conference on Cloud Computing (CLOUD) |date=September 2021 |pages=13–19 |doi=10.1109/CLOUD53861.2021.00014}}</ref> and smart contracts.<ref>{{cite journal |last1=Giesen |first1=Jens-Rene |last2=Andreina |first2=Sebastien |last3=Rodler |first3=Michael |last4=Karame |first4=Ghassan |last5=Davi |first5=Lucas |title=Practical Mitigation of Smart Contract Bugs {{!}} TeraFlow |journal=www.teraflow-h2020.eu |url=https://www.teraflow-h2020.eu/publications/practical-mitigation-smart-contract-bugs}}</ref> Beyond vulnerability discovery, code property graphs find applications in code clone detection,<ref>{{cite journal |last1=Wi |first1=Seongil |last2=Woo |first2=Sijae |last3=Whang |first3=Joyce Jiyoung |last4=Son |first4=Sooel |title=HiddenCPG: Large-Scale Vulnerable Clone Detection Using Subgraph Isomorphism of Code Property Graphs |journal=Proceedings of the ACM Web Conference 2022 |date=25 April 2022 |pages=755–766 |doi=10.1145/3485447.3512235}}</ref><ref>{{cite journal |last1=Bowman |first1=Benjamin |last2=Huang |first2=H. Howie |title=VGRAPH: A Robust Vulnerable Code Clone Detection System Using Code Property Triplets |journal=2020 IEEE European Symposium on Security and Privacy (EuroS&P) |date=September 2020 |pages=53–69 |doi=10.1109/EuroSP48549.2020.00012}}</ref> attack-surface detection,<ref>{{cite journal |last1=Du |first1=Xiaoning |last2=Chen |first2=Bihuan |last3=Li |first3=Yuekang |last4=Guo |first4=Jianmin |last5=Zhou |first5=Yaqin |last6=Liu |first6=Yang |last7=Jiang |first7=Yu |title=LEOPARD: Identifying Vulnerable Code for Vulnerability Assessment Through Program Metrics |journal=2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE) |date=May 2019 |pages=60–71 |doi=10.1109/ICSE.2019.00024}}</ref> exploit generation,<ref>{{cite journal |last1=Alhuzali |first1=Abeer |last2=Gjomemo |first2=Rigel |last3=Eshete |first3=Birhanu |last4=Venkatakrishnan |first4=V. N. |title=NAVEX: Precise and Scalable Exploit Generation for Dynamic Web Applications |date=2018 |pages=377–392 |url=https://www.usenix.org/conference/usenixsecurity18/presentation/alhuzali |language=en}}</ref> measuring code testability,<ref>{{cite journal |last1=Al Kassar |first1=Feras |last2=Clerici |first2=Giulia |last3=Compagna |first3=Luca |last4=Balzarotti |first4=Davide |last5=Yamaguchi |first5=Fabian |title=Testability Tarpits: the Impact of Code Patterns on the Security Testing of Web Applications – NDSS Symposium |journal=NDSS Symposium |url=https://www.ndss-symposium.org/ndss-paper/auto-draft-206/}}</ref> and backporting of security patches.<ref>{{cite journal |last1=Shi |first1=Youkun |last2=Zhang |first2=Yuan |last3=Luo |first3=Tianhan |last4=Mao |first4=Xiangyu |last5=Cao |first5=Yinzhi |last6=Wang |first6=Ziwen |last7=Zhao |first7=Yudi |last8=Huang |first8=Zongan |last9=Yang |first9=Min |title=Backporting Security Patches of Web Applications: A Prototype Design and Implementation on Injection Vulnerability Patches |date=2022 |pages=1993–2010 |url=https://www.usenix.org/conference/usenixsecurity22/presentation/shi |language=en}}</ref>
 
== Definition ==
A code property graph of a program is a graph representation of the program obtained by merging its [[abstract syntax tree]]s (AST), [[control-flow graph]]s (CFG) and [[program dependence graph]]s (PDG) at statement and predicate nodes. The resulting graph is a property graph, which is the underlying graph model of [[graph databasesdatabase]]s such as [[Neo4j]], [[JanusGraph]] and [[OrientDB]] where data is stored in the nodes and edges as [[key-value pairspair]]s. In effect, code property graphs can be stored in graph databases and queried using graph query languages.
 
== Example ==