Profiling (computer programming): Difference between revisions

Content deleted Content added
No edit summary
Tags: Reverted Visual edit Mobile edit Mobile web edit
top: bold alt article name per MOS
 
(12 intermediate revisions by 11 users not shown)
Line 1:
{{Short description|Measuring the time or resources used by a section of a computer program}}
{{more citations needed|date=January 2009}}
{{Software development process|Tools}}
In [[software engineering]], '''profiling''' ("'''program profiling"''', "'''software profiling"''') is a form of [[dynamic program analysis]] that measures, for example, the space (memory) or time [[Computational complexity theory|complexity of a program]], the [[instruction set simulator|usage of particular instructions]], or the frequency and duration of function calls. Most commonly, profiling information serves to aid [[program optimization]], and more specifically, [[performance engineering]].
 
Profiling is achieved by [[Instrumentation (computer programming)|instrumenting]] either the program [[source code]] or its binary executable form using a tool called a ''profiler'' (or ''code profiler''). Profilers may use a number of different techniques, such as event-based, statistical, instrumented, and simulation methods.
 
== Gathering program events ==
Profilers use a wide variety of techniques to collect data, including [[hardware interrupt]]s, [[Instrumentation (computer programming)|code instrumentation]], [[instruction set simulator|instruction set simulation]], operating system [[hooking|hooks]], and [[Hardware performance counter|performance counter]]s.
 
== Use of profilers ==
[[File:CodeAnalyst3.png|thumb|Graphical output of the [[CodeAnalyst]] profiler.]]
{{quotation|text=
Program analysis tools are extremely important for understanding program behavior. Computer architects need such tools to evaluate how well programs will perform on new [[computer architecture|architectures]]. Software writers need tools to analyze their programs and identify critical sections of code. [[Compiler]] writers often use such tools to find out how well their [[instruction scheduling]] or [[branch prediction]] algorithm is performing...|author=ATOM|source=[[Conference on Programming Language Design and Implementation|PLDI]]|'94}}
 
The output of a profiler may be:
Line 19 ⟶ 20:
 
/* ------------ source------------------------- count */
0001 IF X = "A" 0055
0002 THEN DO
0003 ADD 1 to XCOUNT 0032
0004 ELSE
0005 IF X = "B" 0055
 
* A stream of recorded events (a '''trace''')
Line 55 ⟶ 56:
 
===Input-sensitive profiler===
Input-sensitive profilers<ref name="aprof">E. Coppa, C. Demetrescu, and I. Finocchi, [https://web.archive.org/web/20180611201601/https://ieeexplore.ieee.org/document/6858059/ ''Input-Sensitive Profiling''], IEEE Trans. Software Eng. 40(12): 1185-1205 (2014); [[doi:10.1109/TSE.2014.2339825]]</ref><ref>D. Zaparanuks and M. Hauswirth, ''Algorithmic Profiling'', Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2012), ACM SIGPLAN Notices, Vol. 47, No. 6, pp. 67-76, 2012; [[doi:10.1145/2254064.2254074]]</ref><ref>T. Kustner, J. Weidendorfer, and T. Weinzierl, ''Argument Controlled Profiling'', Proceedings of Euro-Par 2009 – Parallel Processing Workshops, Lecture Notes in Computer Science, Vol. 6043, pp. 177-184, 2010; [[doi:10.1007/978-3-642-14122-5 22]]</ref> add a further dimension to flat or call-graph profilers by relating performance measures to features of the input workloads, such as input size or input values. They generate charts that characterize how an application's performance scales as a function of its input.
 
==Data granularity in profiler types==
Profilers, which are also programs themselves, analyze target programs by collecting information on theirthe target program's execution. Based on their data granularity, onwhich depends upon how profilers collect information, they are classified intoas ''event -based'' or ''statistical'' profilers. Profilers interrupt program execution to collect information,. which mayThose resultinterrupts incan alimit limitedtime measurement resolution, inwhich theimplies timethat measurements,timing whichresults should be taken with a grain of salt. [[Basic block]] profilers report a number of machine [[cycles per instruction|clock cycles]] devoted to executing each line of code, or a timing based on adding thesethose together; the timings reported per basic block may not reflect a difference between [[CPU cache|cache]] hits and misses.<ref>{{cite web| work=OpenStax CNX Archive| title=Timing and Profiling - Basic Block Profilers| url=https://archive.cnx.org/contents/d29c016a-2960-4fc9-b431-9eda881a28f5@3/timing-and-profiling-basic-block-profilers#id6897344}}</ref><ref>{{cite journal| last1=Ball| first1=Thomas| last2=Larus| first2=James R.| journal=ACM Transactions on Programming Languages and Systems| volume=16| issue=4| pages=1319–1360| title=Optimally profiling and tracing programs| publisher=ACM Digital Library| year=1994| url=https://www.classes.cs.uchicago.edu/current/32001-1/papers/ball-larus-profiling.pdf| doi=10.1145/183432.183527| s2cid=6897138| access-date=2018-05-18| archive-url=https://web.archive.org/web/20180518195918/https://www.classes.cs.uchicago.edu/current/32001-1/papers/ball-larus-profiling.pdf| archive-date=2018-05-18| url-status=dead}}</ref>
 
===Event-based profilers===
Event-based profilers are available for the following programming languages:
The programming languages listed here have event-based profilers:
* [[Java (programming language)|Java]]: the [[Java Virtual Machine Tools Interface|JVMTI]] (JVM Tools Interface) API, formerly JVMPI (JVM Profiling Interface), provides hooks to profilers, for trapping events like calls, class-load, unload, thread enter leave.
* [[.NET Framework|.NET]]: Can attach a profiling agent as a ''COM'' server to the ''CLR'' using Profiling ''API''. Like Java, the runtime then provides various callbacks into the agent, for trapping events like method [[Interpreter|JIT]] / enter / leave, object creation, etc. Particularly powerful in that the profiling agent can rewrite the target application's bytecode in arbitrary ways.
Line 68 ⟶ 69:
 
===Statistical profilers===
SomeThese profilers operate by [[Sampling (statistics)|sampling]]. A sampling profiler probes the target program's [[call stack]] at regular intervals using [[operating system]] [[interrupt]]s. Sampling profiles are typically less numerically accurate and specific, providing only a statistical approximation, but allow the target program to run at near full speed. "The actual amount of error is usually more than one sampling period. In fact, if a value is n times the sampling period, the expected error in it is the square-root of n sampling periods."<ref>[http://www.cs.utah.edu/dept/old/texinfo/as/gprof.html#SEC12 Statistical Inaccuracy of <code>gprof</code> Output] {{webarchive|url=https://web.archive.org/web/20120529075000/http://www.cs.utah.edu/dept/old/texinfo/as/gprof.html |date=2012-05-29 }}</ref>
 
In practice, sampling profilers can often provide a more accurate picture of the target program's execution than other approaches, as they are not as intrusive to the target program, and thus don't have as many side effects (such as on memory caches or instruction decoding pipelines). Also since they don't affect the execution speed as much, they can detect issues that would otherwise be hidden. They are also relatively immune to over-evaluating the cost of small, frequently called routines or 'tight' loops. They can show the relative amount of time spent in user mode versus interruptible kernel mode such as [[system call]] processing.
The resulting data are not exact, but a statistical approximation. "The actual amount of error is usually more than one sampling period. In fact, if a value is n times the sampling period, the expected error in it is the square-root of n sampling periods."<ref>[http://www.cs.utah.edu/dept/old/texinfo/as/gprof.html#SEC12 Statistical Inaccuracy of <code>gprof</code> Output] {{webarchive|url=https://web.archive.org/web/20120529075000/http://www.cs.utah.edu/dept/old/texinfo/as/gprof.html |date=2012-05-29 }}</ref>
 
Unfortunately, running kernel code to handle the interrupts incurs a minor loss of CPU cycles from the target program, diverts cache usage, and cannot distinguish the various tasks occurring in uninterruptible kernel code (microsecond-range activity) from user code. Dedicated hardware can go beyonddo thisbetter: ARM Cortex-M3 and some recent MIPS processors' JTAG interfaceinterfaces have a PCSAMPLE register, which samples the [[program counter]] in a truly undetectable manner, allowing non-intrusive collection of a flat profile.
In practice, sampling profilers can often provide a more accurate picture of the target program's execution than other approaches, as they are not as intrusive to the target program, and thus don't have as many side effects (such as on memory caches or instruction decoding pipelines). Also since they don't affect the execution speed as much, they can detect issues that would otherwise be hidden. They are also relatively immune to over-evaluating the cost of small, frequently called routines or 'tight' loops. They can show the relative amount of time spent in user mode versus interruptible kernel mode such as [[system call]] processing.
 
Some commonly used<ref>{{cite web| title=Popular C# Profilers| publisher=Gingtage| year=2014| url=http://www.ginktage.com/2014/10/popular-c-profilers/}}</ref> statistical profilers for Java/managed code are [[SmartBear Software]]'s [[AQtime]]<ref>{{cite web| work=AQTime 8 Reference| title=Sampling Profiler - Overview| publisher=SmartBear Software| year=2018| url=https://support.smartbear.com/viewarticle/54581/}}</ref> and [[Microsoft]]'s [[CLR Profiler]].<ref>{{cite web| work=Microsoft .NET Framework Unmanaged API Reference| last=Wenzal| first=Maira|display-authors=etal| title=Profiling Overview| publisher=Microsoft| year=2017| url=https://docs.microsoft.com/en-us/dotnet/framework/unmanaged-api/profiling/profiling-overview#supported-features}}</ref> Those profilers also support native code profiling, along with [[Apple Inc.]]'s [[Apple Developer Tools#Shark|Shark]] (OSX),<ref>{{cite web| work=[[Apple Developer Tools]]| title=Performance Tools| publisher=Apple, Inc.| year=2013| url=https://developer.apple.com/library/content/documentation/Performance/Conceptual/PerformanceOverview/PerformanceTools/PerformanceTools.html}}</ref> [[OProfile]] (Linux),<ref>{{cite web| work=[[IBM DeveloperWorks]]| last1=Netto| first1=Zanella| last2=Arnold| first2=Ryan S.| title=Evaluate performance for Linux on Power| year=2012| url=https://www.ibm.com/developerworks/linux/library/l-evaluatelinuxonpower/}}</ref> [[Intel]] [[VTune]] and Parallel Amplifier (part of [[Intel Parallel Studio]]), and [[Oracle Corporation|Oracle]] [[Performance Analyzer]],<ref>{{cite conference |last1=Schmidl |first1=Dirk |first2=Christian |last2=Terboven |first3=Dieter |last3=an Mey |first4=Matthias S. |last4=Müller |title=Suitability of Performance Tools for OpenMP Task-Parallel Programs |conference=Proc. 7th Int'l Workshop on Parallel Tools for High Performance Computing |year=2013 |pages=25–37 |isbn=9783319081441 |url=https://books.google.com/books?id=-I64BAAAQBAJ&pg=PA27}}</ref> among others.
Still, kernel code to handle the interrupts entails a minor loss of CPU cycles, diverted cache usage, and is unable to distinguish the various tasks occurring in uninterruptible kernel code (microsecond-range activity).
 
Dedicated hardware can go beyond this: ARM Cortex-M3 and some recent MIPS processors JTAG interface have a PCSAMPLE register, which samples the [[program counter]] in a truly undetectable manner, allowing non-intrusive collection of a flat profile.
 
Some commonly used<ref>{{cite web| title=Popular C# Profilers| publisher=Gingtage| year=2014| url=http://www.ginktage.com/2014/10/popular-c-profilers/}}</ref> statistical profilers for Java/managed code are [[SmartBear Software]]'s [[AQtime]]<ref>{{cite web| work=AQTime 8 Reference| title=Sampling Profiler - Overview| publisher=SmartBear Software| year=2018| url=https://support.smartbear.com/viewarticle/54581/}}</ref> and [[Microsoft]]'s [[CLR Profiler]].<ref>{{cite web| work=Microsoft .NET Framework Unmanaged API Reference| last=Wenzal| first=Maira|display-authors=etal| title=Profiling Overview| publisher=Microsoft| year=2017| url=https://docs.microsoft.com/en-us/dotnet/framework/unmanaged-api/profiling/profiling-overview#supported-features}}</ref> Those profilers also support native code profiling, along with [[Apple Inc.]]'s [[Apple Developer Tools#Shark|Shark]] (OSX),<ref>{{cite web| work=[[Apple Developer Tools]]| title=Performance Tools| publisher=Apple, Inc.| year=2013| url=https://developer.apple.com/library/content/documentation/Performance/Conceptual/PerformanceOverview/PerformanceTools/PerformanceTools.html}}</ref> [[OProfile]] (Linux),<ref>{{cite web| work=[[IBM DeveloperWorks]]| last1=Netto| first1=Zanella| last2=Arnold| first2=Ryan S.| title=Evaluate performance for Linux on Power| year=2012| url=https://www.ibm.com/developerworks/linux/library/l-evaluatelinuxonpower/}}</ref> [[Intel]] [[VTune]] and Parallel Amplifier (part of [[Intel Parallel Studio]]), and [[Oracle Corporation|Oracle]] [[Performance Analyzer]],<ref>{{cite conference |last1=Schmidl |first1=Dirk |first2=Christian |last2=Terboven |first3=Dieter |last3=an Mey |first4=Matthias S. |last4=Müller |title=Suitability of Performance Tools for OpenMP Task-Parallel Programs |conference=Proc. 7th Int'l Workshop on Parallel Tools for High Performance Computing |year=2013 |pages=25–37 |isbn=9783319081441 |url=https://books.google.com/books?id=-I64BAAAQBAJ&pg=PA27}}</ref> among others.
 
===Instrumentation ===
Line 95 ⟶ 92:
* '''Interpreter debug''' options can enable the collection of performance metrics as the interpreter encounters each target statement. A [[bytecode]], [[control table]] or [[Just-in-time compilation|JIT]] interpreters are three examples that usually have complete control over execution of the target code, thus enabling extremely comprehensive data collection opportunities.
 
===Hypervisor/Simulatorsimulator===
* '''Hypervisor''': Data are collected by running the (usually) unmodified program under a [[hypervisor]]. Example: [[SIMMON]]
* '''Simulator''' and '''Hypervisor''': Data collected interactively and selectively by running the unmodified program under an [[Instructioninstruction Setset Simulatorsimulator]].
 
==See also==
 
<!-- Please keep entries in alphabetical order & add a short description [[{{annotated link|WP:SEEALSO]]}} -->
{{div col|small=yes|colwidth=20em}}
* [[{{annotated link|Algorithmic efficiency]]}}
* [[{{annotated link|Benchmark (computing)|Benchmark]]}}
* [[{{annotated link|Java performance]]}}
* [[{{annotated link|List of performance analysis tools]]}}
* [[{{annotated link|Performance Application Programming Interface|PAPI]] is a portable interface (in the form of a library) to hardware performance counters on modern microprocessors.}}
* [[{{annotated link|Performance engineering]]}}
* [[{{annotated link|Performance prediction]]}}
* [[{{annotated link|Performance tuning]]}}
* [[{{annotated link|Runtime verification]]}}
* [[{{annotated link|Profile-guided optimization]]}}
* [[{{annotated link|Static code analysis]]}}
* [[{{annotated link|Software archaeology]]}}
* [[{{annotated link|Worst-case execution time]]}} (WCET)
{{div col end}}
<!-- please keep entries in alphabetical order -->