Profiling (computer programming)

"Profiling" redirects here. For the science of criminal psychological analysis, see offender profiling.

In software engineering, performance analysis (also known as dynamic program analysis) is the investigation of a program's behavior using information gathered as the program runs, as opposed to static code analysis. The usual goal of performance analysis is to determine which parts of a program to optimize for speed or memory usage.

A profiler is a performance analysis tool that measures the behavior of a program as it runs, particularly the frequency and duration of function calls. The output is a stream of recorded events (a trace) or a statistical summary of the events observed (a profile). Profilers use a wide variety of techniques to collect data, including hardware interrupts, code instrumentation, operating system hooks, and performance counters.

As the summation in a profile often is done related to the source code positions where the events happen, the size of measurement data is linear to the code size of the program. In contrast, the size of a trace is linear to the program's runtime, making it somewhat impractical. For sequential programs, a profile is usually enough, but performance problems in parallel programs (waiting for messages or synchronisation issues) often depend on the time relationship of events, thus requiring the full trace to get an understanding of the problem.

Program analysis tools are extremely important for understanding program behavior. Computer architects need such tools to evaluate how well programs will perform on new architectures. Software writers need tools to analyze their programs and identify critical pieces of code. Compiler writers often use such tools to find out how well their instruction scheduling or branch prediction algorithm is performing... (ATOM, PLDI, '94)

History

Profiler-driven program analysis dates back to 1982, with the publication of Gprof: a Call Graph Execution Profiler [1]. The paper outlined a system which later became the GNU profiler, also known as gprof.

In 1994, Amitabh Srivastava and Alan Eustace of Digital Equipment Corporation published a paper describing ATOM [2]. ATOM is a platform for converting a program into its own profiler. That is, at compile time, it inserts code into the program to be analyzed. That inserted code outputs analysis data. This technique, modifying a program to analyze itself, is known as "instrumentation".

In 2004, both the Gprof and ATOM papers appeared on the list of the 20 most influental PLDI papers of all time. [3]

Performance Analysis [[4]]

In most organizations the key data on performance, typically derived from order processing and invoicing, are likely to already be available on its computer databases. These should provide accurate sales data split by product and by region. They should also provide these in a timely manner on a computer terminal. It should be recognized, however, that such systems are driven by accounting requirements, and in particular by accounting periods; they will often reflect an unbalanced picture until the month-end procedures have been completed.

In the case of non-profit organizations it is just as important to keep track of the clients (recipients, donors, patients, customers and so on), as well as the transactions related to them.

If the computer systems have been designed to cope with the level of detail needed, performance figures should be available down to individual customers or clients. On the other hand, this potentially poses the problem of `information overload'. There will be so much information, most of it redundant, that it will effectively be useless as a management tool.

There are a number of possible answers to this potential torrent of data:

ABC analysis

Typically the reports are sorted in terms of volume (or value) of sales, so that the customers are ranked in order of their sales offtake; with the highest-volume (and hence most `important') customers at the top of the list and the many low-volume customers at the bottom (since it matters less if they are not taken into account in decisions).

The 80:20 Rule says that the top 20 per cent of customers on such a list are likely to account for 80 per cent of total sales; so this approach can, in effect, be used to reduce the data to be examined by a factor of five.

Variance analysis

In this approach performance criteria (typically budgets or targets) are set, against which each of the products or customers are subsequently monitored. If their performance falls outside the expected range this is high-lighted. This means that only those items where there are `variances' need be reviewed.

However, the variances are only as good as the criteria (usually the budgets) set; and setting these is, in practice, a major task. This is particularly problematical where parameters change with time, so this approach is often only used (if at all) on the 20 per cent of most important items.

Ad hoc database enquiries and reports

If the basic data is suitably organized, on a computer database, it may be possible to access it from terminals across the organization. The abstracted data can then be processed from a variety of perspectives. This means that ad hoc reports or enquiries may be easily prepared. Unfortunately, few organizations even now have their performance data structured in such a way that it can be used for analysis in more than a very limited fashion.

Regrettably, though, many of the key measures may not have been recorded. The data collected by the average system are driven by accounting needs and record only those transactions which result in the actual completion of a sale. It will be a very unusual system if it records details of sales lost, for example because the item wanted was out of stock or did not quite meet the specification required. Such information 'may' be available, typically to those taking the orders, but it is usually discarded as soon as it is obvious that a sale is not to be made; yet an analysis of such lost orders can be another invaluable input to marketing planning.

Methods of data gathering

Statistical profilers

Some profilers operate by sampling. A sampling profiler probes the target program's program counter at regular intervals using operating system interrupts. Sampling profiles are typically less accurate and specific, but allow the target program to run at near full speed.

Some profilers instrument the target program with additional instructions to collect the required information. Instrumenting the program can cause changes in the performance of the program, causing inaccurate results and heisenbugs. Instrumenting can potentially be very specific but slows down the target program as more specific information is collected.

The resulting data are not exact, but a statistical approximation. The actual amount of error is usually more than one sampling period. In fact, if a value is n times the sampling period, the expected error in it is the square-root of n sampling periods. [5]

Two of the most commonly used statistical profilers are GNU's gprof and SGI's Pixie.

Instrumentation

Manual. Done by the programmer, e.g. by adding instructions to explicitly calculate runtimes.
Compiler assisted. Example: "gcc -pg ..." for gprof.
Binary translation; the tool adds instrumentation to a compiled binary. Example: ATOM
Runtime instrumentation: Directly before execution the code is instrumented. The program run is fully supervised and controlled by the tool. Examples: PIN, Valgrind
Runtime injection: More lightweight than runtime instrumentation. Code is modified at runtime to have jumps to helper functions. Example: DynInst

External links

gprof The GNU Profiler, part of GNU Binutils (which are part of the GNU project); you can use some visualisation tools called VCG tools and combine both of them using Call Graph Drawing Interface (CGDI); a second solution is kprof. More for C/C++ but works well for other languages.
FunctionCheck, @ sourceforge.net is a profiler that was created "because the well known profiler gprof have some limitations". using GCC -finstrument-functions option. kprof is a front-end. For C++/C.
Valgrind is a GPL'd system for debugging and profiling x86-Linux programs. You can automatically detect many memory management and threading bugs. alleyoop is a front-end for valgrind. It works for any language and the assembler.
VTune is Intel's family of commercial performance analyzers for Windows and Linux executables on Intel CPUs. It has command-line tools, a standalone environment and plugins for Microsoft Visual Studio and Eclipse.
Shark is Apple's free performance analyzer for Macintosh executables.
PurifyPlus is a commercial family of performance analysis tools from IBM's Rational unit. For Linux, UNIX and Windows.
CodeAnalyst is AMD's free performance analyzer for Windows programs on AMD hardware. AMD also has a Linux version of CodeAnalyst.
OProfile statistical, kernel based GPL profiler for Linux
Profiler for use with Digital Mars C, C++ and D compilers.
JRat Java Runtime Analysis Toolkit a LGPL profiler
CLR Profiler is a free CLR profiler provided by Microsoft for CLR applications.
Performance Analyzer included with Sun Studio (now free!)
YourKit a profiler for Java and .NET framework.
JProbe, a profiler by Quest Software that is now part of the JProbe suite which also includes tools such as a memory debugger.
Article "Need for speed -- Eliminating performance bottlenecks" on doing execution time analysis of Java applications using IBM Rational Application Developer.
Tutorial on the use of Oprofile