Profilers, which are also programs themselves, analyze target programs by collecting information on the target program's execution. Based on their data granularity, which depends upon how profilers collect information, they are classified as
event-based or
statistical profilers. Profilers interrupt program execution to collect information. Those interrupts can limit time measurement resolution, which implies that timing results should be taken with a grain of salt.
Basic block profilers report a number of machine
clock cycles devoted to executing each line of code, or timing based on adding those together; the timings reported per basic block may not reflect a difference between
cache hits and misses.
Event-based profilers Event-based profilers are available for the following programming languages: •
Java: the
JVMTI (JVM Tools Interface) API, formerly JVMPI (JVM Profiling Interface), provides hooks to profilers, for trapping events like calls, class-load, unload, thread enter leave. •
.NET: Can attach a profiling agent as a
COM server to the
CLR using Profiling
API. Like Java, the runtime then provides various callbacks into the agent, for trapping events like method
JIT / enter / leave, object creation, etc. Particularly powerful in that the profiling agent can rewrite the target application's bytecode in arbitrary ways. •
Python: Python profiling includes the profile module, hotshot (which is call-graph based), and using the 'sys.setprofile' function to trap events like c_{call,return,exception}, python_{call,return,exception}. •
Ruby: Ruby also uses a similar interface to Python for profiling. Flat-profiler in profile.rb, module, and ruby-prof a C-extension are present.
Statistical profilers These profilers operate by
sampling. A sampling profiler probes the target program's
call stack at regular intervals using
operating system interrupts. Sampling profiles are typically less numerically accurate and specific, providing only a statistical approximation, but allow the target program to run at near full speed. "The actual amount of error is usually more than one sampling period. In fact, if a value is n times the sampling period, the expected error in it is the square-root of n sampling periods." In practice, sampling profilers can often provide a more accurate picture of the target program's execution than other approaches, as they are not as intrusive to the target program and thus don't have as many side effects (such as on memory caches or instruction decoding pipelines). Also since they don't affect the execution speed as much, they can detect issues that would otherwise be hidden. They are also relatively immune to over-evaluating the cost of small, frequently called routines or 'tight' loops. They can show the relative amount of time spent in user mode versus interruptible kernel mode such as
system call processing. Unfortunately, running kernel code to handle the interrupts incurs a minor loss of CPU cycles from the target program, diverts cache usage, and cannot distinguish the various tasks occurring in uninterruptible kernel code (microsecond-range activity) from user code. Dedicated hardware can do better: ARM Cortex-M3 and some recent MIPS processors' JTAG interfaces have a PCSAMPLE register, which samples the
program counter in a truly undetectable manner, allowing non-intrusive collection of a flat profile. Some commonly used statistical profilers for Java/managed code are
SmartBear Software's
AQtime and
Microsoft's
CLR Profiler. Those profilers also support native code profiling, along with
Apple Inc.'s
Shark (OSX),
OProfile (Linux),
Intel VTune and Parallel Amplifier (part of
Intel Parallel Studio), and
Oracle Performance Analyzer, among others.
Instrumentation This technique effectively adds instructions to the target program to collect the required information. Note that
instrumenting a program can cause performance changes, and may in some cases lead to inaccurate results and/or
heisenbugs. The effect will depend on what information is being collected, on the level of timing details reported, and on whether basic block profiling is used in conjunction with instrumentation. For example, adding code to count every procedure/routine call will probably have less effect than counting how many times each statement is obeyed. A few computers have special hardware to collect information; in this case the impact on the program is minimal. Instrumentation is key to determining the level of control and amount of time resolution available to the profilers. •
Manual: Performed by the programmer, e.g. by adding instructions to explicitly calculate runtimes, simply count events or calls to measurement
APIs such as the
Application Response Measurement standard. •
Automatic source level: instrumentation added to the source code by an automatic tool according to an instrumentation policy. •
Intermediate language: instrumentation added to
assembly or decompiled
bytecodes giving support for multiple higher-level source languages and avoiding (non-symbolic) binary offset re-writing issues. •
Compiler assisted •
Binary translation: The tool adds instrumentation to a compiled
executable. •
Runtime instrumentation: Directly before execution the code is instrumented. The program run is fully supervised and controlled by the tool. •
Runtime injection: More lightweight than runtime instrumentation. Code is modified at runtime to have jumps to helper functions.
Interpreter instrumentation •
Interpreter debug options can enable the collection of performance metrics as the interpreter encounters each target statement. A
bytecode,
control table or
JIT interpreters are three examples that usually have complete control over execution of the target code, thus enabling extremely comprehensive data collection opportunities.
Hypervisor/simulator •
Hypervisor: Data are collected by running the (usually) unmodified program under a
hypervisor. Example:
SIMMON •
Simulator and
Hypervisor: Data collected interactively and selectively by running the unmodified program under an
instruction set simulator. ==See also==