Measures are normally expressed as a function of the size of the input \scriptstyle {n}. The two most common measures are: •
Time: how long does the algorithm take to complete? •
Space: how much working memory (typically RAM) is needed by the algorithm? This has two aspects: the amount of memory needed by the code (auxiliary space usage), and the amount of memory needed for the data on which the code operates (intrinsic space usage). For computers whose power is supplied by a battery (e.g.
laptops and
smartphones), or for very long/large calculations (e.g.
supercomputers), other measures of interest are: •
Direct power consumption: power needed directly to operate the computer. •
Indirect power consumption: power needed for cooling, lighting, etc. , power consumption is growing as an important metric for computational tasks of all types and at all scales ranging from
embedded Internet of things devices to
system-on-chip devices to
server farms. This trend is often referred to as
green computing. Less common measures of computational efficiency may also be relevant in some cases: •
Transmission size: bandwidth could be a limiting factor.
Data compression can be used to reduce the amount of data to be transmitted. Displaying a picture or image (e.g.
Google logo) can result in transmitting tens of thousands of bytes (48K in this case) compared with transmitting six bytes for the text "Google". This is important for
I/O bound computing tasks. •
External space: space needed on a disk or other external memory device; this could be for temporary storage while the algorithm is being carried out, or it could be long-term storage needed to be carried forward for future reference. •
Response time (
latency): this is particularly relevant in a
real-time application when the computer system must
respond quickly to some external event. •
Total cost of ownership: particularly if a computer is dedicated to one particular algorithm.
Time Theory Analysis of algorithms, typically using concepts like
time complexity, can be used to get an estimate of the running time as a function of the size of the input data. The result is normally expressed using
Big O notation. This is useful for comparing algorithms, especially when a large amount of data is to be processed. More detailed estimates are needed to compare algorithm performance when the amount of data is small, although this is likely to be of less importance.
Parallel algorithms may be
more difficult to analyze.
Practice A
benchmark can be used to assess the performance of an algorithm in practice. Many programming languages have an available function which provides
CPU time usage. For long-running algorithms the elapsed time could also be of interest. Results should generally be averaged over several tests. Run-based profiling can be very sensitive to hardware configuration and the possibility of other programs or tasks running at the same time in a
multi-processing and
multi-programming environment. This sort of test also depends heavily on the selection of a particular programming language, compiler, and compiler options, so algorithms being compared must all be implemented under the same conditions.
Space This section is concerned with use of memory resources (
registers,
cache,
RAM,
virtual memory,
secondary memory) while the algorithm is being executed. As for time analysis above,
analyze the algorithm, typically using
space complexity analysis to get an estimate of the run-time memory needed as a function as the size of the input data. The result is normally expressed using
Big O notation. There are up to four aspects of memory usage to consider: • The amount of memory needed to hold the code for the algorithm. • The amount of memory needed for the
input data. • The amount of memory needed for any
output data. • Some algorithms, such as sorting, often
rearrange the input data and do not need any additional space for output data. This property is referred to as "
in-place" operation. • The amount of memory needed as working space during the calculation. • This includes
local variables and any
stack space needed by
routines called during a calculation; this stack space can be significant for algorithms which use
recursive techniques. Early electronic computers, and early home computers, had relatively small amounts of working memory. For example, the 1949
Electronic Delay Storage Automatic Calculator (EDSAC) had a maximum working memory of 1024 17-bit words, while the 1980 Sinclair
ZX80 came initially with 1024 8-bit bytes of working memory. In the late 2010s, it is typical for
personal computers to have between 4 and 32
GB of RAM, an increase of over 300 million times as much memory.
Caching and memory hierarchy Modern computers can have relatively large amounts of memory (possibly gigabytes), so having to squeeze an algorithm into a confined amount of memory is not the kind of problem it used to be. However, the different types of memory and their relative access speeds can be significant: •
Processor registers, are the fastest memory with the least amount of space. Most direct computation on modern computers occurs with source and destination operands in registers before being updated to the cache, main memory and virtual memory if needed. On a
processor core, there are typically on the order of hundreds of bytes or fewer of register availability, although a
register file may contain more physical registers than
architectural registers defined in the instruction set architecture. •
Cache memory is the second fastest, and second smallest, available in the memory hierarchy. Caches are present in processors such as CPUs or GPUs, where they are typically implemented in
static RAM, though they can also be found in peripherals such as disk drives. Processor caches often have their own
multi-level hierarchy; lower levels are larger, slower and typically
shared between
processor cores in
multi-core processors. In order to process operands in cache memory, a
processing unit must fetch the data from the cache, perform the operation in registers and write the data back to the cache. This operates at speeds comparable (about 2-10 times slower) with the CPU or GPU's
arithmetic logic unit or
floating-point unit if in the
L1 cache. It is about 10 times slower if there is an L1
cache miss and it must be retrieved from and written to the
L2 cache, and a further 10 times slower if there is an L2 cache miss and it must be retrieved from an
L3 cache, if present. •
Main physical memory is most often implemented in
dynamic RAM (DRAM). The main memory is much larger (typically
gigabytes compared to ≈8
megabytes) than an L3 CPU cache, with read and write latencies typically 10-100 times slower. , RAM is increasingly implemented
on-chip of processors, as CPU or
GPU memory. •
Paged memory, often used for
virtual memory management, is memory stored in
secondary storage such as a
hard disk, and is an extension to the
memory hierarchy which allows use of a potentially larger storage space, at the cost of much higher latency, typically around 1000 times slower than a
cache miss for a value in RAM. While originally motivated to create the impression of higher amounts of memory being available than were truly available, virtual memory is more important in contemporary usage for its
time-space tradeoff and enabling the usage of
virtual machines. Cache misses from main memory are called
page faults, and incur huge performance penalties on programs. An algorithm whose memory needs will fit in cache memory will be much faster than an algorithm which fits in main memory, which in turn will be very much faster than an algorithm which has to resort to paging. Because of this,
cache replacement policies are extremely important to high-performance computing, as are
cache-aware programming and
data alignment. To further complicate the issue, some systems have up to three levels of cache memory, with varying effective speeds. Different systems will have different amounts of these various types of memory, so the effect of algorithm memory needs can vary greatly from one system to another. In the early days of electronic computing, if an algorithm and its data would not fit in main memory then the algorithm could not be used. Nowadays the use of virtual memory appears to provide much more memory, but at the cost of performance. Much higher speed can be obtained if an algorithm and its data fit in cache memory; in this case minimizing space will also help minimize time. This is called the
principle of locality, and can be subdivided into
locality of reference,
spatial locality, and
temporal locality. An algorithm which will not fit completely in cache memory but which exhibits locality of reference may perform reasonably well. ==See also==