The primary learning algorithm in AlphaDev is an extension of
AlphaZero.
Encoding assembly programming into a game In order to use AlphaZero on assembly programming, the authors created a
Transformer-based vector representation of
assembly programs designed to capture their underlying structure. This finite representation allows a neural network to play assembly programming like a game with finitely many possible moves (like Go), The representation uses the following components: • A Transformer network, to encode assembly
opcodes are converted to one-hot encodings and concatenated to form the raw input sequence. • A
multilayer perceptron network, which encodes the "CPU state", that is, the states of each register and memory location for a given set of inputs,
Playing the game The game
state is the assembly program generated up to a given point. The game
move is an extra instruction appended to the current assembly program. The game's
reward is a function of the assembly program's correctness and latency. To reduce cost, AlphaDev only computes actual measured latency on less than 0.002% of generated programs, as it does not evaluate latency during the search process. Instead, it uses two functions that
estimate the correctness and latency by being trained via supervised learning using the real measured correctness and latency values. == Result ==