In 2021,
Nature published a paper under the title “A graph‑placement methodology for fast chip design” co‑authored by 21 Google-affiliated researchers. The paper reported that an RL agent could generate macro placements for
integrated circuits "in under six hours" and achieve improvements over human-designed layouts in power, timing performance, and area (PPA), standard chip-quality metrics referring respectively to energy consumption, chip operating speed, and silicon footprint (evaluated after wire routing). It introduced a
sequential macro placement algorithm in which macros are placed one at a time instead of optimizing their locations concurrently. At each step, the algorithm selects a location for a single macro on a discretized chip canvas, conditioning its decision on the placements of previously placed macros. This
sequential formulation converts macro placement into a long-horizon
decision process in which early placement choices constrain later ones. After macro placement, force-directed placement is applied to place standard cells connected to the macros.
Deep reinforcement learning is used to train a
policy network to place macros by maximizing a reward that reflects final placement quality (for example, wirelength and congestion). Policy learning occurs during
self‑play for one or multiple circuit designs. Further placement optimizations refine the overall layout by balancing wirelength, density, and overlap constraints, while treating the macro locations produced by the RL policy as fixed obstacles. The approach relies on
pre-training, in which the RL model is first trained on a corpus of prior designs (twenty in the
Nature paper) to learn general placement patterns before being fine-tuned on a specific chip. Circuit examples used in the study were parts of proprietary Google
TPU designs, called
blocks (or
floorplan partitions). The paper reported results on five blocks and described the approach as generalizable across chip designs. ==Controversy==