Progressive GAN Progressive GAN The key architectural choice of StyleGAN-1 is a progressive growth mechanism, similar to Progressive GAN. Each generated image starts as a constant 4\times 4 \times 512 array, and repeatedly passed through style blocks. Each style block applies a "style latent vector" via affine transform ("adaptive instance normalization"), similar to how neural style transfer uses
Gramian matrix. It then adds noise, and normalize (subtract the mean, then divide by the variance). At training time, usually only one style latent vector is used per image generated, but sometimes two ("mixing regularization") in order to encourage each style block to independently perform its stylization without expecting help from other style blocks (since they might receive an entirely different style latent vector). After training, multiple style latent vectors can be fed into each style block. Those fed to the lower layers control the large-scale styles, and those fed to the higher layers control the fine-detail styles. Style-mixing between two images x, x' can be performed as well. First, run a
gradient descent to find z, z' such that G(z)\approx x, G(z')\approx x'. This is called "projecting an image back to style
latent space". Then, z can be fed to the lower style blocks, and z' to the higher style blocks, to generate a composite image that has the large-scale style of x, and the fine-detail style of x'. Multiple images can also be composed this way.
StyleGAN2 StyleGAN2 improves upon StyleGAN in two ways. One, it applies the style latent vector to transform the convolution layer's weights instead, thus solving the "blob" problem. The "blob" problem roughly speaking is because using the style latent vector to normalize the generated image destroys useful information. Consequently, the generator learned to create a "distraction" by a large blob, which absorbs most of the effect of normalization (somewhat similar to using flares to distract a
heat-seeking missile). Two, it uses residual connections, which helps it avoid the phenomenon where certain features are stuck at intervals of pixels. For example, the seam between two teeth may be stuck at pixels divisible by 32, because the generator learned to generate teeth during stage N-5, and consequently could only generate primitive teeth at that stage, before scaling up 5 times (thus intervals of 32). This was updated by the StyleGAN2-ADA ("ADA" stands for "adaptive"), which uses
invertible data augmentation. It also tunes the amount of data augmentation applied by starting at zero, and gradually increasing it until an "
overfitting heuristic" reaches a target level, thus the name "adaptive".
StyleGAN3 StyleGAN3 improves upon StyleGAN2 by solving the "texture sticking" problem, which can be seen in the official videos. They analyzed the problem by the
Nyquist–Shannon sampling theorem, and argued that the layers in the generator learned to exploit the high-frequency signal in the pixels they operate upon. To solve this, they proposed imposing strict
lowpass filters between each generator's layers, so that the generator is forced to operate on the pixels in a way
faithful to the continuous signals they represent, rather than operate on them as merely discrete signals. They further imposed rotational and translational invariance by using more
signal filters. The resulting StyleGAN-3 is able to generate images that rotate and translate smoothly, and without texture sticking. ==See also==