Generative adversarial network

A generative adversarial network (GAN) is a class of machine learning frameworks and a prominent framework for approaching generative artificial intelligence. The concept was initially developed by Ian Goodfellow and his colleagues in June 2014. In a GAN, two neural networks compete with each other in the form of a zero-sum game, where one agent's gain is another agent's loss.

Definition

Mathematical The original GAN is defined as the following game: A known dataset serves as the initial training data for the discriminator. Training involves presenting it with samples from the training dataset until it achieves acceptable accuracy. The generator is trained based on whether it succeeds in fooling the discriminator. Typically, the generator is seeded with randomized input that is sampled from a predefined latent space (e.g. a multivariate normal distribution). Thereafter, candidates synthesized by the generator are evaluated by the discriminator. Independent backpropagation procedures are applied to both networks so that the generator produces better samples, while the discriminator becomes more skilled at flagging synthetic samples. When used for image generation, the generator is typically a deconvolutional neural network, and the discriminator is a convolutional neural network. Relation to other statistical machine learning methods GANs are implicit generative models, which means that they do not explicitly model the likelihood function nor provide a means for finding the latent variable corresponding to a given sample, unlike alternatives such as flow-based generative model. File:Types of deep generative models.png|thumb|Main types of deep generative models that perform maximum likelihood estimation == Mathematical properties ==

Mathematical properties

Measure-theoretic considerations This section provides some of the mathematical theory behind these methods. In modern probability theory based on measure theory, a probability space also needs to be equipped with a σ-algebra. As a result, a more rigorous definition of the GAN game would make the following changes:Each probability space (\Omega, \mathcal B, \mu_{\text{ref}}) defines a GAN game. The generator's strategy set is \mathcal P(\Omega, \mathcal B), the set of all probability measures \mu_G on the measure-space (\Omega, \mathcal B). The discriminator's strategy set is the set of Markov kernels \mu_D: (\Omega, \mathcal B) \to \mathcal P([0, 1], \mathcal B([0, 1])), where \mathcal B([0, 1]) is the Borel σ-algebra on [0, 1].Since issues of measurability never arise in practice, these will not concern us further. Choice of the strategy set In the most generic version of the GAN game described above, the strategy set for the discriminator contains all Markov kernels \mu_D: \Omega \to {\mathcal {P}}[0,1], and the strategy set for the generator contains arbitrary probability distributions \mu_G on \Omega. However, as shown below, the optimal discriminator strategy against any \mu_G is deterministic, so there is no loss of generality in restricting the discriminator's strategies to deterministic functions D:\Omega \to [0, 1]. In most applications, D is a deep neural network function. As for the generator, while \mu_G could theoretically be any computable probability distribution, in practice, it is usually implemented as a pushforward: \mu_G = \mu_Z \circ G^{-1}. That is, start with a random variable z \sim \mu_Z, where \mu_Z is a probability distribution that is easy to compute (such as the uniform distribution, or the Gaussian distribution), then define a function G: \Omega_Z \to \Omega. Then the distribution \mu_G is the distribution of G(z). Consequently, the generator's strategy is usually defined as just G, leaving z \sim \mu_Z implicit. In this formalism, the GAN game objective isL(G, D) := \operatorname E_{x\sim \mu_{\text{ref}}}[\ln D(x)] + \operatorname E_{z\sim \mu_Z}[\ln (1-D(G(z)))]. Generative reparametrization The GAN architecture has two main components. One is casting optimization into a game, of form \min_G \max_D L(G, D), which is different from the usual kind of optimization, of form \min_\theta L(\theta). The other is the decomposition of \mu_G into \mu_Z \circ G^{-1}, which can be understood as a reparametrization trick. To see its significance, one must compare GAN with previous methods for learning generative models, which were plagued with "intractable probabilistic computations that arise in maximum likelihood estimation and related strategies". and Rezende et al. developed the same idea of reparametrization into a general stochastic backpropagation method. Among its first applications was the variational autoencoder. Move order and strategic equilibria In the original paper, as well as most subsequent papers, it is usually assumed that the generator moves first, and the discriminator moves second, thus giving the following minimax game:\min_{\mu_G}\max_{\mu_D} L(\mu_G, \mu_D) := \operatorname E_{x\sim \mu_{\text{ref}}, y\sim \mu_D(x)}[\ln y] + \operatorname E_{x\sim \mu_G, y\sim \mu_D(x)}[\ln (1-y)]. If both the generator's and the discriminator's strategy sets are spanned by a finite number of strategies, then by the minimax theorem,\min_{\mu_G}\max_{\mu_D} L(\mu_G, \mu_D)= \max_{\mu_D}\min_{\mu_G} L(\mu_G, \mu_D)that is, the move order does not matter. However, since the strategy sets are both not finitely spanned, the minimax theorem does not apply, and the idea of an "equilibrium" becomes delicate. To wit, there are the following different concepts of equilibrium: • Equilibrium when generator moves first, and discriminator moves second:\hat \mu_G \in \arg\min_{\mu_G}\max_{\mu_D} L(\mu_G,\mu_D),\quad \hat \mu_D \in \arg\max_{\mu_D} L(\hat\mu_G, \mu_D), \quad • Equilibrium when discriminator moves first, and generator moves second:\hat \mu_D \in \arg\max_{\mu_D}\min_{\mu_G} L(\mu_G, \mu_D), \quad \hat \mu_G \in \arg\min_{\mu_G} L(\mu_G,\hat \mu_D), • Nash equilibrium (\hat \mu_D, \hat\mu_G) , which is stable under simultaneous move order:\hat \mu_D \in \arg\max_{\mu_D} L(\hat\mu_G, \mu_D), \quad \hat \mu_G \in \arg\min_{\mu_G} L(\mu_G, \hat\mu_D) For general games, these equilibria do not have to agree, or even to exist. For the original GAN game, these equilibria all exist, and are all equal. However, for more general GAN games, these do not necessarily exist, or agree. Main theorems for GAN game The original GAN paper proved the following two theorems:{{Math theorem \begin{align} D^*(x) &= \frac{d\mu_{\text{ref}}}{d(\mu_{\text{ref}} + \mu_G)}\\[6pt] L(\mu_G, D^*) &= 2D_{JS}(\mu_{\text{ref}}; \mu_G) - 2\ln 2 \end{align} where the derivative is the Radon–Nikodym derivative, and D_{JS} is the Jensen–Shannon divergence. }}{{Math proof|proof= By Jensen's inequality, \operatorname E_{x\sim \mu_{\text{ref}}, y\sim \mu_D(x)}[\ln y] \leq \operatorname E_{x\sim \mu_{\text{ref}}}[\ln \operatorname E_{y\sim \mu_D(x)}[y and similarly for the other term. Therefore, the optimal reply can be deterministic, i.e. \mu_D(x) = \delta_{D(x)} for some function D: \Omega \to [0, 1], in which case L(\mu_G, \mu_D) := \operatorname E_{x\sim \mu_{\text{ref}}}[\ln D(x)] + \operatorname E_{x\sim \mu_{G}}[\ln (1-D(x))]. To define suitable density functions, we define a base measure \mu := \mu_{\text{ref}} + \mu_G, which allows us to take the Radon–Nikodym derivatives \rho_{\text{ref}} = \frac{d\mu_{\text{ref}}}{d\mu} \quad \rho_{G} = \frac{d\mu_{G}}{d\mu} with \rho_{\text{ref}} + \rho_G = 1. We then have L(\mu_G, \mu_D) := \int \mu(dx) \left[\rho_{\text{ref}}(x) \ln(D(x)) + \rho_G(x) \ln(1-D(x))\right]. The integrand is just the negative cross-entropy between two Bernoulli random variables with parameters \rho_{\text{ref}}(x) and D(x). We can write this as -H(\rho_{\text{ref}}(x))-D_{KL}(\rho_{\text{ref}}(x) \parallel D(x)), where H is the binary entropy function, so L(\mu_G, \mu_D) = -\int \mu(dx) (H(\rho_{\text{ref}}(x)) + D_{KL}(\rho_{\text{ref}}(x) \parallel D(x))). This means that the optimal strategy for the discriminator is D(x) = \rho_{\text{ref}}(x), with L(\mu_G, \mu_D^*) = -\int \mu(dx) H(\rho_{\text{ref}}(x)) = D_{JS}(\mu_{\text{ref}} \parallel \mu_G) -2 \ln 2 after routine calculation. }} Interpretation: For any fixed generator strategy \mu_G, the optimal discriminator keeps track of the likelihood ratio between the reference distribution and the generator distribution:\frac{D(x)}{1-D(x)} = \frac{d\mu_{\text{ref}}}{d\mu_G}(x) = \frac{\mu_{\text{ref}}(dx)}{\mu_G(dx)}; \quad D(x) = \sigma(\ln\mu_{\text{ref}}(dx) - \ln\mu_G(dx))where \sigma is the logistic function. In particular, if the prior probability for an image x to come from the reference distribution is equal to \frac 12, then D(x) is just the posterior probability that x came from the reference distribution:D(x) = \Pr(x \text{ came from reference distribution} \mid x). {{Math theorem \begin{align} & L(\hat\mu_G, \hat\mu_D) = \min_{\mu_G}\max_{\mu_D} L(\mu_G, \mu_D) =& \max_{\mu_D}\min_{\mu_G} L(\mu_G, \mu_D) = -2\ln 2\\[6pt] & \hat \mu_D \in \arg\max_{\mu_D}\min_{\mu_G} L(\mu_G, \mu_D), &\quad \hat \mu_G \in \arg\min_{\mu_G}\max_{\mu_D} L(\mu_G, \mu_D)\\[6pt] & \hat \mu_D \in \arg\max_{\mu_D} L(\hat\mu_G, \mu_D), &\quad \hat \mu_G \in \arg\min_{\mu_G} L(\mu_G, \hat\mu_D)\\[6pt] & \forall x\in \Omega, \hat \mu_D(x) = \delta_{\frac 1 2}, &\quad\hat \mu_G = \mu_{\text{ref}} \end{align} That is, the generator perfectly mimics the reference, and the discriminator outputs \frac 12 deterministically on all inputs. }} {{Math proof|proof= From the previous proposition, \arg\min_{\mu_G}\max_{\mu_D} L(\mu_G, \mu_D)= \mu_{\text{ref}}; \quad \min_{\mu_G}\max_{\mu_D} L(\mu_G, \mu_D) = -2\ln 2. For any fixed discriminator strategy \mu_D, any \mu_G concentrated on the set \{ x \mid \operatorname E_{y\sim\mu_D(x)}[\ln(1-y)] = \inf_x \operatorname E_{y\sim \mu_D(x)}[\ln(1-y)] \} is an optimal strategy for the generator. Thus, \arg\max_{\mu_D}\min_{\mu_G} L(\mu_G, \mu_D) = \arg\max_{\mu_D} \operatorname E_{x\sim \mu_{\text{ref}}, y\sim \mu_D(x)}[\ln y] + \inf_x \operatorname E_{y\sim \mu_D(x)}[\ln (1-y)]. By Jensen's inequality, the discriminator can only improve by adopting the deterministic strategy of always playing D(x) = \operatorname E_{y\sim \mu_D(x)}[y]. Therefore, \arg\max_{\mu_D}\min_{\mu_G} L(\mu_G, \mu_D) = \arg\max_{D} \operatorname E_{x\sim \mu_{\text{ref}}}[\ln D(x)] + \inf_x \ln (1-D(x)) By Jensen's inequality, \begin{align} & \ln \operatorname E_{x\sim \mu_{\text{ref}}}[ D(x)] + \inf_x \ln (1-D(x)) \\[6pt] = {} & \ln \operatorname E_{x\sim \mu_{\text{ref}}}[ D(x)] + \ln (1-\sup_x D(x)) \\[6pt] = {} & \ln [\operatorname E_{x\sim \mu_{\text{ref}}}[ D(x)] (1-\sup_x D(x))] \leq \ln [ \sup_x D(x)) (1-\sup_x D(x))] \leq \ln\frac 14, \end{align} with equality if D(x) = \frac 12, so \forall x\in \Omega, \hat \mu_D(x) = \delta_{\frac 1 2}; \quad \max_{\mu_D}\min_{\mu_G} L(\mu_G, \mu_D) = -2\ln 2. Finally, to check that this is a Nash equilibrium, note that when \mu_G = \mu_{\text{ref}}, we have L(\mu_G, \mu_D) := \operatorname E_{x\sim \mu_{\text{ref}}, y\sim \mu_D(x)}[\ln (y (1-y))] which is always maximized by y = \frac 12. When \forall x\in \Omega, \mu_D(x) = \delta_{\frac 1 2}, any strategy is optimal for the generator.}} == Training and evaluating GAN ==

Training and evaluating GAN

Training Unstable convergence While the GAN game has a unique global equilibrium point when both the generator and discriminator have access to their entire strategy sets, the equilibrium is no longer guaranteed when they have a restricted strategy set. Further, even if an equilibrium still exists, it can only be found by searching in the high-dimensional space of all possible neural network functions. The standard strategy of using gradient descent to find the equilibrium often does not work for GAN, and often the game "collapses" into one of several failure modes. To improve the convergence stability, some training strategies start with an easier task, such as generating low-resolution images and gradually increase the difficulty of the task during training. This essentially translates to applying a curriculum learning scheme. Mode collapse GANs often suffer from mode collapse where they fail to generalize properly, missing entire modes from the input data. For example, a GAN trained on the MNIST dataset containing many samples of each digit might only generate pictures of digit 0. This was termed "the Helvetica scenario". Even the state-of-the-art architecture, BigGAN (2019), could not avoid mode collapse. The authors resorted to "allowing collapse to occur at the later stages of training, by which time a model is sufficiently trained to achieve good results". They further show that this property extends to the use of the Adam optimizer, which is commonly used in stochastic gradient descent. It is important to note, however, that a local Nash equilibrium in no way signifies an absence of mode collapse - for instance, a GAN trained on MNIST collapsed to generating a single digit may satisfy the hypotheses of the paper, while still presenting mode collapse. Vanishing gradient Conversely, if the discriminator learns too fast compared to the generator, then the discriminator could almost perfectly distinguish \mu_{G_\theta}, \mu_{\text{ref}}. In such case, the generator G_\theta could be stuck with a very high loss no matter which direction it changes its \theta, meaning that the gradient \nabla_\theta L(G_\theta, D_\zeta) would be close to zero. In such case, the generator cannot learn, a case of the vanishing gradient problem. Other evaluation methods are reviewed in. == Variants ==

Variants

There is a veritable zoo of GAN variants. Some of the most prominent are as follows: Conditional GAN Conditional GANs are similar to standard GANs except they allow the model to conditionally generate samples based on additional information. For example, if we want to generate a cat face given a dog picture, we could use a conditional GAN. The generator in a GAN game generates \mu_G, a probability distribution on the probability space \Omega. This leads to the idea of a conditional GAN, where instead of generating one probability distribution on \Omega, the generator generates a different probability distribution \mu_G(c) on \Omega, for each given class label c. For example, for generating images that look like ImageNet, the generator should be able to generate a picture of cat when given the class label "cat". In the original paper, GANs with alternative architectures The GAN game is a general framework and can be run with any reasonable parametrization of the generator G and discriminator D. In the original paper, the authors demonstrated it using multilayer perceptron networks and convolutional neural networks. Many alternative architectures have been tried. Deep convolutional GAN (DCGAN): For both generator and discriminator, uses only deep networks consisting entirely of convolution-deconvolution layers, that is, fully convolutional networks. Self-attention GAN (SAGAN): Starts with the DCGAN, then adds residually-connected standard self-attention modules to the generator and discriminator. Variational autoencoder GAN (VAEGAN): Uses a variational autoencoder (VAE) for the generator. Transformer GAN (TransGAN): Uses the pure transformer architecture for both the generator and discriminator, entirely devoid of convolution-deconvolution layers. Flow-GAN: Uses flow-based generative model for the generator, allowing efficient computation of the likelihood function. GANs with alternative objectives Many GAN variants are merely obtained by changing the loss functions for the generator and discriminator. Original GAN: We recast the original GAN objective into a form more convenient for comparison:\begin{cases} \min_D L_D(D, \mu_G) = -\operatorname E_{x\sim \mu_{G}}[\ln D(x)] - \operatorname E_{x\sim \mu_{\text{ref}}}[\ln (1-D(x))]\\ \min_G L_G(D, \mu_G) = -\operatorname E_{x\sim \mu_G}[\ln (1-D(x))] \end{cases} Original GAN, non-saturating loss: This objective for generator was recommended in the original paper for faster convergence. Original GAN, maximum likelihood: L_G = \operatorname E_{x\sim \mu_G}[({\exp} \circ \sigma^{-1} \circ D) (x)]where \sigma is the logistic function. When the discriminator is optimal, the generator gradient is the same as in maximum likelihood estimation, even though GAN cannot perform maximum likelihood estimation itself. Hinge loss GAN: L_D = -\operatorname E_{x\sim p_{\text{ref}}}\left[\min\left(0, -1 + D(x)\right)\right] -\operatorname E_{x\sim\mu_G}\left[\min\left(0, -1 - D\left(x\right)\right)\right] L_G = -\operatorname E_{x\sim \mu_G} [D(x)] Least squares GAN:L_D = \operatorname E_{x\sim \mu_{\text{ref}}}[(D(x)-b)^2] + \operatorname E_{x\sim \mu_G}[(D(x)-a)^2]L_G = \operatorname E_{x\sim \mu_G}[(D(x)-c)^2]where a, b, c are parameters to be chosen. The authors recommended a = -1, b = 1, c = 0. Wasserstein GAN (WGAN) The Wasserstein GAN modifies the GAN game at two points: • The discriminator's strategy set is the set of measurable functions of type D: \Omega \to \R with bounded Lipschitz norm: \|D\|_L \leq K , where K is a fixed positive constant. • The objective isL_{WGAN}(\mu_G, D) := \operatorname E_{x\sim \mu_G}[D(x)] -\mathbb E_{x\sim \mu_{\text{ref}}}[D(x)] One of its purposes is to solve the problem of mode collapse (see above). is more autoencoder than GAN. The idea is to start with a plain autoencoder, but train a discriminator to discriminate the latent vectors from a reference distribution (often the normal distribution). InfoGAN In conditional GAN, the generator receives both a noise vector z and a label c, and produces an image G(z, c). The discriminator receives image-label pairs (x, c), and computes D(x, c). When the training dataset is unlabeled, conditional GAN does not work directly. The idea of InfoGAN is to decree that every latent vector in the latent space can be decomposed as (z, c): an incompressible noise part z, and an informative label part c, and encourage the generator to comply with the decree, by encouraging it to maximize I(c, G(z, c)), the mutual information between c and G(z, c), while making no demands on the mutual information z between G(z, c). Unfortunately, I(c, G(z, c)) is intractable in general, The key idea of InfoGAN is Variational Mutual Information Maximization: indirectly maximize it by maximizing a lower bound {\hat {I}}(G,Q)=\mathbb {E} _{z\sim \mu_Z, c\sim \mu _{C}}[\ln Q(c\mid G(z,c))]; \quad I(c, G(z, c)) \geq \sup_Q \hat I(G, Q)where Q ranges over all Markov kernels of type Q: \Omega_Y \to \mathcal P(\Omega_C). The InfoGAN game is defined as follows:Three probability spaces define an InfoGAN game: • (\Omega_X, \mu_{\text{ref}}), the space of reference images. • (\Omega_Z, \mu_Z), the fixed random noise generator. • (\Omega_C, \mu_C), the fixed random information generator. There are 3 players in 2 teams: generator, Q, and discriminator. The generator and Q are on one team, and the discriminator on the other team. The objective function isL(G, Q, D) = L_{GAN}(G, D) - \lambda \hat I(G, Q)where L_{GAN}(G, D) = \operatorname E_{x\sim \mu_{\text{ref}}, }[\ln D(x)] + \operatorname E_{z\sim \mu_Z}[\ln (1-D(G(z, c)))] is the original GAN game objective, and \hat I(G, Q) = \mathbb E_{z\sim\mu_Z, c\sim\mu_C}[\ln Q(c \mid G(z, c))] Generator-Q team aims to minimize the objective, and discriminator aims to maximize it:\min_{G, Q} \max_D L(G, Q, D) Bidirectional GAN (BiGAN) The standard GAN generator is a function of type G: \Omega_Z\to \Omega_X, that is, it is a mapping from a latent space \Omega_Z to the image space \Omega_X. This can be understood as a "decoding" process, whereby every latent vector z\in \Omega_Z is a code for an image x\in \Omega_X, and the generator performs the decoding. This naturally leads to the idea of training another network that performs "encoding", creating an autoencoder out of the encoder-generator pair. Already in the original paper, The BiGAN is defined as follows: Two probability spaces define a BiGAN game: • (\Omega_X, \mu_{X}), the space of reference images. • (\Omega_Z, \mu_Z), the latent space. There are 3 players in 2 teams: generator, encoder, and discriminator. The generator and encoder are on one team, and the discriminator on the other team. The generator's strategies are functions G:\Omega_Z \to \Omega_X, and the encoder's strategies are functions E:\Omega_X \to \Omega_Z. The discriminator's strategies are functions D:\Omega_X \to [0, 1]. The objective function isL(G, E, D) = \mathbb E_{x\sim \mu_X}[\ln D(x, E(x))] + \mathbb E_{z\sim \mu_Z}[\ln (1-D(G(z), z))] Generator-encoder team aims to minimize the objective, and discriminator aims to maximize it:\min_{G, E} \max_D L(G, E, D) In the paper, they gave a more abstract definition of the objective as:L(G, E, D) = \mathbb E_{(x, z)\sim \mu_{E, X}}[\ln D(x, z)] + \mathbb E_{(x, z)\sim \mu_{G, Z}}[\ln (1-D(x, z))]where \mu_{E, X}(dx, dz) = \mu_X(dx) \cdot \delta_{E(x)}(dz) is the probability distribution on \Omega_X\times \Omega_Z obtained by pushing \mu_X forward via x \mapsto (x, E(x)), and \mu_{G, Z}(dx, dz) = \delta_{G(z)}(dx)\cdot \mu_Z(dz) is the probability distribution on \Omega_X\times \Omega_Z obtained by pushing \mu_Z forward via z \mapsto (G(x), z). Applications of bidirectional models include semi-supervised learning, interpretable machine learning, and neural machine translation. CycleGAN CycleGAN is an architecture for performing translations between two domains, such as between photos of horses and photos of zebras, or photos of night cities and photos of day cities. The CycleGAN game is defined as follows:There are two probability spaces (\Omega_X, \mu_X), (\Omega_Y, \mu_Y), corresponding to the two domains needed for translations fore-and-back. There are 4 players in 2 teams: generators G_X: \Omega_X \to \Omega_Y, G_Y: \Omega_Y \to \Omega_X, and discriminators D_X: \Omega_X\to [0, 1], D_Y:\Omega_Y\to [0, 1]. The objective function isL(G_X, G_Y, D_X, D_Y) = L_{GAN}(G_X, D_X) +L_{GAN}(G_Y, D_Y) + \lambda L_{cycle}(G_X, G_Y) where \lambda is a positive adjustable parameter, L_{GAN} is the GAN game objective, and L_{cycle} is the cycle consistency loss:L_{cycle}(G_X, G_Y) = E_{x\sim \mu_X} \|G_X(G_Y(x)) - x\| + E_{y\sim \mu_Y} \|G_Y(G_X(y)) - y\|The generators aim to minimize the objective, and the discriminators aim to maximize it:\min_{G_X, G_Y} \max_{D_X, D_Y} L(G_X, G_Y, D_X, D_Y) Unlike previous work like pix2pix, which requires paired training data, cycleGAN requires no paired data. For example, to train a pix2pix model to turn a summer scenery photo to winter scenery photo and back, the dataset must contain pairs of the same place in summer and winter, shot at the same angle; cycleGAN would only need a set of summer scenery photos, and an unrelated set of winter scenery photos. GANs with particularly large or small scales BigGAN The BigGAN is essentially a self-attention GAN trained on a large scale (up to 80 million parameters) to generate large images of ImageNet (up to 512 x 512 resolution), with numerous engineering tricks to make it converge. Invertible data augmentation When there is insufficient training data, the reference distribution \mu_{\text{ref}} cannot be well-approximated by the empirical distribution given by the training dataset. In such cases, data augmentation can be applied, to allow training GAN on smaller datasets. Naïve data augmentation, however, brings its problems. Consider the original GAN game, slightly reformulated as follows:\begin{cases} \min_D L_D(D, \mu_G) = -\operatorname E_{x\sim \mu_{\text{ref}}}[\ln D(x)] - \operatorname E_{x\sim \mu_G}[\ln (1-D(x))]\\ \min_G L_G(D, \mu_G) = -\operatorname E_{x\sim \mu_G}[\ln (1-D(x))] \end{cases}Now we use data augmentation by randomly sampling semantic-preserving transforms T: \Omega \to \Omega and applying them to the dataset, to obtain the reformulated GAN game:\begin{cases} \min_D L_D(D, \mu_G) = -\operatorname E_{x\sim \mu_{\text{ref}}, T\sim \mu_\text{trans}}[\ln D(T(x))] - \operatorname E_{x\sim \mu_G}[\ln (1-D(x))]\\ \min_G L_G(D, \mu_G) = -\operatorname E_{x\sim \mu_G}[\ln (1-D(x))] \end{cases}This is equivalent to a GAN game with a different distribution \mu_{\text{ref}}', sampled by T(x), with x\sim \mu_{\text{ref}}, T\sim \mu_\text{trans}. For example, if \mu_{\text{ref}} is the distribution of images in ImageNet, and \mu_\text{trans} samples identity-transform with probability 0.5, and horizontal-reflection with probability 0.5, then \mu_{\text{ref}}' is the distribution of images in ImageNet and horizontally-reflected ImageNet, combined. The result of such training would be a generator that mimics \mu_{\text{ref}}'. For example, it would generate images that look like they are randomly cropped, if the data augmentation uses random cropping. The solution is to apply data augmentation to both generated and real images:\begin{cases} \min_D L_D(D, \mu_G) = -\operatorname E_{x\sim \mu_{\text{ref}}, T\sim \mu_\text{trans}}[\ln D(T(x))] - \operatorname E_{x\sim \mu_G, T\sim \mu_\text{trans}}[\ln (1-D(T(x)))]\\ \min_G L_G(D, \mu_G) = -\operatorname E_{x\sim \mu_G, T\sim \mu_\text{trans}}[\ln (1-D(T(x)))] \end{cases}The authors demonstrated high-quality generation using just 100-picture-large datasets. The StyleGAN-2-ADA paper points out a further point on data augmentation: it must be invertible. StyleGAN series The StyleGAN family is a series of architectures published by Nvidia's research division. Progressive GAN Progressive GAN is a method for training GAN for large-scale image generation stably, by growing a GAN generator from small to large scale in a pyramidal fashion. Like SinGAN, it decomposes the generator asG = G_1 \circ G_2 \circ \cdots \circ G_N, and the discriminator as D = D_1 \circ D_2 \circ \cdots \circ D_N. During training, at first only G_N, D_N are used in a GAN game to generate 4x4 images. Then G_{N-1}, D_{N-1} are added to reach the second stage of GAN game, to generate 8x8 images, and so on, until we reach a GAN game to generate 1024x1024 images. To avoid shock between stages of the GAN game, each new layer is "blended in" (Figure 2 of the paper The key architectural choice of StyleGAN-1 is a progressive growth mechanism, similar to Progressive GAN. Each generated image starts as a constant 4\times 4 \times 512 array, and repeatedly passed through style blocks. Each style block applies a "style latent vector" via affine transform ("adaptive instance normalization"), similar to how neural style transfer uses Gramian matrix. It then adds noise, and normalize (subtract the mean, then divide by the variance). At training time, usually only one style latent vector is used per image generated, but sometimes two ("mixing regularization") in order to encourage each style block to independently perform its stylization without expecting help from other style blocks (since they might receive an entirely different style latent vector). After training, multiple style latent vectors can be fed into each style block. Those fed to the lower layers control the large-scale styles, and those fed to the higher layers control the fine-detail styles. Style-mixing between two images x, x' can be performed as well. First, run a gradient descent to find z, z' such that G(z)\approx x, G(z')\approx x'. This is called "projecting an image back to style latent space". Then, z can be fed to the lower style blocks, and z' to the higher style blocks, to generate a composite image that has the large-scale style of x, and the fine-detail style of x'. Multiple images can also be composed this way. StyleGAN-2 StyleGAN-2 improves upon StyleGAN-1, by using the style latent vector to transform the convolution layer's weights instead, thus solving the "blob" problem. This was updated by the StyleGAN-2-ADA ("ADA" stands for "adaptive"), which uses invertible data augmentation as described above. It also tunes the amount of data augmentation applied by starting at zero, and gradually increasing it until an "overfitting heuristic" reaches a target level, thus the name "adaptive". StyleGAN-3 StyleGAN-3 improves upon StyleGAN-2 by solving the "texture sticking" problem, which can be seen in the official videos. They analyzed the problem by the Nyquist–Shannon sampling theorem, and argued that the layers in the generator learned to exploit the high-frequency signal in the pixels they operate upon. To solve this, they proposed imposing strict lowpass filters between each generator's layers, so that the generator is forced to operate on the pixels in a way faithful to the continuous signals they represent, rather than operate on them as merely discrete signals. They further imposed rotational and translational invariance by using more signal filters. The resulting StyleGAN-3 is able to solve the texture sticking problem, as well as generating images that rotate and translate smoothly. == Other uses ==

Other uses

Other than for generative and discriminative modelling of data, GANs have been used for other things. GANs have been used for transfer learning to enforce the alignment of the latent feature space, such as in deep reinforcement learning. This works by feeding the embeddings of the source and target task to the discriminator which tries to guess the context. The resulting loss is then (inversely) backpropagated through the encoder. ==Applications==

Applications

Science • Iteratively reconstruct astronomical images • Simulate gravitational lensing for dark matter research. • Model the distribution of dark matter in a particular direction in space and to predict the gravitational lensing that will occur. • Model high energy jet formation and showers through calorimeters of high-energy physics experiments. • Approximate bottlenecks in computationally expensive simulations of particle physics experiments. Applications in the context of present and proposed CERN experiments have demonstrated the potential of these methods for accelerating simulation and/or improving simulation fidelity. • Reconstruct velocity and scalar fields in turbulent flows. GAN-generated molecules were validated experimentally in mice. Medical One of the major concerns in medical imaging is preserving patient privacy. Due to these reasons, researchers often face difficulties in obtaining medical images for their research purposes. GAN has been used for generating synthetic medical images, such as MRI and PET images to address this challenge. GAN can be used to detect glaucomatous images helping the early diagnosis which is essential to avoid partial or total loss of vision. GANs have been used to create forensic facial reconstructions of deceased historical figures. Malicious that looks deceptively like a photograph of a real person. This image was generated by a StyleGAN based on an analysis of portraits. Concerns have been raised about the potential use of GAN-based human image synthesis for sinister purposes, e.g., to produce fake, possibly incriminating, photographs and videos. GANs can be used to generate unique, realistic profile photos of people who do not exist, in order to automate creation of fake social media profiles. In 2019 the state of California considered and passed on October 3, 2019, the bill AB-602, which bans the use of human image synthesis technologies to make fake pornography without the consent of the people depicted, and bill AB-730, which prohibits distribution of manipulated videos of a political candidate within 60 days of an election. Both bills were authored by Assembly member Marc Berman and signed by Governor Gavin Newsom. The laws went into effect in 2020. DARPA's Media Forensics program studies ways to counteract fake media, including fake media produced using GANs. Fashion, art and advertising GANs can be used to generate art; The Verge wrote in March 2019 that "The images created by GANs have become the defining look of contemporary AI art." GANs can also be used to • inpaint photographs • generate fashion models, shadows, photorealistic renders of interior design, industrial design, shoes, etc. Such networks were reported to be used by Facebook. Some have worked with using GAN for artistic creativity, as "creative adversarial network". A GAN, trained on a set of 15,000 portraits from WikiArt from the 14th to the 19th century, created the 2018 painting Edmond de Belamy, which sold for US$432,500. GANs were used by the video game modding community to up-scale low-resolution 2D textures in old video games by recreating them in 4k or higher resolutions via image training, and then down-sampling them to fit the game's native resolution (resembling supersampling anti-aliasing). In 2020, Artbreeder was used to create the main antagonist in the sequel to the psychological web horror series Ben Drowned. The author would later go on to praise GAN applications for their ability to help generate assets for independent artists who are short on budget and manpower. In May 2020, Nvidia researchers taught an AI system (termed "GameGAN") to recreate the game of Pac-Man simply by watching it being played. In August 2019, a large dataset consisting of 12,197 MIDI songs each with paired lyrics and melody alignment was created for neural melody generation from lyrics using conditional GAN-LSTM (refer to sources at GitHub AI Melody Generation from Lyrics). Miscellaneous GANs have been used to • show how an individual's appearance might change with age. • reconstruct 3D models of objects from images, • generate novel objects as 3D point clouds, • model patterns of motion in video. • inpaint missing features in maps, transfer map styles in cartography or augment street view imagery. • use feedback to generate images and replace image search systems. • visualize the effect that climate change will have on specific houses. • reconstruct an image of a person's face after listening to their voice. • produces videos of a person speaking, given only a single photo of that person. • recurrent sequence generation. ==History==

History

In 1991, Juergen Schmidhuber published "artificial curiosity", neural networks in a zero-sum game. The first network is a generative model that models a probability distribution over output patterns. The second network learns by gradient descent to predict the reactions of the environment to these patterns. GANs can be regarded as a case where the environmental reaction is 1 or 0 depending on whether the first network's output is in a given set. Other people had similar ideas but did not develop them similarly. An idea involving adversarial networks was published in a 2010 blog post by Olli Niemitalo. This idea was never implemented and did not involve stochasticity in the generator and thus was not a generative model. An idea similar to GANs was used to model animal behavior by Wei Li, Melvin Gauci and Roderich Gross in 2013. Another inspiration for GANs was noise-contrastive estimation, which uses the same loss function as GANs and which Goodfellow studied during his PhD in 2010–2014. Adversarial machine learning has other uses besides generative modeling and can be applied to models other than neural networks. In control theory, adversarial learning based on neural networks was used in 2006 to train robust controllers in a game theoretic sense, by alternating the iterations between a minimizer policy, the controller, and a maximizer policy, the disturbance. In 2017, a GAN was used for image enhancement focusing on realistic textures rather than pixel-accuracy, producing a higher image quality at high magnification. In 2017, the first faces were generated. These were exhibited in February 2018 at the Grand Palais. Faces generated by StyleGAN in 2019 drew comparisons with Deepfakes. == See also ==

Source: Wikipedia ↗

tickerdossier.com tickerdossier.substack.com