
Introduction
In the landscape of artificial intelligence (AI), there's a cutting-edge technology that has taken the realm of creative generation by storm: Generative Adversarial Networks(GAN's). These remarkable systems have the ability to produce stunningly realistic images, videos, music, and even text, revolutionizing the way we approach creative tasks in the digital
age.
So what are GANs?
Generative Adversarial Networks (GAN's) are a class of machine learning models introduced by Ian Goodfellow and his colleagues in 2014. The concept behind GANs is very simple.they consist of two neural networks, a generator and a discriminator, These two engage in a competition to produce authentic output.
The Generator: Unleashing Creativity
The generator's role in a GAN is to create something entirely new. It takes random noise as input and generates output that is indistinguishable from real data. For instance, in the case of images, the generator can craft very realistic pictures of non-existent landscapes, faces, objects or literally anything. This imaginative capacity extends to other domains too; GANs can compose unique pieces of music, generate lifelike human speech, and even create coherent paragraphs of text.
The Discriminator: A Critical Eye
The discriminator is the second half of the GAN equation. It acts as the critic, evaluating the output of the generator and attempting to distinguish between real and generated content. As the generator keeps generating out new creations, the discriminator becomes tougher and a more strict critic identifying tiny and subtle flaws in the generated data. This back-and-forth dynamic creates a training loop where the generator continually refines its output to outwit the discriminator.
Harmony in Competition
The remarkable power of GANs lies in the interplay between the generator and the discriminator. Their competition drives the network to improve iteratively, resulting in outputs that can be astonishingly realistic. As the generator refines its work to better fool the discriminator, the discriminator simultaneously hones its ability to discern the genuine from the generated. This tug-of-war culminates in outputs that push the boundaries of creativity and realism.
Training

Images in dataset: GANs are used for generating very realistic images. The goal is to train a generator to create images that are visually similar to a given dataset.
Embeddings: GANs often do not directly use embeddings. Instead, they take random noise vectors as input to the generator and try to generate images that match the target distribution. These noise vectors are points in a latent space, but they are not necessarily embeddings.
Probability Distribution in Latent Space: The latent space is a conceptual space where the generator operates. The generator takes random noise samples from this latent space and transforms them into images. The goal is to learn a mapping from this latent space to the data space (images) so that the generated images resemble the real data distribution.
Learning Features: GANs learn features implicitly during training. The generator and discriminator networks compete against each other. The generator tries to produce images that the discriminator cannot easily distinguish from real images. As a result, the generator learns to generate images with features similar to those in the real dataset.
Generate Similar Images: The trained generator can take random noise vectors from the latent space and generate images. If the GAN training is successful, the generated images should resemble the training dataset.
Calculating loss while training:
We use binary cross-entropy (BCE) loss as the loss function, which is a common choice for GANs. BCE loss measures the dissimilarity between two probability distributions (predicted and true labels).
Discriminator Loss:
The discriminator's loss function aims to minimize the difference between its predictions for real data and its predictions for fake data.
The BCE loss for the discriminator can be written as:
L_D = -summation (1/n)*[yi*log(D(real_data)) + (1-yi)*log(1 - D(fake_data))]
where yi(y_expected)=1,for real images and 0 for fake images.
D(real_data): Discriminator's output (probability) for real data being classified as real.
D(fake_data): Discriminator's output (probability) for fake data (generated by the generator) being classified as real.
The discriminator tries to minimize this loss. It wants to correctly classify real data as real (maximize log(D(real_data))) and classify fake data as fake (maximize log(1 - D(fake_data))).
2. Generator Loss:
The generator's loss function aims to maximize the likelihood of the discriminator classifying fake data as real.
The BCE loss for the generator can be written as
L_G = -summation (1/n)*yi*log(D(fake_data))
When the generator is trained, it samples random noise and produces an output (fake image). The output then goes through the discriminator and gets classified as either “Real” or “Fake” based on the ability of the discriminator to classify.
The generator loss is then calculated from the discriminator’s classification – it gets rewarded if it successfully fools the discriminator, and gets penalized otherwise.
So,it is sort of a min max game where they both compete to outperform eachother and in the end produce very realistic results.
Applications
The applications of GANs are virtually limitless. In the world of art and design, GANs have been used to create unique paintings, generate fashion designs, and even assist architects in envisioning novel structures. Healthcare professionals use GANs to generate synthetic medical images, aiding in diagnostics and research. Moreover, GANs have an amazing ability to augment data in the dataset , helping to enhance the performance of machine learning models when data is scarce.
Ethical Considerations
However, the stunning capabilities of GANs also raise important ethical questions. With the ability to be able to generate incredibly convincing fake content, there are growing concern's about the potential misuse of this technology, such as creating deepfake videos that can spread misinformation.
Conclusion
GANs have already made a significant impact on how we approach creative and imaginative tasks and problem-solving. As this technology continues to advance, it's important for us to use its potential responsibly, The power of GAN's shouldn't be a reason for ethical malpractices or misinformation.
I hope by now y'all understood exactly how GAN's work. It's time to get our hands dirty and build a gan model.
I have left below the link to my implementation of GAN to produce realistic images similar to handwritten numbers by using MNIST dataset.
Note before opening the link:
GAN model's are very complex and computationally intensive and take alot of time to train (even few days).Hence i would suggest you to not keep on training the model for long hours of time although colab uses google's gpu for running the code.I would suggest not leaving your computer running for hours together continuously. Maybe stop training at 200th epoch :).
Comments