Stable diffusion from scratch

This project involves the implementation of generative models such as text to image, image to image models using the U-Net neural network, CLIP Encoder and the Variational Autoencoder (VAE) for its functioning.

The architecture was implemented based on the research paper present in the repo linked below.

I aim to soon add the notes I have taken while building and learning throughout the implementation of this project and also hope to post a blog on the topic soon.

code available at repo