# C1 Generative Modeling

# What is Generative Modeling

Generative modeling is a branch of machine learning that involves training a model to produce new data that is similar to a given dataset.

That is, we train a generative model on a dataset to capture the rules that govern the relationships b/w pixels in images in the dataset. Then we sample from this model to create a novel image.

Each observation in the dataset consists of features and our goal is to build a model that can generate new sets of features.

A generative model should be probabilistic and not deterministic

# Generative vs Discriminative Modeling

Discriminative
- When performing discriminative modeling, training data has labels per observation. The model learns to discriminate b/w groups/classes.
- Discriminative modeling estimates p(y|x)
- Discriminative modeling aims to model the probability of a label y given some observation x.
- Has historically been more popular due to ease in implementation and applicability compared to the latter.
Generative
- Generative modeling does not require labeled data.
- Generative modeling estimates p(x)
- Generative modeling aims to model the probability of observing an observa‐ tion �. Sampling from this distribution allows us to generate new observations.

# Generative Modeling and how it improves AI

For completeness, we should go beyond limiting model to only categorize & should get a better understanding of the data distribution.
GenAI is being used in RL - train the agent to learn a world model of the env independent of a task
GenAI is another step towards AGI

# Getting Started

Given a rule p_{data} to generate set of points X, task is to generate a point such that it looks like it has been generated from the same rule
p_{model}, by intuition, can be sampled to get the point we want.

# Generative Modeling Framework

We have a dataset of obs, X
We assume they are generated from a unknown distribution p_{data}
We want to build p_{model} that mimics p_{data}. Sampling from this gives us similar outputs.
Desirable properties of p_{model} are
- Accuracy - If p_{model} is high for a generated obs, it should look like it has been drawn from p_{data}.
- Generation - Should be easy to sample
- Representation - Should be possible to understand how different high-level features in the data are represented.

# An example

Let p_{data} be a uniform distribution over the land and not ocean.
The box is p_{model}
- Point a is wrong as it is in the ocean
- Point b could never have been generated
- Point c is a good generation
Our model is a simple repr of the underlying complex distribution

# Representation Learning

While describing humans, we do not describe appearances as pixels but as features assuming the person knows what a human looks like. The person is then able to map it to an image that is not perfect but a good estimate.
This is the core idea behind representation learning
Instead of trying to model the high-dimensional sample space directly, we describe each observation using some lower-dimensional latent space - a representation of some high dimensional obs and then learn a mapping function that can take a point in the latent space and map it to a point in the original domain.
For example, in a tin dataset, height and width can be two latent space dimensions that best describe this dataset.
Mathematically speaking, encoder-decoder techniques try to transform the highly nonlinear manifold on which the data lies into a simpler latent space that can be sampled from, so that it is likely that any point in the latent space is the representation of a well-formed image.

# Core Probability Theory

# Generative Model Taxonomy

There are 3 approaches to model the PDF - p_{\theta}(x)

Explicitly model the density function, but constrain the model in some way, so that the density function is tractable.
Explicitly model a tractable approximation of the density function.
Implicitly model the density function, through a stochastic process that directly generates data and do not estimate the prob density.

Models in detail:

Tractable models place constraints on the model architecture, so that the density func‐ tion has a form that makes it easy to calculate. For example, autoregressive models impose an ordering on the input features, so that the output can be generated sequen‐ tially—e.g., word by word, or pixel by pixel.
Normalizing flow models apply a series of tractable, invertible functions to a simple distribution, in order to generate more complex distributions.
Approximate density models include variational autoencoders, which introduce a latent variable and optimize an approximation of the joint density function.
Energy-based models also utilize approximate methods, but do so via Markov chain sampling, rather than variational methods.
Diffusion models approximate the density function by training a model to gradually denoise a given image that has been previously corrupted.