Skip to content

Theory topics

This page lists materials and sources students can use to research specific application areas of generative AI. Beside these provided materials, students are encouraged to find and use additional sources like research papers, articles, videos, and tutorials. See awesome-generative-ai for a curated list of resources on generative AI.

Text generation with LLMs

You should focus on natural language, as well as applications like chatbots.

Papers (LLMs)

(reduced list of important papers from Hannibal046's Awesome-LLM)

Date keywords Institute Paper
2017-06 Transformers Google Attention Is All You Need
2018-06 GPT 1.0 OpenAI Improving Language Understanding by Generative Pre-Training
2018-10 BERT Google BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
2019-02 GPT 2.0 OpenAI Language Models are Unsupervised Multitask Learners
2020-01 Scaling Law OpenAI Scaling Laws for Neural Language Models
2020-05 GPT 3.0 OpenAI Language models are few-shot learners
2021-08 Foundation Models Stanford On the Opportunities and Risks of Foundation Models
2022-03 InstructGPT OpenAI Training language models to follow instructions with human feedback
2023-02 LLaMA Meta LLaMA: Open and Efficient Foundation Language Models
2023-03 GPT 4 OpenAI GPT-4 Technical Report

YouTube videos about LLMs

Books

Text based generation of other modalities

You should focus on how text prompts are used to control the generation of other modalities (image, video, audio).

Papers (text to X)

See all the Papers (LLMs) section for relevant foundational papers.

Date keywords Institute Paper
2021-02 CLIP OpenAI Learning Transferable Visual Models From Natural Language Supervision
2021-11 LAION-400M LAION LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs
2022-12 OpenCLIP LAION Reproducible scaling laws for contrastive language-image learning
2024-12 MetaCLIP Meta AI Demystifying CLIP Data
2025-08 MetaCLIP 2 Meta AI Meta CLIP 2: A Worldwide Scaling Recipe

Blog articles about Text based generation of other modalities

YouTube videos about Text based generation of other modalities

Further resources about CLIP

Image generation with Foundation Models (Diffusion, GAN or others)

You should focus on individual images and graphics.

Papers (image generation)

See this awesome collection for most important papers and an overview.

Date keywords Institute Paper
2014-06 GANs Université de Montréal (Goodfellow et al.) Generative Adversarial Nets
2015-12 Auto-regressive (Dall-E) OpenAI Zero-Shot Text-to-Image Generation
2020-08 Stable Diffusion LMU Munich High-Resolution Image Synthesis with Latent Diffusion Models

Blog articles about Image generation

YouTube videos about Image generation

Tools for image generation

Video generation with Foundation Models

You should focus on how short video clips are generated. What is different compared to image generation? What are the challenges and how are they addressed?

Tools for video generation

Papers (video generation)

3D model generation with Foundation Models

Papers (3D model generation)

Project pages

Tools for 3D model generation

Audio generation with Foundation Models

You should focus on music, speech, sound effects.

Papers (audio generation)

Date keywords Institute Paper
2019-02 WaveGAN UC San Diego WaveGAN: Spectrogram-Free Generative Adversarial Networks for Audio Synthesis
2022-07 AudioLM Google Research AudioLM: a Language Modeling Approach to Audio Generation
2023-01 MusicLM Google Research MusicLM: Generating Music From Text
2020-04 Jukebox OpenAI Jukebox: A Generative Model for Music
2024-07 MELLE Microsoft Autoregressive Speech Synthesis without Vector Quantization

Tools for audio generation

Code generation with LLMs

You should focus on program code, support in the software development process.

Tools for code generation

YouTube videos about Code generation

Andrej Karpathy: Software Is Changing (Again)

Papers (Code)

Date keywords Institute Paper
2021-08 Codex OpenAI Evaluating Large Language Models Trained on Code