Theory topics¶
This page lists materials and sources students can use to research specific application areas of generative AI. Beside these provided materials, students are encouraged to find and use additional sources like research papers, articles, videos, and tutorials. See awesome-generative-ai for a curated list of resources on generative AI.
Text generation with LLMs¶
You should focus on natural language, as well as applications like chatbots.
Papers (LLMs)¶
(reduced list of important papers from Hannibal046's Awesome-LLM)
| Date | keywords | Institute | Paper |
|---|---|---|---|
| 2017-06 | Transformers | Attention Is All You Need | |
| 2018-06 | GPT 1.0 | OpenAI | Improving Language Understanding by Generative Pre-Training |
| 2018-10 | BERT | BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding | |
| 2019-02 | GPT 2.0 | OpenAI | Language Models are Unsupervised Multitask Learners |
| 2020-01 | Scaling Law | OpenAI | Scaling Laws for Neural Language Models |
| 2020-05 | GPT 3.0 | OpenAI | Language models are few-shot learners |
| 2021-08 | Foundation Models | Stanford | On the Opportunities and Risks of Foundation Models |
| 2022-03 | InstructGPT | OpenAI | Training language models to follow instructions with human feedback |
| 2023-02 | LLaMA | Meta | LLaMA: Open and Efficient Foundation Language Models |
| 2023-03 | GPT 4 | OpenAI | GPT-4 Technical Report |
YouTube videos about LLMs¶
- MIT 6.S087: Foundation Models & Generative AI. CHAT-GPT & LLMs
- The Illustrated Word2vec - A Gentle Intro to Word Embeddings in Machine Learning, Jay Alammar explains his blog article about word2vec, which is a foundational concept for understanding how LLMs represent words in a continuous vector space.
Books¶
- Hands-On Large Language Models by Jay Alammar and Maarten Grootendorst, O'Reilly Media, 2024 (accessible via HFU)
Text based generation of other modalities¶
You should focus on how text prompts are used to control the generation of other modalities (image, video, audio).
Papers (text to X)¶
See all the Papers (LLMs) section for relevant foundational papers.
| Date | keywords | Institute | Paper |
|---|---|---|---|
| 2021-02 | CLIP | OpenAI | Learning Transferable Visual Models From Natural Language Supervision |
| 2021-11 | LAION-400M | LAION | LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs |
| 2022-12 | OpenCLIP | LAION | Reproducible scaling laws for contrastive language-image learning |
| 2024-12 | MetaCLIP | Meta AI | Demystifying CLIP Data |
| 2025-08 | MetaCLIP 2 | Meta AI | Meta CLIP 2: A Worldwide Scaling Recipe |
Blog articles about Text based generation of other modalities¶
- CLIP: Connecting text and images, OpenAI, 2021
- A History of CLIP Model Training Data Advances, Voxel51, 2024
YouTube videos about Text based generation of other modalities¶
Further resources about CLIP¶
- OpenCLIP - an open source implementation of CLIP which is widely used in research and updated regularly
- Awesome CLIP
- Awesome CLIP Papers
Image generation with Foundation Models (Diffusion, GAN or others)¶
You should focus on individual images and graphics.
Papers (image generation)¶
See this awesome collection for most important papers and an overview.
| Date | keywords | Institute | Paper |
|---|---|---|---|
| 2014-06 | GANs | Université de Montréal (Goodfellow et al.) | Generative Adversarial Nets |
| 2015-12 | Auto-regressive (Dall-E) | OpenAI | Zero-Shot Text-to-Image Generation |
| 2020-08 | Stable Diffusion | LMU Munich | High-Resolution Image Synthesis with Latent Diffusion Models |
Blog articles about Image generation¶
YouTube videos about Image generation¶
- MIT 6.S087: Foundation Models & Generative AI. IMAGE GENERATION by Rickard Brüel Gabrielsson
- How AI Image Generators Work (Stable Diffusion / Dall-E) - Computerphile
- Flow-Matching vs Diffusion Models explained side by side - AI coffee break with Letitia
- But how do AI images and videos actually work? | 3Blue1Brown guest video by Welch Labs
Tools for image generation¶
Video generation with Foundation Models¶
You should focus on how short video clips are generated. What is different compared to image generation? What are the challenges and how are they addressed?
Tools for video generation¶
- Sora 2 from OpenAI (note that page redirects to a German version if you are in Germany)
- ray3 from Luma AI
- veo3 from Google AI
- WAN-Video
Papers (video generation)¶
- Bridging Text and Video Generation: A Survey, 2025 --> a recent survey paper about text to video generation that contains many references to earlier work
3D model generation with Foundation Models¶
Papers (3D model generation)¶
- Bolt3D: Generating 3D Scenes in Seconds - a very recent paper about generating 3D scenes which refers to older work in the introduction section.
Project pages¶
Tools for 3D model generation¶
Audio generation with Foundation Models¶
You should focus on music, speech, sound effects.
Papers (audio generation)¶
| Date | keywords | Institute | Paper |
|---|---|---|---|
| 2019-02 | WaveGAN | UC San Diego | WaveGAN: Spectrogram-Free Generative Adversarial Networks for Audio Synthesis |
| 2022-07 | AudioLM | Google Research | AudioLM: a Language Modeling Approach to Audio Generation |
| 2023-01 | MusicLM | Google Research | MusicLM: Generating Music From Text |
| 2020-04 | Jukebox | OpenAI | Jukebox: A Generative Model for Music |
| 2024-07 | MELLE | Microsoft | Autoregressive Speech Synthesis without Vector Quantization |
Tools for audio generation¶
Code generation with LLMs¶
You should focus on program code, support in the software development process.
Tools for code generation¶
YouTube videos about Code generation¶
Andrej Karpathy: Software Is Changing (Again)
Papers (Code)¶
| Date | keywords | Institute | Paper |
|---|---|---|---|
| 2021-08 | Codex | OpenAI | Evaluating Large Language Models Trained on Code |