Session 5: First research results¶

Date: 2026-04-20

Overview¶

Recap on transformers. Get an idea about foundation models and their applications.

Plan¶

As introduction, we watch a short video from Stanford HAI (Stanford Institute for Human-Centered Artificial Intelligence) on foundation models: Foundation Models: An Explainer for Non-Experts. It gives a good overview of the main concepts behind foundation models and their applications as well as risks.
We then watch the first lecture of the MIT course "Foundation Models and Generative AI" by Rickard Brüel Gabrielsson, also available on YouTube: Lecture 1: Introduction. It gives a good introduction to the main ideas behind foundation models.
We watch the first 15 minutes of the 5th lecture "MIT 6.S087: Foundation Models and Generative AI. ECOSYSTEM" by Rickard Brüel Gabrielsson, also available on YouTube: Lecture 5: ECOSYSTEM. It summarizes the main idea behind foundation models.
After clarifying open questions, we go deeper into specific applications resp. media generation methods. Each participant chooses one topic and works with the provided resources to prepare a presentation for the next session. See the theory page for resources on preparation for the next session.
The presentation should cover the following points:
- Overview of the application area --> which problems are solved by generative AI in this area? What are the main use cases? Which tools are used for which purposes?
- Key models and architectures --> which models have been developed over the years? Which architectures are used? What are the main differences between them?
- Relevant tools and frameworks --> which tools and frameworks are available today for working with generative AI in this area? Which ones are most popular? Which ones are most powerful? Which ones are free to use? Which ones are open source?
- Example use cases and demos --> which examples and demos are available for this application area? Which ones are most impressive? Which ones are most useful? Which ones are most creative?
- Challenges and future directions --> which challenges are still open in this application area? Which ones are most pressing? Which ones are most interesting? Which ones are most promising for future research and development?

Topic selection¶

Joshua Aleth: Agentic AI with OpenClaw and others (e.g. MCP)
Sebastian Bauer: Text-based generation of other modalities (CLIP and successors)
Sebastian Regelmann: Image generation with foundation models (GAN, Stable Diffusion, Flow Matching)

Grading for the theory presentation¶

Deliverables: Presentation of exactly 15 minutes, then 5 - 15 minutes Q&A, hand-out (pdf) summarizing key points.

Grading is based on clarity, depth of research, and engagement. It is important to list all references and tools used in the hand-out.

Materials¶

See the theory page for ressources on preparation for the next session: theory. Pick one topics and prepare a presentation for the next session.
Uwe shared the Bluesky post from Accented Cinema about how a joke on reddit is now embedded in large language models and generates fake news about Japanese culture. It is a good example of how generative AI can generate false information as the attention mechanism struggles with content where just a small change in the input can lead to a completely different output. It is also a good example of how generative AI can generate content that is not based on reality but on the training data, which can lead to the generation of stereotypes and biases.

Results¶

We finished watching and discussing the videos on transformers and attention mechanisms. We discussed the term foundation models and their applications, based on the videos from Stanford HAI and MIT. We also discussed the plans for the upcoming sessions and the choice of theory topics. The next step is to prepare a presentation on the chosen topic for the next session.