Session 6: Presentation of thery topics¶

Date: 2026-04-27

Overview¶

Students present their current research findings on specific application areas of generative AI.

Plan¶

Presentations of selected theory topics researched by students:

Joshua Aleth: Agentic AI with OpenClaw and others (e.g. MCP)
Sebastian Bauer: Text-based generation of other modalities (CLIP and successors)
Sebastian Regelmann: Image generation with foundation models (GAN, Stable Diffusion, Flow Matching)

Each presentation should be exactly 15 minutes, followed by 5 - 15 minutes of Q&A. A hand-out (pdf) summarizing key points should be provided.

Results¶

The presentations should cover the following points:

Overview of the application area --> which problems are solved by generative AI in this area? What are the main use cases? Which tools are used for which purposes?
Key models and architectures --> which models have been developed over the years? Which architectures are used? What are the main differences between them?
Relevant tools and frameworks --> which tools and frameworks are available today for working with generative AI in this area? Which ones are most popular? Which ones are most powerful? Which ones are free to use? Which ones are open source?
Example use cases and demos --> which examples and demos are available for this application area? Which ones are most impressive? Which ones are most useful? Which ones are most creative?
Challenges and future directions --> which challenges are still open in this application area? Which ones are most pressing? Which ones are most interesting? Which ones are most promising for future research and development?

Agentic AI¶

Start des Agenten übers Smartphone? wie genau? Andon Market als Beispiel: Agent sollte aus 100.000 Doller Startkapital Geld verdienen Agenten definiert: ReAct Reasoning + Acting Act Learn Perceive Reason Zyklus Tool-Use: APIs, Datenbanken Planning mit Mehrstufigkeit Self-Correction und Memory f: Quellen nicht nur mit Nummer angeben und ruhig auch benennen Mediengenerierung über Pipelines, Agent orchestriert f: sehr flüssiger Wechsel zwischen Tools bei der Präsentation API-first vs UI-first als Alternative Ansätze für Agenten Holotab als Beispiel für UI-first, der selbständig den Browser steuert Webseiten kann man agenten-freundlich gestalten f: Warum MCP? Problem erstmal erklärt, aber zu wenig Pause vor dem Einstieg in ein neues Thema f: Folie 17, Struktur übereinander? RAG und MCP wird oft miteinander verglichen --> kann es kombiniert werden? Ja, beides Agent bei Meta hat einfach Emails gelöscht? Ohne Angriff? shodan.io listet offene OpenClaw Schnittstellen persönliches Fazit: sehr bereichernd, Push-Nachrichten auf SmartWatch Webseiten sollten Agenten-optimiert werden Zukunft werden Agenten sein....

Ende 14:25 --> sehr gutes Timing

Ollama-Cloud mit Qwen für 20 Euro im Monat

Agenten werden nicht trainiert, er lernt neue "Skills". Das sind im Wesentlichen .md Dateien, die den Skill in Worten erklären

CLIP and successors¶

Start: 15:27

CLIP: Contrastive Language-Image Pretraining

von OpenAI, 2021
lernt Zusammenhänge zwischen Texten und Bildern
Motivation: Fine-tuning verhindern? Oder ImageNet zu klein?
400 Millionen Bild/Text-Paare zum Training verwendet
Gemeinsamer Embedding Space (gleiche Dimensionalität)
Als Image Encoder wurde ResNet + ViT verwendet
Als Text Encoder GPT-2
Was ist N? Architekturfolie --> N ist die Batch Size
Embeddings einzeln normalisiert?
Cross-Entropy als Loss? --> Es werden Verteilungen verglichen und es muss nicht nur eine passende Beschreibung geben
Use cases:
- Zero-shot image classification
- Image search --> wird es genutzt?
- Text2Image Modelle
Eigene Implementierung: Invarianz für Bilder von Schachbrettern mit Spielsituationen
- 70.000 synthetische Bilder aus Blender mit 3x3 Feldern
- ResNet18 als Backbone

Ende: 15:41

Fragen:

Image Captioning auch mit CLIP? --> sollte gehen, aber nicht ganz klar, ob eine weiteres Modell trainiert werden muss.
Gibt es ein CLIP 2.0? --> Nicht bekannt.

Image generation (GAN, Stable Diffusion, Flow Matching)¶

Start: 16:06

Image generation foundation models

GANs
- erste Modellarchitektur
- CNNs für GANs
- 2014 erfunden (Goodfellow)
- 2017/18 Style transfer sehr erfolgreich
- Implementierungen von StyleGAN und CycleGAN sind verfügbar
- "This person does not exist" als Beispiel
- Training ist schwierig, weil Balance gehalten werden muss
- keine Generalisierung, lernt nur aus den Trainingsdaten
- Woher kommt die Kontrolle? Mann/Frau, Alter usw?
Diffusion Models
- schrittweise (Timesteps) Bilder verrauschen (Encoder) und wieder Entrauschen (Decoder)
- es wird im Training auch immer wieder Rauschen hinzugefügt, so dass nicht nur Durchschnittsbilder entstehen
- DDMP: 2020: pixel-basiert
- LDMP: 2021: mit Auto-Encoder einen Latent-Space erzeugen und in diesem wird dann die Diffusion gerechnet
- Comfy UI als Tool
- Open source models: Qwen image, Stable Diffusion, Playground AI
- Closed source: Dall-E 3, Midjourney bis 5.?
- tensor.art als Webservice um Modelle zu vergleichen
- Problem: typischer Stil der Modelle, hohe Rechenleistung erforderlich, wenig Kontrolle durch die Spracheingabe, Text ist schwierig zu generieren
Flow Matching
- Rauschen wird auch schrittweise hinzugefügt, aber das Entrauschen wird in einem Schritt trainiert
  - Ordinary Differential Equation anstatt Stochastic Differential Equations --> nochmal genauer anschauen
- Modelle: Flux, StableDiffusion 3.5, Dall-E 4?, Midjourney V7

Ende: 16:29

Fragen:

Open Source: open model weights oder open data set?
S

Feedback¶

All talks have been very well prepared and delivered. The content was clear and informative, however the presenters did not fully engage the audience. The use of examples and demos was particularly effective in illustrating the concepts being discussed. The Q&A sessions were also very productive, with insightful questions and thoughtful answers.

Some general advice for future presentations:

Try to engage the audience in the beginning with a strong hook or an interesting question. The quote from Dijkstra was a good start, but it could have been more effective if it had been followed by a pause to let the audience reflect on it before diving into the technical details.
When introducing new concepts, take a moment to explain them in simple terms before going into the technical details. This will help ensure that everyone in the audience is on the same page and can follow along with the presentation.
Do not try to cover too much material in a short amount of time. It is better to focus on a few key points and explain them well than to try to cover everything and risk losing the audience.
Try to build a story arc in your presentation, with a clear beginning, middle, and end. This will help keep the audience engaged and make it easier for them to follow along.
If you refer to research papers or other sources, make sure to name the authors and their affiliation or some context to understand the circumstances that lead to this publication. Provide enough information for the audience to find them and encourage them to find out more, if you appreciate the source - which you should, otherwise you should not use the source.

Todo¶

Ordinary Differential Equation anstatt Stochastic Differential Equations --> nochmal genauer anschauen