3DCV Workshop 02 - Monocular Depth Estimation¶

Workshop Overview¶

In this workshop, you'll explore and compare real-world depth sensing with state-of-the-art AI models for monocular depth estimation. The ultimate goal is to evaluate whether traditional 3D cameras are still necessary in the age of advanced AI.

Workshop Goals¶

Gain practical experience using the Azure Kinect Depth camera.
Learn how to run AI models for monocular depth estimation.
Compare real depth data with AI-generated depth maps.
Reflect on the central question: Do we still need 3D cameras?

Procedure¶

The parts do not need to be done by everyone. Form teams and pairs that work on each part and present the status and results every hour.

Part 1: Collecting Real-World Depth Data¶

Task¶

Capture RGB + Depth images using:

Azure Kinect (for RGB + Depth).

Instructions¶

Install and configure the Azure Kinect SDK:
Azure Kinect Sensor SDK Guide
Write a small Python script that captures images using one of these libraries: pyKinectAzure, pyk4a
- Alternative: Use the SDK or viewer to save synchronized RGB and Depth frames.

Hint: You can also use the webcam code from Workshop-01-Camera Calibration and capture the same scene with a webcam or your cell phone as alternative.

Research: Capture scenes with challenging lighting conditions and geometries that you expect the AI models are not able to deal with. Note for all your sample scenes your expectations. Name and sort all captured scenes and images in a folder structure that allows you to automatically process all images using a script.

Part 2: Generating Depth Maps with AI Models¶

Models to Test¶

Pick one or more of the following models to run depth estimation on exemplary RGB images:

Apple DepthPro (Mac only)
- Paper: arXiv
- Code: GitHub
Depth Anything (V1 + V2)
- V1 Paper: arXiv
- V2 Paper: arXiv
- V1 Homepage: depth-anything.github.io
- V2 Homepage: depth-anything-v2.github.io
- Community Implementations: GitHub
DINOv2
- Demo: Meta
- Code: GitHub
Monodepth2
- Paper: arXiv
- Code: GitHub
MoGe (Monocular Geometry)
- Project: web
- Paper: arXiv
- Code: GitHub

Hint: Start with the images the authors provide for testing in order to reproduce their results. Only if those tests are successful continue with your own images.

Research: Check the examples from the AI models and update your expectations from part #1. Try to automatize the processing of all images using a script. Take care that each step is reproducible and documented.

Part 3: Comparing the Depth Maps¶

Tasks¶

Compare:

AI-generated depth maps from various models
Generated depth maps with those recorded with Azure Kinect

Suggestions¶

Use color maps to visualize depth, e.g. with cv2.applyColorMap
Optionally convert all depth maps to the same scale in order to compare them.
Overlay AI depth maps and real Kinect depth maps on the original RGB image.
Measure per-pixel depth error if ground truth Kinect data is used as reference.
Challenge: Try to use metric depth models and compare the measured distances.

Part 4: Discussion and Reflection¶

Key Question¶

Do we still need 3D cameras like Azure Kinect, or are monocular AI models good enough?

Points to Consider¶

Accuracy and consistency of AI models across different environments
Computational requirements of AI models
Limitations in occlusion handling and texture-less regions
Real-time capabilities
Research: Prepare a short presentation or poster or web page summarizing your findings with visual examples.

Final Thoughts¶

By completing this workshop, you have:

Captured RGBD data for a small research project,
Organized and structured data to efficiently process it,
Dealt with depth maps from cameras as well as ones generated by the use of AI models,
and presented and discussed your results in a scientific format.

Tools and Software Links¶

Azure Kinect SDK: Sensor SDK
Depth Anything V2 (Community Implementations): GitHub
DepthPro (Mac only): GitHub
Monodepth2: GitHub

Grading¶

For this workshop the research of part #4 needs to be handed in. Send the resulting document to the instructor by email until one week after the workshop. It should either contain the file as pdf or a link to the document. If you worked in teams, state who is responsible for each part.

Your results need to be reproducible and contain references to all used sources and tools.