Skip to content

3DCV Workshop 02 - Monocular Depth Estimation

Workshop Overview

In this workshop, you'll explore and compare real-world depth sensing with state-of-the-art AI models for monocular depth estimation. The ultimate goal is to evaluate whether traditional 3D cameras are still necessary in the age of advanced AI.

Workshop Goals

  • Gain practical experience using the Azure Kinect Depth camera.
  • Learn how to run AI models for monocular depth estimation.
  • Compare real depth data with AI-generated depth maps.
  • Reflect on the central question: Do we still need 3D cameras?

Procedure

The parts do not need to be done by everyone. Form teams and pairs that work on each part and present the status and results every hour.


Part 1: Collecting Real-World Depth Data

Task

Capture RGB + Depth images using:

  1. Azure Kinect (for RGB + Depth).

Instructions

  • Install and configure the Azure Kinect SDK:
    Azure Kinect Sensor SDK Guide
  • Write a small Python script that captures images using one of these libraries: pyKinectAzure, pyk4a
    • Alternative: Use the SDK or viewer to save synchronized RGB and Depth frames.

Hint: You can also use the webcam code from Workshop-01-Camera Calibration and capture the same scene with a webcam or your cell phone as alternative.

  • Research: Capture scenes with challenging lighting conditions and geometries that you expect the AI models are not able to deal with. Note for all your sample scenes your expectations. Name and sort all captured scenes and images in a folder structure that allows you to automatically process all images using a script.

Part 2: Generating Depth Maps with AI Models

Models to Test

Pick one or more of the following models to run depth estimation on exemplary RGB images:

Hint: Start with the images the authors provide for testing in order to reproduce their results. Only if those tests are successful continue with your own images.

  • Research: Check the examples from the AI models and update your expectations from part #1. Try to automatize the processing of all images using a script. Take care that each step is reproducible and documented.

Part 3: Comparing the Depth Maps

Tasks

Compare:

  • AI-generated depth maps from various models
  • Generated depth maps with those recorded with Azure Kinect

Suggestions

  • Use color maps to visualize depth, e.g. with cv2.applyColorMap
  • Optionally convert all depth maps to the same scale in order to compare them.
  • Overlay AI depth maps and real Kinect depth maps on the original RGB image.
  • Measure per-pixel depth error if ground truth Kinect data is used as reference.

  • Challenge: Try to use metric depth models and compare the measured distances.


Part 4: Discussion and Reflection

Key Question

Do we still need 3D cameras like Azure Kinect, or are monocular AI models good enough?

Points to Consider

  • Accuracy and consistency of AI models across different environments
  • Computational requirements of AI models
  • Limitations in occlusion handling and texture-less regions
  • Real-time capabilities

  • Research: Prepare a short presentation or poster or web page summarizing your findings with visual examples.


Final Thoughts

By completing this workshop, you have:

  • Captured RGBD data for a small research project,
  • Organized and structured data to efficiently process it,
  • Dealt with depth maps from cameras as well as ones generated by the use of AI models,
  • and presented and discussed your results in a scientific format.

Further Reading

Here are the foundational papers for each model explored:


Grading

For this workshop the research of part #4 needs to be handed in. Send the resulting document to the instructor by email until one week after the workshop. It should either contain the file as pdf or a link to the document. If you worked in teams, state who is responsible for each part.

Your results need to be reproducible and contain references to all used sources and tools.