Skip to content

Workshop 3 + 4 Abgabe Maximilian Flack

Description of Task

  • Comparison of 3D image reconstruction Software - SfM vs AI
  • Used SfM software: Meshroom
  • Used model-based software: Spann3r
  • Ran on Hardware:
    • Laptop with Windows 11
    • NVidia RTX 4060 Laptop GPU
    • AMD Ryzen 5 7535HS CPU
    • 16Gb RAM

Expected Outcomes (see Workshop)

  • A reconstructed 3D model of a small object, exported in a common format (e.g., OBJ, PLY).
  • A reconstructed 3D model of a room or a larger outdoor scene, exported in a common format (e.g., OBJ, PLY).
  • A comparison of the results from different methods and devices.
  • A reflection on the effectiveness of traditional versus model-based methods.
  • A written report summarizing the findings.
  • A discussion on the future of 3D object / scene reconstruction in the context of AI advancements.
  • A reproducible workflow that can be shared with others.

Capturing the images

Object: Mimic-Chest

I captured the images of my object using an iPhone 16 camera on a tripod, turning the object by hand. The object was placed on a black cloth to create a black background for the images. I took 30 images of the object.

Example images of object:

Mimic Front Mimic Side Mimic Back

I chose this object because it doesn't have any dark areas that could blend in with the background. Also it has distinct textures and formes everywhere, hopefully making it easy to detect these features.

Scene: GHB-Room

I captured the images of my scene using an iPhone 16 camera held in hand, turning a few degrees on the same spot for every image. The scene depicts half of my room in the GHB. I took 11 images of the room using the ultra wide lense of the iPhone and 41 images using the standard lense.

Example images of scene:

Ultra-Wide Normal
UW Room Left Room Left
UW Room Right Room Right

The lighting conditions in the scene are not prefect and there are a few shiny and reflective surfaces but i used them anyway to see how the different tools would perform under suboptimal conditions.

Meshroom results

Test of functionality

To make sure Meshroom works correctly, i tested its functionality using the Test Dataset from this Tutorial.

Image from Dataset 3D Reconstruction as Mesh in Blender
Tree Left Tree Blender Left
Tree Right Tree Blender Right

As the reconstructed 3D model closely resembles the initial images, it can be assumed that Meshroom works correctly.

Object Reconstruction

I used all 30 images of the mimic to reconstruct it in Meshroom and after about 12 minutes i got this result (see mimic-meshroom in 3dmodels folder):

Meshroom Mimic Front Meshroom Mimic Side Meshroom Mimic Back

Because i didn't use SAM to segment the object from the background, Meshroom produced a few wrong faces in the mesh, as seen here:

Meshroom Artifacts Mimic

However this is easily fixable and otherwise the 3D-reconstruction worked, as the model closely resembles the real object.

Scene Reconstruction

I used all 52 images of my scene, so the Ultra-Wide lense and standard lense combined to test Meshroom's capacity of recreating 3d scenes. After about 17 minutes, i got the following result (see ghb-meshroom in 3dmodels folder):

Meshroom Room Left Meshroom Room Right

The results from Meshroom are not good, as most of the room has been cut off and there are huge gaps in the model.

Spann3r results

Problems installing Spann3r

I spent most of Workshop 3 and a bit of Workshop 4 trying to get Spann3r to work. The problem was the compatibility of the different tools needed to run Spann3r. Initially, i installed 4 different CUDA versions on my Laptop and tried installing the necessary dependencies using Miniconda. However, some versions were too old for my Visual Studio version (2022) and required installing the 2019 version but it's not possible to download this version for free anymore.

So i gave up on installing it on my Windows and tried with WSL. The Linux version of CUDA uses a different compiler and it is possible to just install the preferred version without complications. After getting the basic functionality of Spann3r to work on WSL, the next problem had to do with the GUI. Gradio could not start the localhost server on which you can interact with Spann3r in the browser. Because of this, i could not test some of the functions of Spann3r like importing videos to reconstruct a scene and had to use the more limited Demo of Spann3r that works with just a command line.

Test of functionality

To test the functionality of Spann3r, i first used the sample data provided here using only every 10th image out of the 580 provided and with a preset confidence threshold of 0.001.

Image from Dataset 3D Reconstruction as Pointcloud in Meshlab
Demo Left Demo Meshlab Left
Demo Right Demo Meshlab Right

This result resembles the original images and the provided demo result video on Github so it can be assumed that Spann3r is set up correctly.

The amount of time it took for Spann3r to reconstruct the objects depended, like with Meshroom, on the amount of images provided. However, the most time was spent finding the checkpoints in the project folder, with every image only taking a few seconds to load. On average, the reconstruction was done in about 2 - 3 minutes.

Object Reconstruction

I tried reconstructing my object with all 30 images and afterwards with only 10 images. The results were the following:

30 images (see mimic.ply in 3dmodels folder):

Spann3r Mimic Front Spann3r Mimic Side Spann3r Mimic Back

10 images (see mimicselected.ply in 3dmodels folder):

Spann3r Mimic Front Spann3r Mimic Side Spann3r Mimic Back

Notice how the reconstruction using 10 images is cleaner than the one using 30 images. Creating a mesh from the 10-image pointcloud using Meshlab results in the following:

Mimic Reconstruction

Using the same 10 images in Meshroom resulted in the following:

Mimic 10 image Meshroom

Here, a lot of the model is missing, because Meshroom couldn't predict the camera parameters correctly with only 10 images. This demonstrates how the AI model requires less images than the SfM.

Scene Reconstruction (see zimmer.ply in 3dmodels folder)

I used all 52 images of the scene with Spann3r. The results were:

Room Spann3r

Like in Meshroom, a lot of the room is still missing but Spann3r was able to recreate more of the left side, on which Meshroom had trouble with.

Comparison with other methods

Object comparison

Original Meshroom Colmap Polycam Kiri Engine (Texture missing)
Original Mimic Meshroom Mimic Colmap Mimic Polycam Mimic Kiri Mimic

(Credits for Reconstruction: Colmap, Polycam and Kiri done by Nico)

Notice how there is also a difference in quality among the different SfM Tools i used. While Meshroom got the mesh of the model very accurately, the textures are a little bit too bright. Colmap got the colors in the texture right but it looks a little washed and the mesh has a few holes. Polycam resulted in the best result with the mesh being pretty much perfect and the texture also very accurate. Kiri Engine's mesh result were also good but while loading the model after the workshop, the texture didn't load so i can't say anything about that. All of the apps used all 30 images.

Original Spann3r VGGT Mast3r
Original Mimic Spann3r Mimic VGGT Mimic Mast3r Mimic

(Credits for Reconstruction: VGGT done by Marcel, Mast3r done by Ethan)

The results for the AI model reconstruction of the object also varied greatly. While Spann3r and VGGT both got the general features of the object right, Mast3r produced a lot of artifacts and faces that weren't there in the original. Mast3r also could not reproduce the whole object correctly. Spann3r had the best result out of the tested models, as VGGT didn't get all of the details on the surface of the model right. This result with Spann3r is the one using 10 images and Mast3r also only used about 10 images, while VGGT used all 30.

However, Spann3r also had a lot of trouble reconstructing some of the objects from the other workshop participants, as can be seen in their reports. Sometimes Spann3r had trouble aligning the models, resulting in multiple iterations of the object visible or it couldn't detect some of the surfaces, resulting in huge holes. While running Spann3r, i could only change the used confidence threshold as a method of tweaking the result. But with a higher value also came holes and gaps in the result and with a lower value, the model detected things that weren't there in the original model. Most of the time, the best result came with the default threshold of 0.001.

Scene comparison

Meshroom Spann3r VGGT
Meshroom Room Spann3r Room VGGT Room

(Credits for Reconstruction: VGGT done by Marcel)

Reconstructing the scene was a huge challenge for Meshroom. The AI models delivered better results, with VGGT having the best out of the 2 tested. While it had a little bit of trouble regarding the bad lighting conditions in the room, it reconstructed the general structure of the room pretty accurately. VGGT used the 11 images with the ultra-wide lense as trying it with all images took a long time. The other methods used all 52 images.

Reflection

Do we still need COLMAP (feature detection) or are the AI models good enough?

In conclusion, the results regarding the different methods vary greatly. The SfM methods of reconstructing a 3D model from images of an object was a lot more consistent and yielded better results. Especially Polycam resulted in a great reconstruction of the used object. The AI models had trouble recognizing some of the features and aligning the model correctly. The only fields in which the AI models had an advantage over the SfM were processing time and number of required images to reconstruct. While Meshroom only took about 15 minutes on my hardware, depending on the number of images, Colmap for example required a lot more time to reconstruct. Spann3r had the results ready in less than 3 minutes. While other AI models, for example VGGT also required longer times to load, the average time processing was way shorter for the AI models than the SfM software. Also the number of images required was a lot less on the AIs compared to SfM, as seen in the example where i only used 10 images of my object. However, with only 20 images more, Meshroom produced significantly better results than the AI models. Additionally AI models need CUDA to run, meaning they require a NVidia GPU, while Meshroom also ran on Hardware where CUDA wasn't installed.

In the area of Scene Reconstruction, as seen in my GHB-room example, the AI had a slight advantage over the SfM software. As i only used Meshroom to reconstruct my scene, i can only talk about this method, but the two AI models produced better results with less holes and better reconstruction of the 3D space in the room. However, even though the results were better than those from the SfM, they still weren't perfect.

As described in my problems with Spann3r, i could not test the function where a video as input method is possible. So i could not test the real-time capabilities of Spann3r, where it uses the frames of a video to reconstruct the scene.

In conclusion, despite the AI models having an advantage in certain areas like Scene Reconstruction, processing time, required images and real-time capabilities, the results regarding object reconstruction are still significantly better with SfM software. Especially professional software like Polycam yields much better results than any AI model we tested. Maybe in the future, with advancements in the field, the AI models could surpass the results but for now, i think we still need COLMAP and other SfM software to produce reliable results.