Optimization of a 3D Scene Reconstruction Application

Optimization of a 3D Reconstruction Application

Scene Reconstruction Pipeline application Internet provides access to a vast, ever-growing collection of photographs of cities and landmarks from all over the world. A simple search for a city name in Flickr returns millions of photographs capturing different parts of the city from various viewpoints. Creating accurate 3D models of cities is a problem of great interest and with broad applications. The 3D Scene Reconstruction application [1] that we are using is a software aimed to reconstruct a 3D model from an unstructured collections of photographs. It is achieved by applying new computer vision techniques.

This application has a pipeline as shown in Figure 1 architecture which consists of four parts: Sift-GPU [2], KeyMatchFull [3], Bundler [4] and Pmvs[5]. Each of these parts represents an independent application that keeps the communication in the pipeline through input and output files. The final output is a file in a ply format that can be visualized as a 3D model.

The work presented in this project is a solution for optimizing an existing 3D Scene Reconstruction application [1]. The application is aimed to match and reconstruct scenes from extremely large unstructured collections of photographs. The reconstruction process, which performs complex algorithms from image matching to large scale optimization, can take days. In this paper we propose a way to speed up the performance by porting the application on GPU computing. Our experimental results demonstrate that it is now possible to reconstruct 3D objects 4 - 5.6 times faster.


The solution that we propose in this project for optimizing the 3D Scene Reconstruction application involves optimizing each of the three main steps of the pipeline: SiftGPU, KeyMatchFull and Bundler:
  • SiftGPU is the first step of the application pipeline. Sift, which stands for Scale Invariant Feature Transform, is a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. SiftGPU is an implementation of Sift for GPU computing
  • KeyMatchFull is an intermediate step between the extraction of keys in SiftGPU and the Bundler. It associates images with common features. It compares two images by calculating the nearest neighbors for all the keys of the images and produces a list of matches.
  • The Bundler application is an implementation of a Structurefrom- Motion (SfM) algorithm that takes the set of images, a list of images features and a list of image matches as input, and produces a 3D reconstruction of camera and (sparse) scene reconstruction as output.

These three applications are independent and they communicate through input and output files. Therefore in each of them we apply different approaches to reduce the execution time, which are described in more details in section 3. For measuring the performance of the application we used three different data sets:
  • Xu31 Consists of 31 images of size 3456x5184 pixels.
  • Car31 Consists of 31 images of size 2048x1536 pixels.
  • Car61 Consists of 61 images of size 2048x1536 pixels.

Analysis and Results

Figure 2 shows the improvement that was achieved over the total time of the pipeline.

Table 2


SiftGPU processes pixels in parallel to build Gaussian pyramids and detects Difference of Gaussian key points. The result of the process contains the orientations and descriptors of the key points. The original implementation of SiftGPU extracts key points for every image on a single GPU. By using the maximum number (4) of GPUs on a node, we modified the SiftGPU implementation by processing four images in parallel. We used OpenMP for thread manipulation.

Table 2

The results obtained in Figure 3 show significant improvements between the two implementations. The performance of the key point extraction process was speeded up to 4.53 - 5.1 times as shown in Figure 4. In addition, it proves the advantage of using OpenMP correctly for splitting the SiftGPU process into multiple processes. Moreover, Figure 3 also depicts the different execution time of SiftGPU for the same number (31) of images for data sets Xu31 and Car31 because the dimension and the key points of the images used in those data sets are not the same. The achieved result was even more than expected (more than four times) because the way of applying multiple processes reduces the execution time for loading SiftGPU initialization and releasing the memory.


The initial implementation uses ANN library [2] to match the key points between two images. Although ANN is highly optimized to minimize the number of calculations needed for finding the nearest neighbors, this task can be ported on GPU. We came across the Fast K-Nearest Neighbor (KNN) Search integrated GPU Computing [6], a cuda library that performs the same task. Although the search algorithm is much simpler, resulting in much more calculations, we can benefit a lot by the parallelization. KNN was very promising since it was documented that its performance can be 64 times faster than ANN. We managed to replace the KNN library with ANN library in the KeyMatchFull implementation. The KNN version that we used is highly customized to meet our needs. The output results we obtain by using KNN are slightly worse than the initial ones but this doesn't affect significantly the whole process and is a fair trade-off for the speed up that we obtain.

Table 2

Figure 4 shows the timing results we obtained running the three versions of KeyMatchFull (ANN-CPU, KNN-GPU, KNN-4 GPU) with our testing data sets. The measurements do not include the loading phase but contain the saving time. It is easily noticed that the GPU version is about 2.5 times faster than the initial one. The reason that there is a great differentiation between what KNN was promising and the final result is because the testing data that were used in KNN's research paper [6] differ significantly from our case.


From the profiler results, we noticed that the step that takes most time during the execution in the CPU is the Levenberg-Marquardt (LM) algorithm. It is aimed to solve a linear system of equations for bundle adjustment created out of the parameters given as input for the Structure-from-Motion (SfM) process. The function that solves this equations system is in the library sba-1.5 [8], and it takes 22.1% of the execution time.

Based on the described result we decided that the LM procedure should be parallelized. In order to do it, an existing implementation of the Bundler that uses Multi-Core CPU and Single-GPU implementations of the SfM was used [9]. That implementation is focused on reducing the use of memory and time in the execution of the Hessian, Schur complement and Jacobian, which are usually used by LM to reduce the size of the linear system [10]. The code is contained in the library libpba, which was integrated in the application and used instead of sba-1.5 [7].

After the integration was performed, we ran experiments in order to compare the performance between the Single-CPU, Multi- CPU and Single-GPU approaches. The three data sets described in the introduction of this paper were used. In every experiment two measurements were taken, the time of execution of the SfM routine (which was parallelized) and the total time of execution of the Bundler. The results obtained are presented in Figure 5.

Table 2

As it can be observed in the Figure 5, after the parallelization of the Structure-from-Motion the reduction in the execution time of this routine influenced the total execution time of the Bundler. The total time decreased according to the speedup factors of the Structure-from-Motion routine, which can be observed in Figure 6.

Table 2

  • The 3D scene reconstruction application that we optimized was initially designed to run in clusters of multiple CPUs. For very large input sets and by using large numbers of CPUs, execution time can take up to days. The purpose of our research was to investigate if we can speed-up the whole application by using a number of GPUs and thus decrease the CPUs needed. Despite the fact that in our analysis we used one CPU the results we obtained can be generalized and prove our initial assumption. Moreover there is still room for further optimization and better utilization of the GPU capabilities.

  • We encountered a lot of challenges while trying to parallelize the code-base of the application. Possibilities for parallelism where sometimes limited by the nature of the initial design or the structure of the algorithms. Re-design and use of alternative algorithms was needed. Moreover, the code itself was complex and purely documented making the adaptations even more intricate. In addition, the way the input data sets affect execution time is not always apparent. The latest was also a challenge when trying to evaluate our results. However, we believe that the numbers we presented in this paper our very representative.

  • Our analysis shows that GPU computing can significantly reduce the execution time of this application. We have already achieved an overall speed-up of about 5 times and with further improvements this number can be increased.

  • [1] S. Agarwal, Y. Furukawa, N. Snavely, I.Simon, Building Rome in a Day
  • [2] C. Wu, A GPU implementation of Scale Invariant Feature Transform
  • [3] D. Mount, S. Arya, A Library for Approximate Nearest Neighbor Searching
  • [4] N. Snavely, Structure from Motion for Unordered Image Collections
  • [5] Y. Furukawa, J. Ponce, Patch-based Multi-view Stereo Software
  • [6] V. Garcia, E. Debreuve, F. Nielsen, and M. Barlaud. K-nearest neighbor search: Fast gpu-based implementations and application to highdimensional feature matching
  • [7] M. A. Lourakis and A. Argyros. SBA: A Software Package for Generic Sparse Bundle Adjustment.
  • [8] C. Wu. SiftGPU: A GPU implementation of scale invariant feature transform (SIFT)
  • [9] C. Wu, S. Agarwal, B. Curless, and S. M. Seitz. Multicore bundle adjustment.
  • [10] S. Agarwal, N. Snavely, I. Simon, S. M.Steven, and R. Szeliski. Building rome in a day
  • Alex Loizidis
  • Diana Ahogado
  • Fanis Grollios
  • Ivana Kostadinovska
  • Huy Nguyen