LucidFusion: Reconstructing 3D Gaussians with Arbitrary Unposed Images

We present a flexible end-to-end feed-forward framework, named the LucidFusion, to generate high-resolution 3D Gaussians from unposed, sparse, and arbitrary numbers of multiview images.

Abstract

Recent large reconstruction models have made notable progress in generating high-quality 3D objects from single images. However, current reconstruction methods often rely on explicit camera pose estimation or fixed viewpoints, restricting their flexibility and practical applicability. We reformulate 3D reconstruction as image-to-image translation and introduce the Relative Coordinate Map (RCM), which aligns multiple unposed images to a “main” view without pose estimation. While RCM simplifies the process, its lack of global 3D supervision can yield noisy outputs. To address this, we propose Relative Coordinate Gaussians (RCG) as an extension to RCM, which treats each pixel’s coordinates as a Gaussian center and employs differentiable rasterization for consistent geometry and pose recovery. Our LucidFusion framework handles an arbitrary number of unposed inputs, producing robust 3D reconstructions within seconds and paving the way for more flexible, pose-free 3D pipelines.

Method Overview

Pipeline Overview of LucidFusion. Our framework processes a set of sparse, unposed multi-view images as input. These images are concatenated along the width dimension and passed through the Stable Diffusion model in a feedforward manner. The model predicts the RCM representation for the input images. Additionally, the feature map from the final layer of the VAE is fed into a decoder network to predict Gaussian parameters. The RCM representation and the predicted Gaussian parameters are then fused and passed to the Gaussian renderer to generate novel views for supervision.

Cross-dataset Results

Examples of cross-dataset content creations with our framework, the LucidFusion, around ~13FPS on A800.

BibTeX

@misc{he2024lucidfusion,
      title={LucidFusion: Reconstructing 3D Gaussians with Arbitrary Unposed Images}, 
      author={Hao He and Yixun Liang and Luozhou Wang and Yuanhao Cai and Xinli Xu and Hao-Xiang Guo and Xiang Wen and Yingcong Chen},
      year={2024},
      eprint={2410.15636},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2410.15636}, 
}

LucidFusion: Reconstructing 3D Gaussians with Arbitrary Unposed Images

Our Results

More Results

Abstract

Method Overview

Cross-dataset Results

BibTeX