We use a Text-to-Image (t2i) model to generate unposed multiview images, and using LucidFusion to reconstruct the 3D object.
a ironman from the movie
a hulk from the movie
a minions from the movie
a batman from the movie
美丽的俄罗斯风格套娃
一个小女巫在流泪
We present a flexible end-to-end feed-forward framework, named the LucidFusion, to generate high-resolution 3D Gaussians from unposed, sparse, and arbitrary numbers of multiview images.
Recent large reconstruction models have made notable progress in generating high-quality 3D objects from single images. However, these methods often struggle with controllability, as they lack information from multiple views, leading to incomplete or inconsistent 3D reconstructions. To address this limitation, we introduce LucidFusion, a flexible end-to-end feed-forward framework that leverages the Relative Coordinate Map (RCM). Unlike traditional methods linking images to 3D world thorough pose, LucidFusion utilizes RCM to align geometric features coherently across different views, making it highly adaptable for 3D generation from arbitrary, unposed images. Furthermore, LucidFusion seamlessly integrates with the original single-image-to-3D pipeline, producing detailed 3D Gaussians at a resolution of 512x512, making it well-suited for a wide range of applications.
Pipeline Overview of LucidFusion. Our framework processes a set of sparse, unposed multi-view images as input. These images are concatenated along the width dimension and passed through the Stable Diffusion model in a feedforward manner. The model predicts the RCM representation for the input images. Additionally, the feature map from the final layer of the VAE is fed into a decoder network to predict Gaussian parameters. The RCM representation and the predicted Gaussian parameters are then fused and passed to the Gaussian renderer to generate novel views for supervision.
Examples of cross-dataset content creations with our framework, the LucidFusion, around ~13FPS on A800.
@misc{he2024lucidfusion,
title={LucidFusion: Generating 3D Gaussians with Arbitrary Unposed Images},
author={Hao He and Yixun Liang and Luozhou Wang and Yuanhao Cai and Xinli Xu and Hao-Xiang Guo and Xiang Wen and Yingcong Chen},
year={2024},
eprint={2410.15636},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2410.15636},
}