Gaga: Group Any Gaussians via 3D-aware Memory Bank

Abstract

We introduce Gaga, a framework that reconstructs and segments open-world 3D scenes by leveraging inconsistent 2D masks predicted by zero-shot class-agnostic segmentation models. Contrasted to prior 3D scene segmentation approaches that rely on video object tracking or contrastive learning methods, Gaga utilizes spatial information and effectively associates object masks across diverse camera poses through a novel 3D-aware memory bank. By eliminating the assumption of continuous view changes in training images, Gaga demonstrates robustness to variations in camera poses, particularly beneficial for sparsely sampled images, ensuring precise mask label consistency. Furthermore, Gaga accommodates 2D segmentation masks from diverse sources and demonstrates robust performance with different open-world zero-shot class-agnostic segmentation models, significantly enhancing its versatility. Extensive qualitative and quantitative evaluations demonstrate that Gaga performs favorably against state-of-the-art methods, emphasizing its potential for real-world applications such as 3D scene understanding and manipulation.

Video

Method

Overview of Gaga. Gaga reconstructs 3D scenes using Gaussian Splatting and adopts any open-world model to generate 2D segmentation masks. To eliminate the 2D mask label inconsistency, we design a mask association process, where a 3D-aware memory bank is employed to assign a consistent group ID across different views to each 2D mask based on the 3D Gaussians projected to that mask. Specifically, we find the corresponding Gaussians projected to 2D mask and assign the mask with the group ID in the memory bank with the maximum overlapped Gaussians After 3D-aware mask association process, we use masks with multi-view consistent group IDs as pseudo labels to train an identity encoding on each 3D Gaussian for segmentation rendering.

Gallery

🗺️ Open-world 3D Segmentation

Gaga accommodates well with diverse open-world 2D segmentation models.

🤝 EntitySeg

RGB

Gaussian Grouping

Gaga

🤝 SAM

RGB

Gaussian Grouping

Gaga

❓ Why 3D-aware

Gaga integrates spatial information to precisely locate objects and associate 2D masks, leading to multi-view consistent segmentation. While previous method without 3D-awareness struggles when encountering following scenarios:

👯 Similar objects present in the scene

RGB

Gaussian Grouping

Gaga

📷 Camera poses change significantly

RGB

Gaussian Grouping

Gaga

😭 Limited images are available (e.g., using only 9 training images)

RGB

Gaussian Grouping

Gaga

🪄 Scene Editing

Gaga achieves high-quality and multi-view consistent 3D segmentation, enables tasks like scene editing.

✨ Change the color of cushion on to maroon; remove

Before Editing

Gaussian Grouping

Gaga

✨ Move closer to the window 👉

Before Editing

Gaussian Grouping

Gaga

BibTeX

@misc{lyu2024gaga,
      title={Gaga: Group Any Gaussians via 3D-aware Memory Bank}, 
      author={Weijie Lyu and Xueting Li and Abhijit Kundu and Yi-Hsuan Tsai and Ming-Hsuan Yang},
      year={2024},
      eprint={2404.07977},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Gaga: Group Any Gaussians via 3D-aware Memory Bank

Gaga groups any Gaussians in an open-world 3D scene and renders multi-view consistent segmentation. We show 3D segmentation rendering results leveraging different 2D segmentation models, Segment Anything (SAM) and EntitySeg.

Abstract

Video

Method

Gallery

🗺️ Open-world 3D Segmentation

🤝 EntitySeg

RGB

Gaussian Grouping

Gaga

🤝 SAM

RGB

Gaussian Grouping

Gaga

❓ Why 3D-aware

👯 Similar objects present in the scene

RGB

Gaussian Grouping

Gaga

📷 Camera poses change significantly

RGB

Gaussian Grouping

Gaga

😭 Limited images are available (e.g., using only 9 training images)

RGB

Gaussian Grouping

Gaga

🪄 Scene Editing

✨ Change the color of cushion on to maroon; remove

Before Editing

Gaussian Grouping

Gaga

✨ Move closer to the window 👉

Before Editing

Gaussian Grouping

Gaga

BibTeX