SAMPart3D: Segment Any Part in 3D Objects
Abstract: 3D part segmentation is a crucial and challenging task in 3D perception, playing a vital role in applications such as robotics, 3D generation, and 3D editing. Recent methods harness the powerful Vision LLMs (VLMs) for 2D-to-3D knowledge distillation, achieving zero-shot 3D part segmentation. However, these methods are limited by their reliance on text prompts, which restricts the scalability to large-scale unlabeled datasets and the flexibility in handling part ambiguities. In this work, we introduce SAMPart3D, a scalable zero-shot 3D part segmentation framework that segments any 3D object into semantic parts at multiple granularities, without requiring predefined part label sets as text prompts. For scalability, we use text-agnostic vision foundation models to distill a 3D feature extraction backbone, allowing scaling to large unlabeled 3D datasets to learn rich 3D priors. For flexibility, we distill scale-conditioned part-aware 3D features for 3D part segmentation at multiple granularities. Once the segmented parts are obtained from the scale-conditioned part-aware 3D features, we use VLMs to assign semantic labels to each part based on the multi-view renderings. Compared to previous methods, our SAMPart3D can scale to the recent large-scale 3D object dataset Objaverse and handle complex, non-ordinary objects. Additionally, we contribute a new 3D part segmentation benchmark to address the lack of diversity and complexity of objects and parts in existing benchmarks. Experiments show that our SAMPart3D significantly outperforms existing zero-shot 3D part segmentation methods, and can facilitate various applications such as part-level editing and interactive segmentation.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Explain it Like I'm 14
What this paper is about (big picture)
This paper introduces SAMPart3D, a computer method that can automatically split any 3D object (like a chair, a robot, or a cartoon character) into its meaningful parts (legs, seat, arms, head, etc.). It can do this at different levels of detail—coarse (few big parts) or fine (many small parts)—and it doesn’t need a pre-made list of part names to work. Later, if you want, it can also guess names for each part.
What questions the researchers wanted to answer
The authors focused on a few simple questions:
- How can we cut 3D objects into parts without any hand-made labels?
- How can we make it work on many kinds of objects, even unusual ones?
- How can we control how detailed the split is (few big parts vs. many small parts)?
- Can we name each part after we find it, even if we didn’t use text labels to find the parts?
How the method works (explained simply)
Think of a 3D object like a toy you can photograph from many angles. SAMPart3D learns to find parts using ideas from 2D image tools and brings them into 3D. It happens in three stages:
- Learn general 3D “sense” from tons of objects
- Analogy: A student (the 3D model) learns from a teacher (a powerful 2D image model) by looking at many photos of 3D objects from different angles.
- The teacher here is a 2D vision model called DINOv2 (boosted with a tool called FeatUp to sharpen details). The student is a 3D network (a modified Point Transformer) that learns to produce 3D features that match what the teacher “sees” in 2D.
- The training data is huge (Objaverse: 800,000+ 3D objects), but none of it needs part labels. This helps the model learn a strong general sense of how 3D shapes are structured.
- Learn to group points into parts at different detail levels
- The system uses 2D masks from a popular image tool called SAM (“Segment Anything”) to get hints about which 2D pixels belong together. It then maps those hints into 3D.
- There’s a “scale knob” (called scale-conditioned grouping) that controls how fine the parts should be.
- Analogy: It’s like choosing how thinly to slice a cake—thick slices (coarse parts) or thin slices (fine parts).
- A small network (an MLP) learns, per object, how to group nearby 3D points into parts based on this scale. Finally, a clustering step groups the 3D points into clean part regions.
- Name the parts (optional, after segmentation)
- After the 3D parts are found, the system renders a few images that highlight each part and asks a multimodal AI (a vision-LLM) to suggest a name (like “wing,” “handle,” or “screen”).
- Important: The naming happens after the parts are found. The parts themselves are discovered without any pre-set list of labels or text prompts.
A few extra touches make it work better:
- The 3D model keeps both big-picture and tiny details (similar to having both a map and a magnifying glass), so the parts line up with real edges and corners.
- By avoiding text prompts during training, the method doesn’t get stuck on a limited vocabulary and can scale to very large, unlabeled 3D datasets.
What they found and why it matters
- It works on many kinds of objects, including complex or unusual ones, and across several datasets.
- It can split objects at different granularities: broad chunks or detailed subparts, controlled by a simple scale setting.
- It outperforms previous “zero-shot” 3D part segmentation methods (methods that don’t rely on labeled training data) in both how accurate the parts are and how flexible the system is.
- The authors also created a new, challenging test set called PartObjaverse-Tiny (200 complex objects, carefully labeled). This helps measure real progress on more varied, real-world shapes.
Why this is important:
- Better part understanding makes it easier to edit 3D objects—change materials, reshape specific parts, or animate them—without hand-labeling everything.
- It can help robotics (e.g., figuring out where the handle is), 3D design, games, AR/VR, and 3D content creation.
What this could lead to (impact and future uses)
- Faster 3D editing and design: Designers can quickly select and modify specific parts (like turning a cup’s handle metallic while keeping the cup ceramic).
- Interactive tools: You can click on a spot and adjust the “detail knob” to select just a small piece or a larger region.
- Data creation: It can auto-generate part labels for huge collections of 3D assets, helping future AI models learn even more.
- Robotics and manufacturing: Machines can better understand where parts begin and end, making tasks like grasping or assembling easier.
In short, SAMPart3D is like giving computers a flexible, label-free “part finder” for 3D objects—one that can scale to massive datasets, cut objects into parts at any level of detail, and then optionally name those parts when needed.
Collections
Sign up for free to add this paper to one or more collections.