Papers
Topics
Authors
Recent
Search
2000 character limit reached

Name That Part: 3D Part Segmentation and Naming

Published 19 Dec 2025 in cs.CV | (2512.18003v1)

Abstract: We address semantic 3D part segmentation: decomposing objects into parts with meaningful names. While datasets exist with part annotations, their definitions are inconsistent across datasets, limiting robust training. Previous methods produce unlabeled decompositions or retrieve single parts without complete shape annotations. We propose ALIGN-Parts, which formulates part naming as a direct set alignment task. Our method decomposes shapes into partlets - implicit 3D part representations - matched to part descriptions via bipartite assignment. We combine geometric cues from 3D part fields, appearance from multi-view vision features, and semantic knowledge from language-model-generated affordance descriptions. Text-alignment loss ensures partlets share embedding space with text, enabling a theoretically open-vocabulary matching setup, given sufficient data. Our efficient and novel, one-shot, 3D part segmentation and naming method finds applications in several downstream tasks, including serving as a scalable annotation engine. As our model supports zero-shot matching to arbitrary descriptions and confidence-calibrated predictions for known categories, with human verification, we create a unified ontology that aligns PartNet, 3DCoMPaT++, and Find3D, consisting of 1,794 unique 3D parts. We also show examples from our newly created Tex-Parts dataset. We also introduce 2 novel metrics appropriate for the named 3D part segmentation task.

Summary

  • The paper introduces the ALIGN-Parts framework that achieves state-of-the-art 3D part segmentation and naming using Partlets and set alignment.
  • It fuses geometry and appearance features through a BiCo module and employs optimal transport with LLM-generated descriptions for precise semantic grounding.
  • The method significantly boosts mIoU metrics by up to +58.8% while reducing runtime and scaling efficiently across diverse 3D datasets.

Semantic 3D Part Segmentation and Naming via Set Alignment: An Expert Overview of "Name That Part: 3D Part Segmentation and Naming" (2512.18003)

Introduction and Motivation

Semantic 3D part segmentation—the simultaneous task of decomposing 3D shapes into geometrically meaningful parts and consistently assigning interpretable names—remains an unresolved bottleneck in vision, robotics, and computational graphics. Existing 3D datasets present annotation inconsistency, and prior computational approaches either (a) produce unlabeled part decompositions or (b) require explicit text prompts yet fail to deliver comprehensive, automatable methods robust to new or ambiguous scenarios. "Name That Part: 3D Part Segmentation and Naming" introduces ALIGN-Parts, which reframes the problem as a direct set alignment task over geometry, appearance, and affordance-aware semantics.

The core contribution is the notion of "Partlets": instance-specific, geometric-semantic primitives that mediate between point-level features and natural language part descriptors. By bipartite-matching these Partlets with LLM-generated, contextually grounded part descriptions using optimal transport, ALIGN-Parts supports scalable, open-vocabulary 3D part annotation and robust evaluation across multiple datasets. The approach is coupled with new metrics measuring both structural and semantic alignment, and comes with a large-scale, verified dataset (TexParts) and unified ontology mapping. Figure 1

Figure 1: ALIGN-Parts segments and names 3D parts, unlike PartField, which performs unlabeled segmentation and is much slower.

ALIGN-Parts: Methodology

ALIGN-Parts comprises three interacting modules: 3D-appearance feature fusion, adaptive Partlet proposal, and semantic text-part alignment.

Dense Multi-Modal Fusion via BiCo

Each 3D point receives both geometry-based features (PartField) and appearance-based features (DINOv2). A BiCo Fusion module employs learnable, bi-directional cross-attention operating on localized kk-NN point graphs, resulting in computationally efficient and information-rich multimodal features. Relative position encoding (Fourier MLP) ensures spatial structure.

Partlet Representation and Assignment

A fixed set of K=32K = 32 learnable Partlets aggregate local features into part-level representations. For each Partlet, the model predicts a soft segmentation mask over the point cloud, a partness score, and a part-specific feature embedding. Transformer self- and cross-attention enables shape-adaptive, context-aware part grouping.

Assignment between Partlets and ground-truth semantic parts is computed via differentiable optimal transport (Sinkhorn iterations), with the assignment cost matrix balancing both mask overlap and semantic affinity (cosine distance between MPNet-encoded text embeddings). Figure 2

Figure 2: The ALIGN-Parts framework: BiCo fusion combines geometry and appearance, Partlets aggregate features, and bipartite matching aligns with semantic part descriptions.

Affordance-Aware Semantic Grounding

Part labels are generated as object-centric, affordance-based descriptions using Gemini LLM, then embedded with MPNet sentence transformers. This ensures disambiguation of lexical overlaps (e.g., "handle") by contextual functional understanding and yields a continuous semantic space suitable for alignment.

The text alignment is enforced via an InfoNCE-style contrastive loss, enabling open-vocabulary matching even for novel categories at inference time. Figure 3

Figure 3: Pairwise cosine similarity matrices show MPNet preserves fine-grained part relations, unlike SigLiP.

Evaluation Protocol and Metrics

The paper emphasizes the inadequacy of conventional, purely geometric metrics in benchmarking semantic part segmentation and introduces two new metrics:

  • Label-aware mIoU (strict, LA-mIoU): Intersection-over-union credit is assigned only when both segmentation and predicted part label match exactly.
  • Relaxed Label-aware mIoU (rLA-mIoU): Partial credit is given for semantic near-misses, weighted by cosine similarity between predicted and ground-truth MPNet text embeddings.

This dual-metric suite robustly captures both the geometric and semantic accuracy of part assignments, and correlation analysis confirms the relaxed metric tracks true semantic performance without producing significant outlier behavior. Figure 4

Figure 4: The relaxed label-aware metric (rLA-mIoU) correlates tightly with class-agnostic mIoU, validating it as a semantic scoring protocol.

Figure 5

Figure 5: Label mismatch statistics confirm relaxed metric’s robustness, showing minimal large-error cases even with label ambiguities.

Experimental Results

ALIGN-Parts is evaluated across unified 3DCoMPaT++, PartNet, and Find3D datasets using consistent vocabulary and annotation mappings. Key findings:

  • ALIGN-Parts surpasses all baselines on both class-agnostic and label-aware mIoU. On average, ALIGN-Parts exceeds PartField by +15.8% mIoU and PartField+MPNet by +58.8% LA-mIoU, with runtime improvements of 100×100\times (0.05s vs. 4s per shape).
  • The approach localizes fine-grained parts (e.g., the screw in scissors) even with sparse input data—10k points versus the 100k required by previous methods—demonstrating enhanced data efficiency and robustness to incomplete observations. Figure 6

    Figure 6: ALIGN-Parts yields correct segmentation and naming, outperforming Find3D and PartField, capturing small/mobile parts.

    Figure 7

    Figure 7: Fine-grained localization (e.g., a tiny screw) is feasible with ALIGN-Parts, illustrating efficiency at reduced point density.

Ablations show that BiCo fusion, Partlets, and explicit semantic alignment all contribute to performance; removal severely reduces both mIoU and semantic accuracy. Inference-time ablations demonstrate that explicit part count supervision is not required—the model adaptively infers cardinality, and dynamic partlets suffice. Figure 8

Figure 8: Sequential addition of components (left-to-right) leads to qualitative and quantitative improvements; full model achieves best results.

Figure 9

Figure 9: Dynamic partlet activation achieves performance similar to oracle part count input, confirming robust cardinality estimation.

Applications: Co-Segmentation, Label Transfer, and Large-Scale Annotation

ALIGN-Parts enables direct feedforward co-segmentation and label transfer, outperforming traditional clustering/initialization approaches (e.g., dependent/independent K-means transfer, as in PartField) which often fail under variable part topologies or cardinality mismatch. Figure 10

Figure 10: Direct ALIGN-Parts inference (middle) yields robust segmentation across shapes with varying part counts, outperforming clustering-based co-segmentation.

The model supports human-in-the-loop bootstrapping of TexParts—a new, large-scale, high-diversity 3D part dataset sourced from TexVerse—at 5–8× lower annotation effort than manual labeling. This accelerates the building of unified, semantically grounded 3D resources and is scalable to millions of assets. Figure 11

Figure 11: Human-in-the-loop annotation pipeline with ALIGN-Parts creates TexParts, accelerating construction of diverse semantic part datasets.

Text-driven part retrieval experiments (cf. Find3D evaluation) show ALIGN-Parts delivers more coherent, connected spatial regions corresponding to queried parts, handling hierarchical decomposition better than per-point assignment methods. Figure 12

Figure 12: Text-driven part retrieval for objects (airplane, motorbike): ALIGN-Parts (center) delivers spatially and semantically consistent retrievals, matching ground-truth (right) and outperforming Find3D (left).

Theoretical and Practical Implications

This work demonstrates that set-level alignment between geometry, appearance, and affordance-based semantics is a tractable, scalable, and highly effective paradigm for 3D part reasoning—enabling open-vocabulary, permutation-invariant, and feedforward prediction. The approach is theoretically sound (leveraging optimal transport, contrastive representation learning, and permutation-equivariant primitives) and practically efficient: annotation pipelines, dataset creation, and deployment in open-world contexts are all feasible.

By integrating LLM-generated affordance-aware descriptions, ALIGN-Parts substantially reduces semantic ambiguity and supports zero-shot transfer to novel part descriptions, although practical generalization is still naturally bounded by the diversity and coverage of training data.

Limitations and Future Directions

ALIGN-Parts currently depends on the available diversity in open 3D datasets and is potentially susceptible to semantic drift under extreme distribution shifts not spanned by PartNet, 3DCoMPaT++, or TexParts. Real-world, raw 3D scans present challenges due to non-manifold structures and noise. End-to-end geometric representation learning (e.g., unfreezing PartField) and multimodal, self-supervised objectives over larger, more diverse corpora represent clear future research axes.

Integration with generative priors, closed-loop robotics affordance learning, articulated object parsing, and part-aware foundation 3D models are also open, impactful directions.

Conclusion

"Name That Part: 3D Part Segmentation and Naming" offers a comprehensive, scalable solution to the semantic 3D part segmentation and naming problem. The introduction of Partlets and set alignment across geometry, appearance, and affordance-aware semantic spaces enables robust, interpretable, and efficient part-level understanding, with strong empirical results, generality to open-vocabulary scenarios, and clear practical benefits for annotation, retrieval, and downstream 3D reasoning. This establishes a new paradigm and benchmark for 3D semantic part research and resource creation.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Explain it Like I'm 14

What is this paper about?

This paper is about teaching computers to look at a 3D object (like a chair, a car, or a plane), split it into its different parts (seat, wheels, wings, etc.), and give each part the right name. The authors built a method called ALIGN-Parts that can do both steps—cutting the object into parts and naming them—quickly and in one go.

What questions were they trying to answer?

Here are the main questions the paper tackles:

  • How can we split a 3D object into meaningful parts and give those parts correct names, all at the same time?
  • How can we make the computer understand parts in a human-like way (for example, a “seat” is a flat surface where you sit), not just by shape?
  • How can we make this work even for new objects or parts the system hasn’t seen before?
  • How can we create a consistent list of parts across different datasets that sometimes use different names for the same thing?

How did they approach the problem?

Breaking objects into “partlets”

Instead of deciding the label for every tiny point on the object separately, the method first creates a small number of smart part proposals called “partlets.” You can think of partlets like smart baskets:

  • Each partlet gathers the points that belong to one part (like all points that make up the “seat”).
  • Each partlet also carries a kind of “name tag” that says what it might be.

Then, the system matches these partlets to a list of possible part names—one name per partlet—so every part gets a single, best-fitting name.

Using three kinds of clues

To figure out the parts and their names, the system blends three types of information:

  1. Geometry: The shape of the object in 3D (for example, “flat and horizontal” suggests a seat).
  2. Appearance: What the object looks like from different camera views (textures or colors can help tell parts apart).
  3. Language: Human-friendly descriptions about what each part does (called “affordances”), like “the part you grasp to open the door” for a door handle.

The language descriptions are written by a LLM and turned into numbers (embeddings) so the computer can compare them to the partlets.

Matching parts to names

The system lines up the partlets with the best-fitting descriptions in a single step. This makes two important things happen:

  • It’s fast, because it matches a few partlets to names, not every point.
  • It’s consistent, because each predicted part gets at most one name, and each name is used only once.

It can also work in “open vocabulary” mode, which means users can provide any set of part descriptions (even new ones), and the system can match parts to those names without retraining.

Making a shared list of parts and a new dataset

Different 3D datasets sometimes use different names for the same part. The authors built a unified part list (an “ontology”) across three big datasets (PartNet, 3DCoMPaT++, and Find3D), covering 1,794 unique parts. They also used ALIGN-Parts to help label a new collection called Tex-Parts, with 8,450 objects and around 14,000 part categories, using human checks only where the system wasn’t confident.

What did they find?

Here are the most important results:

  • The method segments and names all parts in a single, feed-forward pass—no need for multiple steps or extra prompts.
  • It runs fast: about 100× quicker at producing named segments than a strong geometry-only approach (PartField), after data prep.
  • It works for both known object categories and new ones (open vocabulary), because it matches parts to text descriptions on the fly.
  • It handles small, tricky parts (like handles) and keeps parts from overlapping, giving clean, non-overlapping results.
  • It reduces the human effort needed for labeling by 5–8×, thanks to confidence scores that only flag uncertain predictions for review.
  • It unifies part names across datasets, creating consistent training data and making comparisons fairer.
  • It introduces two new evaluation metrics that check both the quality of the part boundaries and whether the names are semantically correct (one strict metric and one relaxed metric that gives partial credit for close matches like “screen” vs. “monitor”).

Why is it important?

This research makes 3D understanding more practical and human-friendly. It can help:

  • Robotics: A robot can locate and name parts like “handle,” “button,” or “leg” to interact safely and correctly.
  • Game and movie creation: Artists can quickly edit or replace specific parts of 3D assets.
  • AR/VR and digital twins: Systems can recognize and label parts for repair, training, or simulation.
  • Big 3D libraries: Companies can automatically organize and search large collections of 3D models by parts.
  • Future AI systems: It provides a faster, scalable way to create high-quality training data by proposing named parts and letting humans verify only uncertain cases.

Overall, ALIGN-Parts moves beyond just cutting objects into shapes—it connects parts to their meaning and purpose, making 3D AI smarter, faster, and more useful in the real world.

Knowledge Gaps

Knowledge gaps, limitations, and open questions

Below is a concise list of what remains missing, uncertain, or unexplored in the paper, framed to guide future work:

  • Open-vocabulary generalization is only partially demonstrated: novel-category evaluation is small (8 objects) and still depends on user/LLM-proposed candidate part lists; the method’s performance under truly open-world, noisy or incomplete candidate sets is not quantified.
  • Dependency on LLM-generated affordance descriptions is unexamined: no analysis of description quality, bias, stability across prompts/LLM versions, or how description errors propagate to segmentation and naming.
  • Text encoder choice (MPNet) is fixed and non-trainable: no study of domain adaptation, multi-lingual support, fine-tuning strategies, or robustness to paraphrases/synonyms beyond the MPNet space; rLA-mIoU depends on MPNet cosine similarity without human validation.
  • Candidate-part generation for unseen categories lacks a principled mechanism: there is no algorithmic way to propose, prune, or rank parts for novel objects without human/LLM intervention, and the effect of overcomplete or noisy candidate lists on precision/recall is untested.
  • Instance multiplicity is unsupported by design: the bipartite matching enforces “each name used at most once,” preventing separate instance-level labeling of repeated parts (e.g., four legs), which many applications and datasets require; extending to part-category and instance-level segmentation is open.
  • Hierarchical part structure is not modeled: overlapping/parent-child relations (e.g., handle composed of subcomponents) are disallowed by the exclusivity/overlap loss, limiting applicability to layered or hierarchical parts common in CAD and real objects.
  • Sensitivity to K=32 Partlets is not ablated: behavior under objects with far more/fewer parts, dynamic K selection, and trade-offs between over-/under-segmentation and runtime are not studied.
  • Fusion and alignment hyperparameters are unexamined: no sensitivity analysis of k-NN size, the geometry/semantics cost weights in OT, InfoNCE temperature, overlap/coverage loss weights, or partness thresholds.
  • Generalization to real-world scans is unclear: the method is trained/evaluated on clean CAD-like data with multi-view renders; robustness to sensor noise, partial observations, non-textured meshes, missing UVs, and scene clutter is untested.
  • Rendering/appearance dependence is a practical limitation: performance when texture is absent, low-quality, or misleading (style/material domain shift) is not quantified; inner/occluded parts not visible in renders are not addressed.
  • Runtime claims exclude preprocessing: speedups do not account for mesh normalization, multi-view rendering, and 2D feature lifting; end-to-end wall-clock cost comparisons (including preprocessing) and scaling with number of views/candidates are missing.
  • Confidence calibration is limited to closed-vocabulary: Mahalanobis-based calibration requires class-conditional statistics from training categories; calibration under open-vocabulary settings (and alternatives like deep ensembles, temperature scaling across categories) is not evaluated.
  • Ontology unification via LLMs lacks quality assurance: no precision/recall or error analysis of merges/splits, handling of granularity mismatches, or reproducibility across LLM prompts/versions; potential for silent semantic drift is not quantified.
  • New metrics need external validation: rLA-mIoU relies on MPNet cosine similarity for partial credit; correlation with human judgments of “near-miss” semantic correctness, and sensitivity to the chosen text encoder/thresholds, is not reported.
  • Limited test scale and coverage: evaluation uses 206 shapes across a few datasets; no breakdown by part frequency, size, or rarity, and no robustness curves versus part size or class imbalance.
  • Object misclassification cascades are unaddressed: in Mode 1, wrong object-class prediction can yield a wrong candidate-part list and cascading naming errors; there is no safeguard or recovery strategy.
  • Handling of ambiguous/polysemous names remains brittle: affordance descriptions help but there is no systematic disambiguation when different objects share part names/functions with divergent geometries.
  • Lack of task-driven validation: despite affordance-centric descriptions, there is no downstream validation in robotic manipulation, physical simulation, or functional reasoning to assess whether named parts align with actionable affordances.
  • No study of cross-dataset/domain adaptation: training mixes PartNet/3DCoMPaT++/Find3D after ontology alignment, but transfer to other repositories (ShapeNet variants, real scans, industry CAD) and strategies for domain adaptation are not explored.
  • Scalability with large candidate vocabularies is unclear: matching cost and accuracy when the candidate label set scales to thousands (e.g., 1,794+ parts) or includes severe distractors is not characterized.
  • Data/annotation quality for Tex-Parts is under-specified: human verification rates, inter-annotator agreement, error types, and cost/time per object are not reported; how confidence thresholds impact precision/recall in real pipelines remains open.
  • Ethical/licensing and reproducibility concerns are unaddressed: reliance on proprietary LLMs (e.g., Gemini) and external assets (e.g., TexVerse) raises issues of version drift, licensing, and reproducibility; standardized releases and ablation on open LLMs are missing.

Practical Applications

Immediate Applications

Below is a concise set of deployable use cases that leverage the paper’s ALIGN-Parts method (open-vocabulary, feed-forward 3D part segmentation and naming, confidence calibration, unified ontology, and new evaluation metrics).

  • Semi-automated 3D part annotation engine for large repositories
    • Sectors: software/AI, 3D content platforms, academia
    • Tools/Products/Workflow: “Part-Annotation-as-a-Service” API; web reviewer UI integrating fused confidence (softmax + Mahalanobis) for auto-accept/flag; integration with Labelbox/Scale AI; batch processing pipelines for CAD/mesh libraries
    • Assumptions/Dependencies: accurate closed-vocabulary calibration data; high-quality meshes or renders for DINOv2 features; access to LLM-generated affordance descriptions; GPU inference
  • Plugins for DCC/CAD/game engines to auto-segment and name parts
    • Sectors: media/games/VFX, AR/VR, industrial design
    • Tools/Products/Workflow: Blender/Maya/3ds Max/Unity/Unreal plugins that assign part groups, naming conventions, and material slots; auto-creation of physics colliders per named part
    • Assumptions/Dependencies: asset meshes are manifold/clean; material/texture views available for appearance cues; ontology mapping to studio naming standards
  • Robotics grasping and manipulation bootstrap (simulation-first)
    • Sectors: robotics, logistics/automation
    • Tools/Products/Workflow: part-aware object models in simulators (Isaac Gym, MuJoCo) to find handles/knobs/latches and pre-compute grasp regions; export “graspable part masks” to motion planners
    • Assumptions/Dependencies: domain gap from CAD/sim to real scans; grasp success still requires downstream control/planning; thin/small parts may need tuning
  • Manufacturing QA and BOM checks from 3D models
    • Sectors: manufacturing, PLM
    • Tools/Products/Workflow: automatic part counting and comparison to BOM; mismatch reports (missing/extra parts); link named parts to ERP/PLM IDs
    • Assumptions/Dependencies: reliable part-to-BOM ontology mapping; tolerance for CAD vs as-built deviations; quality of scans for assembled products
  • E-commerce 3D catalog enrichment and search
    • Sectors: retail/e-commerce
    • Tools/Products/Workflow: auto-tagging of product features (“has armrests”, “5 wheels”, “reversible door”); faceted search filters driven by detected parts; compatibility checks (e.g., mount fits)
    • Assumptions/Dependencies: standardized taxonomy across categories; high-quality 3D assets or vendor CAD; precision in counting repeated parts
  • AR maintenance and assembly guidance using CAD
    • Sectors: field service, consumer electronics, automotive service
    • Tools/Products/Workflow: part-aware overlays that highlight named components during step-by-step instructions; on-device part selection by name for exploded views
    • Assumptions/Dependencies: accurate CAD-to-physical alignment; device compute constraints; controlled lighting/occlusion conditions
  • Cross-dataset ontology unification and dataset curation
    • Sectors: academia, platform data teams
    • Tools/Products/Workflow: use the LLM+MPNet pipeline to merge taxonomies (e.g., PartNet, 3DCoMPaT++, internal ontologies); maintain alias maps; publish unified ontology with IDs
    • Assumptions/Dependencies: human-in-the-loop verification; LLM access and governance; buy-in from dataset owners
  • Adoption of label-aware segmentation metrics in evaluation
    • Sectors: academia, benchmarking consortia
    • Tools/Products/Workflow: integrate LA-mIoU and rLA-mIoU into leaderboards and CI evaluation; use relaxed metric to credit near-miss semantic predictions
    • Assumptions/Dependencies: community agreement on metric definitions and test sets
  • Content policy/moderation for 3D platforms
    • Sectors: platform trust & safety, policy/compliance
    • Tools/Products/Workflow: flag or redact regulated parts (e.g., weapon components) at upload; audit trails linking redactions to part names
    • Assumptions/Dependencies: reliable detection of sensitive parts; legal definitions mapped to ontology; human review for edge cases
  • Repair and 3D printing workflows
    • Sectors: maker/DIY, after-sales service
    • Tools/Products/Workflow: identify broken/missing parts from scans; isolate and export replacement part meshes for 3D printing; generate part lists
    • Assumptions/Dependencies: scan quality; watertight segmentation for printing; tolerances and material selection
  • Physics-aware simulation parameterization
    • Sectors: simulation, robotics, games
    • Tools/Products/Workflow: map part names to default physics/joint templates (wheels → rolling joints; doors → hinge constraints); auto-rigging of mechanical parts
    • Assumptions/Dependencies: correct joint semantics per category; manual overrides for atypical designs
  • Accessible learning content
    • Sectors: education
    • Tools/Products/Workflow: export affordance-aware part captions with highlighted 3D regions; interactive “name that part” learning modules
    • Assumptions/Dependencies: quality of LLM-generated descriptions; need for age/curriculum alignment

Long-Term Applications

These concepts require additional research, domain adaptation, standardization, or real-world validation (e.g., on scans, in-the-wild lighting, or real-time constraints).

  • Real-time, open-world 3D part understanding for mobile/robot platforms
    • Sectors: robotics, consumer AR
    • Tools/Products/Workflow: on-device models segment/name parts from RGB-D streams; voice or text queries (“find the battery cover”) with immediate highlighting
    • Assumptions/Dependencies: model compression and latency targets; robust performance on noisy, partial scans; continual learning and open-set detection
  • Universal cross-industry part ontology standard
    • Sectors: policy/standards, manufacturing, AEC, retail
    • Tools/Products/Workflow: consortium-driven standard for part naming/affordances; mappings to ISO/ETIM/IFC; compliance and interoperability frameworks
    • Assumptions/Dependencies: multi-stakeholder agreement; governance for updates; multilingual harmonization
  • Automated assembly planning and verification
    • Sectors: manufacturing, robotics
    • Tools/Products/Workflow: from part-aware CAD, auto-generate assembly sequences, torque specs, and tool paths; verify assembly completeness with 3D scans
    • Assumptions/Dependencies: high semantic fidelity; integration with process constraints and safety rules; reliable scan-to-CAD registration
  • Construction/AEC digital twins with part-level semantics
    • Sectors: AEC, facilities management
    • Tools/Products/Workflow: part-aware segmentation of building components (HVAC registers, fixtures, fasteners) for maintenance scheduling, inventory, and safety audits
    • Assumptions/Dependencies: large-scale scan robustness; mapping to IFC classes; tolerance for occlusions and clutter
  • Medical device and prosthetics design assistants (part-aware)
    • Sectors: healthcare/medtech
    • Tools/Products/Workflow: design review tools that ensure required subcomponents and connectivities are present (locks, hinges); ergonomic affordance checks
    • Assumptions/Dependencies: domain-specific ontologies and regulatory standards; medical-grade validation; privacy and IP considerations
  • Part-aware generative design/editing copilots
    • Sectors: media/entertainment, industrial design, XR
    • Tools/Products/Workflow: text-to-edit (“widen the chair armrests”, “add cable management bracket”) with reliable part localization and topology-preserving edits; closed-loop with generative 3D models
    • Assumptions/Dependencies: tight integration with 3D generative models; geometric/topological constraints; user-in-the-loop quality control
  • Autonomous inspection and compliance analytics
    • Sectors: automotive, aerospace, consumer electronics
    • Tools/Products/Workflow: detect missing/incorrect parts, improper fasteners, or safety-critical placements automatically; produce compliance reports
    • Assumptions/Dependencies: very high recall/precision; calibrated uncertainty; acceptance by regulators and auditors
  • Sustainability analytics (LCA) and materials estimation from part structure
    • Sectors: ESG/sustainability, manufacturing
    • Tools/Products/Workflow: map part counts/types to materials and processes to estimate embodied carbon; flag high-impact components for redesign
    • Assumptions/Dependencies: bill-of-material mappings to materials databases; accurate volume/thickness estimates; data-sharing agreements
  • Supply chain and catalog matching for parts
    • Sectors: procurement, manufacturing, marketplaces
    • Tools/Products/Workflow: match CAD-derived parts to supplier SKUs; cross-vendor alias resolution; substitution recommendations
    • Assumptions/Dependencies: robust semantic+geometric similarity; access to supplier catalogs; standardized metadata
  • Large-scale educational content generation
    • Sectors: education/EdTech
    • Tools/Products/Workflow: auto-create interactive part-labeled models for curricula; cross-lingual affordance descriptions; assessments using label-aware metrics
    • Assumptions/Dependencies: curriculum alignment; localization; content moderation for age-appropriateness

Cross-cutting assumptions and risks (affecting both categories)

  • Domain shift: Models trained on CAD/clean meshes may underperform on noisy, partial, or texture-less real scans without fine-tuning.
  • Affordance text quality: LLM-generated descriptions can be ambiguous; human verification remains important for high-stakes use.
  • Ontology drift: Different industries require bespoke taxonomies; mappings must be maintained over time.
  • Small/thin parts: Performance can degrade on tiny features; may require higher-resolution inputs or specialized training.
  • Compute and licensing: Dependence on DINOv2, MPNet, and commercial LLMs (e.g., Gemini) introduces cost, licensing, and privacy considerations.
  • Confidence calibration: Closed-vocabulary calibration performs best; open-vocabulary confidence is less reliable and may need conservative thresholds + reviewer workflows.

Glossary

  • AdamW: An optimizer that decouples weight decay from the gradient update to improve generalization. "We use AdamW with an initial learning rate of 3e-4 and cosine annealing to a minimum of 5e-6."
  • Affordance-aware part descriptions: Function-oriented textual descriptions that link a part’s geometry to its intended use. "Semantic knowledge comes from affordance-aware part descriptions that encode form-and-function relationships"
  • Amodal segmentation: Segmenting complete objects or parts independent of occlusion or visibility. "Therefore, most subsequent works have focused on amodal segmentation, which is a relatively easier task."
  • Bi-directional cross-attention: An attention mechanism where features from two modalities (e.g., geometry and appearance) attend to each other in both directions. "We fuse these modalities through a bi-directional cross-attention module"
  • BiCo Fusion module: A cross-modal fusion component that uses bi-directional cross-attention on local k-NN graphs to combine geometry and appearance. "BiCo Fusion module using efficient bi-directional cross-attention on local k=16k{=}16 nearest-neighbor graphs in 3D space"
  • Bipartite assignment: One-to-one matching between two sets (e.g., predicted parts and text descriptions). "matched to part descriptions via bipartite assignment."
  • Conditional Random Field Model: A probabilistic graphical model used for structured prediction over sequences or grids. "Conditional Random Field Model"
  • Cosine annealing: A learning-rate schedule that follows a cosine curve to gradually reduce the learning rate. "cosine annealing to a minimum of 5e-6."
  • Cosine similarity: A measure of similarity between two vectors based on the cosine of the angle between them. "Pairwise cosine similarity heatmaps"
  • DETR: A transformer-based object detection framework that formulates detection as set prediction. "analogous to DETR"
  • Dice loss: A segmentation loss derived from the Dice coefficient, focusing on overlap between predicted and ground-truth masks. "binary cross-entropy and Dice loss supervise masks"
  • Differentiable optimal transport: A soft, gradient-based formulation of the optimal transport problem to align probability distributions. "We establish correspondences between predicted Partlets and ground-truth parts via differentiable optimal transport."
  • Fourier-encoded relative positional biases: Sinusoidal encodings of relative positions used to modulate attention. "To incorporate 3D spatial structure through Fourier-encoded relative positional biases"
  • Gated Fusion: A mechanism using learned gates to control how much cross-modal information is incorporated into features. "Gated Fusion."
  • GELU: The Gaussian Error Linear Unit activation function used in neural network layers. "Two-layer MLP with GELU"
  • Human-in-the-loop: A workflow where automated predictions are verified or refined by human annotators. "human-in-the-loop pipeline that bootstraps TexParts"
  • InfoNCE loss: A contrastive loss that pulls matched pairs together and pushes mismatched pairs apart in embedding space. "Sinkhorn matching establishes a bipartite assignment between Partlet and text embeddings, and an InfoNCE loss further aligns them in a shared representation space"
  • Jonker-Volgenant algorithm: An efficient method for solving the linear assignment problem exactly. "At inference, we use the Jonker-Volgenant algorithm"
  • KNN graph: A graph connecting each point to its k nearest neighbors to structure local computations. "operates on a kk-nearest neighbor (KNN) graph"
  • Label-Aware mIoU (strict, LA-mIoU): An evaluation metric that credits overlap only when the predicted semantic label exactly matches the ground truth. "Label-Aware mIoU (strict, LA-mIoU)"
  • LayerNorm: A normalization technique applied across features within each layer to stabilize training. "LayerNorm"
  • Mahalanobis confidence: A confidence score based on Mahalanobis distance in embedding space using class-conditional statistics. "The Mahalanobis confidence (Eq.~\ref{eq:inference_conf_maha}) requires class-conditional statistics"
  • Mahalanobis distance: A distance measure that accounts for correlations in the data via the covariance matrix. "Mahalanobis-distance based confidence score"
  • mIoU: Mean Intersection over Union, a standard metric for evaluating segmentation quality. "class-agnostic mIoU"
  • Multi-head cross-attention: Attention with multiple heads where queries attend to keys/values from a different set or modality. "multi-head cross-attention, LayerNorm, residual connections"
  • Null class: A special class allowing the model to discard unused proposals to adapt part counts per instance. "A ``null'' class allows the model to discard unused partlets"
  • Open-vocabulary: The ability to handle and match to unseen or arbitrary text labels without retraining. "open-vocabulary matching setup"
  • Optimal assignment: Computing a one-to-one matching that minimizes total cost across pairs. "exact optimal assignment"
  • Optimal transport: A mathematical framework for moving probability mass between distributions to minimize transport cost. "Training uses contrastive alignment and differentiable optimal transport matching"
  • Overlap loss: A regularizer penalizing multiple part assignments per point to enforce exclusivity. "Overlap loss enforces mutual exclusivity"
  • Partlet: A learned part-level representation that includes a soft mask, an embedding, and a partness score. "We introduce partlets - shape-conditioned part proposals with text embeddings"
  • Partness score: A scalar indicating whether a Partlet corresponds to a valid part for the given instance. "a partness score \text{part}_k \in \mathbb{R} indicating whether it represents an actual part"
  • Permutation consistency: Ensuring each predicted part is matched to at most one name and each name is used at most once. "enabling ... permutation consistency: each predicted part receives at most one name, and each name is used at most once."
  • Permutation Invariant: A property indicating results are unaffected by permutations of inputs or labels. "Permutation Invariant"
  • Relative Positional Bias: A learned bias that depends on the relative positions of points, used in attention computations. "Relative Positional Bias."
  • Relaxed Label-Aware mIoU (rLA-mIoU): A label-aware metric that weights IoU by semantic similarity between predicted and ground-truth labels. "Relaxed Label-Aware mIoU (rLA-mIoU)"
  • Semantic grounding: Aligning visual/geometric representations with language to ensure meaningful part names. "produce semantically grounded, visually coherent partlets."
  • Sentence transformer: A transformer-based model that produces fixed-dimensional embeddings for sentences or descriptions. "embedded with a sentence transformer model, such as MPNet (all-mpnet-base-v2)"
  • Set alignment: Formulating prediction as aligning two sets (e.g., partlets and text descriptions) via matching. "treats automatic semantic 3D part segmentation as a direct set alignment problem"
  • Sinkhorn-Knopp iterations: Iterative matrix scaling used to compute soft assignments in entropic optimal transport. "Sinkhorn-Knopp iterations produce a soft assignment matrix"
  • Temperature-calibrated softmax: Softmax with a temperature parameter to adjust confidence calibration. "temperature-calibrated softmax score"
  • Unified ontology: A consolidated, consistent hierarchy and naming scheme across multiple datasets. "we create a unified ontology that aligns PartNet, 3DCoMPaT++, and Find3D"
  • Vocabulary compression: Merging duplicate or equivalent labels across datasets into canonical names to reduce ambiguity. "Our annotation alignment and vocabulary compression system"
  • Zero-shot matching: Matching to new labels or descriptions not seen during training. "supports zero-shot matching to arbitrary descriptions"

Open Problems

We found no open problems mentioned in this paper.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 2 tweets with 38 likes about this paper.

HackerNews