AnimeHair: 3D Anime Hairstyle Dataset
- AnimeHair is a large-scale dataset of 37K curated 3D anime hairstyles with separated hair cards that enable focused learning in a stylized domain.
- The dataset utilizes a control-point based parameterization coupled with an autoregressive transformer to sequence hair as a 'hair language,' ensuring compact and invertible representation.
- Empirical evaluations demonstrate state-of-the-art reconstruction and perceptual metrics, highlighting its effectiveness for both direct mesh editing and conditional generative modeling.
Searching arXiv for the cited papers to ground the article in the latest record. AnimeHair is a large-scale dataset of 37K high-quality anime hairstyles with separated hair cards and processed mesh data, introduced to facilitate both training and evaluation of anime hairstyle generation within the CHARM framework (He et al., 25 Sep 2025). It targets a domain in which anime hairstyle exhibits highly stylized, piecewise-structured geometry that challenges existing techniques, and it is paired with a compact, invertible control-point-based parameterization and an autoregressive generative framework that treats a hairstyle as a sequential “hair language.” In practice, AnimeHair functions both as a curated corpus of decomposed 3D hair geometry and as the empirical basis for learning-based reconstruction and conditional generation.
1. Research setting and motivation
Traditional hair modeling methods focus on realistic hair using strand-based or volumetric representations, whereas anime hairstyle exhibits highly stylized, piecewise-structured geometry. Existing works often rely on dense mesh modeling or hand-crafted spline curves, making them inefficient for editing and unsuitable for scalable learning (He et al., 25 Sep 2025).
Within that setting, AnimeHair addresses two constraints simultaneously. First, it provides a large-scale source of 3D anime-style hairstyles in a form compatible with supervised learning. Second, it aligns dataset design with a representation in which a sequence of control points represents each hair card, rather than treating hair as an undifferentiated dense mesh. This suggests a deliberate coupling between data curation and model architecture: the dataset is not merely a repository of meshes, but a substrate for sequential geometric modeling.
A common misconception is that anime hair can be handled as a minor stylistic variant of realistic hair. The CHARM formulation rejects that assumption by centering hair cards, control-point sequences, and card ordering around the head, rather than realistic strand simulation or volumetric occupancy alone. A plausible implication is that AnimeHair is best understood as a domain-specific dataset whose structure is inseparable from the stylization conventions of anime character design.
2. Dataset construction and statistical profile
AnimeHair consists of 37 000 distinct 3D anime-style hairstyles downloaded from the public VRoid-Hub repository (He et al., 25 Sep 2025). Each original character mesh was normalized to fit in a cube, after which the hair submesh was extracted by material tags. The resulting corpus is described as a dataset of fully decomposed “hair cards” harvested and cleaned from VRoid-Hub.
Preprocessing proceeds through several mesh-cleaning and structural validation stages. Vertex merging and connected-component analysis ensure each hair card is watertight. Endpoints are identified by locating regions where valence patterns change and then tracing each card back to the root. Mesh components that fail to match the repeating-unit template, specifically diamond or triangular pyramids, at recall are discarded. Outliers are also filtered out, including any hair with width or thickness or control points.
The dataset statistics are explicitly tuned for autoregressive sequence modeling. Hairstyles per model range from 25–130 cards. Control points per card range from 20–60. Total control points per hairstyle range from 1 000–6 000, which is stated to be suitable for autoregressive sequences. A random 100-sample held-out test set is reserved, and the remaining 36 900 train the transformer.
The reported distributions further characterize the corpus. Specifically, 51.4 % of cards are “short” ( normalized length), 33 % are medium, and 15.6 % are long. The positions cluster near the head center, positions concentrate higher up, and width and thickness follow smooth but heavy-tailed distributions. These statistics indicate that AnimeHair is not only large-scale, but also distributionally structured around common anime hairstyle conventions such as concentrated scalp attachment regions and variable card extent.
3. Control-point parameterization and invertibility
The geometric core associated with AnimeHair is a control-point-based parameterization in which each hair card is converted into control points (He et al., 25 Sep 2025). At control point , the representation stores five floats:
where 0 is the 3D position, 1 is the half-width, and 2 is the thickness.
Tangent estimation is performed by a fourth-order finite difference on five neighbors using weights 3:
4
A smooth normal field is then computed by least-squares with inter-point smoothness, and 5 is fixed by PCA to avoid the trivial zero solution. Width and thickness directions are defined as
6
Mesh reconstruction follows directly from these quantities. Given 7, the two diamond bases of each unit are rebuilt as
8
and consecutive units are linked by shared vertices with quad faces to recover the original mesh. Inverse encoding reads each unit’s face centroids for 9 and fits base and height for 0. The paper states that this five-parameter model compresses the original mesh by 1.
This representation is characterized as compact and invertible. In practical terms, that means the dataset is not limited to passive storage: it supports direct geometric editing at the level of curve shape, cross-section width, and thickness while remaining compatible with sequence-based learning. A plausible implication is that AnimeHair is simultaneously a dataset format and an implicit interface for hairstyle manipulation.
4. Sequential formulation and generative modeling
Within CHARM, AnimeHair is consumed by an autoregressive transformer that interprets anime hairstyles as a sequential “hair language” (He et al., 25 Sep 2025). Sequence construction begins by sorting cards counterclockwise around the head, looking down 2, in order to capture inter-card structure. Within each card, control points follow root3tip connectivity. The final token stream is
4
where 5 starts, 6 ends each card, and 7 ends the hairstyle.
Conditioning is provided by Michelangelo [Zhao et al. ’23], which converts the input 10 000-point surface cloud into a fixed token sequence 8. A control-point encoder 9 embeds discrete 0 tokens via learnable lookup tables and linearly projects them into hidden states:
1
A decoder-only transformer 2 with 6 layers and hidden dimension 768 performs next-token prediction conditioned on 3 and past hidden states:
4
Cascaded decoders then predict attributes in the order position, width, thickness:
5
Training minimizes the sum of a cross-entropy over discrete token predictions plus two binary cross-entropies for the 6 and 7 classifiers:
8
At inference time, specialized heuristics improve robustness. Root Position Verification tests top-9 alternatives if a newly predicted root is 0 from the scalp cloud. Length Normalization caps control points at 80 per card via cubic-spline resampling and forcibly emits 1 at 100. These design choices indicate that AnimeHair is organized not just for storage, but for stable variable-length decoding.
5. Empirical evaluation and ablation structure
Evaluation is conducted on the 100 held-out hairstyles, with baselines MeshAnything, MeshAnything V2, BPT, and DeepMesh (He et al., 25 Sep 2025). The reported geometry metrics are Chamfer Distance, Earth Mover’s Distance, Hausdorff Distance, and Voxel IoU using 2 voxels. Perceptual quality is assessed by average CLIP cosine similarity over eight rendered views.
On geometric comparison, CHARM reports 3, 4, 5, and 6. For comparison, MeshAnything V2 reports 7, 8, 9, and 0 respectively, while MeshAnything reports 1, 2, 3, and 4. BPT reports 5, 6, 7, and 8, and DeepMesh reports 9, 0, 1, and 2. On the perceptual metric, CHARM reports CLIP 3 with rank 1; MeshAnything V2 reports 4 with rank 2; DeepMesh 5 with rank 3; MeshAnything 6 with rank 4; and BPT 7 with rank 5.
The ablations are directly relevant to AnimeHair as a dataset representation. For sequence ordering, counterclockwise ordering yields 8, 9, 0, and 1, outperforming X-axis sorting, Y-axis sorting, and Z-axis sorting. For parameterization, the five-param method yields the same best values, outperforming the extended vector and explicit vertex alternatives. Hair-Card–Level Metrics are reported to confirm similar trends, showing that individual card shapes are faithfully reconstructed and that the chosen ordering yields the best coherency.
These results support two specific conclusions. First, the dataset’s decomposition into card sequences is empirically consequential rather than merely descriptive. Second, AnimeHair’s control-point format is not only compact, but also associated with the strongest reconstruction and perceptual generation performance among the compared settings.
6. Editing affordances, adjacent tasks, and domain-specific limits
AnimeHair is explicitly framed as artist-friendly and scalable (He et al., 25 Sep 2025). Because each hair card is fully described by 2 control points with 3, artists can directly manipulate curve shape 4, cross-section width 5, and thickness 6 without wrestling with thousands of raw vertices or remeshing. The invertible pipeline lets edits roundtrip between control points and full meshes losslessly. On the learning side, the drastic token compression 7 makes the representation tractable for modern transformers, enabling autoregressive synthesis of complex, variable-length hairstyles.
A useful boundary condition emerges from adjacent work on 2D anime hair processing. “ToonOut: Fine-tuned Background-Removal for Anime Characters” reports that while state-of-the-art background removal models excel at realistic imagery, they frequently underperform in specialized domains such as anime-style content, where complex features like hair and transparency present unique challenges (Muratori et al., 8 Sep 2025). ToonOut therefore collected and annotated a custom dataset of 1,228 high-quality anime images of characters and objects, using gray-scale “alpha” annotation in which intermediate gray values represent partial transparency such as soft hair edges and stray strands. On a 126-image test split, vanilla BiRefNet scores 95.3 % Pixel Accuracy, while fine-tuning yields 99.5 %; Boundary IoU improves from 88.5 % to 95.6 %.
That comparison clarifies a common misunderstanding. AnimeHair is a 3D hairstyle dataset with separated hair cards and processed mesh data, whereas ToonOut addresses 2D foreground-background segmentation and alpha-mask reconstruction. The shared difficulty is hair-specific structure: stylized geometry in the 3D case, and thin strands plus semi-transparent regions in the 2D case. This suggests that anime hair remains a specialized subdomain across both geometry processing and image segmentation, and that domain-specific curation is central in both settings.