SARA: Makeup Transfer via Spatial Alignment

Updated 4 July 2026

SARA is a makeup-transfer method that uses spatial alignment and region-adaptive normalization to transfer makeup style while preserving facial identity.
The architecture includes three modules—spatial alignment, region-adaptive normalization, and makeup fusion—to address fine-level control and spatial misalignments.
Experimental results on public datasets show that SARA outperforms existing approaches with state-of-the-art performance and robust identity preservation.

Searching arXiv for the target paper and closely related makeup-transfer research to ground the article. arxiv_search(query="makeup transfer spatial alignment normalization makeup transfer arXiv", max_results=10, sort_by="relevance") SARA, short for Spatial Alignment and Region-Adaptive normalization, is a makeup-transfer method introduced to transfer the makeup style from a reference image to source images while preserving the identities of the source images. It is positioned as a controllable approach to makeup transfer, with explicit emphasis on detailed results, robustness to large spatial misalignments, and control at the level of facial parts and makeup shade. The method is organized into three modules—a spatial alignment module, a region-adaptive normalization module, and a makeup fusion module—and is reported to outperform existing methods and to achieve state-of-the-art performance on two public datasets (Zhong et al., 2023).

1. Problem setting and scope

In the formulation associated with SARA, makeup transfer denotes the process of transferring the makeup style from a reference image to source images while preserving source identity. The paper characterizes this task as highly desirable and as one that finds many applications (Zhong et al., 2023).

The central difficulty identified for prior work is that existing methods lack fine-level control of makeup style. According to the abstract, this limitation becomes especially problematic when the source and reference exhibit large spatial misalignments, since high-quality transfer is then difficult to achieve (Zhong et al., 2023). In this sense, SARA is framed not merely as a style-transfer system, but as a method for reconciling appearance transfer with structural inconsistency between faces.

A common misconception is to treat makeup transfer as a purely texture-level operation. SARA is presented against that view. Its emphasis on spatial alignment, target semantic maps, per-region encoding, and identity-feature blending indicates that the task is handled as a structured facial image-translation problem rather than as unrestricted stylization. This suggests that the method is intended to preserve facial organization and identity cues while modifying makeup-specific appearance attributes.

2. Architectural organization

SARA is explicitly described as comprising three modules: a spatial alignment module, a region-adaptive normalization module, and a makeup fusion module (Zhong et al., 2023). The architecture is therefore modular, with each component assigned a distinct role in addressing control and misalignment.

The spatial alignment module is described as preserving the spatial context of makeup and providing a target semantic map for guiding the shape-independent style codes. The region-adaptive normalization module decouples shape and makeup style using per-region encoding and normalization, thereby facilitating the elimination of spatial misalignments. The makeup fusion module blends identity features and makeup style by injecting learned scale and bias parameters (Zhong et al., 2023).

Taken together, these statements imply a decomposition of the problem into three subproblems: aligning facial makeup context, separating geometry from cosmetic appearance, and recombining preserved identity content with transferred makeup attributes. A plausible implication is that SARA attempts to move beyond globally applied style codes by structuring transfer around semantically differentiated facial regions.

3. Spatial alignment and semantic guidance

The first module is defined by two functions: it preserves the spatial context of makeup and provides a target semantic map for guiding the shape-independent style codes (Zhong et al., 2023). This is the paper’s most direct answer to the problem of large spatial misalignment.

The phrase spatial context of makeup indicates that makeup is not treated as an undifferentiated global style. Instead, the placement and local arrangement of cosmetic patterns are preserved in some structured form. The introduction of a target semantic map further indicates that the transfer process is conditioned by a representation of facial regions rather than only by appearance embeddings. This suggests that the target organization of lips, eyes, or other makeup-relevant regions is materially important to the transfer mechanism.

The reference to shape-independent style codes is equally significant. It implies that the makeup representation is intended to be abstracted from source-reference geometric discrepancies. In encyclopedic terms, this places SARA within a family of methods that seek to disentangle appearance from shape, but here the disentanglement is tied specifically to spatially guided makeup transfer rather than to generic face editing.

4. Region-adaptive normalization and decoupling of factors

The second module, the region-adaptive normalization module, is described as decoupling shape and makeup style through per-region encoding and normalization (Zhong et al., 2023). Within the conceptual design of SARA, this is the principal mechanism for eliminating spatial misalignments.

The emphasis on per-region operations matters because it localizes normalization and encoding to semantically differentiated facial regions rather than applying a single global transformation. This suggests a finer granularity of control than methods that rely on whole-image feature alignment alone. The abstract directly connects this regional treatment to the elimination of spatial misalignments, indicating that the method addresses discrepancies in local makeup placement by tailoring the representation and normalization process to specific parts of the face.

The paper’s title foregrounds Region-Adaptive Normalization, which makes this component central to the identity of the method. A plausible interpretation is that SARA treats normalization not as a generic statistical operation, but as a controllable interface between region-specific semantic structure and style transfer. The available record, however, does not reproduce the module formulas, so the exact normalization rule is not specified in the supplied materials.

5. Makeup fusion, identity preservation, and controllability

The third module, the makeup fusion module, is described as blending identity features and makeup style by injecting learned scale and bias parameters (Zhong et al., 2023). This formulation ties the preservation of identity directly to a parameterized fusion mechanism.

Identity preservation is not ancillary in SARA; it is part of the task definition. The method is designed to transfer makeup style while preserving the source image’s identity, and the fusion module is the point at which these two objectives are explicitly combined. The mention of learned scale and bias parameters indicates that the blending is modulated rather than simply compositional. This suggests an adaptive mechanism for reconciling transferred style with retained identity features.

SARA is also presented as part-specific and shade-controllable (Zhong et al., 2023). These are two of its principal controllability claims. Part-specific transfer indicates that makeup can be controlled at the level of facial regions, while shade-controllable transfer indicates that the intensity or tonal character of the makeup can be manipulated. Since the supplied record does not include operational details or an interface description, the exact control protocol is not specified, but the claims establish controllability as a defining property of the method rather than as an incidental by-product.

6. Reported performance and research significance

The paper reports that SARA outperforms existing methods and achieves state-of-the-art performance on two public datasets (Zhong et al., 2023). These claims place the method competitively within the makeup-transfer literature and identify empirical validation as part of its contribution.

At the same time, the supplied record is limited. The available abstract states the headline claims, but the accompanying record explicitly notes that the actual method description, module formulas, and experimental results were not provided in the supplied source text. Accordingly, dataset names, evaluation protocols, ablation results, and numerical metrics are not specified in the present record. Any stronger reconstruction of the training objective, implementation, or benchmark setup would therefore exceed the information available here.

Even with that limitation, SARA’s significance is clear at the level of research framing. It is presented as a response to two persistent difficulties in makeup transfer: insufficient fine-level control and degradation under large spatial misalignment. Its design centers on spatial alignment, region-conditioned normalization, and identity-style fusion, and its claimed outcomes are detailed makeup transfer, part-specific and shade-controllable editing, and state-of-the-art empirical performance (Zhong et al., 2023). This suggests a broader methodological direction in which controllability and geometric robustness are treated as first-class objectives in facial style-transfer systems.

Markdown Report Issue Upgrade to Chat

References (1)

SARA: Controllable Makeup Transfer with Spatial Alignment and Region-Adaptive Normalization (2023)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to SARA.