Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Multiple Heads are Better than One: Few-shot Font Generation with Multiple Localized Experts (2104.00887v1)

Published 2 Apr 2021 in cs.CV

Abstract: A few-shot font generation (FFG) method has to satisfy two objectives: the generated images should preserve the underlying global structure of the target character and present the diverse local reference style. Existing FFG methods aim to disentangle content and style either by extracting a universal representation style or extracting multiple component-wise style representations. However, previous methods either fail to capture diverse local styles or cannot be generalized to a character with unseen components, e.g., unseen language systems. To mitigate the issues, we propose a novel FFG method, named Multiple Localized Experts Few-shot Font Generation Network (MX-Font). MX-Font extracts multiple style features not explicitly conditioned on component labels, but automatically by multiple experts to represent different local concepts, e.g., left-side sub-glyph. Owing to the multiple experts, MX-Font can capture diverse local concepts and show the generalizability to unseen languages. During training, we utilize component labels as weak supervision to guide each expert to be specialized for different local concepts. We formulate the component assign problem to each expert as the graph matching problem, and solve it by the Hungarian algorithm. We also employ the independence loss and the content-style adversarial loss to impose the content-style disentanglement. In our experiments, MX-Font outperforms previous state-of-the-art FFG methods in the Chinese generation and cross-lingual, e.g., Chinese to Korean, generation. Source code is available at https://github.com/clovaai/mxfont.

An Overview of MX-Font for Few-shot Font Generation

The paper introduces MX-Font, a novel few-shot font generation method designed to address challenges in generating font libraries with minimal reference glyphs. Few-shot font generation (FFG) aims to synthesize new font styles from a limited number of reference glyphs, a task that is especially relevant for complex and glyph-rich languages like Chinese or Korean. MX-Font advances beyond prior methods by leveraging multiple localized expert networks to capture a broad range of local styles without being limited to specific language systems, thereby addressing both intra-language and cross-lingual font generation.

Technical Approach

The MX-Font introduces Multiple Localized Experts (MLEs) into the FFG framework, constituting a multi-headed encoder architecture. The model's novelty is its ability to create disentangled style and content representations without requiring explicit component labels for training. Each expert head learns to focus on different sub-glyph local concepts, ensuring comprehensive style capture.

During training, MX-Font employs weak supervision utilizing component labels to aspiratively guide each localized expert to different local concepts. Importantly, MX-Font treats the component assignment problem as a graph matching problem, which is solved using the Hungarian algorithm. This allows the model to learn independent local concepts, facilitating robust style-content disentanglement. Independence among experts is further encouraged by including an independence loss and adversarial content-style loss.

Empirical Results

MX-Font was evaluated on a Chinese font dataset and tested in a cross-lingual transfer scenario to generate Korean fonts while being trained solely on Chinese data. This cross-lingual versatility set MX-Font apart from competitors, enabling zero-shot generation that faithfully preserved style and content fidelity.

Quantitative metrics showed that MX-Font outperformed prior state-of-the-art models in several classification accuracies and user preference studies in both Chinese and cross-lingual tasks. LPIPS measures supported the assertion of superior image quality, although MX-Font's FID in the in-domain scenario, while competitive, was slightly less impressive compared to its accuracy metrics.

The method's robustness is further emphasized by qualitative analyses, demonstrating its capacity to synthesize coherent styles across complex and varied glyph structures without any direct exposure to the target language during training.

Implications and Future Directions

The introduction of MLEs represents a significant step in localized feature extraction for font synthesis, showcasing improved generalization capabilities in previously unseen linguistic systems. The paper provides a comprehensive exploration of the paradigm that font style is inherently localized and can therefore benefit from diversified feature extraction strategies.

In future directions, expanding the scope of MX-Font could include optimization for low-resource language scripts and exploring potential computational efficiency improvements. Moreover, broadening the component assignment logic beyond the Hungarian algorithm may yield faster or more nuanced disentanglement techniques.

MX-Font exemplifies how intelligent model architecture can leverage weak supervision to transcend traditional domain constraints, enabling high-quality, diverse font generation with minimal reference inputs. The solid performance in cross-domain scenarios underscores its promise in practical applications where resource efficiency is critical.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Song Park (12 papers)
  2. Sanghyuk Chun (49 papers)
  3. Junbum Cha (10 papers)
  4. Bado Lee (9 papers)
  5. Hyunjung Shim (47 papers)
Citations (52)