PosterDNA: Secure Poster Design Data

Updated 14 January 2026

PosterDNA is a commercial-grade, HTML-based dataset and paradigm that integrates detailed poster design data with steganographic DNA tagging for authenticity verification.
It comprises multiple annotated subsets including blueprint, background, and HTML layout rendering, supporting end-to-end generative model training with high text fidelity.
The platform connects digital poster generation with physical authentication through randomized DNA libraries, enhancing security in industrial poster production.

PosterDNA is a commercial-grade, HTML-based dataset and application paradigm that combines advanced poster design data resources (for training and evaluating generative models) with methodologies for embedding robust, steganographic molecular tags using randomized DNA libraries. Developed initially as the data backbone for PosterVerse—a full-workflow poster generation framework—PosterDNA also leverages insights from the POSERS system for molecular anti-counterfeiting, thereby connecting digital poster design with physical-world authentication (Liu et al., 7 Jan 2026, Yazdi et al., 1 Mar 2025).

1. Dataset Structure and Composition

PosterDNA is comprised of several major annotated subsets, each engineered to capture a distinct phase of the commercial poster design workflow.

Blueprint Creation Subset: 57,000 posters characterized by text-dense, professionally composed layouts. Annotated with key information including themes, color hex codes, purposes, and visual elements, as well as layered user requirements across “basic,” “medium,” and “detailed” levels.
Graphic Generation Subset: 100,000 high-resolution, text-free backgrounds, evenly partitioned into four style clusters (“Illustrative,” “Design-Oriented,” “Minimalistic,” and “Photorealistic,” each ~25,000). Curated from source material using a Chinese-CLIP model for filtering and further refined via Aesthetic-Predictor V2.5.
Unified Text-Layout Rendering Subset: 9,000 posters, each paired with manually verified HTML files delivering precise, pixel-level typographic and layout metadata.
Test Set: 1,000 external posters annotated with blueprints, HTML typesetting, and ground-truth rendered images, stratified by requirements detail.

The dataset targets commercial poster scenarios, including product marketing, event advertising, and informational displays. Annotation relies on high-throughput LLM prompting and intensive manual design review, resulting in a dataset with high text density and broad stylistic and structural diversity (Liu et al., 7 Jan 2026).

Subset	Samples	Key Characteristics
Blueprint Creation	57,000	Text-dense, JSON-annotated
Graphic Generation	100,000	Four visual styles
HTML Layout Rendering	9,000	HTML/CSS typographic meta
External Test	1,000	Complete blueprints, HTML

2. HTML-Based Scalable Typography and Annotation

PosterDNA introduces HTML-based typography files as its standard for scalable, resolution-independent text rendering. Each layout-annotated file encodes spatial and style metadata in the form of HTML <div> containers with absolute positioning and inline CSS. This design leverages native web font rendering and vector glyphs for lossless, high-fidelity representation of dense or small text, which diffusion-based generative models historically fail to synthesize at high accuracy.

HTML annotation includes:

Bounding boxes: CSS left, top, width, height for each text block.
Typography metadata: Inline CSS on <span> elements for font family, weight, size, color (hex), line height, text alignment, and optional effects like letter spacing or shadow.
Semantic labels: Via class or data-type attributes distinguishing roles (title, subtitle, body, contact info).
Meta-schemas: JSON blueprints encode structured information at the content planning stage; HTML files store typographic and layout schema without external XML.

By design, this annotation model bypasses the limitations of raster image synthesis for small, dense text, supporting precise post-hoc editing and robust downstream learning (Liu et al., 7 Jan 2026).

3. Usage in Generative Model Training and Evaluation

PosterDNA underpins supervised and multimodal model training for end-to-end commercial poster synthesis:

Blueprint LLM Training: Qwen2.5-14B is fine-tuned on JSON triples mapping requirements of varying detail (basic/medium/detailed) to ground-truth blueprints. Supervised fine-tuning (SFT) is performed with cross-entropy loss over token streams.
Diffusion Background Models: Four LoRA-finetuned Flux.1-dev checkpoints are each specialized per visual style, trained with 100,000 prompt-image pairs and dynamic prompt sampling based on Claude-generated hierarchical prompts.
HTML Multimodal Engine: Qwen2.5-VL-7B learns to map (background, blueprint JSON) pairs to HTML layout/typography outputs using 9,000 canonical pairs.

Evaluation uses specialized metrics:

Text accuracy: Correct Rate (CR) and F1-score over text box outputs; PosterVerse achieves CR=92.33%, F1=78.58%.
Layout overlap: PosterVerse posts the lowest measured overlap (0.0027), indicating minimal text-box collision.
Perceptual quality: FID=62.54, outperforming alternatives (~70–120).
IoU: For bounding-box similarity in optional evaluations.
Human and AI ratings: GPT-4o and user studies confirm superiority in prompt adherence, layout fidelity, and text rendering (Liu et al., 7 Jan 2026).

4. Statistical Properties and Schema

PosterDNA is engineered with a focus on measurable text density and information-rich layouts. Key computed statistics include:

Average Text Density: $D = (1/N) \sum_{i=1}^N (T_i/A_i)$ where $T_i =$ cumulative text bounding-box area, $A_i =$ total poster area.
Character count per poster: $\overline{C} = (1/N) \sum_{i=1}^{N} |\text{text}_i|$ .
Text-area ratio: $R_i = T_i/A_i$ per poster.

The schema for blueprint annotation is formalized in JSON format:

{
  "textualContent": { "title": ..., "subtitle": ..., "body": ..., "contactInfo": ... },
  "backgroundAttributes": { "style": ..., "caption": ... },
  "keyParameters": {
    "resolution": [W, H],
    "theme": ...,
    "colorPalette": [...],
    "purpose": ...,
    "visualElements": [...]
  }
}

HTML annotation is fully self-contained, supporting canvas-absolute positioning and role-specific styling (Liu et al., 7 Jan 2026).

5. Molecular Tagging and Secure Authentication: The Role of POSERS

PosterDNA also denotes a paradigm for embedding steganographic, copy- and forgery-proof DNA tags in posters, packaging, and labels, inspired by the POSERS system (Yazdi et al., 1 Mar 2025). This approach abandons the use of predefined DNA barcode sequences, instead constructing large randomized oligo pools (libraries) with steganographically embedded “signatures.”

Key properties and mechanisms:

Randomized Library Construction:
- Given strand length $L$ , select $K_1$ positions restricted to a single nucleotide, $K_2$ to two, $K_3$ to three. All remaining loci are uniformly random over $\{A,C,G,T\}$ .
- The number of valid sequences is $N_{\rm POSERS} = 4^{L-(K_1+K_2+K_3)}\,2^{K_2}\,3^{K_3}$ .
Security Model:
- Authentication relies on secret knowledge of which positions are restricted and permissible alphabets at those loci. Copy/forgery attempts by capturing or synthesizing DNA from an authentic item are provably detectable by Sample Combination (SC) and Sample Variety (SV) tests.
- Unforgeability and indistinguishability are proven under the assumption that the adversary only has access to sequencing and synthesis, not cryptographic primitives.
Authentication Workflow: Minute DNA samples (∼5 ng) are processed via NGS. SC fails if any forbidden nucleotide is detected, SV demands that all permitted nucleotides at each restricted locus are present.

A plausible implication is that the combination of digital poster artifacts and secure molecular tagging enables scalable anti-counterfeiting and digital-physical linkage at commercial scale (Yazdi et al., 1 Mar 2025).

6. Limitations, Open Challenges, and Future Directions

PosterDNA’s strengths include its status as the first large-scale Chinese poster generation resource with HTML-based scalable typography, dense annotation, and workflow-level partitioning. It supports robust small-text rendering, broad stylistic coverage, and modular model development for blueprint, visual, and layout stages (Liu et al., 7 Jan 2026).

Limitations include:

Manual annotation is labor-intensive, accounting for approximately 80% of total dataset creation effort.
Presently, the dataset is tailored to Chinese content; multilingual or cross-lingual expansion is nontrivial and demands translation strategies.
No separate dedicated validation set is available; users must subsample from training data for model validation.
DNA-based labeling, despite robust experimental authentication, still requires well-calibrated sequencing workflows and cost management as synthesis/scanning scales.

Future directions include extending PosterDNA to additional poster genres (educational, cultural), enriching semantic label sets (logos, QR codes), and incorporating semi-automated annotation pipelines. For molecular tagging, further increases in K (restricted loci) and continued advances in sequencing depth are anticipated to increase the cryptographic hardness and future-robustness of DNA-based anti-counterfeiting schemes (Yazdi et al., 1 Mar 2025).

References:

Markdown Report Issue Upgrade to Chat

References (2)

PosterVerse: A Full-Workflow Framework for Commercial-Grade Poster Generation with HTML-Based Scalable Typography (2026)

POSERS: Steganography-Driven Molecular Tagging Using Randomized DNA Sequences (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to PosterDNA.

PosterDNA: Secure Poster Design Data

1. Dataset Structure and Composition

2. HTML-Based Scalable Typography and Annotation

3. Usage in Generative Model Training and Evaluation

4. Statistical Properties and Schema

5. Molecular Tagging and Secure Authentication: The Role of POSERS

6. Limitations, Open Challenges, and Future Directions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

PosterDNA: Secure Poster Design Data

1. Dataset Structure and Composition

2. HTML-Based Scalable Typography and Annotation

3. Usage in Generative Model Training and Evaluation

4. Statistical Properties and Schema

5. Molecular Tagging and Secure Authentication: The Role of POSERS

6. Limitations, Open Challenges, and Future Directions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research