Qwen-Image-2.0-RL Technical Report
Abstract: We present Qwen-Image-2.0-RL, a post-training pipeline that applies reinforcement learning from human feedback (RLHF) and on-policy distillation (OPD) to improve both the visual quality and instruction-following capability of the Qwen-Image-2.0 diffusion model. To provide reliable reward signals, we construct task-specific composite reward models by fine-tuning vision-LLMs with a pointwise scoring paradigm and chain-of-thought reasoning. For text-to-image generation, the reward models cover alignment, aesthetics, and portrait fidelity dimensions. For image editing tasks, the reward system addresses instruction-following accuracy and face identity preservation. Building on this reward system, we develop a scalable GRPO-based RL training framework, incorporating a hybrid classifier-free guidance (CFG) strategy to preserve pre-trained knowledge, prompt curation via intra-group reward range filtering, and per-category reward weight calibration. To merge the task-specialized RL policies for T2I and editing, we propose on-policy distillation as the final training stage, which consolidates multiple teachers into a single student model through trajectory-level velocity matching. Extensive evaluation shows that Qwen-Image-2.0-RL achieves 57.84 overall score on Qwen-Image-Bench (+2.61 over the base model), Elo ratings of 1193 in text-to-image arena (+78) and 1349 in image edit arena (+93), demonstrating consistent gains in aesthetic quality, prompt adherence, and editing accuracy.
First 10 authors:
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Explain it Like I'm 14
A simple explanation of “Qwen-Image-2.0-RL”
1) What is this paper about?
This paper is about teaching an image-making AI to do two things better at the same time:
- Make nicer-looking pictures from text prompts (text-to-image).
- Edit pictures as requested without messing up important details (like someone’s face).
The team does this by letting the AI practice, get scored by “judges” that act like people, and then learn from those scores. After training two expert versions (one great at making images and one great at editing), they combine both skills into a single all-in-one model.
2) What questions are they trying to answer?
- How can we give an image AI feedback that matches what people actually like (pretty, accurate, and faithful to the instructions)?
- Can we train the AI to improve both text-to-image and image editing, not just one?
- How do we merge separate expert AIs into one model without losing quality?
- How do we keep this training stable, efficient, and scalable?
3) How did they do it? (Methods in everyday terms)
Think of the AI as an art student. The researchers build a training program with coaches, rubrics, and a final exam:
- Judges that act like people: They train “judge models” (based on vision-LLMs) to score images. These judges don’t just say “A is better than B.” Instead, they give absolute scores (like 1–5 stars) for what matters. This “pointwise” scoring is like grading an essay on a rubric, not just picking a winner, and it gave better results than pairwise preferences.
- For text-to-image, judges score:
- Alignment: Did the image match the prompt (objects, colors, positions, actions)?
- Aesthetics: Is it pleasing—good lighting, composition, and texture?
- Portrait quality: Are faces attractive, proportional, and realistic (skin/hair detail, fingers, no weird distortions)?
- For image editing, judges score:
- Instruction following: Did the edit do what was asked (replace, restyle, recolor, etc.)?
- Face identity: Does the person still look like the same person? They use a special identity checker for this.
- Practice with feedback (reinforcement learning, RL): The AI generates several images for the same prompt, the judges score them, and the AI learns to make higher-scoring images next time. To keep learning stable and fair:
- They compare images within the same prompt group (so scores are comparable).
- They use a “hybrid guidance” trick: during image sampling they nudge the model to listen closely to the prompt (this is called classifier-free guidance), but during learning they remove that nudge. This keeps training stable while preserving the model’s knowledge.
- They send images to the judges asynchronously (in the background) so training doesn’t slow down.
- They pick prompts that actually help learning (prompts where images vary a lot in score), and they adjust score weights by category (e.g., portraits get more face-related weight).
- Focus on the right steps: Diffusion models create pictures by starting from noise and repeatedly refining. They emphasize learning more from the earlier, noisier steps (which decide the big layout), so the AI improves structure and avoids “gaming” the reward.
- Two expert teachers → one student (on-policy distillation): After training a top text-to-image model and a top editing model, they teach a single student model to follow each teacher’s step-by-step drawing moves on the student’s own path. Think of it like shadowing the teacher’s strokes while drawing. This merges both skills into one model without making the student chase conflicting goals at once.
4) What did they find and why does it matter?
They tested their final model, Qwen-Image-2.0-RL, against the original base model:
- On a broad benchmark (Qwen-Image-Bench), the overall score went from 55.23 to 57.84. The biggest improvements were in creative generation and real-world realism.
- In head-to-head human-style preference arenas:
- Text-to-image Elo rating: 1115 → 1193 (+78)
- Image editing Elo rating: 1256 → 1349 (+93)
Why it matters:
- Images look better (sharper textures, nicer composition).
- The AI follows prompts more accurately.
- Edits keep faces and identities consistent while doing the requested changes.
- One model can do both generation and editing well.
5) So what’s the bigger impact?
- Better tools for creators: You can describe what you want, and the AI will produce or edit images that look good and match your instructions.
- Unified workflows: Instead of juggling different models for generation and editing, you get one reliable model for both.
- A training recipe that scales: Combining human-like judges, stable RL tricks, and the “teacher-student” distillation shows a practical way to align image AIs with human preferences.
- Future directions: Stronger and fairer judges, safer and more controllable edits, applying the same ideas to video or 3D, and handling even more complex instructions with fewer mistakes.
In short, the paper shows how to teach an image AI to both follow instructions and make beautiful, accurate pictures—and then combine those skills into a single, dependable model.
Knowledge Gaps
Knowledge gaps, limitations, and open questions
The paper introduces a promising RLHF and on-policy distillation pipeline for image generation and editing, but it leaves several aspects unresolved that future work could address:
- Reproducibility details are missing (e.g., learning rates, optimizer choices, gradient clipping, training steps/epochs, batch size, guidance scales, importance sampling clip ε, σ_t noise schedule, seed control, LoRA vs full-parameter finetuning); without these, the training procedure cannot be faithfully replicated.
- Reward model datasets are under-specified: sizes, sources, prompt distributions, demographics, and curation criteria (especially for the portrait and editing datasets) are not reported; no inter-annotator agreement or label quality analysis is provided.
- The claimed advantage of pointwise over pairwise reward training is only demonstrated qualitatively; no quantitative ablations (e.g., on held-out human judgments, correlation with human preference, calibration error, or task-specific metrics) are reported, nor is the impact of chain-of-thought prompting isolated.
- The composite reward weights w_k and the per-category calibration strategy are ad hoc and not learned; how weights are chosen/tuned, how sensitive performance is to these choices, and whether adaptive or learned weighting (e.g., via bandits or gradient-based meta-learning) would perform better remains unexplored.
- Prompt curation via intra-group reward range filtering may bias training toward prompts with high variance while ignoring important but low-variance or difficult prompts; the impact of this selection bias on generalization to real-world prompt distributions is unmeasured.
- The hybrid CFG strategy (CFG in rollout only) lacks a rigorous analysis: optimal guidance scales, stability regions, and trade-offs with knowledge retention are not quantified; how this strategy interacts with negative prompts, style prompts, or multilingual prompts is not evaluated.
- Timestep subsampling is proposed (emphasizing high-noise steps), but the exact schedule, selection criteria, and sensitivity are not described; comparisons against SNR-based weighting, full-timestep training, or curriculum schedules are absent.
- The RL framework parameters (e.g., stochastic sampler σ_t schedule, discretization steps, and the exact computation of importance ratios) and any KL/reference regularization are not reported; safeguards against over-optimization (beyond qualitative statements) are not quantified.
- Using remote VLM reward APIs introduces potential non-determinism and version drift; procedures for ensuring reward reproducibility (e.g., model version pinning, response caching, randomized seeds) and measurements of latency/throughput scaling are not detailed.
- Reliance on VLM-based rewards raises overfitting concerns to idiosyncrasies of the judge models; evidence that improvements transfer to truly independent human evaluations beyond the reported arenas (which may reflect overlapping user bases) is limited.
- The portrait “attractiveness” reward invites demographic and cultural bias; there is no fairness analysis across age, gender, skin tone, or ethnicity, nor safeguards against reinforcing biased beauty standards or identity stereotypes.
- The face identity consistency scorer is not described (architecture, training data, thresholds) and may be demographically biased; its accuracy across demographics and under challenging conditions (pose, lighting, occlusion) is not evaluated.
- Safety and policy alignment are not part of the composite reward: handling of unsafe/NSFW content, trademarked logos, deepfake risks, or image watermark/IP compliance is not discussed; integrating safety-aware rewards or constraints is an open need.
- Diversity and coverage effects of RL/OPD are unreported; there are no metrics for intra-prompt diversity, style coverage, mode collapse, or prompt coverage (e.g., LPIPS diversity, multi-seed variance, or compositional generalization tests).
- Editing evaluation focuses on portraits; systematic benchmarks across diverse edit types (non-human objects, layout changes, lighting/style, localized edits, multi-step edits) and quantitative metrics (edit success, content preservation) are missing.
- The OPD procedure raises unanswered questions: how to route ambiguous or mixed tasks (e.g., prompts that imply both generation and editing), how to handle conflicting signals when adding more than two teachers, and how much teacher capability is lost in distillation (teacher–student gap) are not quantified.
- Distilling CFG-enabled teachers into a student trained without CFG, then “integrating CFG” after OPD, is not fully specified; the mechanism for restoring/aligning the unconditional branch and the effect on inference-time behavior need clarification and ablation.
- Robustness to out-of-distribution prompts (long, compositional, multilingual, rare entities), adversarial prompts, and long-tailed categories is not assessed; the impact of the prompt curation process on OOD robustness is unknown.
- The evaluation heavily uses Qwen-Image-Bench (judged by a Qwen-derived judge) and internal arenas; independence from training/reward sources and potential leakage or shared biases between reward models and evaluators are not established.
- Theoretical grounding of OPD is incomplete in the main text (derivation truncated) and specific assumptions (e.g., discretization error, W2 upper bound conditions, teacher/student sampler mismatches) are not validated empirically.
- The effect of training with an SDE for RL while sampling with an ODE at inference is not analyzed; potential mismatches and their impact on final quality and stability remain unclear.
- Compute costs, energy footprint, hardware configuration, and wall-clock time for RL and OPD are not provided; practical scalability limits and cost–quality trade-offs are therefore unknown.
- Data governance and ethics are not discussed for the portrait/reference datasets (copyright, consent, privacy, provenance, demographic balance); policies for handling sensitive identities and preventing misuse (e.g., deepfakes) are unspecified.
- Generalization to additional modalities and tasks (inpainting, outpainting, typography-specific rewards, layout-constrained generation, video) and the extensibility of the OPD framework to more than two teachers remain open design and engineering questions.
Practical Applications
Immediate Applications
Below are concrete, immediately deployable uses that build on the paper’s methods and demonstrated performance gains.
- Bold: Creative suite upgrade for T2I and editing
- Sector: Software, Media & Entertainment, Design
- What: Integrate Qwen-Image-2.0-RL as the generative/editing backend in design tools (e.g., plugins for Adobe/Figma, web UIs) to improve prompt adherence, aesthetics, and portrait fidelity; unify T2I and editing in one model via OPD to cut serving complexity and latency.
- Tools/workflows: Unified inference service; prompt templates with per-category reward weights; CFG-enabled inference; A/B evaluation using composite rewards.
- Assumptions/dependencies: Access to the student model and CFG-compatible sampling; sufficient GPU capacity; compliance with model license.
- Bold: Marketing and ad creative generation at scale
- Sector: Advertising, E-commerce, Finance (marketing ops)
- What: Generate on-brand visuals (product hero shots, banners) with strong alignment and high aesthetic scores; rapidly create hundreds of ad variants for A/B tests.
- Tools/workflows: Prompt curation via intra-group reward range to focus prompts with high improvement headroom; automated reward dashboards (alignment/aesthetic) for QA sign-off.
- Assumptions/dependencies: Brand-safety guardrails; human review in regulated industries; dataset- and reward-model bias monitoring.
- Bold: Product imagery and catalog pipelines
- Sector: Retail/E-commerce
- What: Consistent product renders (background swaps, colorways, materials) and text-aware edits (e.g., label updates) with instruction-following rewards.
- Tools/workflows: Batch editing APIs; per-category reward calibration (e.g., typography emphasis); integration into PIM (Product Information Management) systems.
- Assumptions/dependencies: Validation set reflecting SKU constraints; robust instruction parsing for edge cases (fine text, logos).
- Bold: Portrait retouching and identity-preserving edits
- Sector: Photography, Social Media, Consumer Apps
- What: Face-preserving enhancements (skin, hair, lighting) and stylizations that retain identity (e.g., profile photo upgrades).
- Tools/workflows: Mobile/desktop photo apps with “identity lock” mode; face-ID scorer gating; safe default prompts.
- Assumptions/dependencies: Explicit user consent; privacy and local laws for face processing; accuracy of the identity reward model across demographics.
- Bold: Concept art and game asset ideation
- Sector: Gaming, Animation, Media & Entertainment
- What: Faster, more faithful concept iterations (scenes, characters, props) with improved compositional quality and alignment.
- Tools/workflows: Co-pilot UI that surfaces high-reward candidates first; hybrid CFG rollout to preserve style/world knowledge; OPD student for unified editing and generation.
- Assumptions/dependencies: Style licensing; human creative direction for narrative/worldbuilding.
- Bold: Automated post-processing for renders and scans
- Sector: Architecture/3D, Industrial Design
- What: Aesthetic refinement and artifact cleanup on 3D renders, scans, or CAD-to-image outputs guided by learned aesthetic reward.
- Tools/workflows: Command-line batch enhancer; reward-based pass/fail gate before delivery.
- Assumptions/dependencies: Reward models calibrated to domain (photoreal vs stylized); adherence to client render specs.
- Bold: Synthetic data generation for vision model training
- Sector: Autonomous Systems, Robotics, Retail, Manufacturing
- What: Generate attribute-accurate, diverse images to augment perception datasets (object count/color/pose enforced by alignment reward).
- Tools/workflows: Prompt banks with constraints; composite reward thresholds to accept/reject samples; identity-controlled human datasets for re-identification tasks.
- Assumptions/dependencies: Clear label schema; bias audits; domain gap analysis vs real data.
- Bold: MLOps patterns for diffusion-RL training
- Sector: Software/ML Infrastructure, Academia
- What: Adopt hybrid CFG training, asynchronous reward services, and multi-reward advantage computation for internal diffusion model post-training.
- Tools/workflows: Reward-as-a-service endpoints; per-prompt group normalization; high-noise timestep sampling; logging for reward hacking detection.
- Assumptions/dependencies: Stable VLM-based reward endpoints; GPU budget for RL loops; curated pointwise-labeled datasets.
- Bold: Internal QA/judging for GenAI content governance
- Sector: Policy/Compliance, Platforms
- What: Use composite reward models as automated judges to gate user-generated images pre-publication (alignment, aesthetic, identity consistency when editing faces).
- Tools/workflows: Reward thresholds + manual escalation; category-specific rules; audit logs for decisions.
- Assumptions/dependencies: Policy mapping from reward scores to enforcement; periodic re-calibration to reduce false positives/negatives.
Long-Term Applications
These opportunities require additional research, scaling, or productization beyond the reported system.
- Bold: RLHF- and OPD-aligned video generation and editing
- Sector: Media & Entertainment, Advertising, AR/VR
- What: Extend layered rewards (alignment, aesthetics, identity) to temporal consistency for video; unify T2V and video editing policies via multi-teacher OPD.
- Tools/workflows: Temporal VLM reward models; trajectory-level velocity matching across frames; hybrid CFG schedule for videos.
- Assumptions/dependencies: Reliable temporal reward models; much higher compute; robust evaluation of motion and continuity.
- Bold: Personalized aesthetic reward models
- Sector: Consumer Apps, Creative Platforms
- What: User-specific or brand-specific pointwise reward finetuning to align outputs with personal/brand style preferences.
- Tools/workflows: On-device or federated preference learning; “style profiles” driving per-category reward weights.
- Assumptions/dependencies: Consentful preference data collection; privacy-preserving training; cold start for new users/brands.
- Bold: Regulated domain adoption (e.g., medical, legal)
- Sector: Healthcare, Public Sector
- What: Instruction-faithful edits and controlled image generation for documentation, education, or synthetic data—under strict constraints.
- Tools/workflows: Domain-specific reward rubrics (e.g., anatomy correctness); hard rule-based constraints + reward gates.
- Assumptions/dependencies: Expert-validated datasets; regulatory approval; risk management for misinterpretations.
- Bold: Bias/fairness-aware reward modeling and auditing
- Sector: Policy/Compliance, Platforms, Academia
- What: Embed fairness constraints into composite rewards and report bias metrics alongside aesthetic/alignment scores.
- Tools/workflows: Diverse pointwise annotations; bias-aware per-category weights; fairness dashboards in QA.
- Assumptions/dependencies: Representative annotator pools; clear definitions of fairness for target markets; ongoing monitoring.
- Bold: Provenance and watermark-aware quality control
- Sector: Platforms, Media, Policy
- What: Couple reward-based QA with content provenance/watermarks to meet authenticity standards and labeling policies.
- Tools/workflows: Pipeline step that signs accepted images; policy-driven thresholds for “AI-labeled” vs “human-edited.”
- Assumptions/dependencies: Adoption of provenance standards (e.g., C2PA); interoperability with platform policies.
- Bold: Edge and mobile deployment via distillation
- Sector: Consumer Apps, IoT/Edge
- What: Use OPD to compress multi-teacher capabilities into smaller students for on-device T2I/editing with acceptable latency.
- Tools/workflows: Quantization-aware OPD; scheduler optimization; partial CFG or alternative guidance schemes for edge.
- Assumptions/dependencies: Model size/latency trade-offs; memory constraints; energy usage.
- Bold: Multimodal co-creative agents
- Sector: Software, Education, Enterprise Productivity
- What: Agents that reason about user intent (CoT in VLM rewards), generate and iteratively edit images to meet structured rubrics.
- Tools/workflows: Plan–generate–judge–revise loops; reward-driven stopping criteria; conversation memory with per-step reward feedback.
- Assumptions/dependencies: Robust reasoning models; guardrails against reward hacking; UX design for iterative dialogue.
- Bold: Cross-lingual and culture-aware prompt adherence
- Sector: Global Platforms, Education
- What: Train multilingual reward models to improve alignment for non-English prompts and culturally specific content.
- Tools/workflows: Pointwise annotations across languages; per-locale reward calibration; locale-sensitive QA.
- Assumptions/dependencies: High-quality multilingual datasets; cultural context expertise; continuous evaluation.
- Bold: Simulation and robotics synthetic scene generation
- Sector: Robotics, Autonomous Systems, Energy/Utilities (inspection)
- What: Generate aligned, diverse scenes for training perception systems (e.g., safety gear detection, defect spotting).
- Tools/workflows: Programmatic prompt generation for coverage; reward filters for attribute/pose/lighting correctness.
- Assumptions/dependencies: Domain transfer to real-world sensors; 3D/pose-aware reward models; scenario coverage quantification.
- Bold: Workflow automation in creative ops
- Sector: Media Production, Agencies
- What: Reward-range-based prompt selection and per-category weighting to allocate compute where returns are highest; automated iteration until targets are met.
- Tools/workflows: Orchestrators that monitor reward deltas; stop conditions; budget-aware batching.
- Assumptions/dependencies: Integration with project management and DAM (Digital Asset Management); clear KPIs tied to rewards.
Notes on Key Dependencies Across Applications
- Availability and maintenance of VLM-based composite reward services (alignment, aesthetics, portrait, instruction-following, identity).
- Access to pointwise-labeled datasets and periodic re-annotation to prevent drift and bias.
- Compute budgets for RL training and OPD; inference-time CFG support.
- Legal/ethical frameworks for identity-preserving edits and data privacy.
- Monitoring for reward hacking and over-optimization; fallbacks to human review for high-stakes use.
Glossary
- Asynchronous reward pipeline: A training design where reward computation is decoupled from model updates and executed asynchronously to hide latency and improve throughput. "Asynchronous reward pipeline."
- Bradley–Terry ranking loss: A pairwise preference-learning objective that models the probability one item is preferred over another, used to train reward models from comparative judgments. "the training objective minimizes the Bradley-Terry ranking loss:"
- Chain-of-thought (CoT) reasoning: A prompting/finetuning technique that elicits step-by-step reasoning from models to improve judgment or alignment. "pointwise scoring paradigm and chain-of-thought reasoning."
- Classifier-free guidance (CFG): An inference/training technique for conditional generative models that mixes conditional and unconditional predictions to steer outputs toward the prompt. "classifier-free guidance (CFG)"
- Diffusion models: Generative models that learn to reverse a gradual noising process to synthesize data, now dominant for high-fidelity image generation. "Diffusion models have become the dominant paradigm for high-fidelity image generation"
- Elo ratings: A comparative skill metric originally from chess, used here to aggregate human preference wins/losses between models. "Elo ratings of 1193 in text-to-image arena (+78) and 1349 in image edit arena (+93)"
- Euler–Maruyama discretization: A numerical scheme for simulating stochastic differential equations by discretizing time and approximating stochastic integrals. "Under Euler-Maruyama discretization, the transition density becomes Gaussian"
- Flow Matching: A training framework that learns a velocity field to transform noise into data by matching the time-derivative of interpolations between them. "Following the Flow Matching framework"
- Flow-based generative models: Models that learn invertible or continuous-time transformations (flows) between simple and complex distributions for generation. "Diffusion and flow-based generative models"
- Flow-GRPO: An adaptation of GRPO for flow/diffusion models that treats generation as an MDP and optimizes with group-relative advantages. "Flow-GRPO~\citep{liu2025flowgrpo} extends Group Relative Policy Optimization (GRPO,~\citealt{shao2024deepseekmath}) to flow matching models"
- Gaussian transition kernels: Transition distributions that are Gaussian, enabling tractable divergences or objectives in diffusion/flow formulations. "the KL divergence between Gaussian transition kernels reduces to a velocity-field MSE loss."
- Group-level normalization: Normalizing rewards within a sampled group (e.g., per prompt) to compute relative advantages that stabilize RL optimization. "The advantage of the i-th sample is computed by group-level normalization:"
- Group Relative Policy Optimization (GRPO): A policy-gradient method that computes advantages relative to samples in the same group to reduce variance and bias. "extends Group Relative Policy Optimization (GRPO,~\citealt{shao2024deepseekmath})"
- Group-relative advantage: An advantage signal defined relative to other samples in the same prompt group, used to scale policy updates. "a rescaled version of the group-relative advantage."
- Importance sampling ratio: The likelihood ratio between new and old policies for the same transition, used in off-policy or clipped objectives. "importance sampling ratio r_t{(i)}(\theta)"
- Kullback–Leibler (KL) divergence: A measure of distributional discrepancy often used as a regularizer or distillation target. "minimizes the forward Kullback--Leibler (KL) divergence"
- Latent diffusion models: Diffusion models operating in a compressed latent space to improve efficiency and scalability. "The field has progressed to latent diffusion models"
- LoRA fine-tuning: A parameter-efficient finetuning method that injects low-rank adapters into pretrained weights to reduce trainable parameters. "validated under LoRA fine-tuning settings"
- Markov decision process (MDP): A formalism for sequential decision-making with states, actions, transitions, and rewards; used to cast diffusion generation for RL. "as a Markov decision process (MDP)."
- On-policy distillation (OPD): A distillation method where the student learns from teacher targets along the student’s own trajectories to avoid distribution mismatch. "we propose on-policy distillation (OPD) to unify"
- Per-prompt-group normalization: Normalizing rewards within each prompt’s sampled group to balance heterogeneous reward scales before aggregation. "This per-prompt-group normalization is critical"
- Probability flow ordinary differential equation (ODE): The deterministic ODE whose solution shares marginals with the diffusion SDE, used for sample generation. "probability flow ordinary differential equation (ODE)"
- Reward hacking: Exploiting weaknesses in the reward signal to increase scores without genuinely improving desired behavior. "we observe that this leads to rapid reward hacking"
- Trajectory-level velocity matching: A distillation/learning objective that matches teacher and student velocity fields along the entire denoising trajectory. "through trajectory-level velocity matching."
- Velocity field: The vector field predicting the time-derivative of the data state in flow/diffusion models, guiding denoising dynamics. "defines the conditional velocity field"
- Vision-LLM (VLM): A model jointly processing images and text, used here as a learned reward/critic for alignment and aesthetics. "The Vision-LLM (VLM) produces scalar scores"
- W2 (2-Wasserstein) distance: An optimal transport metric on probability distributions; used theoretically to motivate OPD objectives. "deriving the objective from a upper bound."
- Wiener process: A continuous-time stochastic process with independent Gaussian increments, modeling Brownian motion in SDEs. "standard Wiener process"
Collections
Sign up for free to add this paper to one or more collections.