- The paper shows that transformational game design can prompt measurable improvements in youth socio-ethical reasoning about GenAI biases.
- The study employed structured games with prompt-engineering constraints and peer evaluation, revealing nuanced shifts in bias awareness and contextual judgments.
- Using a commercial text-to-image GenAI, the games highlighted trade-offs between constraint-based creativity and deeper ethical analysis in AI outputs.
Problem Setting and Motivation
The rapid proliferation of generative AI (GenAI) systems—and their integration into youth culture, social interaction, and learning environments—has foregrounded the necessity of cultivating critical AI literacy in the next generation. This paradigm of literacy extends beyond technical proficiency to encompass nuanced socio-ethical reasoning regarding biases, value reflections, and human agency encoded in GenAI outputs. Prior literature has emphasized the prevalence of biased results in GenAI, but elementary interventions for youth typically frame such biases as uniformly negative, often omitting complexities surrounding necessity, context, and ethical ambiguity. The present work recognizes this gap, theorizing that transformational games, by leveraging structured play and social interaction, constitute a compelling scaffold for youth engagement with the socio-ethical dynamics of GenAI.
Game-Based Intervention Design
Two transformational games—Diversity Duel and Secret Agent—were engineered with the intention to operationalize four distinct learning goals: recognizing (1) bias in GenAI models, (2) the reflection of real-world biases, (3) contexts where bias might be necessary, and (4) that not all bias is equally harmful. Both games use a commercial text-to-image GenAI model, enforce constraint-based prompt engineering, and incorporate group mechanics.
Diversity Duel employs peer-evaluated competition to catalyze discourse on visible social biases in image outputs. Each round, groups co-construct short prompts aimed at maximizing diversity under word-count constraints, iteratively stimulating discussion on how language shapes model response and surface-level acceptance of biased imagery.
Secret Agent introduces a social deduction dynamic, where one hidden player ("agent") subtly strives to undermine diversity in collective prompts. The group must identify the agent post-hoc, necessitating attentive analysis of prompt decisions and reflecting on the subtlety of bias introduction. Evaluation by external peers on image inclusivity raises the stakes for both critical evaluation and agent success.
Methodology and Participant Demographics
The games were deployed in a five-day summer workshop conducted at a community-affiliated site primarily serving youth of color. All participants were teen girls (ages 13–18), with diverse racial backgrounds and heterogeneous prior computational experience. Mixed methods included pre/post questionnaires probing recognition of bias, open-ended rationales, and group discussion transcriptions subjected to inductive thematic analysis.
Results: Shifts in Socio-Ethical Reasoning and Group Discourse
Both interventions yielded measurable shifts in critical reasoning regarding GenAI bias:
- Awareness Gains: Agreement with the acceptability of biased images decreased post-gameplay. In Secret Agent, explicit recognition of the harms of AI bias nearly doubled.
- Age-Stratified Trends: Older participants demonstrated more pronounced pre-to-post gains in Diversity Duel, whereas Secret Agent facilitated greater learning among younger participants, potentially due to ceiling effects among older youth.
- Deepened Reflection: Post-intervention discourse evidenced a move from purely technical or moral interpretations of bias to situational judgments. Participants differentiated between contextual necessity (e.g., content moderation as positive bias) and systemic harms (e.g., race/gender occupational stereotyping), and began discussing human agency, accountability, and the limits of prompt engineering.
- Prompt Engineering Realizations: Players discovered the criticality of linguistic specificity and the nontriviality of eliciting diverse representations, even with explicit intent. Limiting prompt vocabulary induced surface-level approaches to diversity—prompting recognition of word choice impact but also limitations in the representation of more intersectional or nuanced diversity.
Game Mechanic Implications
In-game competition and peer evaluation successfully scaffolded substantive discourse on GenAI bias and ethics, confirming prior work suggesting that social motivation and deliberative group tasks are pedagogically effective for complex ethical topics. Social deduction, especially when paired with role inversion (assigning a "villain" role), not only increased engagement but also mirrored adversarial "red teaming," requiring players to both surface and mitigate bias, echoing recent best practices in AI auditing.
Constraint-based creativity, while effective in foregrounding the impact of language on AI behavior, sometimes redirected focus toward superficial identity markers due to limited prompt granularity, highlighting a trade-off between playability and conceptual depth. Future iterations could explore open-ended rounds for comparison.
Theoretical and Practical Implications
This investigation substantiates the role of transformational games as accessible, scalable vehicles for fostering critical GenAI literacy in youth—especially those traditionally underrepresented in computing. The work explicitly bridges hands-on prompt engineering with ethical inquiry, informing future educational game design in the AI literacy domain. The findings also align with emergent perspectives challenging the monolithic framing of "bias as harm," pushing toward contextual evaluation and human-centered sensemaking.
Practically, integration into educational contexts is streamlined by the lightweight structure of the games, although facilitators should consider participant typing ability and allow for more extended exploration when deeper conceptual reasoning is prioritized.
Limitations and Directions for Future Research
The study's demographic specificity (teen girls of color in an out-of-school context, moderate sample size) limits broad generalizability; follow-up studies should diversify populations and increase scale. The tension between game pacing and depth of reflection warrants further exploration, as does the adaptation of such interventions for GenAI models generating modalities beyond images (e.g., text generation). Additionally, assessment could be enhanced by multi-phase pre/post measurement and longitudinal follow-up.
Conclusion
The investigation demonstrates that well-designed transformational games, leveraging peer evaluation, constrained creativity, and social deduction mechanics, offer a robust, engaging platform for youth to critically interrogate GenAI outputs and underlying biases. These interventions support the emergence of sophisticated, context-dependent ethical reasoning, directly linked to hands-on prompt engineering. As GenAI becomes even more ubiquitous in youth learning and culture, frameworks that interlace technical artifact analysis with socio-ethical dialogue—such as the ones developed in this study—are essential to responsible AI literacy education.