Papers

Topics

Authors

Recent

View all

Assistant

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 194 tok/s

Gemini 2.5 Pro 47 tok/s Pro

GPT-5 Medium 36 tok/s Pro

GPT-5 High 36 tok/s Pro

GPT-4o 106 tok/s Pro

Kimi K2 183 tok/s Pro

GPT OSS 120B 458 tok/s Pro

Claude Sonnet 4.5 37 tok/s Pro

2000 character limit reached

Generative AI and Firm Productivity: Field Experiments in Online Retail (2510.12049v1)

Published 14 Oct 2025 in econ.GN, cs.AI, and q-fin.EC

Abstract: We quantify the impact of Generative Artificial Intelligence (GenAI) on firm productivity through a series of large-scale randomized field experiments involving millions of users and products at a leading cross-border online retail platform. Over six months in 2023-2024, GenAI-based enhancements were integrated into seven consumer-facing business workflows. We find that GenAI adoption significantly increases sales, with treatment effects ranging from 0\% to 16.3\%, depending on GenAI's marginal contribution relative to existing firm practices. Because inputs and prices were held constant across experimental arms, these gains map directly into total factor productivity improvements. Across the four GenAI applications with positive effects, the implied annual incremental value is approximately \$5 per consumer-an economically meaningful impact given the retailer's scale and the early stage of GenAI adoption. The primary mechanism operates through higher conversion rates, consistent with GenAI reducing frictions in the marketplace and improving consumer experience. We also document substantial heterogeneity: smaller and newer sellers, as well as less experienced consumers, exhibit disproportionately larger gains. Our findings provide novel, large-scale causal evidence on the productivity effects of GenAI in online retail, highlighting both its immediate value and broader potential.

Summary

The paper demonstrates that Generative AI integration significantly boosts firm productivity by increasing conversion rates, with notable effects on sales across various workflows.
It uses randomized controlled trials in online retail settings to establish causal links between GenAI adoption and performance improvements in customer-facing processes.
The study reveals heterogeneous impacts, benefiting smaller sellers and underperforming workflows while emphasizing the importance of domain-specific fine-tuning.

Generative AI and Firm Productivity: Causal Evidence from Field Experiments in Online Retail

Introduction and Motivation

This paper presents a comprehensive empirical analysis of the productivity impact of Generative AI (GenAI) in online retail, leveraging a series of large-scale randomized field experiments on a leading global cross-border e-commerce platform. The paper addresses a critical gap in the literature: while GenAI's potential for productivity enhancement is widely discussed, there is limited causal evidence at the firm or aggregate level, especially regarding revenue-based outcomes as opposed to input-side efficiency. The authors focus on seven consumer-facing business workflows, quantifying both the magnitude and heterogeneity of GenAI-driven productivity gains.

Experimental Design and Methodology

The research design is notable for its scale and rigor. Over six months, GenAI was integrated into seven distinct workflows: Pre-sale Service Chatbot, Search Query Refinement, Product Description Generation, Marketing Push Message Creation, Google Advertising Title Optimization, Chargeback Defense, and Live Chat Translation. Each workflow was evaluated via randomized controlled trials, with treatment and control groups exposed to GenAI-enhanced and baseline (pre-GenAI) workflows, respectively. The randomization occurred at the consumer or product level, with minimal overlap across experiments, ensuring clean identification of causal effects.

Covariate balance checks confirm the effectiveness of randomization, as shown by the distribution of p-values across demographic and behavioral variables.

Figure 1: P-values for covariate balance checks across experiments, confirming randomization validity.

The empirical framework interprets output gains as total factor productivity (TFP) improvements, under the assumption of constant labor and capital inputs and fixed prices. This is justified by the platform's operational context, where GenAI deployments primarily augmented or automated existing digital processes without significant changes in staffing or infrastructure.

Workflow-Specific Implementations

Pre-sale Service Chatbot

A GenAI-powered chatbot replaced the standard auto-response for pre-sale inquiries on self-sold products. The chatbot provided multilingual, content-rich, and context-specific answers, available 24/7.

Figure 2: Illustration of the Pre-sale Service Chatbot interface, contrasting auto-response and GenAI-driven support.

GenAI was used to semantically refine and translate consumer search queries, particularly in under-served languages, improving the match between consumer intent and product retrieval.

Figure 3: Illustration of search query refinement, showing improved search results after GenAI-based query translation.

Product Description Generation

GenAI generated comprehensive, structured, and market-adapted product descriptions, supplementing or replacing minimal or image-based human-generated content.

Figure 4: Illustration of product description enhancement, with GenAI-generated text layered over human input.

Marketing Push Message Creation

GenAI enabled the large-scale generation of personalized marketing messages, increasing content diversity and targeting precision.

Figure 5: Illustration of marketing push messages, comparing human-generated and GenAI-generated variants.

Google Advertising Title Optimization

GenAI was used to optimize product titles for Google Shopping ads, though without domain-specific fine-tuning.

Figure 6: Illustration of Google Shopping interface, highlighting the impact of title optimization.

Chargeback Defense

A GenAI agent automated the process of contesting chargeback disputes, generating tailored defense letters and streamlining evidence collection.

Figure 7: Illustration of the chargeback defense workflow, showing GenAI-driven automation.

Live Chat Translation

GenAI provided real-time, bidirectional translation for customer service agents, enabling effective multilingual support.

Figure 8: Illustration of live chat translation, with real-time GenAI translation between consumer and agent interfaces.

Main Results

Productivity Gains

The experiments reveal substantial heterogeneity in GenAI's impact across workflows:

Pre-sale Service Chatbot: 16.3% increase in sales and 21.7% increase in conversion rate (p < 0.01).
Search Query Refinement: 2.93% increase in sales and 1.15% increase in conversion rate (p < 0.05).
Product Description Generation: 2.05% increase in sales and 1.27% increase in conversion rate (p < 0.05).
Marketing Push Message: 1.6% increase in sales (not statistically significant), but a 3% increase in conversion rate (p < 0.05).
Google Advertising Title: No significant effect; in fact, a small negative point estimate, attributed to lack of domain-specific fine-tuning.
Chargeback Defense: 15% increase in defense success rate (internal estimate).
Live Chat Translation: 5.2% increase in consumer satisfaction (internal estimate).

The aggregate annualized incremental value from the four workflows with positive sales effects is approximately \$5 per consumer, representing 5.5–6% of the global per-user e-commerce revenue growth in 2023–2024.

Mechanisms

The primary mechanism for productivity gains is increased conversion rates, not higher average cart values. GenAI reduces market frictions—information asymmetry, search costs, and language barriers—thereby expanding the market by converting more consumers rather than increasing spend per buyer. This is consistent with the observed increases in click-through rates and order counts in relevant workflows.

Heterogeneity

Sellers: Small and less experienced sellers benefit disproportionately, with significant gains in sales and conversion rates, especially in search and marketing workflows.
Consumers: Inexperienced consumers (shorter registration, fewer logins, lower past spend) derive larger benefits from GenAI enhancements.
Products: Tail products and high-priced items often see greater gains, though the effect varies by workflow and product category concentration.

Implementation Considerations

Model Selection and Fine-tuning: Domain-specific fine-tuning is critical for workflows requiring specialized knowledge (e.g., advertising titles). Generic GenAI models may underperform or even degrade outcomes in such contexts.
Workflow Integration: Productivity gains are maximized when GenAI augments or automates processes with high baseline frictions or information gaps. In workflows already optimized by human input or traditional automation, marginal gains are smaller.
Scalability: The platform scaled GenAI adoption from a handful of workflows in 2023 to over 60 by 2025, with API calls to LLMs increasing twentyfold. This demonstrates the feasibility of rapid, large-scale GenAI integration in digital retail environments.
Resource Requirements: The marginal cost of GenAI deployment is negligible relative to platform scale, given the non-rivalrous nature of digital models and existing infrastructure.
Randomized Evaluation: Rigorous A/B testing with granular transaction data is essential for causal inference and for identifying heterogeneity in treatment effects.

Theoretical and Practical Implications

The findings provide robust evidence that GenAI can drive measurable TFP gains in online retail, primarily through demand-side mechanisms. This challenges the prevailing focus on supply-side (labor-saving) productivity in the GenAI literature and highlights the importance of consumer experience and market expansion. The heterogeneity analysis suggests that GenAI adoption may reduce capability gaps among sellers and consumers, potentially increasing inclusivity and market efficiency.

From a practical perspective, the results inform platform managers and policymakers about where GenAI investments are likely to yield the highest returns—namely, in workflows with high information frictions and among user segments with lower baseline capabilities.

Limitations and Future Directions

Short-run Horizon: The experiments capture immediate effects; long-term impacts, including consumer retention and product returns, remain unmeasured.
Workflow Selection: Only a subset of possible workflows was studied, based on managerial prioritization.
General Equilibrium Effects: The analysis is platform-specific; industry-wide adoption may attenuate relative gains due to competitive dynamics.
Cost-side Adjustments: The paper focuses on revenue-based productivity; future research should examine potential labor displacement and cost savings as GenAI adoption matures.

Conclusion

This paper provides large-scale, causal evidence that GenAI can enhance firm-level productivity in online retail, primarily by reducing market frictions and improving consumer experience. The effects are economically meaningful, heterogeneous across workflows and user segments, and scalable across a global platform. The results underscore the importance of workflow-specific implementation, domain adaptation, and rigorous evaluation in realizing the productivity potential of GenAI. Future research should address long-term impacts, broader workflow coverage, and general equilibrium considerations to fully characterize the role of GenAI in digital commerce transformation.

PDF Markdown

Paper Prompts

Top Community Prompts

Explain it Like I'm 14

off on

Knowledge Gaps

off on

Practical Applications

off on

Glossary

off on

Conceptual Simplification

off on

Explain it Like I'm 14

What this paper is about

This paper asks a simple, big question: when online stores start using generative AI (GenAI)—the kind that can write text, translate, chat, and summarize—do they actually sell more without hiring more people or spending more money? The authors teamed up with a huge international shopping website and ran real-world tests to find out.

Over six months in 2023–2024, the company added GenAI to seven parts of its website and app that shoppers and sellers use. Then they ran large, randomized experiments with millions of users and products to measure what changed.

The short answer: GenAI often helped. In some places it boosted sales by up to 16.3%, mainly by making it easier for people to find what they want and decide to buy. Because prices and staffing stayed the same, these higher sales translate into real productivity gains—more output from the same inputs.

What the researchers wanted to find

In simple terms, the paper set out to answer three questions:

Does using GenAI in everyday shopping tasks make a meaningful difference to sales?
If it helps, how does it help—what’s the mechanism?
Who benefits the most—new vs. experienced buyers, small vs. big sellers?

How they tested it

Think of a fair test like flipping a coin to decide who gets a new tool and who keeps the old one. That’s what they did.

They ran randomized field experiments (real customers in a live store, not a lab). Some shoppers or products saw the GenAI-enhanced version (treatment), while similar shoppers or products saw the regular version (control). Prices and costs stayed the same, so any sales differences likely came from GenAI, not from discounts or extra staffing.

They measured:

Sales per person or per product (how much money was spent)
Conversion rate (out of people who looked, how many actually bought)
Click-throughs and satisfaction in some cases

They added GenAI to seven workflows. Here’s what those are, in everyday language:

Pre-sale Service Chatbot: a 24/7 AI helper to answer shoppers’ questions before they buy.
Search Query Refinement: smarter translation and interpretation of searches (especially in less common languages) so results match what people really mean.
Product Description Generation: clearer, structured product text in different languages, especially where products had little or no text before.
Marketing Push Messages: more personalized push notifications to users’ phones.
Google Advertising Title Optimization: improving product ad titles shown on Google Shopping.
Chargeback Defense (for sellers): AI help to respond to payment disputes.
Live Chat Translation: real-time translation so customer service can talk with anyone in their own language.

Key idea: Inputs stayed constant. The company didn’t add more workers or raise prices. So if sales went up, that’s like baking more cookies without using more flour, eggs, or time—true productivity.

What they found and why it matters

Here are the main results, explained simply:

GenAI often raised sales, but not everywhere and not equally.
- Biggest boosts: up to 16.3% more sales with the pre-sale chatbot; smaller but meaningful gains (about 2–3%) from better search and better product descriptions.
- Advertising-related tests (push messages and Google ad titles) did not show clear sales increases in this period.
- In areas without full sales data, GenAI still helped: chargeback success went up by about 15%, and live chat translation improved customer satisfaction by about 5.2%.
These gains came mostly from higher conversion rates, not bigger baskets.
- In other words, GenAI helped more people decide “Yes, I’ll buy,” rather than getting the same people to spend more per purchase.
- Why? GenAI reduced “frictions”—the little hassles that slow people down:
- Better answers to questions (chatbot) reduce uncertainty.
- Smarter search understands what people really want.
- Clearer descriptions help shoppers judge products quickly.
The benefits were uneven—and that’s interesting.
- Smaller and newer sellers gained more.
- Less experienced shoppers gained more.
- This suggests GenAI is especially helpful for people who don’t already have strong tools, skills, or a big reputation.
The overall value is meaningful at scale.
- Across the four GenAI uses with positive sales effects and detailed data, the extra value is about $5 per consumer per year. For a platform with millions of users, that’s a big impact—especially this early in GenAI adoption.

What this means going forward

For businesses: GenAI can improve productivity on the demand side—by helping customers find, understand, and trust products—without cutting labor or slashing costs. That’s a different kind of “AI payoff” than just doing tasks faster.
For smaller players: GenAI can level the playing field. Newer buyers and smaller sellers benefit more because AI reduces the skill and information gaps that usually hold them back.
For consumers: Shopping becomes easier—fewer confusing searches, clearer product info, quicker answers—so more people feel confident buying.
For the bigger picture: This is early but real, large-scale, causal evidence that GenAI can move the needle on firm productivity. As companies roll out more AI across more workflows, the effects could grow. Long-term results will depend on competition: if every platform gets better, gains for any one firm may shrink, but overall consumer experience could still improve.

In short: Adding GenAI to key parts of online shopping often helps people find and buy what they want, which raises sales without raising costs. That’s genuine productivity—and a strong sign that GenAI can create real-world value, not just cool demos.

View Paper Prompt View All Prompts

Knowledge Gaps

Unresolved Knowledge Gaps, Limitations, and Open Questions

The following list highlights what remains missing, uncertain, or unexplored, with concrete directions for future research:

Quantify net productivity and profitability: incorporate end-to-end cost data (compute/API usage, engineering time, moderation/QA, maintenance) to convert revenue lifts into profit-based TFP; explicitly address the one workflow with cost changes (Chargeback Defense) where constant-input assumptions do not hold.
Validate “constant prices/inputs” assumptions: test whether prices, promotions, markups, product mix, capital utilization (e.g., GPU/compute load), and effective labor effort truly remained constant across arms and over time; instrument or control for dynamic pricing and promotional calendars.
Long-run effects: measure retention, repeat purchase, customer lifetime value, churn, and consumer learning effects beyond the short experimental windows (days–weeks), and track durability versus novelty or decay of effects.
Post-purchase outcomes and quality: evaluate returns/refunds, chargebacks, dispute rates, fulfiLLMent issues, and satisfaction post-purchase to confirm that demand-side gains do not increase downstream frictions or costs.
Consumer welfare and experience: move beyond conversion to quantify welfare (e.g., time-to-find, search effort, perceived relevance, trust), using direct measures such as surveys, dwell time, bounce rates, and task completion times.
Mechanism validation: provide direct evidence that GenAI reduces specific frictions (information asymmetry, search costs, personalization gaps) rather than proxying via conversion; decompose effects into search relevance, query intent resolution, description completeness, and message personalization quality.
Additivity and complementarities: the annualized per-consumer value assumes linear additivity across workflows; test interactions, synergies, and cannibalization (e.g., improved search reducing push-message impact) to avoid double-counting.
Interference and spillovers: assess cross-user and cross-workflow spillovers (e.g., site-wide traffic surges from push messages affecting control users; ad auction dynamics when treated products change click signals) and validate SUTVA.
Multiple testing and statistical rigor: pre-register hypotheses, apply multiple-hypothesis corrections across seven workflows and sub-experiments, report cluster-robust inference (consumer, product, time), and provide power calculations and minimal detectable effects.
Marketing Push Message treatment fidelity: clarify that only ~40% of “treatment-group” users actually received GenAI messages; estimate treatment-on-the-treated (TOT), dose–response, and compliance rates to avoid diluted intent-to-treat effects.
Google Advertising Title null effects: test domain-specific fine-tuning, prompt engineering, and evaluation on downstream purchases (not just clicks), and analyze heterogeneity by category, keyword competitiveness, and query intent.
Language and market generalizability: results focus on minority languages and platform self-sold products; evaluate impacts in major languages, diverse locales, and third‑party seller contexts to establish broader external validity.
Category-level heterogeneity: systematically analyze differences across product categories (long tail vs. concentrated, experience vs. search goods, price bands) and specify conditions under which GenAI helps or does not.
Seller-side outcomes: measure seller effort, adoption, compliance, and cost changes; identify whether GenAI shifts seller behavior (e.g., better listing quality) and whether benefits persist for small/new sellers without unintended burdens.
Heterogeneity design: define and validate “experience” and “size” variables; distinguish correlation from causation in subgroup effects (e.g., via stratified randomization or interaction designs), and test robustness across thresholds and continuous measures.
Quality and safety of GenAI outputs: audit hallucinations, inaccuracies, harmful or biased content, and cultural/linguistic mismatches; implement content quality metrics and error-rate tracking to link output quality to business outcomes.
Privacy, data governance, and compliance: document data usage for personalization, consent, and regulatory compliance (especially in cross-border contexts), and evaluate their impact on adoption feasibility and consumer trust.
Compute and environmental externalities: quantify energy usage and environmental footprint accompanying GenAI scaling (e.g., 20x API calls), and incorporate these costs in productivity accounting.
Equilibrium and competitive dynamics: model industry-wide adoption effects (e.g., competitive offset, ad auction re-optimization, consumer attention constraints) to estimate general equilibrium outcomes and long-run ROI.
Organizational complements: examine process redesign, workflow integration, skill upgrading, and potential labor displacement over time (contradicting “minimal displacement”), and quantify complementarities needed to fully realize gains.
Replicability and transparency: specify models (LLMs), fine-tuning data, prompts, guardrails, and deployment parameters to enable replication and benchmarking across firms and platforms.
Chargeback Defense measurement gap: move beyond success rate to quantify revenue recovered, time-to-resolution, seller satisfaction, net cost savings, and labor displacement, and reconcile with TFP interpretation given non-constant inputs.
Workflow selection bias: managers chose “promising” workflows; evaluate a broader, randomly sampled set to avoid upward bias and to identify where GenAI is unlikely to pay off.
Seasonality and timing: many experiments ran in October–December; assess whether observed effects are seasonal (holiday peaks) versus structural by replicating in off-peak periods.
Overlap and exposure mapping: although consumer overlap across experiments is <1%, map exposure across sellers/products and sessions to ensure units were not indirectly treated by other GenAI features and to validate independence of experimental arms.

View Paper Prompt View All Prompts

Practical Applications

Immediate Applications

Below is a set of deployable applications that can be implemented now, grounded in the paper’s experimental findings and workflow innovations. Each bullet notes the sector, the tool/product/workflow that would emerge, and key assumptions or dependencies affecting feasibility.

Multilingual pre-sale customer service agents to reduce information asymmetry
- Sector: Retail/e-commerce, Customer Service
- Tool/workflow: LLM-powered, 24/7 multilingual pre-sale chatbot integrated into product pages and help centers
- Use case: Answer idiosyncratic product questions pre-purchase, especially for self-sold or long-tail products with sparse information
- Expected impact: Higher conversion rates; observed up to 16.3% sales lift in the paper
- Assumptions/dependencies: High-quality product knowledge base; guardrails for accuracy (hallucination control); escalation routing to human agents for complex cases; latency targets under peak load; cost-effective LLM access
Search query refinement for low-resource or underserved languages
- Sector: Search/software, Retail/e-commerce
- Tool/workflow: LLM-based intent understanding, semantic refinement, and translation before search retrieval
- Use case: Improve match quality by clarifying consumer intent in Arabic, Japanese, Polish, and other languages
- Expected impact: 2–3% sales gains and improved conversion (extensive margin)
- Assumptions/dependencies: Seamless integration with existing search stack; performant language coverage; semantic equivalence checks; real-time processing budgets
Structured, localized product description generation at scale
- Sector: Retail/e-commerce, Content Operations
- Tool/workflow: LLM-generated, market-specific descriptions and “About this item” bullet points for SKUs lacking text
- Use case: Fill description gaps for self-sold products; adapt tone and norms by market (English, Spanish, French, Portuguese, Korean)
- Expected impact: 2–3% sales lift; conversion gains without increasing cart value
- Assumptions/dependencies: Accurate attribute extraction from images/spec sheets; localization QA; consistency with brand voice; content moderation and claims verification
Real-time live chat translation for customer service agents
- Sector: Customer Service, Language Technology
- Tool/workflow: LLM/NMT-enabled translation layer for agents serving global traffic
- Use case: Enable native-language service where agents are monolingual; reduce friction across >20 languages
- Expected impact: +5.2% consumer satisfaction (paper); indirect conversion gains via better CX
- Assumptions/dependencies: Low latency; domain-adapted terminology; privacy and compliance controls; agent training on translation cues and context
Chargeback defense assistant for SMB sellers
- Sector: Finance/payments, RegTech
- Tool/workflow: Agent that analyzes disputes, collects evidence, drafts persuasive responses
- Use case: Assist sellers (especially smaller/newer) to contest chargebacks effectively
- Expected impact: +15% success rate in chargeback defense; cost reduction if replacing outsourced processors
- Assumptions/dependencies: Integration with payment networks and platform evidence (delivery logs, comms); jurisdiction-specific templates; legal review; auditability
Segment-aware deployment to maximize uplift where frictions are highest
- Sector: Analytics/marketing operations
- Tool/workflow: Uplift modeling to prioritize GenAI features for less experienced consumers and smaller/newer sellers
- Use case: Allocate agent support, refined search, and enriched descriptions to segments with largest expected gains
- Expected impact: Larger marginal conversion effects; equitable outcomes
- Assumptions/dependencies: Reliable segmentation signals; fairness monitoring; avoidance of unintended exclusion
Evidence-driven ROI benchmarks to prioritize GenAI investments
- Sector: Corporate strategy, Finance
- Tool/workflow: Revenue-based productivity tracking by workflow using A/B tests and $/user uplift (≈$ 5 per consumer/year across positive workflows in paper)
- Use case: Budget steering to applications with highest causal revenue impact; portfolio reprioritization
- Expected impact: Better capital allocation; early identification of low-ROI pilots (e.g., untuned ad titles)
- Assumptions/dependencies: Fixed prices/costs for clean TFP inference; robust experimentation; independence across concurrent tests
High-scale personalized push messaging with fatigue controls
- Sector: Advertising/CRM
- Tool/workflow: LLM-generated message variants with audience-level personalization and frequency capping
- Use case: Move beyond a few thousand templates to millions of individualized notifications
- Expected impact: Mixed/uncertain in paper; potential gains with domain tuning and well-calibrated targeting
- Assumptions/dependencies: Consent and preference management; message quality and relevance; throttling to avoid spam; reinforcement learning for send-time/content optimization
Domain-tuned ad title optimization for shopping channels
- Sector: Advertising/Performance Marketing
- Tool/workflow: LLM fine-tuned on ad performance data (CTR, CVR) to craft compliant, keyword-aligned titles
- Use case: Improve discoverability and click quality in Google Shopping and similar feeds
- Expected impact: No effect without tuning (paper); potential gains after domain-specific fine-tuning
- Assumptions/dependencies: Access to historical ad performance data; platform policy compliance; A/B testing at product-level; attribute correctness
Internationalization playbooks for global platforms and brands
- Sector: Globalization, Localization
- Tool/workflow: Templates and tools for query refinement, translation, description generation, and service agent enablement across markets
- Use case: Rapid expansion into new language regions without proportional staffing increases
- Expected impact: Conversion uplift via reduced linguistic/cultural frictions
- Assumptions/dependencies: Cultural nuance; ongoing evaluation by local teams; error-handling protocols
Academic and industry A/B experimentation frameworks focused on revenue-based productivity
- Sector: Academia, Software/Analytics
- Tool/workflow: Open-source or internal libraries to run randomized, workflow-level experiments with revenue outcomes under fixed input conditions
- Use case: Replicate the paper’s growth-accounting logic to attribute gains to TFP
- Expected impact: Better causal identification of GenAI value; generalizable learnings beyond worker-level pilots
- Assumptions/dependencies: Data quality; price/cost stability; non-rival model use; minimal labor/capital displacement during tests

Long-Term Applications

The following applications require further research, scaling, domain-specific fine-tuning, or ecosystem development before broad deployment.

End-to-end agentic commerce orchestrating the full customer journey
- Sector: Retail/e-commerce, Software
- Tool/product: Coordinated agents for search intent refinement, dynamic description generation, pre/post-sale support, promotions
- Value: Compound productivity gains via workflow complementarities; persistent CX improvement
- Dependencies: Reliable, controllable agents; standardized APIs; cross-workflow data sharing; real-time guardrails
Cross-industry standards for revenue-based TFP measurement in digital firms
- Sector: Academia, Policy, Industry standards
- Tool/product: Shared measurement protocols, benchmarks, and reporting templates linking output changes to TFP under fixed inputs
- Value: Comparable assessments across firms and sectors; accountability for AI ROI
- Dependencies: Privacy-preserving data access; regulator/consortium coordination; clear assumptions (prices, factor shares)
Consumer content transparency and claim verification rules for AI-generated pages
- Sector: Policy/Regulation, Consumer Protection
- Tool/product: Labeling standards; automated fact-checking for product claims; audit trails for generated content
- Value: Reduce misinformation, safeguard consumer trust, prevent deceptive advertising
- Dependencies: Regulatory harmonization; platform compliance; scalable verification pipelines
Public programs to accelerate GenAI adoption among SMEs and micro-sellers
- Sector: Economic development, Policy
- Tool/product: Vouchers, cloud credits, training, plug-and-play templates for descriptions, chargeback defense, service bots
- Value: Reduce digital divide; observed heterogeneous benefits for smaller/newer sellers
- Dependencies: Training and digital literacy; local language support; easy onboarding and support
Privacy-preserving personalization for CRM and search
- Sector: Advertising/CRM, Data Privacy
- Tool/product: Federated learning, on-device inference, differential privacy for targeting and message generation
- Value: Maintain personalization benefits while meeting privacy expectations and regulations
- Dependencies: Mobile/device capabilities; consent frameworks; model optimization for edge deployment
Multi-modal product content generation (images, video, AR/3D) to reduce information asymmetry
- Sector: Retail/e-commerce, Content Tech
- Tool/product: Generators and validators for rich media product pages; auto-summarized specs aligned with visuals
- Value: Higher consumer confidence and conversion; stronger effects than text alone for certain categories
- Dependencies: Category-specific datasets; quality assurance; compute budgets; UX integration
Equilibrium and competitive dynamics of GenAI adoption in marketplaces
- Sector: Academia, Strategy
- Tool/product: Structural models and longitudinal experiments to estimate net industry gains vs. competition-driven offsets
- Value: Inform strategic positioning and regulatory oversight
- Dependencies: Multi-period data; market-level analytics; cross-platform observations
Automated payments dispute resolution across banks/card networks
- Sector: Finance, RegTech
- Tool/product: Standardized AI pipelines that negotiate, compile evidence, and submit dispute packages across payment rails
- Value: Reduced resolution time and improved outcomes at scale
- Dependencies: Industry standards; APIs to issuers/acquirers; auditability; legal safeguards
Workforce redesign and re-skilling for AI-enabled service operations
- Sector: Policy/Labor, HR
- Tool/product: Training curricula, task reallocation, human-in-the-loop oversight roles
- Value: Smooth integration without job displacement spikes; better quality assurance
- Dependencies: Funding; curriculum development; organizational change management
Energy and sustainability governance for GenAI usage
- Sector: Energy/ESG, Operations
- Tool/product: Metering, reporting, and optimizers for LLM call volumes and model sizes; “green AI” practices
- Value: Control energy costs and carbon footprint as API calls scale (noted 20x increase in paper’s setting)
- Dependencies: Tooling integration; sustainability KPIs; executive buy-in
Cross-border compliance automation beyond chargebacks (customs, tax, documentation)
- Sector: Logistics/Compliance, RegTech
- Tool/product: AI agents that generate and verify documentation across jurisdictions
- Value: Reduced frictions in international trade; fewer delays and errors
- Dependencies: Access to regulatory corpora; continuous updates; validation by human experts
Open-source toolkits for workflow-level A/B testing of GenAI
- Sector: Academia/Software
- Tool/product: Libraries and scripts to run randomized experiments with revenue outcomes and heterogeneity analysis
- Value: Democratize rigorous evaluation; accelerate cumulative knowledge
- Dependencies: Community leadership; sample size; standardized data schemas
RL-driven optimization of push messaging to minimize fatigue and maximize utility
- Sector: Advertising/CRM
- Tool/product: Contextual bandits/RL for content, timing, and frequency decisions
- Value: Potential gains where naive scaling showed limited effects; protect CX
- Dependencies: Well-defined rewards (conversion, opt-outs, spam complaints); offline evaluation to prevent production risks
Fairness and inclusion auditing for GenAI-assisted marketplaces
- Sector: Policy/Industry governance
- Tool/product: Bias monitors ensuring uplift allocation doesn’t disadvantage certain seller/consumer groups
- Value: Preserve inclusivity benefits documented in heterogeneity analyses
- Dependencies: Labeled data; stakeholder oversight; transparent reporting

Notes on general assumptions and dependencies across applications

Fixed inputs and prices: The paper’s productivity inference relies on constant labor/capital and fixed prices; deviations (e.g., staffing changes, dynamic pricing) complicate attribution.
Model non-rivalry and marginal cost: Gains assume negligible marginal cost per additional inference at platform scale; smaller firms may face higher per-call costs.
Domain fine-tuning: Applications like ad titles and push messages showed weak effects absent tuning; performance depends on domain-specific data and objectives.
Data quality and integration: Success requires clean product catalogs, accurate attributes, and robust pipelines into search, service, and advertising systems.
Safety, compliance, and trust: Guardrails, claim verification, and auditability are essential to sustain consumer trust and meet legal requirements.
Competition and equilibrium effects: Industry-wide adoption may compress gains over time; long-term ROI depends on complementary workflows and strategic differentiation.

View Paper Prompt View All Prompts

Glossary

Capital deepening: An increase in the amount of capital per worker, often contributing to productivity growth distinct from efficiency improvements. "No capital deepening: Although the platform trains and deploys its own GenAI models, these exhibit strong non-rivalry: once developed, they can be applied across millions of product listings at negligible marginal cost."
Capital utilization: The degree to which capital assets are used, holding the quantity of capital constant. "Constant utilization: Capital utilization and effective labor effort do not vary, so measured input quantities remain valid."
Chargeback: A payment reversal initiated by the consumer’s bank, typically after disputing a transaction. "As a result, more than half of chargeback disputes on the focal platform are left unaddressed by sellers."
Click-through rate (CTR): The percentage of impressions that lead to clicks, commonly used to measure ad engagement. "higher conversion rates (extensive margins) and, where applicable, click-through rates, but not with higher average cart values (intensive margins)."
Cobb--Douglas production function: A standard economic model of output as a function of capital and labor with constant elasticities. "Assume that output is produced according to a Cobb--Douglas production function"
Cold start problem: A situation where recommendation or reputation systems underperform due to insufficient historical data. "âcold startâ problems \citep{Bai2022}"
Complementarities: Interactions where the joint adoption of multiple practices or technologies amplifies total gains. "whether complementarities across workflows amplify these gains"
Conversion rate: The fraction of visitors or users who complete a desired action, such as making a purchase. "we observe significantly higher conversion rates and no effect on average cart values"
Demand-side value creation: Productivity gains arising from increased demand or improved customer experience rather than reduced input costs. "enhance productivity through demand-side value creation."
Equilibrium forces: Market-wide adjustments and competitive responses that determine long-run outcomes of technological adoption. "The long-run impact will ultimately depend on equilibrium forces—whether complementarities across workflows amplify these gains or industry-wide adoption offsets them through intensified competition."
Extensive margins: Changes in outcomes due to more participants taking an action (e.g., more buyers purchasing), as opposed to changing amounts per participant. "higher conversion rates (extensive margins) and, where applicable, click-through rates, but not with higher average cart values (intensive margins)."
Factor shares: The proportions of output or income allocated to capital and labor. "Stable factor shares: Input cost shares ( $\alpha$ , $1-\alpha$ ) remain constant during the adoption period."
General-purpose technologies (GPTs): Broad-impact innovations that enable widespread productivity improvements across sectors. "past general-purpose technologies (e.g., electrification or the internet)"
Growth-accounting decomposition: A framework that breaks down output growth into contributions from TFP, capital, and labor. "yields the standard growth-accounting decomposition:"
Information asymmetry: A market condition where one side (e.g., buyers) has less information than the other (e.g., sellers), affecting decisions. "A key friction in online marketplaces is information asymmetry: buyers often cannot directly verify product quality or seller reliability prior to purchase"
Intensive margins: Changes in the amount per participant (e.g., spending per buyer) rather than the number of participants. "higher conversion rates (extensive margins) and, where applicable, click-through rates, but not with higher average cart values (intensive margins)."
JEL codes: Standardized classification codes used to categorize economics research topics. "JEL codes: C93, D24, L81, M31, O3"
Linear additivity: An assumption that effects from different sources sum linearly without interaction terms. "annualizing workflow-specific gains and assuming linear additivity—suggest that these GenAI applications generate an annual incremental value of approximately \$4.6–\$5 per consumer."
Markups: The difference between price and marginal cost, reflecting pricing power or margins. "Output prices are fixed, so revenue growth reflects real output growth rather than changes in prices or markups."
Market frictions: Impediments to efficient matching or transactions, such as search costs or information gaps. "consistent with GenAI reducing frictions in the marketplace and improving consumer experience."
Non-rivalry: A property of goods (often digital or knowledge goods) where one user’s consumption does not diminish others’ ability to consume. "these exhibit strong non-rivalry: once developed, they can be applied across millions of product listings at negligible marginal cost."
Personalized recommendations: Algorithmic suggestions tailored to individual users to improve discovery and engagement. "removing personalized recommendations discourages consumer search and purchasing, especially for small sellers and niche consumers."
Randomized field experiments: Real-world experiments with random assignment to measure causal effects in operational settings. "Each application was evaluated through randomized field experiments"
Revenue-based productivity: Productivity measured via revenue outcomes, holding inputs and prices constant to infer output changes. "measurable gains in aggregate or firm-level revenue-based productivity attributable to GenAI"
Search costs: The effort and resources required for consumers to find relevant information or products. "search costs, the effort and resources required to locate information"
Search frictions: Obstacles that reduce the effectiveness of search processes in matching consumers with products. "To mitigate search frictions, digital platforms have invested heavily in technology and design innovations"
Search targetability: The effectiveness of search systems in retrieving the most relevant items for a query. "search targetability, the effectiveness of search engines in retrieving the most relevant products."
Solow growth model: A neoclassical framework where output growth is driven by capital, labor, and total factor productivity. "We model the impact of GenAI adoption on firm productivity through the lens of the standard Solow growth model"
Total factor productivity (TFP): The efficiency term in a production function capturing output not explained by measured inputs. "and $A$ is total factor productivity (TFP)."
Treatment effect heterogeneity: Variation in the effects of an intervention across different subgroups or contexts. "and leverage consumer, seller, and product characteristics to study treatment effect heterogeneity."

View Paper Prompt View All Prompts

Continue Learning

Authors (5)

Collections

Tweets

This paper has been mentioned in 6 tweets and received 574 likes.

Upgrade to Pro to view all of the tweets about this paper:

Start a free 7-day Pro trial

HackerNews

Large RCT finds GenAI integration boosts revenues 0% – 16% (1 point, 1 comment)

alphaXiv

Generative AI and Firm Productivity: Field Experiments in Online Retail (11 likes, 0 questions)

Generative AI and Firm Productivity: Field Experiments in Online Retail (2510.12049v1)

Summary

Generative AI and Firm Productivity: Causal Evidence from Field Experiments in Online Retail

Introduction and Motivation

Experimental Design and Methodology

Workflow-Specific Implementations

Pre-sale Service Chatbot

Search Query Refinement

Product Description Generation

Marketing Push Message Creation

Google Advertising Title Optimization

Chargeback Defense

Live Chat Translation

Main Results

Productivity Gains

Mechanisms

Heterogeneity

Implementation Considerations

Theoretical and Practical Implications

Limitations and Future Directions

Conclusion

Paper Prompts

Top Community Prompts

Explain it Like I'm 14

What this paper is about

What the researchers wanted to find

How they tested it

What they found and why it matters

What this means going forward

Knowledge Gaps

Unresolved Knowledge Gaps, Limitations, and Open Questions

Practical Applications

Immediate Applications

Long-Term Applications

Notes on general assumptions and dependencies across applications

Glossary

Continue Learning

Related Papers

Authors (5)

Collections

Tweets

HackerNews

alphaXiv