The Shift to Agentic AI: Evidence from Codex
Abstract: We analyze usage data from OpenAI's Codex tool to present large-scale evidence of how agentic AI technology, which can take actions on a user's behalf, changes how people work. We use an automated, privacy-protecting pipeline to contrast usage across three populations: external personal-account users, external organizational-account users, and workers within OpenAI. We find that agentic AI usage is growing rapidly: the number of active users has grown more than fivefold in the first half of 2026, with the most rapid increase occurring outside the initial audience of software developers. Uptake is uneven: within OpenAI, Codex usage is nearly universal and has largely replaced business usage of ChatGPT. We document a similar shift to agentic tooling outside OpenAI, particularly within organizations, although external adoption remains lower and more uneven. In addition to headline usage figures, we observe measures of sophistication, and find that a growing number of users have used Codex to change their workflows substantially. More than 10% of users manage three or more concurrent Codex agents at some point each week and that 26.6% use skills, which allow users to share instructions for complex workflows. Alongside these changes in usage practices, request complexity has increased: since the start of the year, the share of individual Codex users who submit at least one request for a task estimated to require more than eight hours for an experienced human to complete has increased nearly tenfold. Concurrently, output has grown rapidly -- in June 2026, the median OpenAI employee in a legal role generated 13 times more monthly output tokens across Codex and ChatGPT than they did in November 2025, while the median researcher generated more than 50 times as many. We conclude by discussing the implications of these patterns for productivity, job reorganization, and workforce restructuring.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Explain it Like I'm 14
What is this paper about?
This paper looks at a new kind of AI called “agentic AI.” Unlike a normal chatbot that mainly talks with you, agentic AI can actually do things for you: open files, run tools, write and edit code, draft documents, analyze data, and more. The authors study how people are starting to use OpenAI’s agentic tool, Codex, and how that changes the way they work.
The big questions the authors asked
- Who is using agentic AI, and how fast is it spreading?
- What kinds of tasks are people giving to these AI agents?
- How are people organizing their work around agents (for example, running multiple tasks at once)?
- What does this shift mean for productivity and jobs?
How did they study it?
The researchers analyzed real-world, privacy-protected usage data from Codex. They compared three groups:
- Individual users (people with personal accounts)
- Organizational users (people using Codex through their company)
- OpenAI employees (who are very familiar with advanced AI and had strong incentives and support to try Codex)
To understand what people were doing, they used automated systems (think of them as smart sorters) that:
- Labeled tasks (e.g., coding, writing, data analysis) based on what users asked Codex to produce.
- Estimated task complexity by asking, “How long would this take an experienced human without AI?” (like under 1 hour, 1–8 hours, or more than a full day).
- Measured “output tokens,” which are like tiny pieces of text (similar to Lego bricks of language). Counting tokens is a way to measure how much work the AI produced.
- Tracked how many agents people ran at the same time (parallel “threads”) and how long agents worked on a user’s behalf.
- Noted the use of “skills” (reusable instructions for complex workflows), which is like saving a recipe so you can repeat a multi-step process easily.
All of this was done without reading private messages: the pipeline produced aggregated, anonymized statistics.
What did they find?
1) Adoption is rapid but uneven
- Codex usage grew very quickly in early 2026 (more than fivefold growth in active users).
- Inside OpenAI, use is almost universal and has largely replaced standard chat use for work.
- In companies outside OpenAI, agentic AI is spreading and taking a big share of work, but it’s less widespread than inside OpenAI.
- Among individuals on personal accounts, adoption is still early and patchy.
- Measuring “how much work the AI did” (output tokens) shows an even sharper shift than just counting users—people who adopt Codex often use it heavily.
Why this matters: It shows we’re moving beyond chatting with AI and toward delegating real work—especially where organizations have the right tools, access, and training.
2) People are delegating real production work, not just asking for advice
- Users tell Codex to do hands-on tasks: debug code, refactor programs, validate changes, configure apps, draft documents, and analyze data.
- Over time, tasks got more complex. Many more users now ask Codex to handle jobs that would take an experienced human many hours or even more than a day.
Why this matters: Agentic AI is shifting from “answering questions” to “doing jobs,” which can change how people plan and execute their work.
3) It started with coding, but grows broader as adoption deepens
- The biggest chunk of use is still software-related: writing code, understanding large codebases, testing, managing applications, and keeping systems running.
- Where adoption is deepest (like inside OpenAI), Codex is also used for research, planning, communication, data work, recruiting, sales, and more.
Why this matters: Agentic AI can plug into the full software lifecycle and, when teams get comfortable, spreads into general knowledge work.
4) Power users run large, repeatable, and parallel workflows
- Many users now run multiple agents at the same time (think: managing a small team of AI helpers in parallel). More than 10% of users run three or more agents concurrently at least once a week.
- Heavy users rely on longer-running tasks, reusable “skills,” and complex chains of steps.
- Inside OpenAI, this looks like a new way of working: delegate, monitor, review, and coordinate several agents at once—less “type a request, get an answer,” more “manage a mini factory.”
Why this matters: When people organize work this way, AI isn’t just a smart assistant; it becomes a system you manage—like a set of digital coworkers.
Extra signs of change
- Task complexity jumped: the share of users attempting multi-hour and full-day tasks grew dramatically.
- Output exploded: for example, by June 2026 the median OpenAI employee in a legal role produced 13 times more monthly AI output tokens than in November 2025; median researchers produced over 50 times more.
Why is this important?
This shift could reshape how work is organized:
- Productivity: If agents can take on bigger, longer tasks, people can get more done—especially when they run several agents in parallel.
- Job roles: Work may shift from “doing every step yourself” to “delegating, supervising, and verifying.” Skills like planning, reviewing, and domain expertise (knowing what “good” looks like) become more valuable.
- Organizations: Big gains come when companies redesign workflows around agents—giving access to the right files and tools, setting up review steps, training people, and sharing best practices. That’s why OpenAI (with strong support and training) shifted faster than most.
In simple terms: Think of agentic AI as a team of reliable digital helpers. The people and companies that learn to plan, assign, and check their work can move faster and do more.
Bottom line
The paper shows a clear move from chat-style AI to agentic AI that actually does work. Adoption is booming where teams have support and the right setup, tasks are getting more complex, and power users are managing multiple agents at once. If organizations redesign their processes to fit this new reality—teaching people how to delegate, review, and coordinate—agentic AI could bring large and lasting productivity gains and change how many jobs are done.
Knowledge Gaps
Unresolved Knowledge Gaps, Limitations, and Open Questions
Below is a focused list of what remains missing, uncertain, or unexplored in the paper, framed to guide actionable future research:
- Causal impact on productivity: No causal identification of how agentic AI affects task time, throughput, or quality (e.g., randomized access, staggered rollout, or instrumented feature gating to estimate treatment effects).
- Output tokens vs. value: Reliance on token counts as a proxy for work/output lacks validation against business value, human time saved, or quality-adjusted deliverables.
- Quality and error rates: Absence of measures of correctness, rework, defect rates, or downstream incidents resulting from agent-generated artifacts (e.g., code bugs, legal drafting errors).
- Completion and success metrics: No tracking of whether agentic tasks reach successful completion, require human takeover, or fail silently; lack of “task success” KPIs for workflows.
- Human oversight costs: Unmeasured verification, supervision, and coordination burden imposed on users managing agents; no time-use or attention-cost accounting.
- Learning curves and skill acquisition: No longitudinal analysis of user learning, ramp-up dynamics, or which training interventions accelerate effective agentic use.
- Confounding improvements: Inability to disentangle effects of model upgrades, UI/UX changes, pricing, and organizational initiatives on adoption and intensity.
- External validity beyond OpenAI: OpenAI’s near-frictionless internal environment is acknowledged as atypical, but the paper does not quantify which complements (e.g., permissions, repo access, internal skills libraries) are necessary for replication in typical firms.
- Organizational heterogeneity: Limited evidence on differences across industries, geographies, firm size/maturity, regulatory environments, and security postures; unclear barriers to adoption in less tech-centric contexts.
- Job redesign and role evolution: No systematic measurement of how responsibilities, team structures, and managerial layers change when agentic workflows scale.
- Seniority dynamics: While task mix by seniority is described, the causal implications for delegation patterns (e.g., manager-as-orchestrator vs. IC-as-implementer) remain untested.
- Verification pipelines: Lack of concrete designs/benchmarks for effective review, audit trails, or automated guardrails that reduce supervisory burden without compromising safety.
- Safety, compliance, and data governance: No analysis of data leakage risks, policy violations, or compliance incidents arising from agent tool access and file operations.
- Economic tradeoffs and costs: No cost accounting for agent runtime, parallelism, or long-running tasks; missing cost-benefit analyses at user, team, and org levels.
- Task taxonomy fidelity: Task labels are assigned from the initial request rather than the full action graph; execution-phase actions and outcomes may diverge from the labeled intent.
- Classifier validity and bias: Automated classifiers (persona, job title, task type, complexity) have limited validation; potential misclassification across roles, domains, and languages is unexplored.
- Complexity estimation scope: Task complexity is measured on a 0.1% sample of Individual users who opted in; no analogous estimates for organizational or OpenAI users, and no external ground-truth benchmarking.
- Measuring actual “agency”: Tool invocation and thread structure are imperfect proxies for autonomous action; the paper does not specify or validate stronger agency metrics (e.g., autonomous toolchains, file diffs, multi-step execution without user input).
- Concurrency interpretation: Parallel turns are measured, but cognitive load, interruption costs, coordination strategies, and the marginal value of additional concurrent agents are unmeasured.
- Workflow persistence and reuse: Skill adoption is cited but not deeply analyzed; no measurement of retention, standardization, or performance gains from reusable skills/plugins over time.
- End-to-end software outcomes: For software tasks, there is no linkage to repo-level outcomes (commit acceptance, rollback frequency, CI/CD failures, incident rates, mean time to restore).
- Non-software work outcomes: For legal, sales, recruiting, and operations tasks, there are no domain-specific outcome metrics (e.g., contract accuracy, pipeline conversion, time-to-fill, SLA adherence).
- Human-AI task boundary: Unclear which task segments (planning, implementation, validation) benefit most from delegation; no mapping of “automation adjacency” or chainable segments that drive the largest gains.
- Inequality and distributional effects: No analysis of whether agentic AI widens performance dispersion across workers or firms, or how benefits accrue by skill level and role.
- Retention and cohort dynamics: Adoption is presented cross-sectionally; missing cohort analyses of retention, escalation from conversational to agentic use, and saturation points.
- Comparative benchmarks: No head-to-head, task-matched comparisons between Codex and conversational ChatGPT (or other agentic products) to quantify differential value on identical tasks.
- Access and permissions constraints: The role of system access (files, repos, SaaS integrations) in enabling or constraining agentic workflows is asserted but not measured or experimentally varied.
- Error recovery and escalation: No telemetry on how users detect, diagnose, and correct agent failures; absence of patterns or tools that reduce recovery cost.
- Governance of shared skills: Unexplored questions about versioning, ownership, provenance, and quality assurance for shared skills within and across organizations.
- Long-run dynamics: Short observation window (late 2025 to mid-2026) limits conclusions about durability, saturation, or post-novelty usage patterns.
- Human factors and UX: No evaluation of the cognitive ergonomics of agent orchestration (thread design, notifications, dashboards) and their impact on outcomes.
- Privacy-preserving analysis limits: The privacy pipeline restricts content inspection, which may systematically bias classification and task inference; alternative methods (e.g., secure enclaves, federated analytics) are not explored.
- Policy and regulatory implications: The organizational prerequisites and controls needed to meet sectoral regulations (e.g., finance, health, public sector) are not examined.
- Spillovers and complementarities: Interactions between agentic AI and adjacent tools (RPA, BI platforms, ticketing, design systems) and the resultant workflow synergies or conflicts are not measured.
Practical Applications
Overview
Based on the paper’s evidence about rapid but uneven diffusion of agentic AI (OpenAI Codex) across individuals, organizations, and OpenAI itself—and its documented shifts in delegation, task complexity, parallelization, and workflow design—the following practical applications emerge for industry, academia, policymakers, and daily life. Where relevant, sector links, prospective tools/products/workflows, and feasibility assumptions/dependencies are included.
Immediate Applications
These can be deployed now with today’s agentic capabilities and standard enterprise IT practices.
Industry
- Software engineering life cycle automation (software)
- Use agents for code implementation, debugging, refactoring, code understanding, validation, engineering operations, and application management; integrate into CI/CD and repo workflows to generate/validate PRs, maintain documentation, and manage environments.
- Tools/workflows: agentic IDEs; CI bots that run tests/lint/refactor; repo-aware “skills” libraries for repeatable runbooks; threaded agents for parallel tasks (e.g., refactor + doc + tests).
- Dependencies: repo and ticketing access (GitHub/GitLab/Jira); permissioning and audit logs; human-in-the-loop code review; model/tool reliability; cost/latency budgeting.
- DevOps and IT operations runbooks (software, IT)
- Encode common operational runbooks as reusable “skills” to standardize configuration, deployment, and incident triage; enable long-running agents for routine checks with human approval gates.
- Tools/workflows: “Skills” repositories; approval workflows in chat/ITSM; agent action logs; recovery playbooks.
- Dependencies: secure tool access; change management; rollback plans; observability integration.
- Knowledge artifact production at scale (legal, HR/recruiting, sales/marketing, product)
- Draft and iterate memos, proposals, contracts, job descriptions, interview packets, sales collateral, and product requirement documents with agentic workflows that pull from internal files.
- Tools/workflows: document agents tied to drive/wiki/CRM; templated skill packs; parallel thread drafting and SME review.
- Dependencies: content governance; source-of-truth linking; PII redaction; review/approval policies.
- Data analysis and reporting (analytics, finance, operations)
- Delegate spreadsheet transformation, EDA, charting, and recurring report generation to agents; use “skills” for repeatable pipelines; route higher-complexity requests to more capable runs.
- Tools/workflows: analysis agents tethered to BI/warehouse; scheduled runs; thread-based revision cycles.
- Dependencies: read-only access to data sources (initially); validation checklists; ACLs and privacy controls.
- Parallelized task management for power users (cross-function)
- Adopt threaded, concurrent delegation (e.g., running 2–5 agents at once) for complex projects; monitor, review, and merge outputs rather than serial “ask–answer.”
- Tools/workflows: agent dashboards; progress status summaries; concurrency guardrails (compute/priority).
- Dependencies: user training on delegation/verification; clear escalation paths; compute quotas.
- Adoption analytics and governance (enterprise IT, risk)
- Track shift from conversational to agentic interfaces using token-share metrics; establish autonomy tiers, tool-access policies, and auditability for agent actions.
- Tools/workflows: usage dashboards; autonomy level catalog; action ledgers; exception monitoring.
- Dependencies: logging standards; RBAC; data-retention policies; internal buy-in.
- Delegation and verification training (L&D, HR)
- Teach employees to scope tasks, set acceptance criteria, design verification, and reuse “skills”; emphasize supervision, error handling, and parallel thread management.
- Tools/workflows: internal playbooks; code/doc review checklists; “prompt-to-skill” templates.
- Dependencies: time for upskilling; departmental champions; feedback loops.
Academia
- Privacy-preserving usage measurement and taxonomy research
- Apply the paper’s task-taxonomy and persona/job-title classifiers to study diffusion and task mix without inspecting content; measure complexity, runtime, concurrency.
- Tools/workflows: classifier prompts and labels; token-based intensity metrics; opt-in datasets.
- Dependencies: IRB approvals; anonymization pipelines; institutional data-sharing agreements.
- Curricula for agent supervision and workflow design
- Incorporate agentic task design, verification, and multi-agent orchestration into CS/IS/business courses and capstones.
- Dependencies: access to agentic tooling; sandboxed repos/data; assessment rubrics focused on delegation.
Policy and Governance
- Enterprise guidance on agent autonomy, logging, and access
- Issue internal standards for tool invocation, system access, and auditable action logs; define approval tiers by task risk.
- Dependencies: CISO/legal approval; secure integration; workforce communication.
- Targeted upskilling support
- Leverage evidence that productivity gains concentrate where complements (skills, processes) exist to fund organizational training in delegation/verification.
- Dependencies: program funding; outcome metrics tied to adoption and quality.
Daily Life
- Personal productivity automation
- Use agents for drafting resumes/letters, organizing notes, managing small projects, and learning tasks; encode personal “skills” (e.g., budgeting template updates, study plans).
- Tools/workflows: file-linked agents; recurring checklist skills; parallel drafting + revision threads.
- Dependencies: cautious file permissions; verification; awareness of privacy and data-sharing settings.
- Parallel microtasking with supervision
- Run 2–3 concurrent threads for discrete tasks (e.g., travel plan + email draft + budget update) and review outputs.
- Dependencies: user familiarity with threaded UI; time to validate outputs; tolerance for occasional error.
Long-Term Applications
These require additional research, scaling, integration, or organizational change; they follow from observed frontier usage (e.g., heavy concurrency, long-running tasks, complex delegation) and the paper’s emphasis on complements.
Industry
- Enterprise-wide multi-agent orchestration platforms (software, IT, cross-function)
- Operate fleets of agents orchestrated by “supervisor” agents/humans, coordinating dozens of parallel threads across engineering, operations, and business functions.
- Tools/products: AgentOps platforms (scheduling, retry, dependency graphs, SLAs); hierarchical agent architectures; cross-system connectors (ERP/CRM/ITSM).
- Dependencies: robust verification pipelines; strong identity and fine-grained permissions; reliability/latency SLAs; cultural adoption.
- Delegation-first workflow redesign and new roles
- Redesign jobs around delegation, verification, and coordination; formalize roles like “AI production manager” and “agent supervisor” overseeing throughput and quality.
- Tools/workflows: delegation Kanban; automated acceptance tests; throughput/quality dashboards.
- Dependencies: job architecture changes; incentives; performance evaluation aligned to oversight.
- Sector-specific scaled deployments
- Healthcare: clinical documentation, prior authorization packets, quality reporting—agent-generated with clinician verification.
- Dependencies: HIPAA/GDPR compliance; EHR integrations; medical QA; liability frameworks.
- Education: large-scale tutoring, rubric-aligned grading assistance, courseware generation with instructor review.
- Dependencies: academic integrity policies; LMS integrations; fairness/consistency checks.
- Finance: reconciliations, regulatory reporting, audit preparation with immutable action ledgers.
- Dependencies: model risk management; SOX-compliant logs; segregation of duties; data lineage.
- Legal: e-discovery triage, contract lifecycle automation with redline explainability and approval workflows.
- Dependencies: privilege protection; explainable diffs; firm-specific clause libraries.
- HR/Recruiting: sourcing, screening artifact prep, structured interview packs, candidate comms.
- Dependencies: bias mitigation; consent; ATS integrations; auditability.
- Energy/Utilities: operations documentation, maintenance planning, grid optimization analyses.
- Dependencies: secure OT/IT separation; systems models; fail-safe controls.
- Verification-at-scale and safety pipelines
- Continuous verification, synthetic test generation, proof obligations for agent actions; automated “gatekeepers” for high-risk steps.
- Tools/products: test harnesses for non-code artifacts; policy checks; red-teaming simulators.
- Dependencies: domain test libraries; human adjudication; traceability.
- Skill marketplaces and interoperability standards
- Internal/external marketplaces for reusable “skills” with versioning, permissions, and provenance; cross-platform skill standards.
- Dependencies: vendor-neutral formats; signing/attestation; governance for updates and deprecation.
- Complexity-aware task routers and schedulers
- Route tasks by estimated human-hours and risk level to appropriate agents/humans; schedule long-running jobs for off-peak compute.
- Dependencies: accurate complexity estimation; queueing infrastructure; escalation rules.
Academia
- Longitudinal studies on organizational complements and productivity
- Measure how redesigned workflows (parallelization, skill reuse, supervision intensity) map to firm-level outcomes; replicate across industries.
- Dependencies: multi-tenant data access; standardized metrics; cooperation from firms.
- Benchmarks and evaluation frameworks for agentic tasks
- Create benchmarks for delegated multi-step tasks (beyond chat), including verification quality and supervision cost.
- Dependencies: reproducible task suites; measurement standards; community adoption.
Policy and Governance
- Regulatory standards for agent action logging and accountability
- Mandate actionable, tamper-evident logs for agent actions; define responsibility in delegated workflows (human-on-the-loop).
- Dependencies: technical standards bodies; industry alignment; enforcement mechanisms.
- Data privacy and access frameworks for agentic execution
- Update data-protection rules to cover agents that execute commands/read files; standardize consent and minimization for tool use.
- Dependencies: legal clarity; certifiable controls; third-party audits.
- Workforce transition and reskilling programs at scale
- Invest in large-scale training for agent supervision, verification, and domain expertise; support transitions as roles shift.
- Dependencies: funding; credentialing pathways; outcome tracking.
- Interoperability for tool/skill ecosystems
- Support open APIs and schemas so skills/agents interoperate across vendors and enterprise systems.
- Dependencies: open standards; vendor participation; security reviews.
Daily Life
- Personal multi-agent “household ops” (longer horizon)
- Coordinated agents for finance management, home maintenance scheduling, learning plans, and trip logistics with shared context and calendars.
- Dependencies: secure integrations (banks, utilities, calendars); household policy settings; robust fail-safes.
- Agent-mediated marketplaces and concierge services
- Agents negotiate appointments, purchases, and services under user policies, with receipts and audit trails.
- Dependencies: merchant APIs; identity/payment safeguards; dispute resolution frameworks.
Notes on Feasibility and Dependencies
- Organizational complements are critical: the paper’s evidence shows adoption is deepest where training, access, and review processes exist (e.g., within OpenAI).
- Secure integration is a gating factor: value depends on access to files, repos, and tools with proper permissions and auditability.
- Verification and supervision are central: as complexity and autonomy increase, human review and automated checks determine realized productivity gains.
- Workflow adjacency and chaining matter: productivity is highest when agents can execute contiguous task chains; fragmented processes limit gains.
- Heterogeneous adoption persists: non-technical roles can benefit, but require tailored skills, templates, and training.
- Cost, latency, and reliability constraints will shape parallelization and long-running agent use until infrastructure and models improve.
Glossary
- Agentic AI: AI systems that can autonomously take actions on a user’s behalf, beyond simple conversation. "agentic AI technology, which can take actions on a user's behalf,"
- Application management: Tasks related to configuring, operating, and maintaining software applications. "Engineering operations, code implementation, code understanding, application management, and code validation account for a large share of Codex activity across groups."
- Business-function workflows: Delegated processes tied to specific business domains (e.g., sales, recruiting, marketing). "as well as broader knowledge-work categories such as data analysis, research, knowledge artifacts, collaboration, and business-function workflows."
- Code implementation: Creating or modifying program code to add features or fix issues. "including code implementation, code understanding, code validation, engineering operations, and application management."
- Code understanding: Analyzing existing code to comprehend behavior, structure, or dependencies. "including code implementation, code understanding, code validation, engineering operations, and application management."
- Code validation: Verifying that code changes are correct, robust, and meet requirements (e.g., via tests, checks). "including code implementation, code understanding, code validation, engineering operations, and application management."
- Concurrency: Running multiple AI tasks at the same time across different threads or agents. "Concurrency measures whether users run multiple Codex turns at the same time."
- Delegated production: Having AI carry out concrete work tasks end-to-end rather than merely advising. "Codex use is strongly oriented toward delegated production."
- Delegated workflow: A sequence of steps handed off to AI to execute autonomously toward a user-defined goal. "the relevant unit of analysis is a delegated workflow rather than a conversation."
- Electrification: Historical shift to electric power used as an analogy for reorganizing production around new technology. "In the early stages of electrification, many factories replaced centralized steam engines with centralized electric motors while preserving existing factory layouts and work patterns."
- Engineering operations: Operational tasks supporting software engineering (e.g., CI/CD, environment setup, repo management). "including code implementation, code understanding, code validation, engineering operations, and application management."
- Extensive margin: Whether users adopt or use a tool at all, irrespective of intensity. "Panel A shows the extensive margin: whether active users of either product use Codex at all."
- General-purpose technologies: Broad innovations that enable widespread changes in production and productivity. "the literature on general-purpose technologies suggests that the largest productivity gains often arise when firms reorganize production around the new technology rather than merely substitute it into existing workflows."
- Human–AI collaboration: Joint work where humans and AI systems contribute complementary capabilities. "increased demand and skill complexity in jobs involving human--AI collaboration."
- Intensive margin: How much a tool is used among adopters, often measured by output share or volume. "Panel B shows the intensive margin: the share of output tokens produced through Codex rather than ChatGPT."
- Knowledge artifacts: Written or structured outputs that codify knowledge (e.g., docs, specs, reports). "Inside OpenAI, across developer and non-developer roles, knowledge artifacts, collaboration, and application management are common tasks."
- Organizational complements: Processes, skills, and structures that firms must develop to realize value from new tech. "technology diffusion, organizational complements, and workplace change."
- Output tokens: The model-generated token units used to quantify AI output volume. "Panel B shows the intensive margin: the share of output tokens produced through Codex rather than ChatGPT."
- Persona classifier: An automated method to label users by usage persona (e.g., Developer, General Knowledge Worker). "We validated the persona classifier using a small sample of employees."
- Runtime: The amount of time an agent is actively working on a user’s behalf. "Runtime measures how much active agent work occurs on a user's behalf."
- Skills: Reusable, shareable instructions or integrations that encode complex, repeatable workflows. "skills, which allow users to share instructions for complex workflows."
- Task complexity: An estimate of human time required to complete a delegated task without AI. "Task-complexity measures the estimated time it would take an experienced human to complete the tasks that users delegate."
- Task taxonomy: A structured label space for categorizing tasks delegated to AI. "we classify Codex requests into a fixed two-level task taxonomy."
- Technology diffusion: The spread of new technologies across users, firms, and contexts. "technology diffusion, organizational complements, and workplace change."
- Threaded interaction model: Interface paradigm where multiple agent threads run independently and in parallel. "Codex, like many AI agents, uses a threaded interaction model in which users can initiate multiple agents and interact with each one in a largely independent workspace."
- Tool invocation: Calls made by an AI agent to external tools or services during execution. "some tool invocations are part of simple conversational interactions"
- Turn: A discrete unit of interaction or execution within an agent thread. "we calculate the number of overlapping turns they have in different threads"
- Verification: Processes to check, review, or validate AI-produced work for correctness and quality. "making supervision, verification, and coordination central determinants of value creation"
- Workflow system: A coordinated environment for delegating, monitoring, and integrating multiple streams of AI work. "Codex is less an assistant answering requests and more like a workflow system in which the user delegates, monitors, reviews, and coordinates multiple streams of work."
Collections
Sign up for free to add this paper to one or more collections.




