Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Visibility into AI Agents (2401.13138v6)

Published 23 Jan 2024 in cs.CY and cs.AI

Abstract: Increased delegation of commercial, scientific, governmental, and personal activities to AI agents -- systems capable of pursuing complex goals with limited supervision -- may exacerbate existing societal risks and introduce new risks. Understanding and mitigating these risks involves critically evaluating existing governance structures, revising and adapting these structures where needed, and ensuring accountability of key stakeholders. Information about where, why, how, and by whom certain AI agents are used, which we refer to as visibility, is critical to these objectives. In this paper, we assess three categories of measures to increase visibility into AI agents: agent identifiers, real-time monitoring, and activity logging. For each, we outline potential implementations that vary in intrusiveness and informativeness. We analyze how the measures apply across a spectrum of centralized through decentralized deployment contexts, accounting for various actors in the supply chain including hardware and software service providers. Finally, we discuss the implications of our measures for privacy and concentration of power. Further work into understanding the measures and mitigating their negative impacts can help to build a foundation for the governance of AI agents.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (177)
  1. 42 U.S. Code § 17941 - Recognition of security practices. URL https://www.law.cornell.edu/uscode/text/42/17941.
  2. Proposal for a Regulation of the European Parliament and of the Council Laying Down Harmonised Rules on Artificial Intelligence (Artificial Intelligence Act) and Amending Certain Union Legislative Acts, 2021. URL https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=celex%3A52021PC0206.
  3. 12 CFR § 1026.25 - Record retention. Federal Register, January 2022.
  4. 14 CFR § 91.609 - Flight data recorders and cockpit voice recorders. Federal Register, January 2023a.
  5. character.ai, 2023b. URL https://beta.character.ai/.
  6. Artificial Intelligence, Automation, and Work. In The Economics of Artificial Intelligence: An Agenda, pages 197–236. University of Chicago Press, January 2018. URL https://www.nber.org/books-and-chapters/economics-artificial-intelligence-agenda/artificial-intelligence-automation-and-work.
  7. Automation and New Tasks: How Technology Displaces and Reinstates Labor. Journal of Economic Perspectives, 33(2):3–30, May 2019. ISSN 0895-3309. doi: 10.1257/jep.33.2.3. URL https://www.aeaweb.org/articles?id=10.1257/jep.33.2.3.
  8. Frontier AI Regulation: Managing Emerging Risks to Public Safety, September 2023. URL http://arxiv.org/abs/2307.03718. arXiv:2307.03718 [cs].
  9. Artificial Intelligence Can Persuade Humans on Political Issues. January 2024. doi: 10.31219/osf.io/stakv. URL https://osf.io/stakv. Publisher: OSF.
  10. Constitutional AI: Harmlessness from AI Feedback, December 2022. URL http://arxiv.org/abs/2212.08073. arXiv:2212.08073 [cs].
  11. Image Hijacks: Adversarial Images can Control Generative Models at Runtime, September 2023. URL https://arxiv.org/abs/2309.00236v2.
  12. Tessa Baker. The EU AI Act: A Primer, September 2023. URL https://cset.georgetown.edu/article/the-eu-ai-act-a-primer/.
  13. Kirstie Ball. Workplace surveillance: an overview. Labor History, 51(1):87–106, February 2010. ISSN 0023-656X. doi: 10.1080/00236561003654776. URL https://doi.org/10.1080/00236561003654776. Publisher: Routledge _eprint: https://doi.org/10.1080/00236561003654776.
  14. The Direct Approach, 2023. URL https://epochai.org/blog/the-direct-approach.
  15. Purple Llama CyberSecEval: A Secure Coding Benchmark for Language Models, December 2023. URL http://arxiv.org/abs/2312.04724. arXiv:2312.04724 [cs].
  16. PowerGridworld: a framework for multi-agent reinforcement learning in power systems. In Proceedings of the Thirteenth ACM International Conference on Future Energy Systems, e-Energy ’22, pages 565–570, New York, NY, USA, June 2022. Association for Computing Machinery. ISBN 978-1-4503-9397-3. doi: 10.1145/3538637.3539616. URL https://dl.acm.org/doi/10.1145/3538637.3539616.
  17. Monitoring approaches for health-care workers during the COVID-19 pandemic. The Lancet. Infectious Diseases, 20(10):e261–e267, October 2020. ISSN 1474-4457. doi: 10.1016/S1473-3099(20)30458-8.
  18. Patrick Birkinshaw. Freedom of Information and Openness: Fundamental Human Rights? Administrative Law Review, 58(1):177–218, 2006. ISSN 0001-8368. URL https://www.jstor.org/stable/40712007. Publisher: American Bar Association.
  19. Exploring the Relevance of Data Privacy-Enhancing Technologies for AI Governance Use Cases, March 2023. URL http://arxiv.org/abs/2303.08956. arXiv:2303.08956 [cs].
  20. Board of Governors of the Federal Reserve System. Proactive Monitoring of Markets and Institutions, December 2021. URL https://www.federalreserve.gov/financial-stability/proactive-monitoring-of-markets-and-institutions.htm.
  21. Emergent autonomous scientific research capabilities of large language models, April 2023. URL http://arxiv.org/abs/2304.05332. arXiv:2304.05332 [physics].
  22. Picking on the Same Person: Does Algorithmic Monoculture lead to Outcome Homogenization?, November 2022. URL http://arxiv.org/abs/2211.13972. arXiv:2211.13972 [cs].
  23. Ecosystem Graphs: The Social Footprint of Foundation Models, March 2023. URL http://arxiv.org/abs/2303.15772. arXiv:2303.15772 [cs].
  24. A 61-million-person experiment in social influence and political mobilization. Nature, 489(7415):10.1038/nature11421, September 2012. ISSN 0028-0836. doi: 10.1038/nature11421. URL https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3834737/.
  25. Samuel Bowman. The Dangers of Underclaiming: Reasons for Caution When Reporting How NLP Systems Fail. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 7484–7499, Dublin, Ireland, May 2022. Association for Computational Linguistics. doi: 10.18653/v1/2022.acl-long.516. URL https://aclanthology.org/2022.acl-long.516.
  26. ChemCrow: Augmenting large-language models with chemistry tools, June 2023. URL http://arxiv.org/abs/2304.05376. arXiv:2304.05376 [physics, stat].
  27. The Society of Algorithms. Annual Review of Sociology, 47(1):213–237, 2021. doi: 10.1146/annurev-soc-090820-020800. URL https://doi.org/10.1146/annurev-soc-090820-020800. _eprint: https://doi.org/10.1146/annurev-soc-090820-020800.
  28. Vitalik Buterin. My Techno-Optimism, November 2023. URL https://vitalik.eth.limo/general/2023/11/27/techno_optimism.html.
  29. Human Control: Definitions and Algorithms. In Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence, pages 271–281. PMLR, July 2023. URL https://proceedings.mlr.press/v216/carey23a.html. ISSN: 2640-3498.
  30. John Cassidy. How markets fail: The logic of economic calamities. Farrar, Straus and Giroux, 2009.
  31. CFTC. CFTC Market Surveillance Program, November 2023. URL https://www.cftc.gov/IndustryOversight/MarketSurveillance/CFTCMarketSurveillanceProgram/index.htm.
  32. CFTC. Large Trader Reporting Program, November 2023. URL https://www.cftc.gov/IndustryOversight/MarketSurveillance/LargeTraderReportingProgram/index.htm.
  33. Reclaiming the Digital Commons: A Public Data Trust for Training Data. In Proceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society, AIES ’23, pages 855–868, New York, NY, USA, August 2023a. Association for Computing Machinery. ISBN 9798400702310. doi: 10.1145/3600211.3604658. URL https://dl.acm.org/doi/10.1145/3600211.3604658.
  34. Harms from Increasingly Agentic Algorithmic Systems. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’23, pages 651–666, New York, NY, USA, June 2023b. Association for Computing Machinery. ISBN 9798400701924. doi: 10.1145/3593013.3594033. URL https://dl.acm.org/doi/10.1145/3593013.3594033.
  35. Harrison Chase. LangChain 0.0.77 Docs, 2022. URL https://langchain.readthedocs.io/en/latest/modules/agents/getting_started.html.
  36. Data, privacy, and security for Azure OpenAI Service - Azure AI services, June 2023. URL https://learn.microsoft.com/en-us/legal/cognitive-services/openai/data-privacy.
  37. Google Cloud. Generative AI, Privacy, and Google Cloud. Technical report, 2023. URL https://services.google.com/fh/files/misc/genai_privacy_google_cloud_202308.pdf.
  38. Seeking Truth for Power: Informational Strategy and Regulatory Policy Making. Minnesota Law Review, January 2004. URL https://scholarship.law.upenn.edu/faculty_scholarship/107.
  39. Complex networks: structure, robustness and function. Cambridge university press, 2010.
  40. Securities and Exchange Commission. 17 CFR § 240.17a-4 - Records to be preserved by certain exchange members, brokers and dealers. Federal Register, April 2022a.
  41. Securities and Exchange Commission. 17 CFR § 240.17a-3 - Records to be made by certain exchange members, brokers and dealers. Federal Register, April 2022b.
  42. Accountability in an Algorithmic Society: Relationality, Responsibility, and Robustness in Machine Learning. In 2022 ACM Conference on Fairness, Accountability, and Transparency. ACM, June 2022. doi: 10.1145/3531146.3533150. URL https://doi.org/10.1145%2F3531146.3533150.
  43. PCI Security Standars Council. Payment Card Industry Data Security Standard, March 2022.
  44. TASRA: a Taxonomy and Analysis of Societal-Scale Risks from AI, June 2023. URL http://arxiv.org/abs/2306.06924. arXiv:2306.06924 [cs].
  45. CTFC and SEC. Preliminary findings regarding the market events of may 6, 2010. Technical report, U.S. Commodity Futures Trading Commission and U.S. Securities & Exchange Commission, May 2010. URL https://www.sec.gov/sec-cftc-prelimreport.pdf. tex.creationdate: 2023-10-29T16:47:57.
  46. Mary Cummings. Automation Bias in Intelligent Time Critical Decision Support Systems. In AIAA 1st Intelligent Systems Technical Conference, Infotech@Aerospace Conferences. American Institute of Aeronautics and Astronautics, September 2004. doi: 10.2514/6.2004-6313. URL https://arc.aiaa.org/doi/10.2514/6.2004-6313.
  47. Ernesto Dal Bó. Regulatory capture: A review. Oxford review of economic policy, 22(2):203–225, 2006. Publisher: Oxford University Press.
  48. AI capabilities can be significantly improved without expensive retraining, December 2023. URL http://arxiv.org/abs/2312.07413. arXiv:2312.07413 [cs].
  49. Alejandro De La Garza. States’ Automated Systems Are Trapping Citizens in Bureaucratic Nightmares With Their Lives on the Line. TIME, May 2020. URL https://time.com/5840609/algorithm-unemployment/.
  50. Magnetic control of tokamak plasmas through deep reinforcement learning. Nature, 602(7897):414–419, February 2022. ISSN 1476-4687. doi: 10.1038/s41586-021-04301-9. URL https://www.nature.com/articles/s41586-021-04301-9. Number: 7897 Publisher: Nature Publishing Group.
  51. Bottom-up data Trusts: disturbing the ‘one size fits all’ approach to data governance. International Data Privacy Law, 9(4):236–252, November 2019. ISSN 2044-3994. doi: 10.1093/idpl/ipz014. URL https://doi.org/10.1093/idpl/ipz014.
  52. Democratising the Digital Revolution: The Role of Data Governance, June 2020. URL https://papers.ssrn.com/abstract=3720208.
  53. Renee DiResta. A New Law Makes Bots Identify Themselves—That’s the Problem. Wired, July 2019. ISSN 1059-1028. URL https://www.wired.com/story/law-makes-bots-identify-themselves/. Section: tags.
  54. Florian E. Dorner. Algorithmic collusion: A critical review, October 2021. URL http://arxiv.org/abs/2110.04740. arXiv:2110.04740 [cs].
  55. Oversight for Frontier AI through a Know-Your-Customer Scheme for Compute Providers, October 2023. URL http://arxiv.org/abs/2310.13625. arXiv:2310.13625 [cs].
  56. Natural selection of artificial intelligence. 2023.
  57. Let’s Encrypt. Let’s Encrypt Stats - Let’s Encrypt, January 2024. URL https://letsencrypt.org/stats/.
  58. Level of automation effects on performance, situation awareness and workload in a dynamic control task. Ergonomics, 42(3):462–492, March 1999. ISSN 0014-0139. doi: 10.1080/001401399185595.
  59. The Out-of-the-Loop Performance Problem and Level of Control in Automation. Human Factors, 37(2):381–394, June 1995. ISSN 0018-7208. doi: 10.1518/001872095779064555. URL https://doi.org/10.1518/001872095779064555. Publisher: SAGE Publications Inc.
  60. Epoch. Key trends and figures in Machine Learning, 2023. URL https://epochai.org/trends.
  61. Algorithmic progress in computer vision, 2023. _eprint: 2212.05153.
  62. Agency Problems and Residual Claims. The Journal of Law & Economics, 26(2):327–349, 1983. ISSN 0022-2186. URL https://www.jstor.org/stable/725105. Publisher: [University of Chicago Press, Booth School of Business, University of Chicago, University of Chicago Law School].
  63. They who must not be identified—distinguishing personal from non-personal data under the GDPR. International Data Privacy Law, 10(1):11–36, February 2020. ISSN 2044-3994. doi: 10.1093/idpl/ipz026. URL https://doi.org/10.1093/idpl/ipz026.
  64. Kevin Frazier. The Right to Reality, December 2023. URL https://www.lawfaremedia.org/article/the-right-to-reality.
  65. Application of LLM Agents in Recruitment: A Novel Framework for Resume Screening, January 2024. URL http://arxiv.org/abs/2401.08315. arXiv:2401.08315 [cs].
  66. Predictability and Surprise in Large Generative Models. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’22, pages 1747–1764, New York, NY, USA, June 2022. Association for Computing Machinery. ISBN 978-1-4503-9352-2. doi: 10.1145/3531146.3533229. URL https://dl.acm.org/doi/10.1145/3531146.3533229.
  67. Datasheets for datasets. Communications of the ACM, 64(12):86–92, December 2021. ISSN 0001-0782, 1557-7317. doi: 10.1145/3458723. URL https://dl.acm.org/doi/10.1145/3458723.
  68. Reward Reports for Reinforcement Learning. In Proceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society, AIES ’23, pages 84–130, New York, NY, USA, August 2023. Association for Computing Machinery. ISBN 9798400702310. doi: 10.1145/3600211.3604698. URL https://dl.acm.org/doi/10.1145/3600211.3604698.
  69. Chloe Goodwin. Cooperation or resistance?: The role of tech companies in government surveillance. 131:1722–1722, 2018.
  70. Google. Bard Privacy Help Hub - Bard Help, December 2023. URL https://support.google.com/bard/answer/13594961?sjid=16420951458997305974-EU&visit_id=638406643042311657-3533103185&p=bard_pntos_retention&rd=1#retention&zippy=%2Cwhy-does-google-retain-my-conversations-after-i-turn-off-bard-activity-and-what-does-google-do-with-this-data.
  71. Ben Green. The flaws of policies requiring human oversight of government algorithms. Computer Law & Security Review, 45:105681, July 2022. ISSN 0267-3649. doi: 10.1016/j.clsr.2022.105681. URL https://www.sciencedirect.com/science/article/pii/S0267364922000292.
  72. David Green. Emergence in complex networks of simple agents. Journal of Economic Interaction and Coordination, 18:1–44, May 2023. doi: 10.1007/s11403-023-00385-w.
  73. AI Control: Improving Safety Despite Intentional Subversion, January 2024. URL http://arxiv.org/abs/2312.06942. arXiv:2312.06942 [cs].
  74. NSA Prism program taps in to user data of Apple, Google and others. The Guardian, June 2013. ISSN 0261-3077. URL https://www.theguardian.com/world/2013/jun/06/us-tech-giants-nsa-data.
  75. Surveillance on Healthcare Workers During the First Wave of SARS-CoV-2 Pandemic in Italy: The Experience of a Tertiary Care Pediatric Hospital. Frontiers in Public Health, 9, 2021. ISSN 2296-2565. URL https://www.frontiersin.org/articles/10.3389/fpubh.2021.644702.
  76. Lewis Hammond and TBD. Multi-Agent Risks from Advanced AI. Technical report, 2024.
  77. Julian Hazell. Large Language Models Can Be Used To Effectively Scale Spear Phishing Campaigns, May 2023. URL http://arxiv.org/abs/2305.06972. arXiv:2305.06972 [cs].
  78. Dan Hendrycks. Natural Selection Favors AIs over Humans, July 2023. URL http://arxiv.org/abs/2303.16200. arXiv:2303.16200 [cs].
  79. Runtime Monitoring of Dynamic Fairness Properties. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’23, pages 604–614, New York, NY, USA, June 2023. Association for Computing Machinery. ISBN 9798400701924. doi: 10.1145/3593013.3594028. URL https://dl.acm.org/doi/10.1145/3593013.3594028.
  80. Trends in GPU Price-Performance, 2022. URL https://epochai.org/blog/trends-in-gpu-price-performance.
  81. Trends in machine learning hardware, 2023. URL https://epochai.org/blog/trends-in-machine-learning-hardware.
  82. Training Compute-Optimal Large Language Models, March 2022. URL http://arxiv.org/abs/2203.15556. arXiv:2203.15556 [cs].
  83. Multitask Principal-Agent Analyses: Incentive Contracts, Asset Ownership, and Job Design. Journal of Law, Economics, & Organization, 7:24–52, 1991. ISSN 8756-6222. URL https://www.jstor.org/stable/764957. Publisher: Oxford University Press.
  84. Bengt Holmström. Moral Hazard and Observability. The Bell Journal of Economics, 10(1):74–91, 1979. ISSN 0361-915X. doi: 10.2307/3003320. URL https://www.jstor.org/stable/3003320. Publisher: [RAND Corporation, Wiley].
  85. AI and International Stability: Risks and Confidence-Building Measures. Technical report, Centrer for a New American Security, January 2021. URL https://www.cnas.org/publications/reports/ai-and-international-stability-risks-and-confidence-building-measures.
  86. The White House. National Cybersecurity Strategy. Technical report, March 2023.
  87. Generative AI and the Digital Commons, February 2023. URL https://cip.org/research/generative-ai-digital-commons.
  88. ISO. ISO/IEC 27001, 2022. URL https://www.iso.org/standard/27001.
  89. Co-Writing with Opinionated Language Models Affects Users’ Views. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, CHI ’23, pages 1–15, New York, NY, USA, April 2023. Association for Computing Machinery. ISBN 978-1-4503-9421-5. doi: 10.1145/3544548.3581196. URL https://dl.acm.org/doi/10.1145/3544548.3581196.
  90. Monitoring Misuse for Accountable ’Artificial Intelligence as a Service’. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, AIES ’20, pages 300–306, New York, NY, USA, February 2020. Association for Computing Machinery. ISBN 978-1-4503-7110-0. doi: 10.1145/3375627.3375873. URL https://dl.acm.org/doi/10.1145/3375627.3375873.
  91. Monitoring AI Services for Misuse. In Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society, AIES ’21, pages 597–607, New York, NY, USA, July 2021. Association for Computing Machinery. ISBN 978-1-4503-8473-5. doi: 10.1145/3461702.3462566. URL https://dl.acm.org/doi/10.1145/3461702.3462566.
  92. Theory of the firm: Managerial behavior, agency costs and ownership structure. Journal of Financial Economics, 3(4):305–360, October 1976. ISSN 0304-405X. doi: 10.1016/0304-405X(76)90026-X. URL https://www.sciencedirect.com/science/article/pii/0304405X7690026X.
  93. Scaling Laws for Neural Language Models, January 2020. URL http://arxiv.org/abs/2001.08361. arXiv:2001.08361 [cs, stat].
  94. Algorithmic Recourse: From Counterfactual Explanations to Interventions. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’21, pages 353–362, New York, NY, USA, 2021. Association for Computing Machinery. ISBN 978-1-4503-8309-7. doi: 10.1145/3442188.3445899. URL https://doi.org/10.1145/3442188.3445899. event-place: Virtual Event, Canada.
  95. Working With AI to Persuade: Examining a Large Language Model’s Ability to Generate Pro-Vaccination Messages. Proceedings of the ACM on Human-Computer Interaction, 7(CSCW1):116:1–116:29, April 2023. doi: 10.1145/3579592. URL https://doi.org/10.1145/3579592.
  96. Discovering agents. Artificial Intelligence, 322:103963, September 2023. ISSN 0004-3702. doi: 10.1016/j.artint.2023.103963. URL https://www.sciencedirect.com/science/article/pii/S0004370223001091.
  97. Evaluating Language-Model Agents on Realistic Autonomous Tasks, July 2023. URL https://evals.alignment.org/Evaluating_LMAs_Realistic_Tasks.pdf.
  98. Self-preferencing by platforms: A literature review. Japan and the World Economy, 66:101191, 2023. ISSN 0922-1425. doi: https://doi.org/10.1016/j.japwor.2023.101191. URL https://www.sciencedirect.com/science/article/pii/S0922142523000178.
  99. Risk assessment at AGI companies: A review of popular risk assessment techniques from other safety-critical industries, July 2023. URL http://arxiv.org/abs/2307.08823. arXiv:2307.08823 [cs].
  100. Preparing for the (Non-Existent?) Future of Work, June 2022. URL https://www.nber.org/papers/w30172.
  101. Alexander Kott. Intelligent Autonomous Agents are Key to Cyber Defense of the Future Army Networks. The Cyber Defense Review, 3(3):57–70, 2018. ISSN 2474-2120. URL https://www.jstor.org/stable/26554997. Publisher: Army Cyber Institute.
  102. The Theory of Incentives: The Principal-Agent Model. Princeton University Press, 2002. ISBN 978-0-691-09184-6. doi: 10.2307/j.ctv7h0rwr. URL https://www.jstor.org/stable/j.ctv7h0rwr.
  103. A Survey of Text Watermarking in the Era of Large Language Models, January 2024. URL http://arxiv.org/abs/2312.07913. arXiv:2312.07913 [cs].
  104. AgentBench: Evaluating LLMs as Agents, October 2023. URL http://arxiv.org/abs/2308.03688. arXiv:2308.03688 [cs].
  105. Brian Melley. Judges in England and Wales Given Cautious Approval to Use AI. TIME, January 2024. URL https://time.com/6553030/ai-legal-opinions-england-wales/.
  106. GAIA: a benchmark for General AI Assistants, November 2023. URL https://arxiv.org/abs/2311.12983v1.
  107. Engagement, User Satisfaction, and the Amplification of Divisive Content on Social Media, December 2023. URL http://arxiv.org/abs/2305.16941. arXiv:2305.16941 [cs].
  108. Continuous Auditing of Artificial Intelligence: a Conceptualization and Assessment of Tools and Frameworks. Digital Society, 1(3):21, October 2022. ISSN 2731-4669. doi: 10.1007/s44206-022-00022-2. URL https://doi.org/10.1007/s44206-022-00022-2.
  109. Model Cards for Model Reporting. In Proceedings of the Conference on Fairness, Accountability, and Transparency, pages 220–229, January 2019. doi: 10.1145/3287560.3287596. URL http://arxiv.org/abs/1810.03993. arXiv:1810.03993 [cs].
  110. Levels of AGI: Operationalizing Progress on the Path to AGI, November 2023. URL http://arxiv.org/abs/2311.02462. arXiv:2311.02462 [cs].
  111. mrbullwinkle and eric urban. Azure OpenAI Service abuse monitoring - Azure OpenAI, July 2023. URL https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/abuse-monitoring.
  112. Auditing Large Language Models: A Three-Layered Approach, February 2023. URL https://papers.ssrn.com/abstract=4361607.
  113. Testing Language Model Agents Safely in the Wild, December 2023. URL http://arxiv.org/abs/2311.10538. arXiv:2311.10538 [cs].
  114. The alignment problem from a deep learning perspective, August 2022. URL https://arxiv.org/abs/2209.00626v5.
  115. Helen Nissenbaum. Accountability in a computerized society. Science and Engineering Ethics, 2(1):25–42, March 1996. ISSN 1471-5546. doi: 10.1007/BF02639315. URL https://doi.org/10.1007/BF02639315.
  116. Deployment Corrections: An incident response framework for frontier AI models, September 2023. URL http://arxiv.org/abs/2310.00328. arXiv:2310.00328 [cs].
  117. U.S. Department of Human and Health Services. Guidance Regarding Methods for De-identification of Protected Health Information in Accordance with the Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule, October 2022. URL https://www.hhs.gov/hipaa/for-professionals/privacy/special-topics/de-identification/index.html. Last Modified: 2023-02-22T10:17:21-0500.
  118. Division of Trading and Markets. Guide to Broker-Dealer Registration. Technical report, U.S. Securities and Exchange Commission, April 2008.
  119. OpenAI. ChatGPT plugins, 2023a. URL https://openai.com/blog/chatgpt-plugins.
  120. OpenAI. GPT-4 Technical Report, March 2023b. URL http://arxiv.org/abs/2303.08774. arXiv:2303.08774 [cs].
  121. OpenAI. Introducing GPTs, November 2023c. URL https://openai.com/blog/introducing-gpts.
  122. OpenAI. Introducing the GPT Store, January 2024. URL https://openai.com/blog/introducing-the-gpt-store.
  123. Aviv Ovadya. ’Generative CI’ through Collective Response Systems, February 2023. URL http://arxiv.org/abs/2302.00672. arXiv:2302.00672 [cs].
  124. Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark, April 2023. arXiv: 2304.03279 [cs] Issue: arXiv:2304.03279.
  125. Generative Agents: Interactive Simulacra of Human Behavior. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology, UIST ’23, pages 1–22, New York, NY, USA, October 2023. Association for Computing Machinery. ISBN 9798400701320. doi: 10.1145/3586183.3606763. URL https://dl.acm.org/doi/10.1145/3586183.3606763.
  126. Mitigating bias in algorithmic hiring: evaluating claims and practices. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, FAT* ’20, pages 469–481, New York, NY, USA, January 2020. Association for Computing Machinery. ISBN 978-1-4503-6936-7. doi: 10.1145/3351095.3372828. URL https://doi.org/10.1145/3351095.3372828.
  127. The Fallacy of AI Functionality. In 2022 ACM Conference on Fairness, Accountability, and Transparency, pages 959–972, Seoul Republic of Korea, June 2022a. ACM. ISBN 978-1-4503-9352-2. doi: 10.1145/3531146.3533158. URL https://dl.acm.org/doi/10.1145/3531146.3533158.
  128. Outsider Oversight: Designing a Third Party Audit Ecosystem for AI Governance. In Proceedings of the 2022 AAAI/ACM Conference on AI, Ethics, and Society, AIES ’22, pages 557–571, New York, NY, USA, July 2022b. Association for Computing Machinery. ISBN 978-1-4503-9247-1. doi: 10.1145/3514094.3534181. URL https://dl.acm.org/doi/10.1145/3514094.3534181.
  129. A Generalist Agent. Transactions on Machine Learning Research, 2022. URL https://openreview.net/forum?id=1ikK0kHjvj.
  130. Toran Bruce Richards. Auto-GPT: An Autonomous GPT-4 Experiment, April 2023. URL https://github.com/Significant-Gravitas/Auto-GPT. original-date: 2023-03-16T09:21:07Z.
  131. Felix Richter. Infographic: Amazon Maintains Lead in the Cloud Market, August 2023. URL https://www.statista.com/chart/18819/worldwide-market-share-of-leading-cloud-infrastructure-service-providers.
  132. From Plane Crashes to Algorithmic Harm: Applicability of Safety Engineering Frameworks for Responsible ML. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, CHI ’23, pages 1–18, New York, NY, USA, April 2023. Association for Computing Machinery. ISBN 978-1-4503-9421-5. doi: 10.1145/3544548.3581407. URL https://dl.acm.org/doi/10.1145/3544548.3581407.
  133. Preventing Language Models From Hiding Their Reasoning, October 2023. URL http://arxiv.org/abs/2310.18512. arXiv:2310.18512 [cs].
  134. Identifying the Risks of LM Agents with an LM-Emulated Sandbox, September 2023. URL http://arxiv.org/abs/2309.15817. arXiv:2309.15817 [cs].
  135. Artificial Intelligence: A Modern Approach. 4 edition, 2021.
  136. Differential technology development: An innovation governance consideration for navigating technology risks, September 2022. URL https://papers.ssrn.com/abstract=4213670.
  137. Jonas B. Sandbrink. Artificial intelligence and biological misuse: Differentiating risks of language models and biological design tools, June 2023. URL http://arxiv.org/abs/2306.13952. arXiv:2306.13952 [cs].
  138. Paul Scharre. Debunking the AI Arms Race Theory (Summer 2021). 2021. URL https://hdl.handle.net/2152/87035. Publisher: Texas National Security Review.
  139. Thomas C. Schelling. Micromotives and Macrobehavior. W. W. Norton & Company, October 1978. ISBN 0-393-09009-4.
  140. Toolformer: Language Models Can Teach Themselves to Use Tools, February 2023. URL http://arxiv.org/abs/2302.04761. arXiv:2302.04761 [cs].
  141. Democratising AI: Multiple Meanings, Goals, and Methods. In Proceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society, AIES ’23, pages 715–722, New York, NY, USA, August 2023. Association for Computing Machinery. ISBN 9798400702310. doi: 10.1145/3600211.3604693. URL https://dl.acm.org/doi/10.1145/3600211.3604693.
  142. A Causal Framework for AI Regulation and Auditing, November 2023. URL https://static1.squarespace.com/static/6461e2a5c6399341bcfc84a5/t/654bc268049d687cecac24d8/1699463818729/auditing_framework_web.pdf.
  143. Practices for Governing Agentic AI Systems, 2023.
  144. Is Instagram Causing Poorer Mental Health Among Teen Girls? - Is Instagram Causing Poorer Mental Health Among Teen Girls? - United States Joint Economic Committee. Technical report, United States Congress Joint Economic Committee, December 2021. URL https://www.jec.senate.gov/public/index.cfm/republicans/2021/12/is-instagram-causing-poorer-mental-health-among-teen-girls.
  145. Sociotechnical Harms of Algorithmic Systems: Scoping a Taxonomy for Harm Reduction. In Proceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society, AIES ’23, pages 723–741, New York, NY, USA, August 2023. Association for Computing Machinery. ISBN 9798400702310. doi: 10.1145/3600211.3604673. URL https://dl.acm.org/doi/10.1145/3600211.3604673.
  146. HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face, December 2023. URL http://arxiv.org/abs/2303.17580. arXiv:2303.17580 [cs].
  147. A Human-Centric Perspective on Model Monitoring. Proceedings of the AAAI Conference on Human Computation and Crowdsourcing, 10:173–183, October 2022. ISSN 2769-1349. doi: 10.1609/hcomp.v10i1.21997. URL https://ojs.aaai.org/index.php/HCOMP/article/view/21997.
  148. Model evaluation for extreme risks, May 2023. URL https://arxiv.org/abs/2305.15324v2.
  149. An introduction to complex systems science and its applications. Complexity, 2020:1–16, 2020. Publisher: Hindawi Limited.
  150. Corrigibility. In Workshops at the Twenty-Ninth AAAI Conference on Artificial Intelligence, 2015.
  151. Can large language models democratize access to dual-use biotechnology?, June 2023. URL http://arxiv.org/abs/2306.03809. arXiv:2306.03809 [cs].
  152. Irene Solaiman. The Gradient of Generative AI Release: Methods and Considerations, February 2023. URL http://arxiv.org/abs/2302.04844. arXiv:2302.04844 [cs].
  153. From Explanation to Recommendation: Ethical Standards for Algorithmic Recourse. In Proceedings of the 2022 AAAI/ACM Conference on AI, Ethics, and Society. ACM, July 2022. doi: 10.1145/3514094.3534185. URL https://doi.org/10.1145%2F3514094.3534185.
  154. Cognitive Architectures for Language Agents, September 2023. URL http://arxiv.org/abs/2309.02427. arXiv:2309.02427 [cs].
  155. Reinforcement learning: An introduction. Adaptive computation and machine learning series. The MIT Press, Cambridge, Massachusetts, second edition edition, 2018. ISBN 978-0-262-03924-6. tex.lccn: Q325.6 .R45 2018.
  156. Human-Timescale Adaptation in an Open-Ended Task Space, January 2023a. URL http://arxiv.org/abs/2301.07608. arXiv:2301.07608 [cs].
  157. Gemini: A Family of Highly Capable Multimodal Models, December 2023b. URL http://arxiv.org/abs/2312.11805. arXiv:2312.11805 [cs].
  158. Dual use of artificial-intelligence-powered drug discovery. Nature Machine Intelligence, 4(3):189–191, March 2022. ISSN 2522-5839. doi: 10.1038/s42256-022-00465-9. URL https://www.nature.com/articles/s42256-022-00465-9.
  159. Can Large Language Models Really Improve by Self-critiquing Their Own Plans?, October 2023a. URL http://arxiv.org/abs/2310.08118. arXiv:2310.08118 [cs].
  160. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change), April 2023b. URL http://arxiv.org/abs/2206.10498. arXiv:2206.10498 [cs].
  161. Rory Van Loo. Regulatory Monitors: Policing Firms in the Compliance Era. Columbia Law Review, 119(2):369, January 2019. URL https://scholarship.law.bu.edu/faculty_scholarship/265.
  162. The philosophical basis of algorithmic recourse. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency. ACM, January 2020. doi: 10.1145/3351095.3372876. URL https://doi.org/10.1145%2F3351095.3372876.
  163. Pranshu Verma. The rise of AI fake news is creating a ‘misinformation superspreader’. Washington Post, December 2023a. ISSN 0190-8286. URL https://www.washingtonpost.com/technology/2023/12/17/ai-fake-news-misinformation/.
  164. Pranshu Verma. They thought loved ones were calling for help. It was an AI scam. Washington Post, March 2023b. ISSN 0190-8286. URL https://www.washingtonpost.com/technology/2023/03/05/ai-voice-scam/.
  165. A Survey on Large Language Model based Autonomous Agents, September 2023. arXiv: 2308.11432 [cs] Issue: arXiv:2308.11432.
  166. Data Hiding with Deep Learning: A Survey Unifying Digital Watermarking and Steganography, July 2021. URL https://arxiv.org/abs/2107.09287v3.
  167. Tom Warren. Microsoft’s new Copilot Pro brings AI-powered Office features to the rest of us. The Verge, January 2024. URL https://www.theverge.com/2024/1/15/24038711/microsoft-copilot-pro-office-ai-apps.
  168. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models, January 2023. URL http://arxiv.org/abs/2201.11903. arXiv:2201.11903 [cs].
  169. Sociotechnical Safety Evaluation of Generative AI Systems, October 2023. URL http://arxiv.org/abs/2310.11986. arXiv:2310.11986 [cs].
  170. Open (for Business): Big Tech, Concentrated Power, and the Political Economy of Open AI, August 2023. URL https://ssrn.com/abstract=4543807.
  171. Simon Willison. Prompt injection: What’s the worst that can happen?, April 2023. URL https://simonwillison.net/2023/Apr/14/worst-that-can-happen/.
  172. AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation Framework. 2023. _eprint: 2308.08155.
  173. Language Agents with Reinforcement Learning for Strategic Play in the Werewolf Game, October 2023. URL https://arxiv.org/abs/2310.18940v2.
  174. Watermarks in the Sand: Impossibility of Strong Watermarking for Generative Models, November 2023. URL http://arxiv.org/abs/2311.04378. arXiv:2311.04378 [cs].
  175. Transparency, Governance and Regulation of Algorithmic Tools Deployed in the Criminal Justice System: a UK Case Study. In Proceedings of the 2022 AAAI/ACM Conference on AI, Ethics, and Society. ACM, July 2022. doi: 10.1145/3514094.3534200. URL https://doi.org/10.1145%2F3514094.3534200.
  176. Universal and Transferable Adversarial Attacks on Aligned Language Models, July 2023. URL https://arxiv.org/abs/2307.15043v2.
  177. Thinking about risks from AI: Accidents, misuse and structure. Lawfare. February, 11:2019, 2019.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (12)
  1. Alan Chan (23 papers)
  2. Carson Ezell (11 papers)
  3. Max Kaufmann (5 papers)
  4. Kevin Wei (11 papers)
  5. Lewis Hammond (18 papers)
  6. Herbie Bradley (10 papers)
  7. Emma Bluemke (10 papers)
  8. Nitarshan Rajkumar (11 papers)
  9. David Krueger (75 papers)
  10. Noam Kolt (12 papers)
  11. Lennart Heim (21 papers)
  12. Markus Anderljung (29 papers)
Citations (9)

Summary

Visibility into AI Agents: An Academic Overview

The paper "Visibility into AI Agents" addresses the crucial topic of understanding and mitigating the risks associated with the increasing deployment of AI agents in various sectors. The authors focus on three principal measures to enhance visibility: agent identifiers, real-time monitoring, and activity logging. This structured exploration provides a framework for understanding the potential impacts of AI agents and proposes mechanisms to facilitate effective governance and oversight.

Summary of Methods

  1. Agent Identifiers: The paper discusses a method to label AI agents during interactions to discern their involvement. These identifiers could range from basic watermarks on outputs to more sophisticated headers for API calls, enabling distinction between human and AI activities. The potential inclusion of an "agent card" with additional information about the agent's underlying system, instance specifics, and associated actors could provide context and facilitate accountability.
  2. Real-Time Monitoring: This measure involves the continuous oversight of AI agent activities to flag and filter problematic behaviors in real-time. Automated systems are proposed given the speed and volume of agent operations. The authors suggest that real-time monitoring can detect violations of predefined rules, such as the leakage of sensitive information or exceeding agreed thresholds for resource usage.
  3. Activity Logs: The maintenance of detailed logs of AI agents' inputs and outputs aids in post-incident analysis and forensics. Logging is emphasized as a vital tool for retrospective audits and understanding delayed or diffuse impacts. The granularity of logged data can vary, but detailed logs are necessary for high-risk applications.

Practical and Theoretical Implications

The measures proposed have both practical and theoretical implications. Practically, they enable regulatory bodies to better monitor and potentially intervene in AI deployments across commercial, scientific, governmental, and personal spheres. By identifying AI interactions, stakeholders can gain insight into the breadth of agent usage, facilitating timely interventions when necessary.

From a theoretical perspective, these measures invite further exploration into the socio-technical systems that underpin AI deployments. They challenge researchers to consider how visibility measures can support or hinder the development of robust governance structures. Moreover, the paper opens discussions on the balance between extensive monitoring and the ethical implications concerning privacy and concentration of power.

Future Directions and Challenges

A significant challenge outlined in the paper is the balance between acquiring detailed information for effective oversight and preserving privacy. The extended implementation of visibility measures to decentralized deployments, particularly with the support of compute providers and tool/service providers, raises complex ethical considerations. Furthermore, while voluntary adoption of visibility standards is suggested, mandating compliance remains contentious.

The future of AI governance will likely involve more comprehensive research into decentralized data systems, privacy-preserving monitoring technologies, and mechanisms to ensure equitable power distribution among stakeholders. Understanding these dimensions will be crucial for developing frameworks that not only address the risks highlighted but also promote trust in AI deployments.

The deployment of AI agents poses unique challenges that necessitate innovative governance strategies. The authors of this paper provide a foundational exploration of visibility as a pivotal component in managing the risks associated with AI agents. This research represents an important step towards recognizing the need for transparency and accountability in AI systems to ensure their safe and responsible integration into society.

Youtube Logo Streamline Icon: https://streamlinehq.com

HackerNews

  1. Visibility into AI Agents (5 points, 0 comments)
  2. Visibility into AI Agents (2 points, 0 comments)