Standardization Trends on Safety and Trustworthiness Technology for Advanced AI (2410.22151v1)
Abstract: AI has rapidly evolved over the past decade and has advanced in areas such as language comprehension, image and video recognition, programming, and scientific reasoning. Recent AI technologies based on LLMs and foundation models are approaching or surpassing artificial general intelligence. These systems demonstrate superior performance in complex problem solving, natural language processing, and multi-domain tasks, and can potentially transform fields such as science, industry, healthcare, and education. However, these advancements have raised concerns regarding the safety and trustworthiness of advanced AI, including risks related to uncontrollability, ethical conflicts, long-term socioeconomic impacts, and safety assurance. Efforts are being expended to develop internationally agreed-upon standards to ensure the safety and reliability of AI. This study analyzes international trends in safety and trustworthiness standardization for advanced AI, identifies key areas for standardization, proposes future directions and strategies, and draws policy implications. The goal is to support the safe and trustworthy development of advanced AI and enhance international competitiveness through effective standardization.
- Uk Government. Frontier ai: capabilities and risks. https://www.gov.uk/government/publications/frontier-ai-capabilities-and-risks-discussion-paper, 2023. discussion paper.
- Bengio Y and et al. Managing extreme ai risks amid rapid progress. Sci, 384:842–845, 2023.
- Rishi Bommasani and et al. On the opportunities and risks of foundation models. https://arxiv.org/abs/2108.07258, 2022. arXiv preprint.
- Uk Government. Future risks of frontier ai. https://assets.publishing.service.gov.uk/media/653bc393d10f3500139a6ac5/future-risks-of-frontier-ai-annex-a.pdf, 2023.
- Uk Government. International scientific report on the safety of advanced ai. https://www.gov.uk/government/publications/international-scientific-report-on-the-safety-of-advanced-ai, 2024.
- Exploring clusters of research in three areas of ai safety. https://cset.georgetown.edu/publication/exploring-clusters-of-research-in-three-areas-of-ai-safety/, 02 2022.
- Ai Safety and Summit. The bletchley declaration by countries attending the ai safety summit. https://www.gov.uk/government/publications/ai-safety-summit-2023-the-bletchley-declaration/the-bletchley-declaration-by-countries-attending-the-ai-safety-summit-1-2-november-2023, 11 2023.
- Laura Weidinger and et al. Taxonomy of risks posed by language models. Proc. 2022 ACM Conf. Fairness, Accountability, Transparency, 2022.
- Irene Solaiman and et al. Evaluating the social impact of generative ai systems in systems and society. https://arxiv.org/abs/2306.05949, 2024. arXiv preprint.
- Lei Wang and et al. A survey on large language model based autonomous agents. https://arxiv.org/abs/2308.11432, 2024. arXiv preprint.
- Megan Kinniment and et al. Evaluating language-model agents on realistic autonomous tasks. https://arxiv.org/abs/2312.11671, 2024. arXiv preprint.
- Markus Anderljung and et al. Frontier ai regulation: Managing emerging risks to public safety. https://arxiv.org/abs/2307.03718, 2023. arXiv preprint.
- Resolution A/RES/78/265. Seizing the opportunities of safe, secure and trustworthy artificial intelligence systems for sustainable development, 2024. UN General Assembly on 21 March 2024.
- Jonghong Jeon. Trustworthiness characteristics matrix. https://github.com/hollobit/WG3_TCM. GitHub repository.
- Yang Liu and et al. Trustworthy llms: a survey and guideline for evaluating large language models’ alignment. https://arxiv.org/abs/2308.05374, 2024. arXiv preprint.
- Lichao Sun and et al. Trustllm: Trustworthiness in large language models. https://arxiv.org/abs/2401.05561, 2024. arXiv preprint.
- Stanford Crfm. The foundation model transparency index. https://crfm.stanford.edu/fmti/May-2024/index.html. Technical report.
- Foundation model transparency reports. https://arxiv.org/pdf/2402.16268, 2024. arXiv preprint.
- Dan Hendrycks and et al. An overview of catastrophic ai risks. https://arxiv.org/abs/2306.12001, 2023. arXiv preprint.
- Laura Weidinger and et al. Holistic safety and responsibility evaluations of advanced ai models. https://arxiv.org/abs/2404.14068, 2024. arXiv preprint.
- NIST. Artificial intelligence risk management framework (ai rmf 1.0). https://www.nist.gov/itl/ai-risk-management-framework. NIST AI 100-1.
- Gavin Abercrombie and et al. A collaborative, human-centred taxonomy of ai, algorithmic, and automation harms. https://arxiv.org/pdf/2407.01294, 2024. arXiv preprint.
- Yi Zeng and et al. Ai risk categorization decoded (air 2024): From government regulations to corporate policies. https://arxiv.org/pdf/2406.17864, 2024. arXiv preprint.
- OECD. Defining ai incidents and related terms. https://www.oecd.org/en/publications/2024/05/defining-ai-incidents-and-related-terms_88d089ec.html, 2024. Working paper.
- Yi Dong and et al. Safeguarding large language models: A survey. https://arxiv.org/abs/2406.02622, 2024. arXiv preprint.
- Boming Xia and et al. An ai system evaluation framework for advancing ai safety: Terminology, taxonomy, lifecycle mapping. https://arxiv.org/abs/2404.05388, 2024. arXiv preprint.
- Meredith Ringel Morris and et al. Levels of agi for operationalizing progress on the path to agi. https://arxiv.org/pdf/2311.02462, 2023. arXiv preprint.
- Phuong M. Evaluating frontier models for dangerous capabilities. https://arxiv.org/abs/2403.13793, 2024. arXiv preprint.
- Ziwei Ji and et al. Survey of hallucination in natural language generation. https://arxiv.org/abs/2202.03629, 2022. arXiv preprint.
- Peter S. Park and et al. Ai deception: A survey of examples, risks, and potential solutions. Patterns, 5, 2023.
- Xiaowei Huang and et al. A survey of safety and trustworthiness of large language models through the lens of verification and validation. https://arxiv.org/abs/2305.11391, 2023. arXiv preprint.
- Shervin Minaee and et al. Large language models: A survey. https://arxiv.org/pdf/2402.06196, 2024. arXiv preprint.
- Junjie Wang and et al. Software testing with large language models: Survey, landscape, and vision. https://arxiv.org/pdf/2307.07221, 2023. arXiv preprint.
- Xinyi Hou and et al. Large language models for software engineering: A systematic literature review. https://arxiv.org/abs/2308.10620, 2023. arXiv preprint.
- Yupeng Chang and et al. A survey on evaluation of large language models. https://arxiv.org/pdf/2307.03109, 2023. arXiv preprint.
- Zishan Guo and et al. Evaluating large language models: A comprehensive survey. https://arxiv.org/pdf/2310.19736, 2023. arXiv preprint.
- Sander Schulhoff and et al. The prompt report: A systematic survey of prompting techniques. https://arxiv.org/pdf/2406.06608, 2024. arXiv preprint.
- Xiang Yue and et al. Mmmu: A massive multi-discipline multimodal understanding and reasoning benchmark for expert agi. https://arxiv.org/abs/2311.16502, 2024. arXiv preprint.
- Kaijie Zhu and et al. Promptrobust: Towards evaluating the robustness of large language models on adversarial prompts. https://arxiv.org/abs/2306.04528, 2024. arXiv preprint.
- IECÂ 63521 ED1. Machine learning-enabled medical device -performance evaluation process. https://www.iec.ch/dyn/www/f?p=103:38:411515690011080::::FSP_ORG_ID,FSP_APEX_PAGE,FSP_PROJECT_ID:1245,23,107066. IEC TC62 Committee Draft document.
- Hong Wang and et al. A survey on an emerging safety challenge for autonomous vehicles: Safety of the intended functionality. Eng, 33, 2024.
- Yufei Wang and et al. Aligning large language models with human: A survey. https://arxiv.org/abs/2307.12966, 2023. arXiv preprint.
- Tianhao Shen and et al. Large language model alignment: A survey. https://arxiv.org/abs/2309.15025, 2023. arXiv preprint.
- Jiaming Ji and et al. Ai alignment: A comprehensive survey. https://arxiv.org/abs/2310.19852, 2023. arXiv preprint.
- Zhichao Wang and et al. A comprehensive survey of llm alignment techniques. https://arxiv.org/abs/2407.16216, 2024. arXiv preprint.
- Vanshika Vats and et al. A survey on human-ai teaming with large pre-trained models. https://arxiv.org/abs/2403.04931, 2024. arXiv preprint.
- Univ Stanford. 2024 ai index report. https://aiindex.stanford.edu/report/, 2024.
- Introducing the frontier safety framework. https://deepmind.google/discover/blog/introducing-the-frontier-safety-framework/, 2024. Deepmind.
- Md Meftahul Ferdaus and et al. Towards trustworthy ai: A review of ethical and robust large language models. https://arxiv.org/abs/2407.13934, 2024. arXiv preprint.
- Responsible reporting for frontier ai development. https://arxiv.org/abs/2404.02675, 2024. arXiv preprint.
- N.D. RodrÃguez and et al. Connecting the dots in trustworthy artificial intelligence: From ai principles, ethics, and key requirements to responsible ai systems and regulation. Inf. Fusion, 99, 2023.
- Applicability and implications of llm in the medical field. https://www.maillink.co.kr/agency/html/20231221/REPORT1.pdf, 2023. Issue Report of the Korean Medical Information Society.
- Epoch AI. Notable ai models. https://epochai.org/data/notable-ai-models, 2024. Technical Report.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.