Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Principles for Evaluation of AI/ML Model Performance and Robustness (2107.02868v1)

Published 6 Jul 2021 in cs.LG and stat.ML

Abstract: The Department of Defense (DoD) has significantly increased its investment in the design, evaluation, and deployment of Artificial Intelligence and Machine Learning (AI/ML) capabilities to address national security needs. While there are numerous AI/ML successes in the academic and commercial sectors, many of these systems have also been shown to be brittle and nonrobust. In a complex and ever-changing national security environment, it is vital that the DoD establish a sound and methodical process to evaluate the performance and robustness of AI/ML models before these new capabilities are deployed to the field. This paper reviews the AI/ML development process, highlights common best practices for AI/ML model evaluation, and makes recommendations to DoD evaluators to ensure the deployment of robust AI/ML capabilities for national security needs.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Olivia Brown (7 papers)
  2. Andrew Curtis (24 papers)
  3. Justin Goodwin (8 papers)
Citations (5)

Summary

We haven't generated a summary for this paper yet.