Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
88 tokens/sec
Gemini 2.5 Pro Premium
39 tokens/sec
GPT-5 Medium
27 tokens/sec
GPT-5 High Premium
22 tokens/sec
GPT-4o
88 tokens/sec
DeepSeek R1 via Azure Premium
95 tokens/sec
GPT OSS 120B via Groq Premium
465 tokens/sec
Kimi K2 via Groq Premium
226 tokens/sec
2000 character limit reached

Professional Certification Benchmark Dataset: The First 500 Jobs For Large Language Models (2305.05377v1)

Published 7 May 2023 in cs.AI and cs.LG

Abstract: The research creates a professional certification survey to test LLMs and evaluate their employable skills. It compares the performance of two AI models, GPT-3 and Turbo-GPT3.5, on a benchmark dataset of 1149 professional certifications, emphasizing vocational readiness rather than academic performance. GPT-3 achieved a passing score (>70% correct) in 39% of the professional certifications without fine-tuning or exam preparation. The models demonstrated qualifications in various computer-related fields, such as cloud and virtualization, business analytics, cybersecurity, network setup and repair, and data analytics. Turbo-GPT3.5 scored 100% on the valuable Offensive Security Certified Professional (OSCP) exam. The models also displayed competence in other professional domains, including nursing, licensed counseling, pharmacy, and teaching. Turbo-GPT3.5 passed the Financial Industry Regulatory Authority (FINRA) Series 6 exam with a 70% grade without preparation. Interestingly, Turbo-GPT3.5 performed well on customer service tasks, suggesting potential applications in human augmentation for chatbots in call centers and routine advice services. The models also score well on sensory and experience-based tests such as wine sommelier, beer taster, emotional quotient, and body language reader. The OpenAI model improvement from Babbage to Turbo resulted in a median 60% better-graded performance in less than a few years. This progress suggests that focusing on the latest model's shortcomings could lead to a highly performant AI capable of mastering the most demanding professional certifications. We open-source the benchmark to expand the range of testable professional skills as the models improve or gain emergent capabilities.

Citations (2)
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to a collection.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube