• OpenAI's GPT-4 has reportedly scored high on professional licensing exams, raising questions about its impact on professionals like lawyers.
  • However, concerns arise about the model's training data contamination and the validity of using human-designed tests to evaluate bots.

Key terms:

  • Training data contamination: The issue where AI models may memorize solutions from their training set, possibly affecting their test results
  • Codeforces: A website hosting coding competitions, used by OpenAI to evaluate GPT-4's coding abilities
  • Construct validity: The degree to which a test measures what it claims to measure, which is questioned when applying human-designed tests to AI models
  • Real-world tasks: The actual tasks professionals perform in their jobs, which should be the focus of AI evaluation instead of standardized tests


ChatGPT OpenAI GPT-4 Tools GPT-3 AI comparison Memorization Construct Validity AI evaluation Codeforces