Papers

Topics

Authors

Recent

View all

Detailed Answer

Quick Answer

Concise responses based on abstracts only

Detailed Answer

Well-researched responses based on abstracts and relevant paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses

Gemini 2.5 Flash

Gemini 2.5 Flash 89 tok/s

Gemini 2.5 Pro 49 tok/s Pro

GPT-5 Medium 29 tok/s Pro

GPT-5 High 31 tok/s Pro

GPT-4o 98 tok/s Pro

GPT OSS 120B 424 tok/s Pro

Kimi K2 164 tok/s Pro

2000 character limit reached

EnviroExam: Benchmarking Environmental Science Knowledge of Large Language Models (2405.11265v1)

Published 18 May 2024 in cs.CL and cs.AI

Abstract: In the field of environmental science, it is crucial to have robust evaluation metrics for LLMs to ensure their efficacy and accuracy. We propose EnviroExam, a comprehensive evaluation method designed to assess the knowledge of LLMs in the field of environmental science. EnviroExam is based on the curricula of top international universities, covering undergraduate, master's, and doctoral courses, and includes 936 questions across 42 core courses. By conducting 0-shot and 5-shot tests on 31 open-source LLMs, EnviroExam reveals the performance differences among these models in the domain of environmental science and provides detailed evaluation standards. The results show that 61.3% of the models passed the 5-shot tests, while 48.39% passed the 0-shot tests. By introducing the coefficient of variation as an indicator, we evaluate the performance of mainstream open-source LLMs in environmental science from multiple perspectives, providing effective criteria for selecting and fine-tuning LLMs in this field. Future research will involve constructing more domain-specific test sets using specialized environmental science textbooks to further enhance the accuracy and specificity of the evaluation.

References (20)

Collections

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Paper Prompts

Explore 10 Community Prompts

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Generate Now

EnviroExam: Benchmarking Environmental Science Knowledge of Large Language Models (2405.11265v1)

Collections

Summary

Paper Prompts

Follow-up Questions

Authors (7)

Tweets

EnviroExam: Benchmarking Environmental Science Knowledge of Large Language Models (2405.11265v1)

Collections

Summary

Paper Prompts

Follow-up Questions

Related Papers

Authors (7)

Tweets