Papers
Topics
Authors
Recent
Search
2000 character limit reached

NewsBench: A Systematic Evaluation Framework for Assessing Editorial Capabilities of Large Language Models in Chinese Journalism

Published 29 Feb 2024 in cs.CL and cs.AI | (2403.00862v4)

Abstract: We present NewsBench, a novel evaluation framework to systematically assess the capabilities of LLMs for editorial capabilities in Chinese journalism. Our constructed benchmark dataset is focused on four facets of writing proficiency and six facets of safety adherence, and it comprises manually and carefully designed 1,267 test samples in the types of multiple choice questions and short answer questions for five editorial tasks in 24 news domains. To measure performances, we propose different GPT-4 based automatic evaluation protocols to assess LLM generations for short answer questions in terms of writing proficiency and safety adherence, and both are validated by the high correlations with human evaluations. Based on the systematic evaluation framework, we conduct a comprehensive analysis of ten popular LLMs which can handle Chinese. The experimental results highlight GPT-4 and ERNIE Bot as top performers, yet reveal a relative deficiency in journalistic safety adherence in creative writing tasks. Our findings also underscore the need for enhanced ethical guidance in machine-generated journalistic content, marking a step forward in aligning LLMs with journalistic standards and safety considerations.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (20)
  1. Amy Ross Arguedas and Felix M Simon. 2023. Automating democracy: Generative ai, journalism, and the future of democracy.
  2. Jose Berengueres and Marybeth Sandell. 2023. Applying standards to advance upstream & downstream ethics in large language models. arXiv preprint arXiv:2306.03503.
  3. Artificial intelligence and journalism. Journalism & mass communication quarterly, 96(3):673–695.
  4. Badprompt: Backdoor attacks on continuous prompts. Advances in Neural Information Processing Systems, 35:37068–37080.
  5. Caitlin Chin. 2023. Navigating the risks of artificial intelligence on the digital news landscape.
  6. Towards guidelines for guidelines on the use of generative ai in newsrooms.
  7. Risk taxonomy, mitigation, and assessment benchmarks of large language model systems. arXiv preprint arXiv:2401.05778.
  8. Leveraging professional ethics for responsible ai: Applying ai techniques to journalism. Communications of the ACM.
  9. Data science, machine learning and big data in digital journalism: A survey of state-of-the-art, challenges and opportunities. Expert Systems with Applications, page 119795.
  10. Gptscore: Evaluate as you desire. arXiv preprint arXiv:2302.04166.
  11. Generative ai and chatgpt: Applications, challenges, and ai-human collaboration.
  12. Writing for journalists. Routledge.
  13. Generative ai & journalism: A rapid risk-based review.
  14. OpenAI. 2024. Openai moderation api. https://platform.openai.com/docs/guides/moderation. Accessed: 2024-02-05.
  15. Beyond the imitation game: Quantifying and extrapolating the capabilities of language models. arXiv preprint arXiv:2206.04615.
  16. On the safety of conversational models: Taxonomy, dataset, and benchmark. arXiv preprint arXiv:2110.08466.
  17. Safety assessment of chinese large language models. arXiv preprint arXiv:2304.10436.
  18. Cvalues: Measuring the values of chinese large language models from safety to responsibility. arXiv preprint arXiv:2307.09705.
  19. DI Zagorulko. 2023. Chatgpt in newsrooms: Adherence of ai-generated content to journalism standards and prospects for its implementation in digital media. Vcheni zapysky TNU imeni VI Vernadskoho, 34(73):1.
  20. Safetybench: Evaluating the safety of large language models with multiple choice questions. arXiv preprint arXiv:2309.07045.
Citations (4)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.