KoSBi: A Dataset for Mitigating Social Bias Risks Towards Safer Large Language Model Application (2305.17701v2)

Published 28 May 2023 in cs.CL

Abstract: LLMs learn not only natural text generation abilities but also social biases against different demographic groups from real-world data. This poses a critical risk when deploying LLM-based applications. Existing research and resources are not readily applicable in South Korea due to the differences in language and culture, both of which significantly affect the biases and targeted demographic groups. This limitation requires localized social bias datasets to ensure the safe and effective deployment of LLMs. To this end, we present KO SB I, a new social bias dataset of 34k pairs of contexts and sentences in Korean covering 72 demographic groups in 15 categories. We find that through filtering-based moderation, social biases in generated content can be reduced by 16.47%p on average for HyperCLOVA (30B and 82B), and GPT-3.

PDF Abstract

Summary of Formatting Guidelines for ACL 2023 Submissions

The paper provides comprehensive instructions for authors submitting papers to ACL 2023 utilizing \LaTeX. Primarily, the document ensures conformity with the general instructions for ACL proceedings, which are accessible online. Additionally, it highlights the importance of adhering to the ACL 2023 call for papers guidelines, supplemented by specific directives pertinent to the \LaTeX{} style files provided.

Engines and Basic Setup

For generating PDF files, the paper recommends using pdf\LaTeX{} over traditional \LaTeX{} with dvips+ps2pdf or dvipdf routes. The document also acknowledges Xe\LaTeX{} for its capability in handling non-Latin scripts. Users are urged to refer to script examples dealing with accented characters to maintain the formatting integrity. This technical specificity empowers the accurate presentation of content across a diverse array of language scripts.

Document Structure and Citation

The paper explains the preparation of the document with precision. Authors are instructed to stipulate the document class at the outset of the file and details on loading styles for review and final versions are provided. The section underlines the necessity of employing Times Roman in the preamble, although alternatives like txfonts or newtx are permissible. Furthermore, the guidelines detail the arrangement of titles and author lists, emphasizing minimalistic yet effective formatting techniques via \LaTeX{} commands.

In the field of citations, the paper rigorously discusses syntax options supported by the style files, recommending the natbib styles for citation management. This ensures a seamless integration of references, which is a critical component for academic papers, allowing for clear and consistent attribution of previous works.

Practical Application in BibTeX

Another significant section of the paper explores the utilization of \LaTeX{} and Bib\TeX{} style files, which align with the American Psychological Association format. Emphasis is placed on the inclusion of DOIs and URLs in BibTeX files to facilitate accessibility through hyperlinked references. Such meticulous guidance on reference management underscores the paper's focus on enhancing the discoverability and credibility of academic materials.

Implications for Future Use

The instructions outlined in this paper have broader implications for ensuring uniformity and standardization in the presentation of scientific research. By providing detailed instructions on document formatting, citation protocols, and reference management, the paper lays a foundation for enhancing the scholarly communication process, making it more efficient and accessible.

Future research developments based on this framework have the potential to influence how technical documents are prepared and shared across numerous AI and computer science conferences, ultimately contributing to the harmonization of academic standards globally. This contributes to the interoperability of document preparation systems and establishes a baseline that supports the consistency needed in a rapidly advancing field such as AI.

The paper epitomizes a meticulous approach to document preparation, fostering a streamlined process that is likely to facilitate the effective dissemination and exchange of academic knowledge.

PDF Markdown Bookmark Chat (Pro)

Authors (6)

Hwaran Lee (31 papers)
Seokhee Hong (8 papers)
Joonsuk Park (24 papers)
Takyoung Kim (10 papers)
Gunhee Kim (74 papers)
Jung-Woo Ha (67 papers)

Citations (24)

View on Semantic Scholar

Related Papers

Find Related Papers

GitHub

GitHub - naver-ai/korean-safety-benchmarks: Official datasets and pytorch implementation repository of SQuARe and KoSBi (ACL 2023) (242 stars)