Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

German's Next Language Model (2010.10906v4)

Published 21 Oct 2020 in cs.CL and cs.LG

Abstract: In this work we present the experiments which lead to the creation of our BERT and ELECTRA based German LLMs, GBERT and GELECTRA. By varying the input training data, model size, and the presence of Whole Word Masking (WWM) we were able to attain SoTA performance across a set of document classification and named entity recognition (NER) tasks for both models of base and large size. We adopt an evaluation driven approach in training these models and our results indicate that both adding more data and utilizing WWM improve model performance. By benchmarking against existing German models, we show that these models are the best German models to date. Our trained models will be made publicly available to the research community.

Overview of COLING-2020 Proceedings Instructions

This paper serves as a comprehensive guide for authors submitting their work to the COLING-2020 conference, detailing specifications that ensure uniformity and adherence to the expected format. The document not only exemplifies its own guidelines but also provides historical context by tracing its evolution through previous COLING and ACL proceedings. The guidelines cover a wide array of aspects related to manuscript preparation, including formatting standards, submission requirements, and specifics for the camera-ready versions of accepted papers.

Formatting and Manuscript Preparation

The authors emphasize the importance of uniformity in manuscript formatting to maintain consistency throughout the conference proceedings. The document outlines the necessity for using a single-column format on A4 paper, with strict margin and font guidelines. Notably, Times Roman or Times New Roman are recommended for uniform appearance across submissions.

The paper includes a detailed section on electronic manuscript preparation, strongly favoring the use of \LaTeX{} over Microsoft Word due to its efficiency in creating compliant PDF files. Adherence to the COLING 2020 style file is emphasized to minimize discrepancies.

Submission and Review Process

An essential component of the document addresses the submission process, highlighting the requirement for authors to present their work anonymously to ensure a double-blind review process. The instructions elaborate on managing citations, self-references, and the presentation of author information in both the submission and the final camera-ready paper.

Numerical Results and Specifications

Among the numerical specifications, the paper delineates font sizes for various sections of the manuscript, explicitly setting expectations to aid in maintaining a standard format. For instance, it specifies a 15 pt bold font for paper titles, with a decremental scale for other text elements, such as 11 pt for the main document text and 10 pt for the bibliography.

Licensing and Ethical Considerations

A significant emphasis is placed on ethical considerations by mandating that final papers be licensed under Creative Commons Attribution 4.0 International Licence (CC-BY). This requirement underscores the conference's commitment to open access, allowing for adaptation and redistribution of research while ensuring proper author attribution.

Implications and Future Developments

The paper's meticulous approach to formatting and submission guidelines reflects an ongoing effort within the academic community to streamline the dissemination process, facilitating a barrier-free exchange of ideas and research findings. As academic conferences continue to expand their reach and audience, the emphasis on standardized guidelines will remain fundamental in managing the ever-increasing volume of submissions.

Future conferences may consider further enhancements to the submission process, possibly incorporating automated tools for format verification and increasing support for diverse manuscript preparation platforms. Additionally, embracing more inclusive policies for non-English terms could broaden the accessibility and comprehension of research for a global audience.

Conclusion

In summary, the instructions for the COLING-2020 proceedings provide a detailed and structured approach to manuscript preparation, ensuring consistency and professionalism across all submissions. By adhering to these guidelines, authors contribute to a cohesive and accessible body of work that supports the advancement of computational linguistics research.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Branden Chan (1 paper)
  2. Stefan Schweter (7 papers)
  3. Timo Möller (4 papers)
Citations (252)