Papers

Topics

Authors

Recent

View all

Assistant

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 61 tok/s

Gemini 2.5 Pro 49 tok/s Pro

GPT-5 Medium 28 tok/s Pro

GPT-5 High 26 tok/s Pro

GPT-4o 95 tok/s Pro

Kimi K2 193 tok/s Pro

GPT OSS 120B 447 tok/s Pro

Claude Sonnet 4.5 32 tok/s Pro

2000 character limit reached

LLM-BT-Terms: Back-Translation as a Framework for Terminology Standardization and Dynamic Semantic Embedding (2506.08174v2)

Published 9 Jun 2025 in cs.CL

Abstract: The rapid expansion of English technical terminology presents a significant challenge to traditional expert-based standardization, particularly in rapidly developing areas such as artificial intelligence and quantum computing. Manual approaches face difficulties in maintaining consistent multilingual terminology. To address this, we introduce LLM-BT, a back-translation framework powered by LLMs designed to automate terminology verification and standardization through cross-lingual semantic alignment. Our key contributions include: (1) term-level consistency validation: by performing English -> intermediate language -> English back-translation, LLM-BT achieves high term consistency across different models (such as GPT-4, DeepSeek, and Grok). Case studies demonstrate over 90 percent of terms are preserved either exactly or semantically; (2) multi-path verification workflow: we develop a novel pipeline described as Retrieve -> Generate -> Verify -> Optimize, which supports both serial paths (e.g., English -> Simplified Chinese -> Traditional Chinese -> English) and parallel paths (e.g., English -> Chinese / Portuguese -> English). BLEU scores and term-level accuracy indicate strong cross-lingual robustness, with BLEU scores exceeding 0.45 and Portuguese term accuracy reaching 100 percent; (3) back-translation as semantic embedding: we reinterpret back-translation as a form of dynamic semantic embedding that uncovers latent trajectories of meaning. In contrast to static embeddings, LLM-BT offers transparent, path-based embeddings shaped by the evolution of the models. This reframing positions back-translation as an active mechanism for multilingual terminology standardization, fostering collaboration between machines and humans - machines preserve semantic integrity, while humans provide cultural interpretation.

Summary

The paper introduces a back-translation framework that achieves over 90% terminology consistency across models through rigorous Term-Level Consistency Validation.
It develops a Multi-Path Verification Workflow yielding BLEU scores above 0.45 and 100% accuracy for Portuguese terms, ensuring robust cross-lingual performance.
The approach reconceptualizes back-translation as dynamic semantic embedding, paving the way for automated multilingual terminology governance in evolving scientific fields.

Overview of LLM-BT Framework for Terminology Standardization

The paper "LLM-BT: Back-Translation as a Framework for Terminology Standardization and Dynamic Semantic Embedding" introduces a novel approach to address the challenges in terminology standardization across multilingual contexts, particularly in fast-evolving disciplines such as artificial intelligence and quantum computing. This approach centers on the utilization of LLMs within a back-translation framework, aiming to enhance terminology verification and standardization through cross-lingual semantic alignment.

Key Contributions

The paper presents three main innovations. Firstly, the Term-Level Consistency Validation demonstrates high terminology consistency across major models like GPT-4, DeepSeek, and Grok, with over 90% exact or semantic match rates in case studies. This consistency provides both linguistic and algorithmic feasibility for automated terminology verification.

Secondly, the Multi-Path Verification Workflow introduces a pipeline integrating both serial and parallel back-translation routes, which are evaluated using BLEU and term-level accuracy metrics. This multi-path approach shows strong cross-lingual robustness, evidenced by BLEU scores exceeding 0.45 and 100% accuracy for Portuguese terms.

Thirdly, the Back-Translation as Semantic Embedding conceptualizes the back-translation process as a form of dynamic semantic embedding. This reconceptualization allows intermediate translations to reveal latent trajectories in meaning construction and alignment, offering path-based embeddings shaped by model evolution.

Implications

The implications of the LLM-BT framework are significant for multilingual terminology governance. It creates foundational infrastructure for human-AI collaboration, where machines ensure semantic fidelity while human experts manage cultural and disciplinary interpretation. This collaboration paves the way for enhanced scientific discourse and terminology alignment across various languages and knowledge systems.

Experimental Insights

The experiments conducted validate the framework's effectiveness in maintaining term-level stability across various languages, including Simplified Chinese, Traditional Chinese, Japanese, and Brazilian Portuguese. Results indicate a notable difference in consistency, with Traditional Chinese often outperforming Simplified Chinese, suggesting that corpus quality and training data coverage directly impact standardization accuracy.

These findings emphasize the importance of selecting appropriate intermediate languages and leveraging high-quality corpora in LLM training to improve translation reliability and semantic alignment.

Future Directions

The research indicates potential for broader applications in terminology standardization, including automated term recommendations and multilingual knowledge graphs. Future developments could explore multimodal translation paths (e.g., integrating text with image or audio modalities) and refine techniques for emerging term discovery, particularly in fields characterized by rapid innovation.

In summary, the LLM-BT framework redefines back-translation from a passive tool into an active engine for terminology standardization, reinforcing the importance of dynamic, interpretable semantic embedding in the evolving landscape of generative AI.