Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
144 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Improving Language Modelling with Noise-contrastive estimation (1709.07758v1)

Published 22 Sep 2017 in cs.CL

Abstract: Neural LLMs do not scale well when the vocabulary is large. Noise-contrastive estimation (NCE) is a sampling-based method that allows for fast learning with large vocabularies. Although NCE has shown promising performance in neural machine translation, it was considered to be an unsuccessful approach for LLMling. A sufficient investigation of the hyperparameters in the NCE-based neural LLMs was also missing. In this paper, we showed that NCE can be a successful approach in neural LLMling when the hyperparameters of a neural network are tuned appropriately. We introduced the 'search-then-converge' learning rate schedule for NCE and designed a heuristic that specifies how to use this schedule. The impact of the other important hyperparameters, such as the dropout rate and the weight initialisation range, was also demonstrated. We showed that appropriate tuning of NCE-based neural LLMs outperforms the state-of-the-art single-model methods on a popular benchmark.

Citations (6)

Summary

We haven't generated a summary for this paper yet.