An Embedded Diachronic Sense Change Model with a Case Study from Ancient Greek (2311.00541v5)
Abstract: Word meanings change over time, and word senses evolve, emerge or die out in the process. For ancient languages, where the corpora are often small and sparse, modelling such changes accurately proves challenging, and quantifying uncertainty in sense-change estimates consequently becomes important. GASC (Genre-Aware Semantic Change) and DiSC (Diachronic Sense Change) are existing generative models that have been used to analyse sense change for target words from an ancient Greek text corpus, using unsupervised learning without the help of any pre-training. These models represent the senses of a given target word such as "kosmos" (meaning decoration, order or world) as distributions over context words, and sense prevalence as a distribution over senses. The models are fitted using Markov Chain Monte Carlo (MCMC) methods to measure temporal changes in these representations. This paper introduces EDiSC, an Embedded DiSC model, which combines word embeddings with DiSC to provide superior model performance. It is shown empirically that EDiSC offers improved predictive accuracy, ground-truth recovery and uncertainty quantification, as well as better sampling efficiency and scalability properties with MCMC methods. The challenges of fitting these models are also discussed.
- Apidianaki, M. (2022). From Word Types to Tokens and Back: A Survey of Approaches to Word Meaning Representation and Interpretation. Computational Linguistics, 49(2), 465–523. ISSN 0891-2017. 10.1162/coli_a_00474. URL https://doi.org/10.1162/coli_a_00474.
- Optimal tuning of the hybrid monte carlo algorithm. Bernoulli, 19(5A), 1501–1534. 10.3150/12-BEJ414.
- Recent trends in word sense disambiguation: A survey. In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI-21. International Joint Conference on Artificial Intelligence, Inc.
- Blei, D. M. (2012). Probabilistic topic models. Communications of the ACM, 55(4), 77–84.
- Dynamic topic models. In Proceedings of the 23rd international conference on Machine learning, pages 113–120.
- Latent dirichlet allocation. Journal of machine Learning research, 3(Jan), 993–1022.
- Enriching Word Vectors with Subword Information. Transactions of the Association for Computational Linguistics, 5, 135–146. ISSN 2307-387X. 10.1162/tacl_a_00051. URL https://doi.org/10.1162/tacl_a_00051.
- The evolution of topic modeling. ACM Computing Surveys, 54(10s), 1–35.
- Davies, M. (2010). The corpus of historical american english: 400 million words, 1810-2009.
- BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics. 10.18653/v1/N19-1423. URL https://aclanthology.org/N19-1423.
- The dynamic embedded topic model. arXiv preprint arXiv:1907.05545.
- Topic Modeling in Embedding Spaces. Transactions of the Association for Computational Linguistics, 8, 439–453. ISSN 2307-387X. 10.1162/tacl_a_00325. URL https://doi.org/10.1162/tacl_a_00325.
- Hybrid Monte Carlo. Physics Letters B, 195(2), 216–222. 10.1016/0370-2693(87)91197-X.
- Time-Out: Temporal Referencing for Robust Modeling of Lexical Semantic Change. arXiv e-prints, art. arXiv:1906.01688.
- A bayesian model of diachronic meaning change. Transactions of the Association for Computational Linguistics, 4, 31–45.
- Stochastic relaxation, gibbs distributions, and the bayesian restoration of images. IEEE Transactions on pattern analysis and machine intelligence, (6), 721–741.
- Markov chain monte carlo maximum likelihood. interface foundation of north america.
- Hajek, B. (1988). Cooling schedules for optimal annealing. Mathematics of operations research, 13(2), 311–329.
- Diachronic Word Embeddings Reveal Statistical Laws of Semantic Change. arXiv e-prints, art. arXiv:1605.09096.
- The no-u-turn sampler: adaptively setting path lengths in hamiltonian monte carlo. J. Mach. Learn. Res., 15(1), 1593–1623.
- Automatic Variational Inference in Stan. arXiv e-prints, art. arXiv:1506.03431. 10.48550/arXiv.1506.03431.
- Statistically significant detection of linguistic change. In Proceedings of the 24th International Conference on World Wide Web, pages 625–635.
- Diachronic word embeddings and semantic shifts: a survey. In Proceedings of the 27th International Conference on Computational Linguistics, pages 1384–1397, Santa Fe, New Mexico, USA, August 2018. Association for Computational Linguistics. URL https://aclanthology.org/C18-1117.
- A computational approach to lexical polysemy in Ancient Greek. Digital Scholarship in the Humanities, 34(4), 893–907. ISSN 2055-7671. 10.1093/llc/fqz036. URL https://doi.org/10.1093/llc/fqz036.
- context2vec: Learning generic context embedding with bidirectional lstm. In Proceedings of the 20th SIGNLL conference on computational natural language learning, pages 51–61.
- Efficient Estimation of Word Representations in Vector Space. arXiv e-prints, art. arXiv:1301.3781.
- Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems 26, pages 3111–3119. Curran Associates, Inc.
- Linguistic regularities in continuous space word representations. In Proceedings of the 2013 conference of the north american chapter of the association for computational linguistics: Human language technologies, pages 746–751.
- That’s sick dude!: Automatic identification of word sense change across different timescales. arXiv e-prints, art. arXiv:1405.4392.
- An automatic approach to identify word sense changes in text media across timescales. Natural Language Engineering, 21(5), 773–798.
- A Survey on Contextualised Semantic Shift Detection. arXiv e-prints, art. arXiv:2304.01666. 10.48550/arXiv.2304.01666.
- A comprehensive survey on word representation models: From classical to state-of-the-art word representation language models. ACM Trans. Asian Low-Resour. Lang. Inf. Process., 20(5). ISSN 2375-4699. 10.1145/3434237. URL https://doi.org/10.1145/3434237.
- Neal, R. M. (2012). MCMC using Hamiltonian dynamics. arXiv e-prints, art. arXiv:1206.1901.
- Towards lower bounds on number of dimensions for word embeddings. In Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 31–36, Taipei, Taiwan, November 2017. Asian Federation of Natural Language Processing. URL https://aclanthology.org/I17-2006.
- Glove: Global vectors for word representation. In Empirical Methods in Natural Language Processing (EMNLP), pages 1532–1543.
- GASC: Genre-aware semantic change for Ancient Greek. In Proceedings of the 1st International Workshop on Computational Approaches to Historical Language Change, pages 56–66, Florence, Italy, August 2019. Association for Computational Linguistics. 10.18653/v1/W19-4707.
- Deep contextualized word representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 2227–2237, New Orleans, Louisiana, June 2018. Association for Computational Linguistics. 10.18653/v1/N18-1202. URL https://aclanthology.org/N18-1202.
- R Core Team (2023). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.
- Language models are unsupervised multitask learners. OpenAI blog, 1(8), 9.
- Optimal scaling of discrete approximations to langevin diffusions. Journal of the Royal Statistical Society: Series B, 60(1), 255–268. 10.1111/1467-9868.00123.
- Exponential convergence of langevin distributions and their discrete approximations. Bernoulli, 2(4), 341–363. ISSN 13507265.
- Vector space models of ancient greek word meaning, and a case study on homer. Traitement Automatique des Langues, 60(3), 63–87.
- Dynamic embeddings for language evolution. In Proceedings of the 2018 World Wide Web Conference, pages 1003–1011.
- Selivanov, D. (2022). Glove word embeddings. URL https://cran.r-project.org/web/packages/text2vec/vignettes/glove.html. Accessed 2022-06-01.
- Stan Development Team (2023a). RStan: the R interface to Stan. URL https://mc-stan.org/. R package version 2.32.3.
- Stan Development Team (2023b). Stan modeling language user’s guide and reference manual. URL https://mc-stan.org/docs/2_26/reference-manual/index.html. Stan version 2.26.1.
- Non-Reversible Parallel Tempering: A Scalable Highly Parallel MCMC Scheme. Journal of the Royal Statistical Society Series B: Statistical Methodology, 84(2), 321–350. ISSN 1369-7412. 10.1111/rssb.12464. URL https://doi.org/10.1111/rssb.12464.
- Finding individual word sense changes and their delay in appearance. In RANLP, pages 741–749.
- Survey of computational approaches to lexical semantic change detection. Computational approaches to semantic change, pages 1–91.
- Tang, X. (2018). A state-of-the-art of semantic change computation. Natural Language Engineering, 24(5).
- The diorisis ancient greek corpus. Research Data Journal for the Humanities and Social Sciences, 3(1), 55 – 65.
- Ancient greek semantic change - annotated datasets and code.
- Practical bayesian model evaluation using leave-one-out cross-validation and waic. Statistics and computing, 27(5), 1413–1432.
- Asymptotic equivalence of bayes cross validation and widely applicable information criterion in singular learning theory. Journal of machine learning research, 11(12).
- On the dimensionality of word embedding. Advances in neural information processing systems, 31.
- Semantic change detection with gaussian word embeddings. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 29, 3349–3361. 10.1109/TASLP.2021.3120645.
- Measuring diachronic sense change: New models and Monte Carlo methods for Bayesian inference. Journal of the Royal Statistical Society: Series C (Applied Statistics), 71(5), 1569–1604. https://doi.org/10.1111/rssc.12591.