Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
194 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Syntactic Variation Across the Grammar: Modelling a Complex Adaptive System (2309.11869v1)

Published 21 Sep 2023 in cs.CL

Abstract: While language is a complex adaptive system, most work on syntactic variation observes a few individual constructions in isolation from the rest of the grammar. This means that the grammar, a network which connects thousands of structures at different levels of abstraction, is reduced to a few disconnected variables. This paper quantifies the impact of such reductions by systematically modelling dialectal variation across 49 local populations of English speakers in 16 countries. We perform dialect classification with both an entire grammar as well as with isolated nodes within the grammar in order to characterize the syntactic differences between these dialects. The results show, first, that many individual nodes within the grammar are subject to variation but, in isolation, none perform as well as the grammar as a whole. This indicates that an important part of syntactic variation consists of interactions between different parts of the grammar. Second, the results show that the similarity between dialects depends heavily on the sub-set of the grammar being observed: for example, New Zealand English could be more similar to Australian English in phrasal verbs but at the same time more similar to UK English in dative phrases.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (52)
  1. Barbaresi, A. (2018). Computationally efficient discrimination between language varieties with large feature vectors and regularized classifiers. Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects , 164–171
  2. Language Is a Complex Adaptive System: Position Paper. Language Learning 59, 1–26
  3. A Character-level Convolutional Neural Network for Distinguishing Similar Languages and Dialects. In Proceedings of the Third Workshop on NLP for Similar Languages, Varieties and Dialects (Association for Computational Linguistics), 145–152
  4. Buchstaller, I. (2008). The localization of global linguistic variants. English World-Wide. A Journal of Varieties of English 29, 15–44. 10.1075/eww.29.1.03buc
  5. Bybee, J. (2007). Frequency of Use and the Organization of Language (Oxford University Press)
  6. Third person present tense markers in some varieties of English. English World-Wide 38, 77–103
  7. Hierarchical density estimates for data clustering, visualization, and outlier detection. ACM Transactions on Knowledge Discovery from Data 10. https://doi.org/10.1145/2733381
  8. Density-based clustering based on hierarchical density estimates. In Advances in Knowledge Discovery and Data Mining (Springer Berlin Heidelberg). 160–172. 10.1007/978-3-642-37456-2_14
  9. Common ground across globalized english varieties: A multivariate exploration of mental predicates in world englishes. Corpus Linguistics and Linguistic Theory 16, 1–28. doi:10.1515/cllt-2016-0052
  10. Dialectometric analysis of language variation in Twitter. In Proceedings of the Fourth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial) (Valencia, Spain: Association for Computational Linguistics), vol. 4, 16–25. 10.18653/v1/W17-1202
  11. Modelling language acquisition through syntactico-semantic pattern finding. In Findings of the Association for Computational Linguistics: EACL 2023 (Dubrovnik, Croatia: Association for Computational Linguistics), 1347–1357
  12. Dunn, J. (2017). Computational learning of construction grammars. Language & Cognition 9, 254–292
  13. Dunn, J. (2018a). Finding variants for construction-based dialectometry: A corpus-based approach to regional cxgs. Cognitive Linguistics 29, 275–311
  14. Dunn, J. (2018b). Modeling the Complexity and Descriptive Adequacy of Construction Grammars. In Proceedings of the Society for Computation in Linguistics , 81–90
  15. Dunn, J. (2019a). Frequency vs. Association for Constraint Selection in Usage-Based Construction Grammar. In Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics (Association for Computational Linguistics)
  16. Dunn, J. (2019b). Global Syntactic Variation in Seven Languages: Toward a Computational Dialectology. Frontiers in Artificial Intelligence 2, 15. 10.3389/frai.2019.00015
  17. Dunn, J. (2019c). Modeling Global Syntactic Variation in English Using Dialect Classification. In Proceedings of the Sixth Workshop on NLP for Similar Languages, Varieties and Dialects (Ann Arbor, Michigan: Association for Computational Linguistics), 42–53. 10.18653/v1/W19-1405
  18. Dunn, J. (2020). Mapping languages: the Corpus of Global Language Use. Language Resources and Evaluation 54, 999–1018. 10.1007/s10579-020-09489-2
  19. Dunn, J. (2022). Exposure and emergence in usage-based grammar: Computational experiments in 35 languages. Cognitive Linguistics
  20. Dunn, J. (2023a). Exploring the constructicon: Linguistic analysis of a computational CxG. In Proceedings of the First International Workshop on Construction Grammars and NLP (CxGs+NLP, GURT/SyntaxFest 2023) (Washington, D.C.: Association for Computational Linguistics), 1–11
  21. Dunn, J. (2023b). Variation and instability in dialect-based embedding spaces. In Tenth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2023) (Dubrovnik, Croatia: Association for Computational Linguistics), 67–77
  22. Dunn, J. (In Revision). Computational Construction Grammar: A Usage-Based Approach (Cambridge University Press)
  23. Language identification for austronesian languages. In Proceedings of the 13th International Conference on Language Resources and Evaluation (European Language Resources Association), 6530–6539
  24. Production vs Perception: The Role of Individuality in Usage-Based Grammar Induction. In Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics (Association for Computational Linguistics), 149–159
  25. Learned Construction Grammars Converge Across Registers Given Increased Exposure. In Conference on Natural Language Learning (Association for Computational Linguistics)
  26. Stability of syntactic dialect classification over space and time. In Proceedings of the 29th International Conference on Computational Linguistics (Gyeongju, Republic of Korea: International Committee on Computational Linguistics), 26–36
  27. A latent variable model for geographic lexical variation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (287: Association for Computational Linguistics), 221–227
  28. Diffusion of lexical change in social media. PloSOne 10, 1371
  29. Comparing two Basic Methods for Discriminating Between Similar Languages and Varieties. Proceedings of the Third Workshop on NLP for Similar Languages, Varieties and Dialects , 170–177
  30. Goldberg, A. (2006). Constructions at work: The nature of generalization in language (Oxford: Oxford University Press)
  31. Mapping out particle placement in Englishes around the world A study in comparative sociolinguistic analysis. Language Variation and Change 30, 385–412
  32. Grieve, J. (2012). A statistical analysis of regional variation in adverb position in a corpus of written Standard American English. Corpus Linguistics and Linguistic Theory 8, 39–72
  33. Grieve, J. (2016). Regional variation in written American English (Cambridge, UK: Cambridge University Press)
  34. Mapping Lexical Dialect Variation in British English Using Twitter. Frontiers in Artificial Intelligence 2, 11. 10.3389/frai.2019.00011
  35. Kachru, B. (1990). The Alchemy of English The spread, functions, and models of non-native Englishes (Urbana-Champaign: University of Illinois Press)
  36. Measuring differentiability: Unmasking pseudonymous authors. Journal of Machine Learning Research 8, 1261–1276
  37. When Simple n-gram Models Outperform Syntactic Approaches Discriminating between Dutch and Flemish. Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects , 225–244
  38. Size Matters: Digital Social Networks and Language Change. Frontiers in Artificial Intelligence 3, 1–15. 10.3389/frai.2020.00046
  39. Langacker, R. (2008). Cognitive Grammar: A Basic Introduction (Oxford: Oxford University Press)
  40. Larsson, M. M. (2023). Modelling incipient probabilistic grammar change in real time: the grammaticalisation of possessive pronouns in european spanish locative adverbial constructions. Corpus Linguistics and Linguistic Theory 19, 177–206. doi:10.1515/cllt-2021-0030
  41. The theme-recipient alternation in chinese: tracking syntactic variation across seven centuries. Corpus Linguistics and Linguistic Theory 19, 207–235. doi:10.1515/cllt-2021-0048
  42. Characterizing english variation across social media communities with bert. Transactions of the Association for Computational Linguistics 9, 538–556. 10.1162/tacl_a_00383
  43. Accelerated hierarchical density clustering. In IEEE International Conference on Data Mining Workshops (ICDMW) (IEEE), 33–42. https://doi.org/10.1109/ICDMW.2017.12
  44. The Twitter of Babel: Mapping World Languages through Microblogging Platforms. PLOSOne 10, 1371
  45. Language acquisition through intention reading and pattern finding. In Proceedings of the 29th International Conference on Computational Linguistics (Gyeongju, Republic of Korea: International Committee on Computational Linguistics), 15–25
  46. A neural model for user geolocation and lexical dialectology. ACL 2017 - 55th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers) 2, 209–216. 10.18653/v1/P17-2033
  47. Primed progressives? predicting aspectual choice in world englishes. Corpus Linguistics and Linguistic Theory 18, 599–625. doi:10.1515/cllt-2021-0012
  48. Noun phrase complexity across varieties of English: Focus on syntactic function and text type. English World-Wide 37, 58–85
  49. Pluralized non-count nouns across englishes: A corpus-linguistic approach to variety types. Corpus Linguistics and Linguistic Theory 16, 515–546. doi:10.1515/cllt-2018-0068
  50. Variation-Based Distance and Similarity Modeling: A Case Study in World Englishes. Frontiers in Artificial Intelligence 2, 23. 10.3389/frai.2019.00023
  51. Light verb variations and varieties of mandarin chinese: Comparable corpus driven approaches to grammatical variations. Corpus Linguistics and Linguistic Theory 18, 145–173. doi:10.1515/cllt-2019-0049
  52. Natural language processing for similar languages, varieties, and dialects: A survey. Natural Language Engineering 26, 595–612. 10.1017/S1351324920000492
Citations (2)

Summary

We haven't generated a summary for this paper yet.