Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

An unsupervised learning approach to evaluate questionnaire data -- what one can learn from violations of measurement invariance (2312.06309v1)

Published 11 Dec 2023 in cs.LG and stat.AP

Abstract: In several branches of the social sciences and humanities, surveys based on standardized questionnaires are a prominent research tool. While there are a variety of ways to analyze the data, some standard procedures have become established. When those surveys want to analyze differences in the answer patterns of different groups (e.g., countries, gender, age, ...), these procedures can only be carried out in a meaningful way if there is measurement invariance, i.e., the measured construct has psychometric equivalence across groups. As recently raised as an open problem by Sauerwein et al. (2021), new evaluation methods that work in the absence of measurement invariance are needed. This paper promotes an unsupervised learning-based approach to such research data by proposing a procedure that works in three phases: data preparation, clustering of questionnaires, and measuring similarity based on the obtained clustering and the properties of each group. We generate synthetic data in three data sets, which allows us to compare our approach with the PCA approach under measurement invariance and under violated measurement invariance. As a main result, we obtain that the approach provides a natural comparison between groups and a natural description of the response patterns of the groups. Moreover, it can be safely applied to a wide variety of data sets, even in the absence of measurement invariance. Finally, this approach allows us to translate (violations of) measurement invariance into a meaningful measure of similarity.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (39)
  1. “Weighted clustering: Towards solving the user’s dilemma” In Pattern Recognition 120 Elsevier BV, 2021, pp. 108152 DOI: 10.1016/j.patcog.2021.108152
  2. M S Bartlett “Tests of significance in factor analysis” In Br. J. Stat. Psychol. 3.2 Wiley, 1950, pp. 77–85
  3. “Synthetic and Natural Noise Both Break Neural Machine Translation” In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings OpenReview.net, 2018 URL: https://openreview.net/forum?id=BJ8vJebC-
  4. “A dendrite method for cluster analysis” In Communications in Statistics - Theory and Methods 3.1 Informa UK Limited, 1974, pp. 1–27 DOI: 10.1080/03610927408827101
  5. “SMOTE: Synthetic Minority over-Sampling Technique” In J. Artif. Int. Res. 16.1 El Segundo, CA, USA: AI Access Foundation, 2002, pp. 321–357
  6. R.M. Cormack “A Review of Classification” In Journal of the Royal Statistical Society. Series A (General) 134.3 JSTOR, 1971, pp. 321 DOI: 10.2307/2344237
  7. Anna B. Costello and Jason Osborne “Best practices in exploratory factor analysis: four recommendations for getting the most from your analysis” University of Massachusetts Amherst, 2005 DOI: 10.7275/JYJ1-4868
  8. “Nearest neighbor pattern classification” In IEEE Transactions on Information Theory 13.1, 1967, pp. 21–27 DOI: 10.1109/TIT.1967.1053964
  9. Lee J Cronbach “Coefficient alpha and the internal structure of tests” In Psychometrika 16.3 Springer ScienceBusiness Media LLC, 1951, pp. 297–334
  10. “Robust clustering in high dimensional data using statistical depths” In BMC Bioinformatics 8.S7 Springer ScienceBusiness Media LLC, 2007 DOI: 10.1186/1471-2105-8-s7-s8
  11. Charles D Dziuban and Edwin C Shirkey “When is a correlation matrix appropriate for factor analysis? Some decision rules” In Psychol. Bull. 81.6 American Psychological Association (APA), 1974, pp. 358–361
  12. Viktoria Feucht, Paul Wilhelm Dierkes and Matthias Winfried Kleespies “The different values of nature: a comparison between university students’ perceptions of nature’s instrumental, intrinsic and relational values” In Sustainability Science 18.5 Springer ScienceBusiness Media LLC, 2023, pp. 2391–2403 DOI: 10.1007/s11625-023-01371-8
  13. “Handling missing values in multiple factor analysis” In Food Quality and Preference 30.2 Elsevier BV, 2013, pp. 77–85 DOI: 10.1016/j.foodqual.2013.04.013
  14. Florian G. Kaiser “A General Measure of Ecological Behavior1” In Journal of Applied Social Psychology 28.5 Wiley, 1998, pp. 395–422 DOI: 10.1111/j.1559-1816.1998.tb01712.x
  15. Henry F. Kaiser “A second generation little jiffy” In Psychometrika 35.4 Springer ScienceBusiness Media LLC, 1970, pp. 401–415 DOI: 10.1007/bf02291817
  16. Matthias Winfried Kleespies and Paul Wilhelm Dierkes “Impact of biological education and gender on students’ connection to nature and relational values” In PLOS ONE 15.11 Public Library of Science (PLoS), 2020, pp. e0242004 DOI: 10.1371/journal.pone.0242004
  17. “Assessing dimensions of inclusion from students’ perspective – measurement invariance across students with learning disabilities in different educational settings” In European Journal of Special Needs Education 35.3 Informa UK Limited, 2019, pp. 287–302 DOI: 10.1080/08856257.2019.1646958
  18. “Interactive Tree Of Life (iTOL): an online tool for phylogenetic tree display and annotation” In Bioinformatics 23.1 Oxford University Press (OUP), 2006, pp. 127–128 DOI: 10.1093/bioinformatics/btl529
  19. “Promoting connectedness with nature through environmental education” In Environmental Education Research 19.3 Informa UK Limited, 2013, pp. 370–384 DOI: 10.1080/13504622.2012.697545
  20. “Understanding and Enhancement of Internal Clustering Validation Measures” In IEEE Trans. Cybern. 43.3, 2013, pp. 982–994 DOI: 10.1109/TSMCB.2012.2220543
  21. F.Stephan Mayer and Cynthia McPherson Frantz “The connectedness to nature scale: A measure of individuals’ feeling in community with nature” In Journal of Environmental Psychology 24.4 Elsevier BV, 2004, pp. 503–515 DOI: 10.1016/j.jenvp.2004.10.001
  22. “Training and assessing classification rules with imbalanced data” In Data Mining and Knowledge Discovery 28.1 Springer ScienceBusiness Media LLC, 2012, pp. 92–122 DOI: 10.1007/s10618-012-0295-5
  23. Taciano L. Milfont and John Duckitt “The environmental attitudes inventory: A valid and reliable measure to assess the structure of environmental attitudes” In Journal of Environmental Psychology 30.1 Elsevier BV, 2010, pp. 80–94 DOI: 10.1016/j.jenvp.2009.09.001
  24. “Syntactic Data Augmentation Increases Robustness to Inference Heuristics” In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020 Association for Computational Linguistics, 2020, pp. 2339–2352 DOI: 10.18653/V1/2020.ACL-MAIN.212
  25. Mojgan Mohajer, Karl-Hans Englmeier and Volker J. Schmid “A comparison of Gap statistic definitions with and without logarithm function” In LMU Department of Statistics: Technical Reports 96, 2010 DOI: 10.5282/ubm/epub.11920
  26. Diane L Putnick and Marc H Bornstein “Measurement invariance conventions and reporting: The state of the art and future directions for psychological research” In Dev. Rev. 41 Elsevier BV, 2016, pp. 71–90
  27. Peter J. Rousseeuw “Silhouettes: A graphical aid to the interpretation and validation of cluster analysis” In Journal of Computational and Applied Mathematics 20 Elsevier BV, 1987, pp. 53–65 DOI: 10.1016/0377-0427(87)90125-7
  28. “New ways of dealing with lacking measurement invariance” In Accountability and Educational Improvement Cham: Springer International Publishing, 2021, pp. 63–82
  29. “Measurement invariance: Review of practice and implications” In Hum. Resour. Manag. Rev. 18.4 Elsevier BV, 2008, pp. 210–222
  30. Benedikt Szmrecsanyi “Studies in English language: Grammatical variation in British English dialects: A study in corpus-based dialectometry”, Studies in English language Cambridge, England: Cambridge University Press, 2012
  31. Kim-Pong Tam and Taciano L. Milfont “Towards cross-cultural environmental psychology: A state-of-the-art review and recommendations” In Journal of Environmental Psychology 71 Elsevier BV, 2020, pp. 101474 DOI: 10.1016/j.jenvp.2020.101474
  32. Robert Tibshirani, Guenther Walther and Trevor Hastie “Estimating the Number of Clusters in a Data Set Via the Gap Statistic” In Journal of the Royal Statistical Society Series B: Statistical Methodology 63.2 Oxford University Press (OUP), 2001, pp. 411–423 DOI: 10.1111/1467-9868.00293
  33. “Missing value estimation methods for DNA microarrays” In Bioinformatics 17.6 Oxford University Press (OUP), 2001, pp. 520–525 DOI: 10.1093/bioinformatics/17.6.520
  34. “Editorial: Measurement invariance” In Front. Psychol. 6 Frontiers Media SA, 2015, pp. 1064
  35. Joe H. Ward “Hierarchical Grouping to Optimize an Objective Function” In Journal of the American Statistical Association 58.301 Informa UK Limited, 1963, pp. 236–244 DOI: 10.1080/01621459.1963.10500845
  36. “Exploratory factor analysis and reliability analysis with missing data: A simple method for SPSS users” In The Quantitative Methods for Psychology 10.2 The Quantitative Methods for Psychology, 2014, pp. 143–152 DOI: 10.20982/tqmp.10.2.p143
  37. Alfred Wehrl “General properties of entropy” In Reviews of Modern Physics 50.2 American Physical Society (APS), 1978, pp. 221–260 DOI: 10.1103/revmodphys.50.221
  38. An Gie Yong and Sean Pearce “A Beginner’s Guide to Factor Analysis: Focusing on Exploratory Factor Analysis” In Tutorials in Quantitative Methods for Psychology 9.2 The Quantitative Methods for Psychology, 2013, pp. 79–94 DOI: 10.20982/tqmp.09.2.p079
  39. Weihang Zhang, Yuma Kinoshita and Hitoshi Kiya “Image-Enhancement-Based Data Augmentation for Improving Deep Learning in Image Classification Problem” In IEEE International Conference on Consumer Electronics - Taiwan, ICCE-TW 2020, Taoyuan, Taiwan, September 28-30, 2020 IEEE, 2020, pp. 1–2 DOI: 10.1109/ICCE-TAIWAN49838.2020.9258292
Citations (1)

Summary

We haven't generated a summary for this paper yet.