An unsupervised learning approach to evaluate questionnaire data -- what one can learn from violations of measurement invariance (2312.06309v1)
Abstract: In several branches of the social sciences and humanities, surveys based on standardized questionnaires are a prominent research tool. While there are a variety of ways to analyze the data, some standard procedures have become established. When those surveys want to analyze differences in the answer patterns of different groups (e.g., countries, gender, age, ...), these procedures can only be carried out in a meaningful way if there is measurement invariance, i.e., the measured construct has psychometric equivalence across groups. As recently raised as an open problem by Sauerwein et al. (2021), new evaluation methods that work in the absence of measurement invariance are needed. This paper promotes an unsupervised learning-based approach to such research data by proposing a procedure that works in three phases: data preparation, clustering of questionnaires, and measuring similarity based on the obtained clustering and the properties of each group. We generate synthetic data in three data sets, which allows us to compare our approach with the PCA approach under measurement invariance and under violated measurement invariance. As a main result, we obtain that the approach provides a natural comparison between groups and a natural description of the response patterns of the groups. Moreover, it can be safely applied to a wide variety of data sets, even in the absence of measurement invariance. Finally, this approach allows us to translate (violations of) measurement invariance into a meaningful measure of similarity.
- “Weighted clustering: Towards solving the user’s dilemma” In Pattern Recognition 120 Elsevier BV, 2021, pp. 108152 DOI: 10.1016/j.patcog.2021.108152
- M S Bartlett “Tests of significance in factor analysis” In Br. J. Stat. Psychol. 3.2 Wiley, 1950, pp. 77–85
- “Synthetic and Natural Noise Both Break Neural Machine Translation” In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings OpenReview.net, 2018 URL: https://openreview.net/forum?id=BJ8vJebC-
- “A dendrite method for cluster analysis” In Communications in Statistics - Theory and Methods 3.1 Informa UK Limited, 1974, pp. 1–27 DOI: 10.1080/03610927408827101
- “SMOTE: Synthetic Minority over-Sampling Technique” In J. Artif. Int. Res. 16.1 El Segundo, CA, USA: AI Access Foundation, 2002, pp. 321–357
- R.M. Cormack “A Review of Classification” In Journal of the Royal Statistical Society. Series A (General) 134.3 JSTOR, 1971, pp. 321 DOI: 10.2307/2344237
- Anna B. Costello and Jason Osborne “Best practices in exploratory factor analysis: four recommendations for getting the most from your analysis” University of Massachusetts Amherst, 2005 DOI: 10.7275/JYJ1-4868
- “Nearest neighbor pattern classification” In IEEE Transactions on Information Theory 13.1, 1967, pp. 21–27 DOI: 10.1109/TIT.1967.1053964
- Lee J Cronbach “Coefficient alpha and the internal structure of tests” In Psychometrika 16.3 Springer ScienceBusiness Media LLC, 1951, pp. 297–334
- “Robust clustering in high dimensional data using statistical depths” In BMC Bioinformatics 8.S7 Springer ScienceBusiness Media LLC, 2007 DOI: 10.1186/1471-2105-8-s7-s8
- Charles D Dziuban and Edwin C Shirkey “When is a correlation matrix appropriate for factor analysis? Some decision rules” In Psychol. Bull. 81.6 American Psychological Association (APA), 1974, pp. 358–361
- Viktoria Feucht, Paul Wilhelm Dierkes and Matthias Winfried Kleespies “The different values of nature: a comparison between university students’ perceptions of nature’s instrumental, intrinsic and relational values” In Sustainability Science 18.5 Springer ScienceBusiness Media LLC, 2023, pp. 2391–2403 DOI: 10.1007/s11625-023-01371-8
- “Handling missing values in multiple factor analysis” In Food Quality and Preference 30.2 Elsevier BV, 2013, pp. 77–85 DOI: 10.1016/j.foodqual.2013.04.013
- Florian G. Kaiser “A General Measure of Ecological Behavior1” In Journal of Applied Social Psychology 28.5 Wiley, 1998, pp. 395–422 DOI: 10.1111/j.1559-1816.1998.tb01712.x
- Henry F. Kaiser “A second generation little jiffy” In Psychometrika 35.4 Springer ScienceBusiness Media LLC, 1970, pp. 401–415 DOI: 10.1007/bf02291817
- Matthias Winfried Kleespies and Paul Wilhelm Dierkes “Impact of biological education and gender on students’ connection to nature and relational values” In PLOS ONE 15.11 Public Library of Science (PLoS), 2020, pp. e0242004 DOI: 10.1371/journal.pone.0242004
- “Assessing dimensions of inclusion from students’ perspective – measurement invariance across students with learning disabilities in different educational settings” In European Journal of Special Needs Education 35.3 Informa UK Limited, 2019, pp. 287–302 DOI: 10.1080/08856257.2019.1646958
- “Interactive Tree Of Life (iTOL): an online tool for phylogenetic tree display and annotation” In Bioinformatics 23.1 Oxford University Press (OUP), 2006, pp. 127–128 DOI: 10.1093/bioinformatics/btl529
- “Promoting connectedness with nature through environmental education” In Environmental Education Research 19.3 Informa UK Limited, 2013, pp. 370–384 DOI: 10.1080/13504622.2012.697545
- “Understanding and Enhancement of Internal Clustering Validation Measures” In IEEE Trans. Cybern. 43.3, 2013, pp. 982–994 DOI: 10.1109/TSMCB.2012.2220543
- F.Stephan Mayer and Cynthia McPherson Frantz “The connectedness to nature scale: A measure of individuals’ feeling in community with nature” In Journal of Environmental Psychology 24.4 Elsevier BV, 2004, pp. 503–515 DOI: 10.1016/j.jenvp.2004.10.001
- “Training and assessing classification rules with imbalanced data” In Data Mining and Knowledge Discovery 28.1 Springer ScienceBusiness Media LLC, 2012, pp. 92–122 DOI: 10.1007/s10618-012-0295-5
- Taciano L. Milfont and John Duckitt “The environmental attitudes inventory: A valid and reliable measure to assess the structure of environmental attitudes” In Journal of Environmental Psychology 30.1 Elsevier BV, 2010, pp. 80–94 DOI: 10.1016/j.jenvp.2009.09.001
- “Syntactic Data Augmentation Increases Robustness to Inference Heuristics” In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020 Association for Computational Linguistics, 2020, pp. 2339–2352 DOI: 10.18653/V1/2020.ACL-MAIN.212
- Mojgan Mohajer, Karl-Hans Englmeier and Volker J. Schmid “A comparison of Gap statistic definitions with and without logarithm function” In LMU Department of Statistics: Technical Reports 96, 2010 DOI: 10.5282/ubm/epub.11920
- Diane L Putnick and Marc H Bornstein “Measurement invariance conventions and reporting: The state of the art and future directions for psychological research” In Dev. Rev. 41 Elsevier BV, 2016, pp. 71–90
- Peter J. Rousseeuw “Silhouettes: A graphical aid to the interpretation and validation of cluster analysis” In Journal of Computational and Applied Mathematics 20 Elsevier BV, 1987, pp. 53–65 DOI: 10.1016/0377-0427(87)90125-7
- “New ways of dealing with lacking measurement invariance” In Accountability and Educational Improvement Cham: Springer International Publishing, 2021, pp. 63–82
- “Measurement invariance: Review of practice and implications” In Hum. Resour. Manag. Rev. 18.4 Elsevier BV, 2008, pp. 210–222
- Benedikt Szmrecsanyi “Studies in English language: Grammatical variation in British English dialects: A study in corpus-based dialectometry”, Studies in English language Cambridge, England: Cambridge University Press, 2012
- Kim-Pong Tam and Taciano L. Milfont “Towards cross-cultural environmental psychology: A state-of-the-art review and recommendations” In Journal of Environmental Psychology 71 Elsevier BV, 2020, pp. 101474 DOI: 10.1016/j.jenvp.2020.101474
- Robert Tibshirani, Guenther Walther and Trevor Hastie “Estimating the Number of Clusters in a Data Set Via the Gap Statistic” In Journal of the Royal Statistical Society Series B: Statistical Methodology 63.2 Oxford University Press (OUP), 2001, pp. 411–423 DOI: 10.1111/1467-9868.00293
- “Missing value estimation methods for DNA microarrays” In Bioinformatics 17.6 Oxford University Press (OUP), 2001, pp. 520–525 DOI: 10.1093/bioinformatics/17.6.520
- “Editorial: Measurement invariance” In Front. Psychol. 6 Frontiers Media SA, 2015, pp. 1064
- Joe H. Ward “Hierarchical Grouping to Optimize an Objective Function” In Journal of the American Statistical Association 58.301 Informa UK Limited, 1963, pp. 236–244 DOI: 10.1080/01621459.1963.10500845
- “Exploratory factor analysis and reliability analysis with missing data: A simple method for SPSS users” In The Quantitative Methods for Psychology 10.2 The Quantitative Methods for Psychology, 2014, pp. 143–152 DOI: 10.20982/tqmp.10.2.p143
- Alfred Wehrl “General properties of entropy” In Reviews of Modern Physics 50.2 American Physical Society (APS), 1978, pp. 221–260 DOI: 10.1103/revmodphys.50.221
- An Gie Yong and Sean Pearce “A Beginner’s Guide to Factor Analysis: Focusing on Exploratory Factor Analysis” In Tutorials in Quantitative Methods for Psychology 9.2 The Quantitative Methods for Psychology, 2013, pp. 79–94 DOI: 10.20982/tqmp.09.2.p079
- Weihang Zhang, Yuma Kinoshita and Hitoshi Kiya “Image-Enhancement-Based Data Augmentation for Improving Deep Learning in Image Classification Problem” In IEEE International Conference on Consumer Electronics - Taiwan, ICCE-TW 2020, Taoyuan, Taiwan, September 28-30, 2020 IEEE, 2020, pp. 1–2 DOI: 10.1109/ICCE-TAIWAN49838.2020.9258292