Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Co-ML: Collaborative Machine Learning Model Building for Developing Dataset Design Practices (2311.09088v3)

Published 15 Nov 2023 in cs.HC

Abstract: Machine learning (ML) models are fundamentally shaped by data, and building inclusive ML systems requires significant considerations around how to design representative datasets. Yet, few novice-oriented ML modeling tools are designed to foster hands-on learning of dataset design practices, including how to design for data diversity and inspect for data quality. To this end, we outline a set of four data design practices (DDPs) for designing inclusive ML models and share how we designed a tablet-based application called Co-ML to foster learning of DDPs through a collaborative ML model building experience. With Co-ML, beginners can build image classifiers through a distributed experience where data is synchronized across multiple devices, enabling multiple users to iteratively refine ML datasets in discussion and coordination with their peers. We deployed Co-ML in a 2-week-long educational AIML Summer Camp, where youth ages 13-18 worked in groups to build custom ML-powered mobile applications. Our analysis reveals how multi-user model building with Co-ML, in the context of student-driven projects created during the summer camp, supported development of DDPs including incorporating data diversity, evaluating model performance, and inspecting for data quality. Additionally, we found that students' attempts to improve model performance often prioritized learnability over class balance. Through this work, we highlight how the combination of collaboration, model testing interfaces, and student-driven projects can empower learners to actively engage in exploring the role of data in ML systems.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (85)
  1. Scratch nodes ML: A playful system for children to create gesture recognition classifiers. In Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems. 1–6.
  2. Constructionism, ethics, and creativity: Developing primary and middle school artificial intelligence education. In International workshop on education in artificial intelligence k-12 (eduai’19), Vol. 2. 1–4.
  3. Apple. 2022. CreateML. https://developer.apple.com/machine-learning/create-ml/
  4. Designing with and for Youth: A Participatory Design Research Approach for Critical Machine Learning Education. Educational Technology & Society 25, 4 (2022), 126–141.
  5. Characterizing children’s conceptual knowledge and computational practices in a critical machine learning educational program. International Journal of Child-Computer Interaction 34 (2022), 100541.
  6. Charles Babbage. 2022. Passages from the Life of a Philosopher. DigiCat.
  7. Brigid Barron. 2003. When smart groups fail. The journal of the learning sciences 12, 3 (2003), 307–359.
  8. PearProgram: A more fruitful approach to pair programming. In Proceedings of the 52nd ACM Technical Symposium on Computer Science Education. 900–906.
  9. Learning with peers: From small group cooperation to collaborative communities. Educational researcher 25, 8 (1996), 37–39.
  10. How people learn. Vol. 11. Washington, DC: National academy press.
  11. The case for pair programming in the computer science classroom. ACM Transactions on Computing Education (TOCE) 11, 1 (2011), 1–21.
  12. Joy Buolamwini and Timnit Gebru. 2018. Gender shades: Intersectional accuracy disparities in commercial gender classification. In Conference on fairness, accountability and transparency. PMLR, 77–91.
  13. Teachable machine: Approachable Web-based tool for exploring machine learning classification. In Extended Abstracts of the 2020 CHI Conference on Human Factors in Computing Systems. 1–8.
  14. Kathy Charmaz. 2000. Grounded theory: Objectivist and constructivist methods. Handbook of qualitative research 2, 1 (2000), 509–535.
  15. Jacob Cohen. 1988. Statistical power analysis for the behavioral sciences. Routledge.
  16. Jeffrey Dastin. 2018. Amazon scraps secret AI recruiting tool that showed bias against women. In Ethics of data and analytics. Auerbach Publications, 296–299.
  17. The discourse of creative problem solving in childhood engineering education. Boulder, CO: International Society of the Learning Sciences.
  18. How do we assess equity in programming pairs? Singapore: International Society of the Learning Sciences.
  19. Paul Dourish and Victoria Bellotti. 1992. Awareness and coordination in shared workspaces. In Proceedings of the 1992 ACM conference on Computer-supported cooperative work. 107–114.
  20. Stefania Druga. 2018. Growing up with AI: Cognimates: from coding to teaching machines. Ph.D. Dissertation. Massachusetts Institute of Technology.
  21. Family as a Third Space for AI Literacies: How do children and parents learn about AI together?. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems. 1–17.
  22. The landscape of teaching resources for ai education. In Proceedings of the 27th ACM Conference on on Innovation and Technology in Computer Science Education Vol. 1. 96–102.
  23. Exploring Machine Teaching with Children. In 2021 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC). IEEE, 1–11.
  24. Gentle introduction to artificial intelligence for high-school students using scratch. IEEE access 7 (2019), 179027–179036.
  25. Virginia Eubanks. 2018. Automating inequality: How high-tech tools profile, police, and punish the poor. St. Martin’s Press.
  26. Rebecca Fiebrink. 2019. Machine learning education for artists, musicians, and other creative practitioners. ACM Transactions on Computing Education (TOCE) 19, 4 (2019), 1–32.
  27. Center for Democracy and Technology. 2019. AI & Machine Learning. https://cdt.org/ai-machine-learning/.
  28. Barney G Glaser and Anselm L Strauss. 2017. Discovery of grounded theory: Strategies for qualitative research. Routledge.
  29. Google. 2023. Colab. https://colab.research.google.com.
  30. MIT Media Lab Personal Robots Group and MIT STEP Lab. 2023. DAILy Curriculum for Middle School Students. https://raise.mit.edu/daily/index.html.
  31. Introducing children to machine learning concepts through hands-on experience. In Proceedings of the 17th ACM conference on interaction design and children. 563–568.
  32. Computer-supported collaborative learning in STEM domains: Towards a meta-synthesis. (2017).
  33. Understanding and Visualizing Data Iteration in Machine Learning. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM. https://doi.org/10.1145/3313831.3376177
  34. Designing data: Proactive data collection and iteration for machine learning. arXiv preprint arXiv:2301.10319 (2023).
  35. Danielle L Jones and Scott D Fleming. 2013. What use is a backseat driver? A qualitative investigation of pair programming. In 2013 IEEE Symposium on Visual Languages and Human Centric Computing. IEEE, 103–110.
  36. PoseBlocks: A toolkit for creating (and dancing) with AI. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 15551–15559.
  37. Ken Kahn and Niall Winters. 2021. Constructionism and AI: A history and possible futures. British Journal of Educational Technology 52, 3 (2021), 1130–1142.
  38. VotestratesML: A high school learning tool for exploring machine learning and its societal implications. In FabLearn Europe/MakeEd 2021-An international conference on computing, design and making in education. 1–10.
  39. Nazish Zaman Khan and Andrew Luxton-Reilly. 2016. Is computing for social good the solution to closing the gender gap in computer science?. In Proceedings of the Australasian Computer Science Week Multiconference. 1–5.
  40. Talking datasets–understanding data sensemaking behaviours. International journal of human-computer studies 146 (2021), 102562.
  41. Dale Lane. 2021. Machine learning for kids: A project-based introduction to artificial intelligence. No Starch Press.
  42. In the black mirror: Youth investigations into artificial intelligence. ACM Transactions on Computing Education 22, 3 (2022), 1–25.
  43. Developing middle school students’ AI literacy. In Proceedings of the 52nd ACM technical symposium on computer science education. 191–197.
  44. Duri Long and Brian Magerko. 2020. What is AI literacy? Competencies and design considerations. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. 1–16.
  45. Family Learning Talk in AI Literacy Learning Activities. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems. 1–20.
  46. Investigating different assignment designs to promote collaboration in block-based environments. In Proceedings of the 51st ACM Technical Symposium on Computer Science Education. 832–838.
  47. Learning Sciences for computing education. In Cambridge Handbook of Computing Education Research, Sally A Fincher and Anthony V Robins (Eds.). Cambridge: Cambridge University Press.
  48. Teaching machine learning in school: A systematic mapping of the state of the art. Informatics in Education 19, 2 (2020), 283–321.
  49. Diversity and inclusion metrics in subset selection. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society. 117–123.
  50. Safiya Umoja Noble. 2018. Algorithms of oppression. New York University Press.
  51. Department of Agriculture. 2022. Child Nutrition Programs: Income Eligibility Guidelines. https://www.govinfo.gov/content/pkg/FR-2022-02-16/pdf/2022-03261.pdf
  52. Blakeley H Payne. 2019. An ethics of artificial intelligence curriculum for middle school students. (2019).
  53. A multi-institutional study of peer instruction in introductory computing. In Proceedings of the 47th ACM Technical Symposium on Computing Science Education. 358–363.
  54. Interactive machine teaching: a human-centered approach to building machine-learned models. Human–Computer Interaction 35, 5-6 (2020), 413–451.
  55. Thomas C Redman. 2018. If your data is bad, your machine learning tools are useless. Harvard Business Review 2 (2018).
  56. Yim Register and Amy J Ko. 2020. Learning machine learning with personal data helps stakeholders ground advocacy arguments in model mechanics. In Proceedings of the 2020 ACM Conference on International Computing Education Research. 67–78.
  57. Exploring the Pair Programming Process: Characteristics of Effective Collaboration. In Proceedings of the 2017 ACM SIGCSE Technical Symposium on Computer Science Education (SIGCSE ’17). Association for Computing Machinery, New York, NY, USA, 507–512. https://doi.org/10.1145/3017680.3017748
  58. Jeremy Roschelle. 1992. Learning by collaborating: Convergent conceptual change. The journal of the learning sciences 2, 3 (1992), 235–276.
  59. Jeremy Roschelle and Stephanie D Teasley. 1995. The construction of shared knowledge in collaborative problem solving. In Computer supported collaborative learning. Springer, 69–97.
  60. The effects of openness to experience on pair programming in a higher education context. In 2011 24th IEEE-CS Conference on Software Engineering Education and Training (CSEE&T). IEEE, 149–158.
  61. “Everyone wants to do the model work, not the data work”: Data Cascades in High-Stakes AI. In proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 1–15.
  62. Exploring teachers’ preconceptions of teaching machine learning in high school: A preliminary insight from Africa. Computers and Education Open 3 (2022), 100072.
  63. Co-located collaborative block-based programming. In 2019 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC). IEEE, 107–116.
  64. R Benjamin Shapiro and Rebecca Fiebrink. 2019. Introduction to the special section: Launching an agenda for research on learning machine learning. , 6 pages.
  65. Machine teaching: A new paradigm for building machine learning systems. arXiv preprint arXiv:1707.06742 (2017).
  66. Anselm Strauss and Juliet Corbin. 1990. Basics of qualitative research. Sage publications.
  67. Danny Tang. 2019. Empowering Novices to Understand and Use Machine Learning With Personalized Image Classification Models, Intuitive Analysis Tools, and MIT App Inventor. Ph.D. Dissertation. Massachusetts Institute of Technology.
  68. Giri Kumar Tayi and Donald P Ballou. 1998. Examining data quality. Commun. ACM 41, 2 (1998), 54–57.
  69. CT 2.0. In Proceedings of the 21st Koli Calling International Conference on Computing Education Research. 1–8.
  70. Machine Learning and the Five Big Ideas in AI. International Journal of Artificial Intelligence in Education (2022), 1–34.
  71. K-12 Guidelines for Artificial Intelligence: What Students Should Know. https://github.com/touretzkyds/ai4k12/raw/master/documents/ISTE_2019_Presentation_website_final.pdf/
  72. PlushPal: Storytelling with Interactive Plush Toys and Machine Learning. In Interaction Design and Children. 236–245.
  73. Toward more gender diversity in CS through an artificial intelligence summer program for high school girls. In Proceedings of the 47th ACM technical symposium on computing science education. 303–308.
  74. Teaching tech to talk: K-12 conversational artificial intelligence literacy curriculum and development tools. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 15655–15663.
  75. Machine learning for middle-schoolers: Children as designers of machine-learning apps. In 2020 IEEE Frontiers in Education Conference (FIE). IEEE, 1–9.
  76. How data scientists use computational notebooks for real-time collaboration. Proceedings of the ACM on Human-Computer Interaction 3, CSCW (2019), 1–30.
  77. A survey of transfer learning. Journal of Big data 3, 1 (2016), 1–40.
  78. Pair-programming helps female computer science students. Journal on Educational Resources in Computing (JERIC) 4, 1 (2004), 4–es.
  79. AI+ ethics curricula for middle school youth: Lessons learned from three project-based curricula. International Journal of Artificial Intelligence in Education (2022), 1–59.
  80. H. James Wilson and Paul R. Daugherty. [n. d.]. Small Data Can Play a Big Role in AI.
  81. Grounding interactive machine learning tool design in how non-experts actually build models. In Proceedings of the 2018 designing interactive systems conference. 573–584.
  82. Good On You. [n. d.]. https://goodonyou.eco/
  83. Designing AI Learning Experiences for K-12: Emerging Works, Future Opportunities and a Design Framework. arXiv preprint arXiv:2009.10228 (2020).
  84. Youth making machine learning models for gesture-controlled interactive media. In Proceedings of the Interaction Design and Children Conference. 63–74.
  85. Youth learning machine learning through building models of athletic moves. In Proceedings of the 18th ACM International Conference on Interaction Design and Children. 121–132.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Tiffany Tseng (4 papers)
  2. Matt J. Davidson (2 papers)
  3. Luis Morales-Navarro (17 papers)
  4. Jennifer King Chen (5 papers)
  5. Victoria Delaney (1 paper)
  6. Mark Leibowitz (1 paper)
  7. Jazbo Beason (2 papers)
  8. R. Benjamin Shapiro (5 papers)
Citations (8)