Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
101 tokens/sec
GPT-4o
13 tokens/sec
Gemini 2.5 Pro Pro
37 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
4 tokens/sec
DeepSeek R1 via Azure Pro
33 tokens/sec
2000 character limit reached

NLP4Gov: A Comprehensive Library for Computational Policy Analysis (2404.03206v1)

Published 4 Apr 2024 in cs.HC

Abstract: Formal rules and policies are fundamental in formally specifying a social system: its operation, boundaries, processes, and even ontology. Recent scholarship has highlighted the role of formal policy in collective knowledge creation, game communities, the production of digital public goods, and national social media governance. Researchers have shown interest in how online communities convene tenable self-governance mechanisms to regulate member activities and distribute rights and privileges by designating responsibilities, roles, and hierarchies. We present NLP4Gov, an interactive kit to train and aid scholars and practitioners alike in computational policy analysis. The library explores and integrates methods and capabilities from computational linguistics and NLP to generate semantic and symbolic representations of community policies from text records. Versatile, documented, and accessible, NLP4Gov provides granular and comparative views into institutional structures and interactions, along with other information extraction capabilities for downstream analysis.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (76)
  1. Socio-technical design for public interest technology. IEEE Transactions on Technology and Society, 2(2):55–61, 2021.
  2. Specifying norm-governed computational societies. ACM Transactions on Computational Logic (TOCL), 10(1):1–42, 2009.
  3. Intellectual Property Norms in Online Communities: How User-Organized Intellectual Property Regulation Supports Innovation. Information Systems Research, 27(4):724–750, December 2016.
  4. Yochai Benkler. The wealth of networks: How social production transforms markets and freedom, 2006.
  5. Propbank annotation guidelines. Center for Computational Language and Education Research, CU-Boulder, 9, 2010.
  6. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  7. Sparks of artificial general intelligence: Early experiments with gpt-4, 2023.
  8. Integrating core concepts from the institutional analysis and development framework for the systematic analysis of policy designs: An illustration from the us national organic program regulation. Journal of Theoretical Politics, 28(1):159–185, 2016.
  9. Do we run how we say we run? formalization and practice of governance in oss communities, 2023.
  10. Quarantined! Examining the Effects of a Community-Wide Moderation Intervention on Reddit. ACM Transactions on Computer-Human Interaction, 29(4):1–26, August 2022.
  11. The Bag of Communities: Identifying Abusive Behavior Online with Preexisting Internet Data. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems, pages 3175–3187, Denver Colorado USA, May 2017. ACM.
  12. A grammar of institutions. American political science review, 89(3):582–600, 1995. Publisher: Cambridge University Press.
  13. Motivational foundations of communication, voluntary cooperation, and self-governance in a common-pool resource dilemma. Current Research in Ecological and Social Psychology, 2:100016, 2021.
  14. Nadia Eghbal. Working in public: the making and maintenance of open source software, 2020.
  15. Reddit rules! characterizing an ecosystem of governance, 2018.
  16. Birds of the Internet: Towards a field guide to the organization and governance of participation. Journal of Cultural Economy, 4(2):157–187, 2011. ISBN: 1753-0350 Publisher: Taylor & Francis.
  17. Comparisons of historical dutch commons inform about the long-term dynamics of social-ecological systems. Plos one, 16(8):e0256803, 2021.
  18. Institutional Grammar 2.0: A specification for encoding and analyzing institutional design. Public Administration, 99(2):222–247, 2021. Publisher: Wiley Online Library.
  19. Institutional Grammar. Springer, New York, USA, 2022.
  20. Composing games into complex institutions. Plos one, 18(3):e0283361, 2023.
  21. " this place does what it was built for" designing digital institutions for participatory change. Proceedings of the ACM on Human-Computer Interaction, 3(CSCW):1–31, 2019.
  22. Emergence of integrated institutions in a large population of self-governing communities. PloS one, 14(7):e0216335, 2019.
  23. Emergence of integrated institutions in a large population of self-governing communities. PLOS ONE, 14(7):e0216335, July 2019.
  24. Governing Online Goods: Maturity and Formalization in Minecraft, Reddit, and World of Warcraft Communities. Proceedings of the ACM on Human-Computer Interaction, 6(CSCW2):1–23, November 2022.
  25. Governing knowledge commons. Oxford University Press, Oxford, UK, 2014.
  26. Enhancing abm into an inevitable tool for policy analysis. Policy and Complex Systems, 1(1):61–76, 2014.
  27. Tarleton Gillespie. The politics of ‘platforms’. New media & society, 12(3):347–364, 2010.
  28. Maarten Grootendorst. Bertopic: Neural topic modeling with a class-based tf-idf procedure, 2022.
  29. Garrett Hardin. The tragedy of the commons: the population problem has no technical solution; it requires a fundamental extension in morality. science, 162(3859):1243–1248, 1968.
  30. Colin Harris. Institutional solutions to free-riding in peer-to-peer networks: a case study of online pirate communities. Journal of Institutional Economics, 14(5):901–924, October 2018.
  31. Understanding knowledge as a commons: From theory to practice, 2007.
  32. David J. Hess. Technology-and product-oriented movements: Approximating social movement studies and science and technology studies. Science, Technology, & Human Values, 30(4):515–535, 2005.
  33. The Hidden Costs of Requiring Accounts: Quasi-Experimental Evidence From Peer Production. Communication Research, 48(6):771–795, August 2021.
  34. Open source software and the “private-collective” innovation model: Issues for organization science. Organization science, 14(2):209–223, 2003.
  35. Rules and Rule-Making in the Five Largest Wikipedias. Proceedings of the International AAAI Conference on Web and Social Media, 16:347–357, May 2022.
  36. Decentralizing platform power: A design space of multi-level governance in online social platforms. Social Media+ Society, 9(4):20563051231207857, 2023.
  37. Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition, 2000.
  38. The future of crowd work. In Proceedings of the 2013 Conference on Computer Supported Cooperative Work, CSCW ’13, page 1301–1318, New York, NY, USA, 2013. Association for Computing Machinery.
  39. Higher-order coreference resolution with coarse-to-fine inference. In Marilyn Walker, Heng Ji, and Amanda Stent, editors, Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), pages 687–692, New Orleans, Louisiana, June 2018. Association for Computational Linguistics.
  40. J Nathan Matias. The civic labor of volunteer moderators online. Social Media+ Society, 5(2):2056305119836778, 2019.
  41. Ms marco: A human generated machine reading comprehension dataset. choice, 2640:660, 2016.
  42. Mastodon rules: Characterizing formal rules on popular mastodon instances. In Companion Publication of the 2023 Conference on Computer Supported Cooperative Work and Social Computing, CSCW ’23 Companion, page 86–90, New York, NY, USA, 2023. Association for Computing Machinery.
  43. Elinor Ostrom. Governing the Commons: The evolution of institutions for collective action. Cambridge University Press, Cambridge, MA, 1990.
  44. Elinor Ostrom. Understanding institutional diversity. Princeton University Press, Princeton, NJ, 2009.
  45. Siobhán O’Mahony. Guarding the commons: how community managed software projects protect their work. Research policy, 32(7):1179–1198, 2003.
  46. Characterizations of Online Harassment: Comparing Policies Across Social Media Platforms. In Proceedings of the 19th International Conference on Supporting Group Work, pages 369–374, Sanibel Island Florida USA, November 2016. ACM.
  47. The use of the institutional grammar 1.0 for institutional analysis: A literature review. International Journal of the Commons, 17(1):pp. 256–270, 2023.
  48. An institutional framework for policy analysis and design, 1999.
  49. Working together: collective action, the commons, and multiple methods in practice. Princeton University Press, Princeton, NJ, 2010.
  50. Stanza: A python natural language processing toolkit for many human languages. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pages 101–108, Online, 2020. Association for Computational Linguistics.
  51. Sentence-bert: Sentence embeddings using siamese bert-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), page 3980–3990, Hong Kong, China, 2019. Association for Computational Linguistics.
  52. Machine coding of policy texts with the Institutional Grammar. Public Administration, 99(2):248–262, 2021. Publisher: Wiley Online Library.
  53. Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter, 2019.
  54. Edella Schlager and E. Ostrom. Property-rights regimes and natural resources: A conceptual analysis. Land Economics, 68:249, 1992.
  55. Modular Politics: Toward a Governance Layer for Online Communities. Proceedings of the ACM on Human-Computer Interaction, 5(CSCW1):1–26, April 2021.
  56. Tragedy of the foss commons? investigating the institutional designs of free/libre and open source software projects. First Monday, 12, 2007.
  57. Internet success: a study of open-source software commons. MIT Press, Cambridge, MA, 2012.
  58. Applying elinor ostrom’s rule classification framework to the analysis of open source software commons. Transnational Corporations Review, 2(1):13–26, Jan 2010.
  59. Joseph Seering. Reconsidering self-moderation: the role of research in supporting community-based models for online content moderation. Proc. ACM Hum.-Comput. Interact., 4(CSCW2), oct 2020.
  60. Cui bono: Do open source software incubator policies and procedures benefit the projects or the incubator? International Journal of the Commons, 16(1):64–77, 2022.
  61. Laboratories of oligarchy? how the iron law extends to peer production: Laboratories of oligarchy. Journal of Communication, 64(2):215–238, Apr 2014.
  62. Simple bert models for relation extraction and semantic role labeling, 2019.
  63. Saba Siddiki. Assessing policy design and interpretation: An institutions-based analysis in the context of aquaculture in f lorida and v irginia, u nited s tates. Review of Policy Research, 31(4):281–303, 2014.
  64. Evaluating change in representation and coordination in collaborative governance over time: A study of environmental justice councils. Environmental Management, 71(3):620–640, 2023.
  65. Using the institutional grammar tool to understand regulatory compliance: The case of colorado aquaculture. Regulation & Governance, 6(2):167–188, 2012.
  66. Understanding the effects of social value orientations in shaping regulatory outcomes through agent-based modeling: An application in organic farming, 2023.
  67. Institutional analysis with the institutional grammar. Policy Studies Journal, 50(2):315–339, 2022.
  68. Mpnet: Masked and permuted pre-training for language understanding. Advances in Neural Information Processing Systems, 33:16857–16867, 2020.
  69. Experimenting With Online Governance. Frontiers in Human Dynamics, 3:629285, April 2021.
  70. Matia Vannoni. A political economy approach to the grammar of institutions: Theory and methods. Policy Studies Journal, 50(2):453–471, 2022.
  71. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837, 2022.
  72. Foes to friends: Changing contexts and changing intergroup perceptions. Journal of Comparative Policy Analysis: Research and Practice, 13(5):499–525, 2011.
  73. Ontonotes release 5.0 ldc2013t19. Linguistic Data Consortium, Philadelphia, PA, 23:170, 2013.
  74. Open source software sustainability: Combining institutional analysis and socio-technical networks. Proceedings of the ACM on Human-Computer Interaction, 6(CSCW2):1–23, 2022.
  75. Apache software foundation incubator project sustainability dataset. In 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR), pages 595–599, Madrid, Spain, 2021. IEEE.
  76. PolicyKit: Building Governance in Online Communities. In Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology, pages 365–378, Virtual Event USA, October 2020. ACM.

Summary

  • The paper introduces NLP4Gov as a toolkit that leverages NLP methods to extract semantic and symbolic representations from policy texts.
  • It employs techniques such as coreference resolution, policy parsing, and topic modeling to reveal granular insights into online governance.
  • The toolkit supports comparative policy analysis and governance tracking, empowering research on digital communities.

NLP4Gov: An Interactive Toolkit for Computational Policy Analysis

Overview

The sphere of computational policy analysis receives an invaluable resource with the introduction of NLP4Gov, an interactive toolkit designed to apply NLP and computational linguistics methods to the analysis of community policies obtained from text records. Developed by researchers at the University of California, Davis, and the University of Massachusetts Amherst, NLP4Gov aims to facilitate scholars and practitioners in generating semantic and symbolic representations of policy documents. This versatile toolkit allows for granular and comparative views into institutional structures and interactions, offering an array of information extraction capabilities for downstream analysis.

Motivation

The motivation behind NLP4Gov stems from the burgeoning interest in understanding how online communities self-govern by regulating member activities through formal policy mechanisms. These community policies play a critical role in allocating rights, designating responsibilities, and ensuring the sustainable management of resources, much like traditional governance structures in public administration and natural resources management. However, analyzing these policies poses significant challenges due to the volume of documents and the complexity inherent in understanding and interpreting policy language. NLP4Gov addresses this gap by leveraging computational methods to systematically analyze policy texts, making it a pivotal development for researchers across various fields, including socio-technical systems research, social science, anthropology, economics, and public policy.

Applications and Functionalities

NLP4Gov encompasses several applications, each designed to provide distinct insights into the governance mechanisms of online communities. The toolkit includes functionalities for:

  • Coreference Resolution: Enhancing the continuity of context within documents by identifying and substituting references to the same entity across sentences.
  • Policy Parsing: Employing Semantic Role Labeling (SRL) and dependency parsing to extract the primary components of policies (attributes, objects, aims, and deontics) following the Institutional Grammar framework.
  • Clustering & Topic Modeling: Utilizing advanced models like Bertopic for thematically categorizing policies or their components and discovering patterns in regulatory practices.
  • Comparative Policy Analysis: Drawing upon semantic similarity metrics to compare policy documents across different communities, identifying overlaps and divergences in governance approaches.
  • Governance Tracking: Offering tools to retrieve and analyze discourse from community threads related to specific policies, providing valuable insights into how policies are interpreted and enacted by community members.

These applications are made accessible through an interactive interface powered by Google Colaboratory, requiring minimal setup and allowing users of varied computational backgrounds to utilize the toolkit effectively.

Implications and Future Directions

The introduction of NLP4Gov not only advances the field of computational policy analysis but also opens new avenues for interdisciplinary research on governance in digital platforms. By providing a structured framework to analyze and compare policies across communities, the toolkit empowers researchers to uncover underlying governance mechanisms, evaluate policy effectiveness, and potentially inform the design of more equitable and sustainable governance structures. Furthermore, NLP4Gov's ongoing development signifies a move towards incorporating more components of the Institutional Grammar framework and leveraging LLMs for a deeper understanding of policy semantics and structure.

Conclusion

NLP4Gov stands as a significant contribution towards bridging the gap between the capabilities of computational linguistics and the needs of policy analysis in socio-digital systems. By democratizing access to sophisticated NLP tools for the analysis of governance mechanisms, it paves the way for a more nuanced understanding of how policies shape and are shaped by the dynamics of online communities. As digital platforms continue to play a pivotal role in our social fabric, tools like NLP4Gov are essential for fostering informed discussions and interventions in the governance of these spaces.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.