Pabulib: Standardized Participatory Budgeting Database
- Pabulib is a web-based library that standardizes participatory budgeting data from municipalities globally, offering a unified machine-readable format.
- It enables researchers to conduct reproducible experiments with PB voting rules by providing both real-world and synthetic datasets.
- The platform supports robust metadata validation and parsing, facilitating empirical evaluation of aggregation methods like the greedy rule.
Pabulib is a web-based library designed to collect, standardize, and disseminate participatory budgeting (PB) instances from municipalities worldwide. It provides a common, machine-readable file format (.pb) for storing both real-world and synthetic datasets, optimizing the reproducibility and comparability of computational research into PB voting rules, objective functions, and aggregation mechanisms. The Pabulib platform hosts a curated catalogue of downloadable PB instances, furnishing essential infrastructure for empirical evaluation of PB algorithms (Stolicki et al., 2020).
1. Motivation and Purpose
Pabulib responds to the rapid global adoption of participatory budgeting, where citizens directly allocate public funds through democratic voting processes. The proliferation of research into new aggregation rules, optimization procedures, and fairness axioms in computational social choice has engendered strong demand for standardized benchmark datasets. Pabulib's objectives are threefold:
- Collection and Standardization: Assemble PB data from diverse jurisdictions into a unified format, encompassing both metadata and granular voting records.
- Format Specification: Ensure data is stored in a machine-readable, human-auditable way, covering general metadata, project lists with costs, and complete voter-level ballots.
- Reproducibility and Accessibility: Enable robust, reproducible computational experiments, and allow practitioners to evaluate and compare aggregation methods on authoritative datasets.
By lowering the barriers to dataset access and providing a canonical file structure, Pabulib is positioned as a reference point for both empirical study and operational deployment of participatory budgeting algorithms (Stolicki et al., 2020).
2. Platform Architecture and Workflow
Pabulib is implemented as a lightweight, user-accessible web portal (http://pabulib.org/) featuring the following core components:
- Catalogue of Instances: Each PB instance is represented as a single .pb file and indexed for browsing by country, year (instance identifier), vote type, total budget, and project category.
- Browsing and Filtering: Users can navigate instances via metadata-based filters and obtain direct download links for each .pb file.
- Contribution Mechanism: Researchers or municipal staff submit new instances through the "Contribute" interface. Submissions are reviewed for syntactic correctness and completeness before integration.
The submission and review process ensures that all entries adhere strictly to format specifications and completeness requirements. Immediately upon acceptance, new datasets become available for community use, accelerating research and practitioner engagement (Stolicki et al., 2020).
3. Formal Specification of the .pb Format
Each .pb file is a UTF-8 encoded text file comprising exactly three top-level sections: META, PROJECTS, and VOTES. The format is specified by an EBNF grammar, ensuring syntactic rigor and ease of parsing. The organizational structure is as follows:
| Section | Content | Required/Optional Fields |
|---|---|---|
| META | Global metadata (description, country, instance, etc.) | description, country, unit, instance, num_projects, num_votes, budget, rule, vote_type, plus type-specific parameters |
| PROJECTS | List of projects with costs and optional metadata | project_id, cost, optional (name, category, group, etc.) |
| VOTES | Voter records, demographics, and vote payloads | voter_id, voter_meta (e.g., age, sex), vote structure |
Syntactic Elements
The EBNF grammar (ISO 14977-inspired) defines parsing rules—e.g., the META section contains case-sensitive key-value pairs, the PROJECTS section enumerates projects with positive costs, and the VOTES section admits demographic metadata and payloads determined by vote_type (approval, ordinal, cumulative, or scoring). Project and voter IDs are arbitrary integers, not required to be contiguous.
Semantic Structure and Constraints
Define:
- : project set, project cost
- : voter set
- : total budget
In the META section, all critical parameters (project count, vote count, budget, vote_type, and rule) are accompanied by type-specific constraints (e.g., min_length, max_length for approval/ordinal; max_sum_points and point ranges for cumulative/scoring).
In the PROJECTS section, each project is uniquely represented and annotated with optional metadata (e.g., category, target group).
In the VOTES section, each row is a unique voter, listing demographics and an encoded payload whose structure is dictated by vote_type:
- Approval: Voter approves a subset , subject to min/max cardinality and sum cost
- Ordinal: Voter provides a strict ranking (permutation) of a project subset
- Cumulative: Voter assigns non-negative scores to projects, with sum constrained by max_sum_points
- Scoring: Voter gives scores in a predefined interval, with a default for omitted projects
4. Objective Functions and Aggregation
The principal aggregation mechanism is the "greedy" rule:
- Each project's score is aggregated over all voters:
- Projects are selected in decreasing order of , subject to
For generalization, the PB selection task can be formulated as an integer programming problem:
Only the greedy rule is required in the current (v1.0) Pabulib reference implementation, but the structure supports the future addition of proportional and exact-optimization aggregation procedures (Stolicki et al., 2020).
5. Parsing, Validation, and Computational Usage
Proper utilization of .pb files involves sequential parsing and validation steps:
- Section Delimitation: Read file as UTF-8, splitting into META, PROJECTS, and VOTES at corresponding headers.
- Metadata Validation: Parse and typecast META entries, check for presence and valid ranges of required keys.
- Project List Consistency: Parse each project row; validate that costs are strictly positive and the project count matches num_projects.
- Vote Integrity: For each vote, ensure unique voter_id, demographic correctness, and that the size and structure of votes comply with the type-specific constraints (e.g., length, sum-of-points). Verify all referenced project_ids exist in PROJECTS.
- Structure Construction: Build internal structures such as cost vectors , vote matrices , and metadata dictionaries.
Once validated, these objects can be passed directly to PB solvers. For the greedy rule, the common sequence is to sum votes by project, sort, and greedily select projects until the budget limit is reached.
6. Extensions and Future Developments
Current limitations of Pabulib v1.0 include exclusive support for the greedy rule and four basic vote types. Announced extension directions are:
- Advanced Aggregation: Introduction of proportional aggregation rules and knapsack-based exact optimizers.
- Metadata Enrichment: Addition of geo-coordinates, detailed project descriptions, and expanded socio-economic attributes for voters.
- Multi-Jurisdictional Support: Enablement of nested unit/subunit hierarchies to capture multi-level PB processes.
- Programmatic Access: Development of a web API for querying and filtering by budget, location, and vote type.
These trajectories are intended to solidify Pabulib's status as the standard empirical and operational resource for participatory budgeting research and implementation (Stolicki et al., 2020).