Cast vote records: A database of ballots from the 2020 U.S. Election

Published 24 Oct 2024 in cs.CY and stat.AP | (2411.05020v1)

Abstract: Ballots are the core records of elections. Electronic records of actual ballots cast (cast vote records) are available to the public in some jurisdictions. However, they have been released in a variety of formats and have not been independently evaluated. Here we introduce a database of cast vote records from the 2020 U.S. general election. We downloaded publicly available unstandardized cast vote records, standardized them into a multi-state database, and extensively compared their totals to certified election results. Our release includes vote records for President, Governor, U.S. Senate and House, and state upper and lower chambers -- covering 42.7 million voters in 20 states who voted for more than 2,204 candidates. This database serves as a uniquely granular administrative dataset for studying voting behavior and election administration. Using this data, we show that in battleground states, 1.9 percent of solid Republicans (as defined by their congressional and state legislative voting) in our database split their ticket for Joe Biden, while 1.2 percent of solid Democrats split their ticket for Donald Trump.

Abstract PDF HTML Upgrade to Chat

Authors (14)

Summary

The paper presents a robust database of 42.7 million cast vote records, enabling granular analysis of U.S. 2020 election voting behavior.
Methodological rigor is maintained by standardizing diverse data sources and validating results with a maximum 1% discrepancy against official tallies.
The database supports advanced electoral studies, including the analysis of split-ticket voting and precinct-level voting patterns in key battleground states.

Cast Vote Records: A Dataset for Comprehensive Electoral Analysis

In the study titled "Cast vote records: A database of ballots from the 2020 U.S. Election," the authors have constructed a robust database of Cast Vote Records (CVRs) from the 2020 U.S. general election, covering over 42.7 million voters across 20 states. This collection represents a significant stride in the availability of granular, individual-level voting data, providing researchers with a comprehensive resource for analyzing voting behavior and election administration at unprecedented depth.

Key Contributions and Data Insights

The authors meticulously compiled CVRs from publicly available sources, standardizing them into a single dataset and validating against certified election results to ensure reliability. The dataset encompasses votes for various offices, including President, Governor, U.S. Senate, and House, and is detailed down to the precinct level where possible. Notably, the dataset reveals significant insights into voting patterns, such as the percentage of voters who engaged in split-ticket voting, with 1.9% of "solid Republicans" voting for Joe Biden and 1.2% of "solid Democrats" voting for Donald Trump in key battleground states.

Methodological Rigor

The authors employed a meticulous methodology to standardize and validate the CVR data. Given the variation in data formats from different jurisdictions and voting machines, they prioritized ensuring that the transformation process was rigorous, employing a multi-step validation procedure. This included comparing CVR-derived vote tallies to official election results and addressing potential data discrepancies. For instance, only counties with a maximum discrepancy of 1% or less against official results were included in the release. Such validation establishes a foundation of trust in the data's authenticity and integrity.

Applications and Implications

This dataset is invaluable for multiple research domains. In political science and sociology, it supports the study of electoral phenomena such as split-ticket voting, allowing for precise measurement beyond what aggregate data can provide. In election law and administration, the dataset aids in forensic analyses and investigations into election integrity, serving as a tool to counter misinformation and strengthen democratic processes.

Practical Implications

One practical application includes understanding voter behavior through the lens of split-ticket voting. The dataset allows for examining how voting patterns vary at different levels of government and in relation to voter demographics and geographic distribution. It offers insights into "swing" voter behavior, facilitating a deeper analysis of political strategies and electoral outcomes.

Theoretical Implications

Theoretically, this dataset can underpin models exploring voting behavior, partisan loyalty, and the dynamics of electoral competition. It can provide empirical groundwork for theories about political behavior in highly polarized environments and contribute to discussions about the factors driving ticket-splitting and its impacts on legislative outcomes.

Challenges and Considerations

The dataset's assembly highlights challenges inherent in working with CVRs, such as ensuring data privacy while preserving detailed voter information. The study addresses this by implementing privacy-protecting measures like precinct aggregation, although releasing such detailed data suggests potential challenges in balancing transparency and voter confidentiality.

Future Directions

This dataset not only serves as a pivotal resource for current research but also sets a precedent for future data collection and sharing practices. As election processes continue to evolve, expanding such a dataset to include more states and subsequent elections could offer longitudinal insights, crucial for tracking changes in voter behavior over time. Moreover, integrating additional data points such as campaign spending or media exposure can enhance the understanding of electoral influences.

Conclusion

In summary, this dataset of cast vote records opens new avenues for analyzing electoral processes and behaviors. It exemplifies a rigorous approach to data compilation and validation, establishing a benchmark for future research endeavors aimed at demystifying voting trends and strengthening election integrity. The implications of this work extend beyond the 2020 U.S. election, potentially informing future policy-making and scholarly inquiries into the intricacies of democratic participation.

Markdown Report Issue