Firebird: Data-Driven Fire Inspection Tool
- Firebird Framework is a data-driven system that consolidates heterogeneous municipal records, applies machine learning risk prediction, and utilizes interactive mapping for fire inspection prioritization.
- The system employs robust data integration techniques—including geocoding, fuzzy string matching, and multi-key joins—to achieve nearly complete spatial resolution and high data quality.
- High-risk properties are prioritized using advanced risk models (SVM and Random Forests), enabling evidence-based scheduling and improved operational outcomes in fire risk management.
The Firebird framework is a comprehensive, data-driven system designed to identify and prioritize fire inspections of commercial properties through the integration of heterogeneous municipal datasets, application of machine learning risk prediction, and interactive spatial visualization. Developed in direct collaboration with the Atlanta Fire Rescue Department (AFRD), Firebird supports both discovery of previously uninspected properties and prioritization of existing inspection lists by computed fire risk, thereby optimizing resources and operational outcomes in fire risk management.
1. Data Integration and Record Matching
A foundational component of Firebird is its capacity to consolidate and harmonize disparate municipal records spanning multiple agencies and third-party sources. Data are ingested from eight sources, including AFRD's historical fire incident reports and inspection permits, the City of Atlanta’s parcel and business license data, Atlanta Police Department datasets (crime records and liquor licenses), as well as Google Places API and U.S. Census Bureau demographic variables. Each data source utilizes different identifiers (longitude/latitude coordinates, physical addresses, parcel IDs), with substantial heterogeneity in address formatting, capitalization, abbreviations, and missing values.
A deterministic and probabilistic preprocessing sequence is employed to facilitate record linkage and entity resolution:
- Geocoding operations (using Google’s Geocoding API) standardize and enrich address-based fields.
- Fuzzy string matching (via edit distance algorithms in Python) resolves inconsistent textual representations and supports near-duplicate matching.
- Multi-key joins based on spatial, address, and parcel identifiers ensure that maximal property coverage is achieved, with nearly 100% of targeted properties spatially resolved after cleansing.
This rigorous integration stage permits the subsequent derivation of property-level feature vectors suitable for machine learning risk modeling.
2. Machine Learning for Fire Risk Prediction
Firebird applies a robust machine learning pipeline to estimate the probability of fire occurrence at the parcel level. The process comprises several stages:
- Feature Engineering and Selection: The aggregated dataset initially presents 252 candidate variables per property, encompassing quantitative features (floor size, land area, number of units, property value) and categorical descriptors (zoning, occupancy type, zip code). Manual review and both forward and backward feature selection yield a distilled set of 58 predictive features, expanded into 1,127 binary/dummy variables for model input.
- Modeling Algorithms: Four supervised learning algorithms—Logistic Regression, Gradient Boosting, Support Vector Machines (SVM), and Random Forests—are implemented using the Python scikit-learn library. Each model is optimized using grid search and 10-fold cross-validation.
- Evaluation and Selection: SVMs and Random Forest classifiers exhibit strongest performance, with SVM achieving a true positive rate (TPR) of 71.36% at a 20% false positive rate (FPR). Selection of the optimal operating point is informed by the AFRD’s operational priorities: a high TPR is favored to maximize life safety, accepting moderate FPR as an inherent trade-off.
Risk predictions are post-processed into actionable scores for inspectors by discretizing the probabilistic outputs into a 1–10 scale using the mapping:
where is the probability output by the SVM or Random Forest model. Visual analysis of the resulting score distribution enables categorization into low (1), medium (2–5), and high (6–10) risk tiers.
3. Geospatial Processing and Interactive Visualization
The spatial dimension is central to Firebird’s architecture, both for data integration and for delivery of inspection guidance:
- Geocoding and Spatial Joins: Geocoding underpins address cleansing, facilitates cross-source entity resolution, and ensures near-complete spatial join coverage across datasets with differing standards.
- Interactive Mapping Interface: The system’s visualization layer employs Mapbox and Leaflet as mapping backends, with D3.js handling dynamic overlays and data interaction. The map interface distinguishes fire incidents (red markers), currently inspected properties (green), and newly identified or potentially inspectable properties (blue).
- Operational Overlays: Neighborhood planning units, battalions, and council districts are available as selection layers, supporting both operative and strategic decision-making. Filter controls enable exploration by risk tier, property type, and temporal windows.
This approach enables inspectors and command staff to focus on high-risk geographies and adjust interventions in real time.
4. Inspection Prioritization and Organizational Impact
Firebird directly addresses AFRD's need to transcend historically intuition-based inspection scheduling:
- Discovery of Uninspected Properties: By leveraging occupancy usage typologies from Florida state inspection datasets, Firebird identifies 19,397 uninspected properties in Atlanta. Subsequent filtering by the 100 most frequently inspected property types yields a high-priority list of 6,096 new candidate properties for inspection.
- Risk-Based Scheduling: Given resource constraints, inspection scheduling is informed by the risk scores. Of 8,669 scheduled inspection properties, 5,022 are assigned a machine learning-derived risk score, allowing for a graded, evidence-based inspection cadence.
- Operational Outcomes: Early application led to the identification of 48 code violations among high-risk properties that likely would not have been prioritized under legacy methods. The system’s outputs have informed ongoing conversations on personnel allocation, with exploratory plans to refresh risk scores monthly and dynamically calibrate inspection frequency based on evolving risk assessments.
5. Wider Implications, Scalability, and Recognition
Firebird has established several precedents and broader impacts beyond its immediate operational context:
- Model for Other Cities: The methodological rigor in data integration, risk modeling, and geospatial visualization is cited as a transferable template for municipalities facing similar challenges of siloed data and limited analytic capacity.
- Data Collaboration Needs: The framework’s dependence on cross-agency records underscores the value of standardized data identifiers (such as a unified Building Identification Number) to optimize future urban analytics and predictive efforts.
- National Recognition: The National Fire Protection Association (NFPA) has highlighted Firebird as a best practice in data-driven fire inspection at the NFPA Smart Enforcement Workshop, positioning it as a reference point for North American municipal fire safety innovation.
Firebird’s deployment demonstrates that carefully engineered machine learning systems—supported by comprehensive data integration and visualization—can directly enhance public safety operations and inform policy at both local and broader scales.