Papers
Topics
Authors
Recent
2000 character limit reached

Episcopal Genealogy Dataset: Church Lineages

Updated 2 July 2025
  • Episcopal Genealogy Dataset is a comprehensive repository mapping structured consecrational relationships among over 35,000 Catholic bishops.
  • It employs network science and machine learning techniques to analyze hierarchical succession and doctrinal transmission in a directed acyclic graph.
  • The dataset offers practical insights into ecclesiastical mentorship, community structures, and institutional influence within the Roman Catholic Church.

The Episcopal Genealogy Dataset is a comprehensive, large-scale, temporally-deep representation of consecrational relationships among bishops of the Roman Catholic Church, emphasizing both the structural properties of episcopal lineage and its institutional and ideological significance. The dataset forms the empirical basis for studies on ecclesiastical succession, hierarchical influence, and the quantitative analysis of doctrinal transmission within the global Church network.

1. Definition and Structural Overview

The Episcopal Genealogy Dataset consists of structured records for over 35,000 Catholic bishops, with data primarily drawn from public sources such as Catholic-Hierarchy.org. Each bishop's entry encodes the canonical tripartite consecration: one principal consecrator and two co-consecrators. These directed relationships are modeled as a directed acyclic graph (DAG), with nodes representing individual bishops and edges denoting consecrational acts (i.e., bishop ii consecrating bishop jj as principal or co-consecrator). The network thus encodes both direct mentorship and extended apostolic succession, supporting analysis from the local (dyadic or triadic) to global (network-wide) levels (Baratto et al., 27 Jun 2025).

2. Algorithmic Representations and Analytical Foundations

The dataset is processed and analyzed using methodologies from network science and computational genealogy. The core data structure is a graph database, allowing scalable traversal and extraction of genealogy trees for any focal individual (Anil et al., 2018). The relationship matrix is typically stored as an adjacency matrix AA, with A[i,j]=1A[i,j]=1 indicating that individual ii consecrated individual jj.

Block matrix and adjacency matrix partitioning techniques are employed to efficiently identify local subtrees—descendants, ancestors, and “grandchildren” or “grandparents”—thereby significantly reducing computational complexity from O(n3)O(n^3) to O(n)O(n) per local network extraction. The following pseudocode, as applied to ancestry/genealogy graphs, is adapted for episcopal consecrational lineages:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
def get_local_network(adj_matrix, focal_bishop):
    level1 = []  # bishops consecrated by focal_bishop
    level2 = []  # bishops consecrated by level1
    parents = [] # consecrator of focal_bishop
    grandparents = []
    for i in range(len(adj_matrix)):
        if adj_matrix[focal_bishop][i] == 1:
            level1.append(i)
            for j in range(len(adj_matrix)):
                if adj_matrix[i][j] == 1:
                    level2.append(j)
        if adj_matrix[i][focal_bishop] == 1:
            parents.append(i)
            for j in range(len(adj_matrix)):
                if adj_matrix[j][i] == 1:
                    grandparents.append(j)
    return {'children': level1, 'grandchildren': level2,
            'parent': parents, 'grandparent': grandparents}
(Anil et al., 2018)

Motif detection is central to quantitative analysis. Algorithms systematically enumerate and classify network motifs—including direct principal consecrator ties, shared consecrators, and various chain and cluster patterns—enabling detailed investigations into lineage effects and community structures (Baratto et al., 27 Jun 2025).

3. Data Extraction, Validation, and Standardization

A parallel research thread addresses the extraction of genealogical data from primary documentary sources, notably large-scale digitized parish records (Tarride et al., 2023). Here, a modular machine learning workflow is employed:

  • Page classification is handled as an outlier detection problem, leveraging Isolation Forests for optimal F1 and recall.
  • Text line detection utilizes fully convolutional networks (e.g., Doc-UFCN), with validated segmentation metrics (IoU, mAP) and custom quality formulas such as:

$Q_{\text{line}} = \frac{1}{|H|} \sum_{h \in H} \mathbbm{1}_{ h \in [\alpha \tilde{h}, (1+\alpha)\tilde{h}]}$

where HH is the set of line heights, h~\tilde{h} the median, and α\alpha a tunable parameter.

  • Handwritten Text Recognition (HTR) employs hybrid models (e.g., Kaldi DNN-HMM) and sequence transcription with language modeling.
  • Named Entity Recognition (NER) draws upon libraries such as Flair and Stanza, fine-tuned on annotated genealogical data.
  • Act detection and classification integrates enriched input for neural architectures, using keyword matching to determine act (event) type.

Validation is achieved through expert-designed rules: structural templates ("moulds"), mandatory field enforcement, anomaly and fusion detection, and standardization modules for names and dates. Records not meeting completeness or field consistency thresholds (e.g., missing required relationships or data fields) are flagged or rejected, yielding a linkage-valid dataset with 74% completeness in key act types (Tarride et al., 2023).

4. Visualization and Network Exploration

The structure and dynamics of the genealogy are made accessible through modern visualization tools, most prominently JavaScript libraries such as vis.js, coupled with graph database backends (e.g., Neo4j) (Anil et al., 2018). Visualizations use color coding and interactive queries to highlight generational levels, local clusters, and community structures, enabling both macro and micro analysis—e.g., tracing a bishop’s predecessors or successors, or examining dense intra-group consecrational connections.

Researchers can explore local genealogical networks, identify isolated lines, or analyze the spread of particular lineages (such as those descending from specific historical figures). This infrastructure supports hypothesis-driven analysis of ecclesiastical history and institutional dynamics.

5. Ideological Analysis and Institutional Dynamics

Recent research introduces natural language processing and network regression to investigate correlation between genealogy and doctrinal alignment among bishops and cardinals (Baratto et al., 27 Jun 2025). Public statements and biographical texts are processed using BERTopic for unsupervised topic modeling, followed by semantic polarity assessment via LLMs. For each cardinal and salient doctrinal topic, an ideology score from –2 (very conservative) to +2 (very progressive) is computed.

Statistical analyses, including univariate and multivariate linear regressions and Mantel permutation tests, test the hypothesis that genealogical motifs (specifically, sharing the same principal consecrator) predict ideological proximity. The core formulas include: Y(ij)α=q+βX(ij)+εY_{(ij)}^\alpha = q + \beta X_{(ij)} + \varepsilon where Y(ij)αY_{(ij)}^\alpha is the ideological distance for topic α\alpha between cardinals ii and jj, and X(ij)X_{(ij)} is motif presence or other predictor. Findings show that shared principal consecrators significantly increase the probability of similar doctrinal orientation, with especially clear effects for lineages tracing to specific central figures (e.g., Pope John Paul II), whose “episcopal offspring” systematically align with their mentor’s doctrinal tendencies.

6. Community Detection and Quality Metrics

Community structures are detected via block matrix approaches and thresholding algorithms, where intra-community consecration ratios are computed as: Community Ratio=Intra-community consecrationsTotal consecrations received or performed\text{Community Ratio} = \frac{\text{Intra-community consecrations}}{\text{Total consecrations received or performed}} Nodes exceeding predefined thresholds are flagged as members of tightly bound communities or factions. These methods expose regions of dense consecrational activity or ideological coherence, illuminating how lineages may influence theological development and institutional coherence (Anil et al., 2018).

For metric de-biasing, the lineage-independent model adjusts significance and influence scores to differentiate between impact within one's genealogical cluster and broader church influence: NGC=YX\mathrm{NGC} = Y - X where YY is a total metric (e.g., all consecrations), and XX counts those confined to the individual’s lineage, ensuring recognition of wide-ranging influence and mitigating internal inflation (Anil et al., 2018).

7. Applications, Adaptation, and Limitations

The Episcopal Genealogy Dataset provides a robust foundation for various academic endeavors: quantifying the historical impact of individual bishops, uncovering the mechanisms of doctrinal transmission, identifying institutional communities or factions, and supporting socio-historical research on the transmission of authority.

Adaptation of extraction and validation workflows—for example, from large-scale Quebec parish records to Episcopal or Protestant contexts—necessitates cultural and linguistic retraining, redefinition of template fields, and customized canonical rules (Tarride et al., 2023). Specific limitations include data scarcity for annotated non-Catholic records, structural variation in acts, and the need for rigorous manual review in cases of ambiguous or inconsistent source data.

The dataset’s modularity and methodological rigor enable its extension to other domains where hierarchical lineage or succession is consequential, and its analytical frameworks provide a generalizable model for the empirical study of institutional transmission, mentorship, and community formation.

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Episcopal Genealogy Dataset.