Davies-Bouldin Score (DBS)

Updated 30 June 2025

Davies-Bouldin Score (DBS) is defined as the average of the maximum ratios of within-cluster scatter to inter-cluster separation, measuring cluster cohesion and distinctiveness.
It is widely used for selecting the optimal number of clusters and comparing algorithms in unsupervised learning without requiring ground truth labels.
Recent studies highlight its sensitivity to noisy features and non-convex cluster shapes, prompting the development of robust variants and hybrid indices.

The Davies-Bouldin Score (DBS) is a widely adopted internal cluster validity index for evaluating the quality of clustering solutions in unsupervised learning. It quantifies the average ratio of within-cluster dispersion to between-cluster separation, providing a compact numerical summary of how well clusters are both cohesive and distinct. Designed to be minimized, lower DBS values correspond to solutions with tighter, better-separated clusters.

1. Definition and Mathematical Formulation

The Davies-Bouldin Score is formally defined for a partition of a dataset into $k$ clusters as:

$DBS = \frac{1}{k} \sum_{i=1}^k \max_{j \neq i} \left( \frac{S_i + S_j}{M_{ij}} \right)$

where:

$S_i$ denotes the within-cluster scatter for cluster $i$ , typically computed as the mean distance of members of cluster $i$ to its centroid $\mu_i$ : $S_i = \frac{1}{|C_i|}\sum_{x \in C_i} \|x - \mu_i\|$ .
$M_{ij}$ is the distance between the centroids $\mu_i$ and $\mu_j$ of clusters $i$ and $j$ .

The index operates by, for each cluster, identifying the cluster it is "least well-separated" from (i.e., the maximum of the within-to-between ratio), then averaging these worst-case ratios across clusters. Lower DBS values indicate more desirable clustering solutions: intra-cluster distances are small, and clusters are mutually well-separated.

2. Conceptual Rationale and Role in Clustering

The principal design of the DBS is to balance two competing objectives intrinsic to clustering:

Compactness: Each cluster should consist of points that are close to one another (low within-cluster scatter).
Separation: Clusters should be well differentiated from each other (large separation between centroids).

By computing the ratio $(S_i + S_j)/M_{ij}$ for all pairs $(i, j)$ and taking, for each $i$ , the worst (largest) such ratio, the DBS penalizes clusterings where any cluster is overly dispersed or too close to another. This design choice aligns conceptually with ratio-type validity indices (e.g., Dunn's and Silhouette indices) but is realized using centroids and average pairwise distances.

3. Application in Clustering Algorithm Evaluation

DBS is typically used in one of two principal ways:

Selecting the Number of Clusters ( $k$ ): By computing DBS for a range of candidate $k$ , the value of $k$ minimizing the DBS is often interpreted as the "optimal" cluster count—a procedure widely seen in both application and methodology papers.
Algorithm Comparison: DBS values can be used to compare the quality of clustering solutions produced by different algorithms or different parameterizations, especially when no ground truth is available.

DBS does not require labeled data or prior knowledge, making it effective for unsupervised evaluation. It is frequently paired with other internal indices such as the Silhouette coefficient and Calinski-Harabasz index for comprehensive assessment.

4. Benchmarking, Sensitivity, and Limitations

Empirical studies have evaluated the sensitivity and robustness of DBS compared to alternative validity indices:

Feature Sensitivity: DBS is sensitive to the inclusion of irrelevant or noisy features. When irrelevant variables are appended to well-defined data, DBS increases rapidly, indicating a degradation in cluster quality even if extrinsic metrics (like Adjusted Rand Index) remain stable. This sensitivity makes DBS suitable for feature selection: removing features that cause an increase in DBS typically improves clustering robustness (McCrory et al., 19 Feb 2024, Amorim et al., 1 Mar 2025).
Robustness to Cluster Shape and Density: DBS, as a centroid-based measure, can perform suboptimally for clusters that are non-convex, of differing densities, or poorly represented by centroids. Studies comparing DBS with density-based and local-neighbourhood-based indices show that DBS is less accurate in complex cluster topologies, often failing to align with expert-labelled partitions or ground truth (Liu, 2022, Gagolewski et al., 2022).
Noise Attenuation Strategies: Recent work proposes feature importance rescaling methods that weigh features by their informativeness (dispersion within clusters), substantially improving the reliability of DBS under conditions with many irrelevant variables (Amorim et al., 1 Mar 2025).

Property	Davies-Bouldin Score (DBS)	Notes
Compactness measured by	Mean distance to centroid (per cluster)	Sensitive to outliers
Separation measured by	Inter-centroid distances	Assumes centroid meaningfulness
Ground truth required?	No	Internal metric
Typical use case	Model/cluster selection, method comparison	Label-agnostic feature selection
Limitations	Less accurate for non-spherical clusters, sensitive to irrelevant features	See (Liu, 2022, Gagolewski et al., 2022)

5. Extensions and Modern Variants

To address the limitations inherent in centroid-based indices, several approaches have emerged:

Incremental and Online DBS: For streaming and large-scale settings, incremental formulations of DBS enable efficient, online monitoring of cluster validity using summary statistics, and can be augmented with forgetting factors to make the index time-sensitive (Moshtaghi et al., 2018).
Density and Locality-Aware Indices: Modern indices leveraging density estimation or local neighbour graphs (e.g., DuNN, ambiguous/similarity indices) generally outperform DBS in complex or non-convex clustering scenarios, providing better alignment with true structure in real-world and benchmark data (Liu, 2022, Gagolewski et al., 2022).
Integration with Bayesian Frameworks: Bayesian cluster validity indices (e.g., BCVI) incorporate user expertise and prior beliefs, allowing explicit probabilistic ranking and secondary solution identification, in contrast to the single-solution focus of standard DBS (Wiroonsri et al., 3 Feb 2024).

6. Empirical Use and Interpretability

DBS is broadly used as a default or first-step index for unsupervised clustering quality assessment due to its simplicity and ease of computation. In practical applications, DBS:

Serves as a principal heuristic for elbow-method analysis in model selection workflows (Barron et al., 26 Jul 2024).
Objectively guides cluster and parameter optimization in application domains such as biomedical signal monitoring and NLP (Derouiche et al., 4 Sep 2024, Allen et al., 11 Apr 2025).
Provides interpretable, label-agnostic evidence of the efficacy of anomaly/outlier removal procedures (Shorewala et al., 30 May 2025).

However, benchmark comparisons consistently show that while DBS is useful for identifying poor clustering (high score), low DBS scores do not always coincide with semantically meaningful groupings or with the structure found by density- or neighborhood-aware methods. DBS is most reliable when clusters are compact, well-separated, and centroid-representable; in other contexts, multi-criterion or application-specific indices may be necessary (Gagolewski et al., 2022, Liu, 2022).

7. Summary and Contemporary Perspective

The Davies-Bouldin Score remains a standard metric for quantitative internal validation in clustering workflows. Its mathematical construction—focusing on the maximally poor pairwise cluster separation-to-compactness ratios—captures a balance between cohesion and separation, but inherits the limitations of centroid-based methods. Advances in the research literature emphasize the role of complementing DBS with density-, locality-, or knowledge-based indices, especially in high-dimensional, noisy, or topologically complex datasets. As methodological sophistication in unsupervised learning increases, DBS occupies a foundational role but is rarely sufficient as the sole criterion for cluster validation in contemporary research practice.