Connectivity Score (CS) Overview
- Connectivity Score (CS) is a quantitative metric that measures the reversal of disease-induced gene expression profiles when perturbed by drugs.
- It employs a rank-based Kolmogorov–Smirnov enrichment analysis to differentiate up- and down-regulated gene sets in disease contexts.
- Normalization of raw scores to the [-1, +1] range facilitates robust cross-instance comparisons, enhancing its use in computational drug repurposing.
The Connectivity Score (CS) is a quantitative metric introduced by the original Connectivity Map 1.0 study to measure the reversal relationship between drug- and disease-induced gene expression profiles. Designed to operationalize the principle that a therapeutic agent should invert the molecular ‘signature’ of a disease, CS has served as a foundational methodology in computational drug repurposing, notably enabling the benchmarking and comparison of drugs’ potential efficacy by assessing the strength and directionality of their perturbational impact on disease-associated gene expression (Samart et al., 2020).
1. Preliminaries: Notation and Preprocessing
CS is defined within a gene expression perturbation framework. The following notation, as unified by Samart et al., underpins its computation:
- : set of all genes measured under a drug perturbation, each with differential expression .
- : set of all genes measured for the disease perturbation, with differential expression .
- and : sets of most significantly up- and down-regulated disease genes, respectively, defined via thresholds (e.g., by , -value, or top- selection).
- is typically the full ranked list of drug genes for CS; no restriction to “extreme” genes is used.
- 0: the rank of gene 1 in 2, ordered by decreasing 3 (most positive to most negative).
Preprocessing steps: a) Compute 4 and 5 using standard tools (e.g., limma, DESeq2). b) Define 6 and 7. c) Rank all drug genes 8 by 9. d) For each disease gene set 0, compute enrichment scores as below.
2. Enrichment Score Computation
The CS relies on the (one-sample) Kolmogorov–Smirnov (KS) statistic adapted as an enrichment score (ES):
For a gene set 1 with 2 in a ranked drug gene list 3:
- 4
- 5
- 6
The enrichment score is defined as:
7
This produces 8, where positive values indicate enrichment at the top of 9, negative at the bottom.
3. Connectivity Score Derivation
Given 0 and 1, the raw connectivity score is defined as:
2
For each drug, typically multiple instances (e.g., varying cell lines, doses, timepoints) will yield a set 3. These raw scores are normalized per drug:
4
Here, 5 and 6 are taken over all instances 7 for that drug. This normalization yields 8 for each instance, with most negative 9 representing strongest reversal.
4. Relationship to Alternative Connectivity Metrics
CS is part of a broader ecosystem of metrics for disease-drug connectivity, with distinct properties:
| Metric | Definition / Key Differences | Notes |
|---|---|---|
| CS | Uses unweighted ES, zeroes non-opposite-sign cases, normalizes per drug | Operates strictly on ranking |
| RGES | 0; does not zero same-sign ES or require reversal | Sign loses direct reversal meaning |
| NCS/WCS | Uses weighted (GSEA) KS; normalizes across background | More sensitive to magnitude |
| τ | Signed percentile rank of NCS vs. all drugs | Database-dependent |
CS requires reversal (opposite signs for 1, 2); RGES does not, impacting direct interpretability for therapeutic inversion. NCS and τ utilize magnitude weighting and broader normalization, and pairwise metrics (e.g., CSS, Cosine, EWCos) operate directly on both drug and disease expression vectors’ magnitudes (Samart et al., 2020).
5. Algorithmic Workflow for CS Computation
A high-level pseudocode specification for computing CS (for a single drug instance) is as follows:
7
6. Properties, Use Cases, and Limitations
Advantages:
- Directly encodes the conceptual reversal sought in computational drug repurposing.
- Rank-based; robust to outlier expression and cross-platform differences.
- Normalized per drug to the 3 interval, with negative values indicating reversal.
Limitations:
- Utilizes only drug gene ranking (not magnitude), discarding instance data where 4 and 5 share sign, thus ignoring partial or one-sided enrichment.
- Does not use disease signature magnitudes; loss of granularity in differential expression.
Recommended Contexts:
- High-quality ranked drug references (e.g., original CMap microarrays).
- Scenarios requiring cross-platform or cross-lab comparability via rank-based metrics.
- Reproducibility of CMap 1.0 studies and benchmarks.
Alternatives:
- RGES: Retains partial enrichments, signs do not require reversal—negatively correlated with 6.
- NCS and τ: Incorporate magnitude weighting and normalization across cell lines and backgrounds.
- Pairwise magnitude-based metrics (CSS, Cosine, etc.): Typically outperform ES-derived metrics in large-scale benchmarks.
7. Context and Comparative Significance
CS, as originally codified in Connectivity Map 1.0 and rigorously reconciled by Samart et al., establishes a standardized procedure for quantifying drug-disease signature reversal using only gene ranks. Its continued relevance includes use as a benchmark for comparing the behavior of new connectivity metrics and as a baseline for evaluating advances in magnitude- and direction-aware similarity approaches in drug repurposing. A plausible implication is that CS is most informative where reversal is expected to be strong and differential expression magnitudes are less reliable or comparable across datasets (Samart et al., 2020).