Papers
Topics
Authors
Recent
Search
2000 character limit reached

Connectivity Score (CS) Overview

Updated 26 May 2026
  • Connectivity Score (CS) is a quantitative metric that measures the reversal of disease-induced gene expression profiles when perturbed by drugs.
  • It employs a rank-based Kolmogorov–Smirnov enrichment analysis to differentiate up- and down-regulated gene sets in disease contexts.
  • Normalization of raw scores to the [-1, +1] range facilitates robust cross-instance comparisons, enhancing its use in computational drug repurposing.

The Connectivity Score (CS) is a quantitative metric introduced by the original Connectivity Map 1.0 study to measure the reversal relationship between drug- and disease-induced gene expression profiles. Designed to operationalize the principle that a therapeutic agent should invert the molecular ‘signature’ of a disease, CS has served as a foundational methodology in computational drug repurposing, notably enabling the benchmarking and comparison of drugs’ potential efficacy by assessing the strength and directionality of their perturbational impact on disease-associated gene expression (Samart et al., 2020).

1. Preliminaries: Notation and Preprocessing

CS is defined within a gene expression perturbation framework. The following notation, as unified by Samart et al., underpins its computation:

  • R={g1,,gNR}R = \{ g_1, \ldots, g_{N_R} \}: set of all genes measured under a drug perturbation, each with differential expression vdrg(gi)Rv_\text{drg}(g_i)\in\mathbb{R}.
  • S={g1,...,gNS}S = \{ g_1, ..., g_{N_S} \}: set of all genes measured for the disease perturbation, with differential expression vdis(gi)v_\text{dis}(g_i).
  • S+SS^{+} \subseteq S and SSS^{-} \subseteq S: sets of most significantly up- and down-regulated disease genes, respectively, defined via thresholds (e.g., by vdis|v_\text{dis}|, pp-value, or top-kk selection).
  • RR is typically the full ranked list of drug genes for CS; no restriction to “extreme” genes is used.
  • vdrg(gi)Rv_\text{drg}(g_i)\in\mathbb{R}0: the rank of gene vdrg(gi)Rv_\text{drg}(g_i)\in\mathbb{R}1 in vdrg(gi)Rv_\text{drg}(g_i)\in\mathbb{R}2, ordered by decreasing vdrg(gi)Rv_\text{drg}(g_i)\in\mathbb{R}3 (most positive to most negative).

Preprocessing steps: a) Compute vdrg(gi)Rv_\text{drg}(g_i)\in\mathbb{R}4 and vdrg(gi)Rv_\text{drg}(g_i)\in\mathbb{R}5 using standard tools (e.g., limma, DESeq2). b) Define vdrg(gi)Rv_\text{drg}(g_i)\in\mathbb{R}6 and vdrg(gi)Rv_\text{drg}(g_i)\in\mathbb{R}7. c) Rank all drug genes vdrg(gi)Rv_\text{drg}(g_i)\in\mathbb{R}8 by vdrg(gi)Rv_\text{drg}(g_i)\in\mathbb{R}9. d) For each disease gene set S={g1,...,gNS}S = \{ g_1, ..., g_{N_S} \}0, compute enrichment scores as below.

2. Enrichment Score Computation

The CS relies on the (one-sample) Kolmogorov–Smirnov (KS) statistic adapted as an enrichment score (ES):

For a gene set S={g1,...,gNS}S = \{ g_1, ..., g_{N_S} \}1 with S={g1,...,gNS}S = \{ g_1, ..., g_{N_S} \}2 in a ranked drug gene list S={g1,...,gNS}S = \{ g_1, ..., g_{N_S} \}3:

  • S={g1,...,gNS}S = \{ g_1, ..., g_{N_S} \}4
  • S={g1,...,gNS}S = \{ g_1, ..., g_{N_S} \}5
  • S={g1,...,gNS}S = \{ g_1, ..., g_{N_S} \}6

The enrichment score is defined as:

S={g1,...,gNS}S = \{ g_1, ..., g_{N_S} \}7

This produces S={g1,...,gNS}S = \{ g_1, ..., g_{N_S} \}8, where positive values indicate enrichment at the top of S={g1,...,gNS}S = \{ g_1, ..., g_{N_S} \}9, negative at the bottom.

3. Connectivity Score Derivation

Given vdis(gi)v_\text{dis}(g_i)0 and vdis(gi)v_\text{dis}(g_i)1, the raw connectivity score is defined as:

vdis(gi)v_\text{dis}(g_i)2

For each drug, typically multiple instances (e.g., varying cell lines, doses, timepoints) will yield a set vdis(gi)v_\text{dis}(g_i)3. These raw scores are normalized per drug:

vdis(gi)v_\text{dis}(g_i)4

Here, vdis(gi)v_\text{dis}(g_i)5 and vdis(gi)v_\text{dis}(g_i)6 are taken over all instances vdis(gi)v_\text{dis}(g_i)7 for that drug. This normalization yields vdis(gi)v_\text{dis}(g_i)8 for each instance, with most negative vdis(gi)v_\text{dis}(g_i)9 representing strongest reversal.

4. Relationship to Alternative Connectivity Metrics

CS is part of a broader ecosystem of metrics for disease-drug connectivity, with distinct properties:

Metric Definition / Key Differences Notes
CS Uses unweighted ES, zeroes non-opposite-sign cases, normalizes per drug Operates strictly on ranking
RGES S+SS^{+} \subseteq S0; does not zero same-sign ES or require reversal Sign loses direct reversal meaning
NCS/WCS Uses weighted (GSEA) KS; normalizes across background More sensitive to magnitude
τ Signed percentile rank of NCS vs. all drugs Database-dependent

CS requires reversal (opposite signs for S+SS^{+} \subseteq S1, S+SS^{+} \subseteq S2); RGES does not, impacting direct interpretability for therapeutic inversion. NCS and τ utilize magnitude weighting and broader normalization, and pairwise metrics (e.g., CSS, Cosine, EWCos) operate directly on both drug and disease expression vectors’ magnitudes (Samart et al., 2020).

5. Algorithmic Workflow for CS Computation

A high-level pseudocode specification for computing CS (for a single drug instance) is as follows:

S+SS^{+} \subseteq S7

6. Properties, Use Cases, and Limitations

Advantages:

  • Directly encodes the conceptual reversal sought in computational drug repurposing.
  • Rank-based; robust to outlier expression and cross-platform differences.
  • Normalized per drug to the S+SS^{+} \subseteq S3 interval, with negative values indicating reversal.

Limitations:

  • Utilizes only drug gene ranking (not magnitude), discarding instance data where S+SS^{+} \subseteq S4 and S+SS^{+} \subseteq S5 share sign, thus ignoring partial or one-sided enrichment.
  • Does not use disease signature magnitudes; loss of granularity in differential expression.

Recommended Contexts:

  • High-quality ranked drug references (e.g., original CMap microarrays).
  • Scenarios requiring cross-platform or cross-lab comparability via rank-based metrics.
  • Reproducibility of CMap 1.0 studies and benchmarks.

Alternatives:

  • RGES: Retains partial enrichments, signs do not require reversal—negatively correlated with S+SS^{+} \subseteq S6.
  • NCS and τ: Incorporate magnitude weighting and normalization across cell lines and backgrounds.
  • Pairwise magnitude-based metrics (CSS, Cosine, etc.): Typically outperform ES-derived metrics in large-scale benchmarks.

7. Context and Comparative Significance

CS, as originally codified in Connectivity Map 1.0 and rigorously reconciled by Samart et al., establishes a standardized procedure for quantifying drug-disease signature reversal using only gene ranks. Its continued relevance includes use as a benchmark for comparing the behavior of new connectivity metrics and as a baseline for evaluating advances in magnitude- and direction-aware similarity approaches in drug repurposing. A plausible implication is that CS is most informative where reversal is expected to be strong and differential expression magnitudes are less reliable or comparable across datasets (Samart et al., 2020).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Connection Score (CS).