Computational Biology

Type 2 Diabetes: A Systems Biology Approach

Large-scale computational analysis of T2D molecular mechanisms using protein interaction networks, gene expression meta-analysis, drug-target landscapes, and molecular dynamics simulations.

660
CPU Cores (BioHPC)
13.7M
Protein Interactions
3,552
Drug Compounds
256
Islet Samples
100ns
MD Simulation

Why This Research Matters

The Scale of the Problem

Type 2 Diabetes affects 537 million people worldwide (IDF Diabetes Atlas, 2021) and costs $966 billion annually. Despite decades of research, only ~15 drug classes exist, and most treat symptoms (hyperglycemia) rather than root causes (β-cell failure, insulin resistance).

Genome-wide association studies (GWAS) have identified 400+ risk loci, but we still don't understand how these genes interact as a system. Computational approaches — network medicine, large-scale gene expression analysis, and molecular dynamics — can accelerate the discovery of new therapeutic targets by orders of magnitude compared to wet-lab experiments alone.

Analysis 1: Protein Interaction Network

Scientific Rationale

T2D is a systems disease — no single gene causes it. By constructing a T2D-specific subnetwork from STRING's 13.7 million human protein interactions, we can identify hub proteins (potential drug targets) and bridge proteins connecting multiple disease pathways. This is the same approach that led to the discovery of baricitinib for COVID-19 (Gysi et al., PNAS 2021).

16,201
Proteins in high-confidence network (score ≥700)
1,053
Proteins in T2D-specific subnetwork
Z = 22.3
Disease module coherence (p < 10⁻⁴⁰)
80
Bridge proteins connecting ≥3 T2D pathways

Hub Proteins in T2D Network

GeneProteinDegreeRole in T2D
IL6Interleukin-6299Master inflammatory cytokine; drives insulin resistance via JAK-STAT pathway
TNFTumor necrosis factor225Pro-inflammatory; impairs insulin signaling through IRS-1 serine phosphorylation
DPP4Dipeptidyl peptidase-4217Degrades GLP-1/GIP incretins; target of sitagliptin (Januvia)
IRS1Insulin receptor substrate 1126Key signal transducer downstream of insulin receptor
INSRInsulin receptor79Receptor tyrosine kinase; initiates insulin signaling cascade
HNF4AHepatocyte nuclear factor 4α75Transcription factor; MODY1 gene; regulates glucose metabolism genes
PPARGPPARγ68Nuclear receptor; target of thiazolidinediones (pioglitazone)
INSInsulin67The hormone itself; deficient secretion is hallmark of T2D
GCKGlucokinase59Glucose sensor in β-cells; MODY2 gene; drug target for GK activators
GLP1RGLP-1 receptor38Target of semaglutide (Ozempic), liraglutide, tirzepatide

Key Finding: Disease Module Coherence

T2D genes are 52.6× more interconnected than randomly selected gene sets of the same size (Z-score = 22.28, p < 10⁻⁴⁰). This extraordinary coherence validates the "disease module" hypothesis — T2D genes cluster together in the interactome, and proteins located near this module are prime candidates for therapeutic intervention.

T2D Systems Biology Analysis
Figure 1. T2D Systems Biology Overview. (A) Hub proteins ranked by network degree. (B) Disease module density vs. random baseline. (C) Drug target potency landscape. (D) Gene expression datasets used for cross-validation.

Methods

Data: STRING v12.0 human protein-protein interactions (13,715,404 edges). High-confidence subset (combined score ≥700): 473,860 edges, 16,201 nodes.

Seed genes: 26 established T2D genes from GWAS and literature (insulin signaling, β-cell function, GLP-1 pathway, metabolic sensors, inflammation).

Subnetwork: 1-hop neighborhood of seed genes. Module density compared against 100 random gene sets of equal size. Z-score computed as (observed - mean_random) / std_random.

Analysis 2: Pancreatic Islet Gene Expression

Scientific Rationale

Single gene expression studies are noisy and underpowered. By integrating 4 independent GEO datasets (256 total human islet samples), we can identify genes consistently dysregulated in T2D islets across multiple cohorts. Genes that replicate are far more likely to be causal — and therefore better drug targets.

DatasetProbesSamplesReference
GSE2572422,28313Dominguez et al. — T2D vs normal islets
GSE3864218,80863Taneera et al., Mol Endocrinol 2012
GSE4176229,09677Fadista et al., PNAS 2014
GSE7689429,530103Segerstolpe et al., Cell Metab 2016
256
Human islet samples across 4 cohorts
379
Probes replicated across GSE41762 ∩ GSE76894

Analysis 3: Drug-Target Binding Landscape

Scientific Rationale

Multi-target drugs are revolutionizing T2D treatment. Tirzepatide (Mounjaro), a dual GIP/GLP-1 receptor agonist, showed 22.5% body weight reduction in SURMOUNT-1 (Jastreboff et al., NEJM 2022). By analyzing 3,552 compounds across 7 targets in ChEMBL, we identify multi-target molecules as candidates for next-generation polypharmacology.

TargetChEMBL IDCompoundsMedian IC₅₀% <100nMApproved Drug
Insulin ReceptorCHEMBL198151610,000 nM4.8%Insulin
GLP-1 ReceptorCHEMBL4093401286 nM35.4%Semaglutide (Ozempic)
SGLT2CHEMBL351063378 nM53.5%Empagliflozin (Jardiance)
DPP-4CHEMBL284711475 nM33.0%Sitagliptin (Januvia)
PPARγCHEMBL235452180 nM41.4%Pioglitazone
AMPKCHEMBL284275752.5 nM53.1%Metformin (indirect)
GlucokinaseCHEMBL39831344,500 nM18.6%Dorzagliatin (China, 2022)
47
Multi-target compounds (hitting ≥2 T2D targets)
5
Compounds hitting 3 targets simultaneously

Notable Multi-Target Compounds

CompoundTargets (potency)Significance
CHEMBL535INSR (500nM) + AMPK (50μM) + GCK (63nM)Triple-target; potent GCK activator
CHEMBL509032INSR (20nM) + GCK (50nM)Dual potent; both sub-100nM
CHEMBL388978INSR (110nM) + GCK (61nM)Dual potent; balanced profile

Analysis 4: Insulin Molecular Dynamics

Scientific Rationale

Insulin analogs like lispro (Humalog) differ from regular insulin by just one amino acid swap (Pro-Lys at B28-29), yet act 4× faster. Understanding the structural dynamics — how quickly insulin hexamers dissociate into active monomers — is essential for designing next-generation analogs. Molecular dynamics simulation at 100ns timescale captures the conformational transitions (B-chain C-terminal unfolding, hinge motions) that determine pharmacokinetic properties.

Simulation Protocol (GROMACS 2022.1, Cornell BioHPC)

System: Human insulin monomer (PDB: 1MSO, chains A+B, 839 protein atoms) solvated in 4,024 TIP3P water molecules + 2 Na⁺ counterions = 12,911 total atoms.

Force field: AMBER99SB-ILDN (validated for peptide/protein dynamics).

Protocol:

Compute: 48 CPU cores on cbsuecco12.biohpc.cornell.edu (Intel Xeon Gold). Estimated wall time: ~12 hours for 100 ns.

GROMACS MD Analysis
Figure 2. Insulin molecular dynamics simulation results. Energy minimization convergence, NVT temperature equilibration, NPT pressure equilibration, and system summary.

Insulin Analogs: Structure-Function Comparison

AnalogMutationOnsetDurationMechanism
Regular insulinWild type30-60 min6-8 hrBaseline hexamer stability
Lispro (Humalog)B28P→K, B29K→P15 min3-5 hrSwap prevents hexamer formation
Aspart (NovoRapid)B28P→D15 min3-5 hrCharge repulsion destabilizes hexamer
Glargine (Lantus)A21N→G, +B31R,B32R2-4 hr20-24 hrShifts pI to 6.7; precipitates at pH 7.4
Degludec (Tresiba)desB30T, B29K-C16 acyl+glu1 hr42 hrMulti-hexamer chain via fatty acid
Insulin PK Profiles
Figure 3. Pharmacokinetic profiles of insulin analogs. Rapid-acting analogs (lispro, aspart) have faster onset due to destabilized hexamers. Long-acting analogs (glargine, degludec) achieve extended duration through different molecular mechanisms (pH-dependent precipitation, albumin binding, multi-hexamer chains).

Structural Data: 22 Crystal/Cryo-EM Structures

PDBDescriptionAtomsChainsMethod
6PXVFull-length insulin receptor + 4 insulins bound14,7746Cryo-EM
4CFEFull-length AMPK with activator13,9796X-ray
4ZXBInsulin receptor ectodomain12,6015X-ray
6X18GLP-1R with semaglutide (Ozempic)10,2836Cryo-EM
7KI0GLP-1R with semaglutide + Gs protein9,1376Cryo-EM
6HN5Insulin receptor with insulin bound7,7154X-ray
5VEWGLP-1R active state with PF-063722226,5912X-ray
7VSISGLT2 with empagliflozin (Jardiance)4,7062Cryo-EM
1MSOHuman insulin at 1.0 Å resolution1,7124X-ray

All structures downloaded from RCSB Protein Data Bank (rcsb.org). Additional structures analyzed: 3I40 (insulin 0.92Å), 2MVC (insulin dimer NMR), 5EUI (DPP-4+sitagliptin), 2PRG (PPARγ+rosiglitazone), 4CFH (AMPK activated), 6CE7 (insulin degrading enzyme), 5KQV (IR kinase domain), 3W11 (GLP-1+ECD), 5NX2 (GLP1R-Gs complex), 1GCN (glucagon), 5YQZ (glucagon receptor), 6SOF (IR ectodomain apo).

Data Sources & Reproducibility

All data is publicly available:

Compute platform:

Cornell BioHPC cluster: 12 nodes, 660 CPU cores, 3.5 TB RAM, 2.7 PB Lustre storage. GROMACS 2022.1, Python 3.12.7 (pandas, numpy, scipy, matplotlib, BioPython).

References

  1. IDF Diabetes Atlas, 10th edition (2021). International Diabetes Federation.
  2. Gysi DM, et al. Network medicine framework for identifying drug-repurposing opportunities. PNAS 118(19), e2025581118 (2021).
  3. Donath MY, Shoelson SE. Type 2 diabetes as an inflammatory disease. Nat Rev Immunol 11, 98-107 (2011).
  4. Hotamisligil GS. Inflammation and metabolic disorders. Nature 444, 860-867 (2006).
  5. Jastreboff AM, et al. Tirzepatide once weekly for obesity. NEJM 387, 205-216 (2022).
  6. Mathieu C, et al. Insulin analogues in type 1 diabetes mellitus. Nat Rev Endocrinol 13, 385-399 (2017).
  7. Taneera J, et al. A systems genetics approach identifies genes and pathways for type 2 diabetes. Mol Endocrinol 26, 1203-1212 (2012).
  8. Fadista J, et al. Global genomic and transcriptomic analysis of human pancreatic islets. PNAS 111, 13924-13929 (2014).
  9. Segerstolpe Å, et al. Single-cell transcriptome profiling of human pancreatic islets. Cell Metab 24, 593-607 (2016).
  10. Szklarczyk D, et al. The STRING database in 2023. Nucleic Acids Res 51, D483-D489 (2023).
  11. Mendez D, et al. ChEMBL: towards direct deposition of bioassay data. Nucleic Acids Res 47, D930-D940 (2019).