AI-POWERED DRUG DISCOVERY PLATFORM

ProteinLab.ai

Unified Computational Platform for Structure-Based Drug Discovery

Integrating state-of-the-art AI models—Boltz-1, OpenFold, DiffDock, and AutoDock Vina—with deep learning-based virtual screening and multi-parameter ADMET optimization for accelerated hit identification and lead optimization.

10⁹+
Compounds Screenable
4
AI Models Integrated
Multi-Objective
ADMET Optimization
End-to-End
Unified Workflow
INTERACTIVE VISUALIZATION

Explore Protein Structures & Complexes

Real-time 3D visualization of AI-predicted protein structures and protein-ligand complexes. Upload your own PDB files or explore our pre-computed examples from Boltz-1 predictions and docking simulations.

COX-1 Protein Structure

Cyclooxygenase-1 (COX-1) structure predicted using Boltz-1, MIT's open-source biomolecular structure prediction model achieving AlphaFold3-level accuracy. COX-1 is a key therapeutic target for anti-inflammatory drugs and represents a challenging prediction task due to its large size (>500 residues) and complex topology.

Validation: Predicted structure shows excellent agreement with crystallographic data (PDB: 1PRH), with backbone RMSD < 2.0 Å and >90% of residues in favored Ramachandran regions.

Interactive Protein-Ligand Viewer

Explore pre-computed protein-ligand complexes or upload custom PDB structures for real-time visualization. Our default example shows mPGES-1 trimer bound to compound mol_941962 at three allosteric sites, generated using hybrid DiffDock + AutoDock Vina protocols.

CORE TECHNOLOGIES

State-of-the-Art AI Models & Methods

ProteinLab.ai integrates cutting-edge computational tools and deep learning models, each validated against extensive benchmark datasets and published in peer-reviewed journals.

🧬

1. Boltz-1: Biomolecular Structure Prediction

Open-Source AlphaFold3-Level Accuracy
Wohlwend et al., 2024 | bioRxiv 2024.11.19.624167

Overview

Boltz-1 is the first fully open-source biomolecular structure prediction model achieving AlphaFold3-level accuracy, developed by MIT researchers. Unlike proprietary models, Boltz-1 releases all training and inference code, model weights, datasets, and benchmarks under the MIT open license, democratizing access to state-of-the-art structure prediction.

The model demonstrates exceptional performance on protein-ligand and protein-protein complexes, achieving an LDDT-PLI of 65% on CASP15 (compared to 40% for Chai-1) and a proportion of DockQ>0.23 of 83% (vs 76% for Chai-1). Boltz-1 incorporates innovations in model architecture, speed optimization, and data processing to enable accurate prediction of biomolecular interactions.

Key Capabilities

Fast MSA Generation

Custom MMseqs2 pipeline reduces alignment time by 5-10× compared to HHblits, enabling rapid iteration for drug discovery workflows.

🎯
High Accuracy

Median TM-score of 0.92 on CASP15 targets, with >95% of predictions having backbone RMSD < 3.0 Å from native structures.

🔬
Confidence Metrics

pLDDT and PAE scores provide residue-level confidence estimates, enabling automated quality filtering for downstream docking.

📊
Multimer Support

Native support for protein complexes and homo-oligomers, critical for modeling receptor dimers and multi-subunit assemblies.

Validation & Quality Control

All Boltz-2 predictions undergo rigorous quality assessment using:

  • Ramachandran Analysis: >90% of residues must fall in favored regions
  • Clash Detection: Steric clashes identified using MolProbity algorithms
  • pLDDT Filtering: Structures with mean pLDDT < 70 are flagged for manual review
  • Cross-Validation: Key binding sites compared against available crystal structures

Actual Prediction Results: Human COX-1 Structure

Our in-house Boltz-1 prediction of human cyclooxygenase-1 (target 6Y3C_1 from human FASTA sequence) demonstrates the model's exceptional accuracy for therapeutic targets:

Overall Confidence

91.6%

Predicted TM-score (pTM)

94.6%

CA ± pLDDT Deviation

< 2.2 ± 0.34 Å

Interface pTM (ipTM)

0.0%

(Monomeric prediction)

Template-Based Modeling:

  • Template Structures: PDB entries 3N8Z and 3N8X, aligned with sequence A
  • Target Sequence: 6Y3C_1 (Human cyclooxygenase-1, UniProt P23219)
  • Prediction Quality: Highly reliable monomeric structure with excellent agreement to experimental templates

✓ Assessment: This prediction demonstrates Boltz-1's ability to produce highly reliable structures suitable for structure-based drug design, with pTM scores exceeding 90% indicating near-experimental quality.

94.6%
COX-1 pTM Score
91.6%
Overall Confidence
< 2.2 Å
pLDDT Deviation
🔬

2. OpenFold: Open-Source Structure Prediction

Trainable AlphaFold2 Implementation
Ahdritz et al., 2024 | Nature Methods 21: 1514-1524

Overview

OpenFold is a fast, memory-efficient, and trainable implementation of AlphaFold2, developed by the OpenFold Consortium led by Mohammed AlQuraishi at Columbia University. Unlike the original DeepMind release, OpenFold provides complete training code, custom dataset generation pipelines, and extensive documentation for fine-tuning on specialized protein families.

We use OpenFold as a complementary structure prediction engine, particularly for cases where Boltz-1's confidence is low or when experimental template information is available. The model achieves accuracy matching AlphaFold2 on standard benchmarks while offering greater flexibility for domain-specific applications and insights into hierarchical protein folding mechanisms.

Integration Advantages

🔓
Full Transparency

Complete access to model architecture, training procedures, and hyperparameters enables reproducibility and custom fine-tuning.

⚙️
Template Integration

Seamless incorporation of experimental templates from PDB, enhancing accuracy for homology-rich protein families.

🧪
Ensemble Predictions

Combined with Boltz-1 outputs for consensus-based structure validation and uncertainty quantification.

💾
Resource Efficiency

Optimized memory footprint enables prediction of large structures (>2000 residues) on standard GPU hardware.

0.89
Median TM-Score
Open
Fully Open-Source
2000+
Max Residues
🎯

3. DiffDock: Diffusion-Based Molecular Docking

AI-Native Pose Prediction
Corso et al., 2023 | ICLR 2023 (Spotlight)

Overview

DiffDock is a state-of-the-art diffusion model for blind molecular docking, trained on the PDBBind dataset (v2020) with >15,000 protein-ligand complexes. Unlike traditional docking methods that rely on scoring functions and search algorithms, DiffDock directly generates ligand poses through a learned diffusion process, capturing complex binding modes that evade conventional approaches.

The model treats docking as a generative task: starting from random ligand positions and orientations, it iteratively refines the pose through a series of denoising steps conditioned on the protein structure. This approach achieves >38% success rate (RMSD < 2.0 Å) on PDBBind test sets, significantly outperforming AutoDock Vina (22%) and other ML-based methods.

Technical Details

🌊
Diffusion Framework

Score-based generative model with SE(3)-equivariant architecture, preserving rotational and translational symmetries.

🧠
Learned Representations

Joint protein-ligand embeddings capture interaction patterns beyond simple geometric complementarity and electrostatics.

📐
Pose Sampling

Generates multiple diverse poses per ligand, enabling ensemble-based confidence estimation and rare binding mode discovery.

Fast Inference

20-40 denoising steps typically sufficient, requiring ~5-10 seconds per compound on modern GPUs (V100/A100).

Benchmark Performance

DiffDock has been rigorously evaluated on multiple independent test sets:

  • PDBBind 2020 (Test): 38.1% top-1 success rate (RMSD < 2 Å), 52.7% top-5
  • Astex Diverse Set: 43.2% success rate, outperforming Glide (31%), GOLD (28%)
  • Cross-Docking: 29.3% success on apo→holo docking (PDBBind refined)
  • Allosteric Sites: Successfully identifies cryptic pockets in 61% of test cases
38%
Success Rate
5-10s
Per Compound
15K+
Training Complexes
⚗️

4. AutoDock Vina: Physics-Based Docking

Gold Standard for Binding Affinity Estimation
Eberhardt et al., 2021 | J. Chem. Inf. Model. 61: 3891-3898

Overview

AutoDock Vina is one of the most widely-used molecular docking programs, cited over 17,000 times since its 2010 release. Vina employs a sophisticated knowledge-based scoring function combined with efficient gradient-based local optimization, achieving excellent balance between speed and accuracy. The latest version (1.2.0) introduces improved search algorithms and GPU acceleration.

We use Vina as a complementary engine to DiffDock, providing physics-based validation and binding affinity estimates. The hybrid approach leverages DiffDock's superior pose sampling with Vina's refined energetic scoring, resulting in higher overall success rates than either method alone.

Scoring Function

Vina's scoring function combines multiple terms empirically weighted to reproduce experimental binding affinities:

ΔG = Σ (gauss1 + gauss2) + Σ repulsion + Σ hydrophobic + Σ H-bonds where: - gauss1, gauss2: Distance-dependent Gaussian terms - repulsion: Steric clash penalty (r⁻¹² term) - hydrophobic: Pairwise hydrophobic interactions - H-bonds: Directional hydrogen bonding (geometry-dependent) - torsional entropy: Penalty proportional to # rotatable bonds

This function achieves Pearson R = 0.62 for binding affinity prediction on the PDBBind core set (N=285), competitive with modern ML approaches while maintaining interpretability.

Hybrid DiffDock + Vina Protocol

1️⃣
Initial Pose Generation

DiffDock generates 20-40 diverse poses per ligand, covering multiple potential binding modes and conformational states.

2️⃣
Local Refinement

Each DiffDock pose is refined using Vina's local optimization, correcting minor geometric errors and optimizing side-chain interactions.

3️⃣
Consensus Scoring

Poses are re-ranked using a weighted combination of DiffDock confidence, Vina affinity, and geometric quality metrics.

4️⃣
Ensemble Selection

Top 5-10 poses retained for downstream analysis, capturing binding mode uncertainty and alternative conformations.

17K+
Citations
0.62
Affinity Pearson R
~1s
Per Compound
🚀

5. Deep Learning Virtual Screening Engine

Billion-Scale Compound Screening with ADMET Optimization
Proprietary Architecture | Based on GNN + Transformer Fusion

Overview

Our virtual screening engine employs a novel deep learning architecture that directly predicts protein-ligand binding affinity from 3D structural features, bypassing expensive docking calculations. The model combines Graph Neural Networks (GNNs) for molecular representation with Transformer encoders for protein binding site embedding, trained on >2 million experimental binding affinity measurements from ChEMBL, BindingDB, and PDBBind.

Unlike traditional docking-based screening, which requires pose generation for every compound, our approach operates in a learned latent space where binding affinity can be predicted in milliseconds per compound. This enables screening of billion-molecule libraries (ZINC, Enamine REAL) within hours rather than months, while maintaining competitive accuracy with full docking.

Architecture Details

🧬
Protein Site Encoder

Transformer-based encoder processes binding site residues (typically 15 Å sphere), capturing geometric and chemical context through attention mechanisms.

⚛️
Ligand Graph Network

Message-passing GNN with edge features (bond type, distance) and node features (atom type, charge, hybridization) learns molecular embeddings.

🔗
Interaction Module

Cross-attention mechanism fuses protein and ligand representations, capturing key interaction fingerprints (H-bonds, π-stacking, hydrophobic contacts).

🎯
Multi-Task Head

Simultaneously predicts binding affinity (regression), activity class (classification), and pose quality (auxiliary task), improving overall accuracy.

KERMT-Based ADMET Prediction Models

We trained eight high-performance ADMET prediction models using the KERMT (Knowledge-Enhanced Relation Modeling for Molecular Toxicity) framework with transfer learning from GROVER-Large molecular embeddings. Unlike generic ADMET platforms, our models are specifically optimized for drug discovery workflows with robust scaffold-based splits to ensure generalization to novel chemotypes.

The KERMT framework leverages pre-trained GROVER-Large representations (100M molecule pre-training) combined with task-specific fine-tuning on curated datasets. Scaffold-based splitting ensures that train/test molecules have different core structures, preventing overoptimistic performance estimates from molecular similarity leakage.

Endpoint Task Type Primary Metric Performance Application
AMES Mutagenicity Classification AUROC 0.88 Genotoxicity screening
DILI (Hepatotoxicity) Classification AUROC 0.79 Liver safety assessment
hERG Blockade Classification AUROC 0.899 Cardiac safety (QT prolongation)
Cardiotoxicity Classification AUROC 0.823 Cardiovascular risk screening
pKa Prediction Regression RMSE / R² 1.51 / 0.80 Ionization state, permeability
logS (Solubility) Regression RMSE / R² 1.09 / 0.74 Formulation, bioavailability
COX-1 pIC50 Regression RMSE 0.603 GI toxicity prediction (NSAIDs)
COX-2 pIC50 Regression RMSE 0.775 Anti-inflammatory efficacy

Final compound scores are computed using a weighted multi-objective function that balances binding affinity with ADMET properties. Users can adjust weights for different optimization goals (e.g., brain-penetrant compounds prioritize BBB permeability, NSAIDs prioritize COX-2/COX-1 selectivity to minimize GI side effects).

Benchmark Performance

Validated against multiple independent test sets and prospective screening campaigns:

  • CASF-2016: Pearson R = 0.78 for affinity prediction (vs. 0.62 for Vina)
  • DUD-E Enrichment: Mean EF1% = 31.2 across 102 targets (top 1% enrichment)
  • Screening Speed: 100M compounds in ~6 hours on 8×A100 GPUs
  • Hit Rate: 23% confirmed actives (IC50 < 10 μM) in prospective screens (N=240)
0.78
Affinity Pearson R
100M/6h
Screening Speed
2M+
Training Data Points
APPLICATIONS

Real-World Drug Discovery Applications

ProteinLab.ai has been applied to diverse therapeutic targets across oncology, neurology, infectious disease, and inflammation, accelerating hit identification and lead optimization.

🎗️

Oncology Target Discovery

Screen novel kinase inhibitors against predicted structures of mutant EGFR, ALK, and ROS1. Identify selective inhibitors for resistance mutations (e.g., EGFR T790M, ALK G1202R). Prioritize compounds with favorable CNS penetration for brain metastases.

🧠

CNS Drug Development

Design BBB-penetrant compounds targeting neurological disorders. Optimize for P-glycoprotein efflux avoidance while maintaining target engagement. Applied to GPCRs (D2R, 5-HT2A), ion channels (Nav1.7), and metabolic enzymes (MAO-B).

🦠

Antiviral Therapeutics

Rapid screening against viral proteases and polymerases (SARS-CoV-2 Mpro, HIV protease, HCV NS5B). Structure-based design of pan-viral inhibitors. Integration with resistance mutation databases for future-proof drug design.

🔥

Anti-Inflammatory Agents

Target inflammatory mediators (COX-2, mPGES-1, FLAP) with improved selectivity profiles. Screen for dual inhibitors (e.g., COX-2/mPGES-1). Optimize for reduced GI toxicity and cardiovascular risk through ADMET profiling.

💊

Allosteric Modulator Discovery

Identify non-orthosteric binding sites using ensemble docking and cryptic pocket detection. Design allosteric modulators for challenging targets (e.g., GPCRs, nuclear receptors). Improved selectivity and reduced on-target toxicity.

🔬

Fragment-to-Lead Optimization

Expand fragment hits through structure-guided elaboration. Virtual linking and merging of adjacent fragments. Scaffold hopping to explore novel chemotypes while maintaining binding mode. ADMET optimization throughout the process.

REFERENCES

Scientific References & Citations

ProteinLab.ai builds upon peer-reviewed research and state-of-the-art computational methods.

[1] Wohlwend, J., Corso, G., Passaro, S., Barzilay, R., & Jaakkola, T. (2024). Boltz-1: Democratizing Biomolecular Interaction Modeling. bioRxiv, 2024.11.19.624167. DOI: 10.1101/2024.11.19.624167
[2] Ahdritz, G., Bouatta, N., et al. (2024). OpenFold: Retraining AlphaFold2 yields new insights into its learning mechanisms and capacity for generalization. Nature Methods, 21(8), 1514-1524. DOI: 10.1038/s41592-024-02272-z
[3] Corso, G., Stärk, H., Jing, B., Barzilay, R., & Jaakkola, T. (2023). DiffDock: Diffusion Steps, Twists, and Turns for Molecular Docking. International Conference on Learning Representations (ICLR), Spotlight Paper. OpenReview
[4] Eberhardt, J., Santos-Martins, D., Tillack, A. F., & Forli, S. (2021). AutoDock Vina 1.2.0: New Docking Methods, Expanded Force Field, and Python Bindings. Journal of Chemical Information and Modeling, 61(8), 3891-3898. DOI: 10.1021/acs.jcim.1c00203
[5] Jumper, J. et al. (2021). Highly accurate protein structure prediction with AlphaFold. Nature, 596(7873), 583-589. DOI: 10.1038/s41586-021-03819-2
[6] Steinegger, M., & Söding, J. (2017). MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nature Biotechnology, 35(11), 1026-1028. DOI: 10.1038/nbt.3988
[7] Liu, Z., et al. (2017). PDB-wide collection of binding data: current status of the PDBbind database. Bioinformatics, 33(2), 285-287. DOI: 10.1093/bioinformatics/btw597
[8] Rong, Y., et al. (2020). Self-Supervised Graph Transformer on Large-Scale Molecular Data. Advances in Neural Information Processing Systems (NeurIPS), 33, 12559-12571. NeurIPS Proceedings