Omics-OS Docs
Case Studies

Drug Discovery: ALK Inhibitor Investment Analysis

Multi-agent drug target validation, compound profiling, and resistance pharmacogenomics using Lobster AI's drug discovery agents.

This case study walks through a complete drug discovery investment assessment for the ALK (Anaplastic Lymphoma Kinase) target in non-small cell lung cancer (NSCLC). Across three conversational turns, Lobster AI coordinates four specialized agents, queries external databases, and synthesizes findings that would normally require days of manual research.

Session context: Results generated February 2026 using lobster-ai 1.0.12 on AWS Bedrock (Claude Sonnet 4.5). External databases queried: Open Targets, ChEMBL, PubChem. Total cost: under $1.50 across 3 turns (435,195 tokens). Database content changes over time — re-running these queries will return different results reflecting updated bioactivity records, disease associations, and compound registrations. This case study demonstrates an analytical workflow, not independently validated scientific findings.

Agents and Data Sources

This analysis uses the lobster-drug-discovery package, which provides four agents:

AgentRole
drug_discovery_expertTarget validation, druggability scoring, disease association analysis
cheminformatics_expertMolecular property calculation, Lipinski profiling, ADMET prediction
clinical_dev_expertClinical trial landscape, safety profile, regulatory pathway assessment
pharmacogenomics_expertResistance mutation analysis, variant impact scoring, PGx strategy

External APIs queried during the session: Open Targets (disease associations, druggability), ChEMBL (bioactivity, compound data), PubChem (molecular cross-validation). Local computation is handled by RDKit (molecular descriptors) and ESM2 (protein structure context).

The Research Question

Is ALK a viable drug discovery investment target for NSCLC, and what modality (small molecule, ADC, PROTAC) offers the best risk-adjusted return?

This question requires integrating target biology, clinical evidence, molecular chemistry, and pharmacogenomics -- domains that typically live in separate teams and toolsets. Lobster AI handles them in a single session.


Turn 1: Target Validation

The first query asks Lobster AI to assess ALK as a drug target with investment-grade rigor.

lobster query --session-id alk_hero \
  "Assess ALK as a drug discovery investment target for NSCLC. \
   I need: (1) Open Targets disease association analysis, \
   (2) druggability scoring with component breakdown, \
   (3) safety profile assessment, \
   (4) clinical tractability across modalities (small molecule, ADC, PROTAC). \
   Format for an investor presentation."

Disease Association

The drug_discovery_expert queried Open Targets and returned an ALK-NSCLC association score of 0.754/1.0 (high confidence), ranking NSCLC as the number one disease association out of 1,272 total associated diseases.

Evidence TypeScoreContributionKey Insight
Known Drug0.9824.5%6 FDA-approved ALK inhibitors
Genetic Association0.7522.5%EML4-ALK fusion in approximately 4% of NSCLC
Literature1.0010.0%Over 10,000 publications
Pathway0.9314.0%Clear mechanistic link
Expression0.7014.0%Restricted tissue expression

ALK fusions occur in approximately 4% of NSCLC patients, translating to approximately 9,000-11,000 US patients per year. FDA-approved companion diagnostics (IHC, FISH, NGS) are already in clinical use.

Druggability Score

The composite druggability score came back at 0.850/1.0, consistent with ALK's status as a validated oncology target with 6 approved drugs. The structural basis is strong: over 50 PDB co-crystal structures, an ATP-binding pocket volume of approximately 400 cubic angstroms, and a small gatekeeper residue (L1196) that permits bulky substituents.

MetricTypical KinaseALKAssessment
Druggable Pocket0.6-0.80.85Above average
Time to First Approval10-15 years9 yearsFaster
Phase 2 Success Rate30-40%60-70%Higher
Approved Drugs1-36Exceptional

Safety Profile

The safety assessment rated ALK as low risk. No ALK inhibitor carries a black box warning. Adverse events are manageable: hepatotoxicity at 5-8% Grade 3 or higher, GI toxicity at 40-60%, and vision disorders at 60-70% Grade 1-2. There is no dose-limiting cardiotoxicity.

OrganExpressionSafety ImpactMonitoring
BrainHIGHNeurocognitive effects possibleNeurocognitive assessments
GI TractMODERATEDiarrhea, nauseaSymptom management
LiverLOWMinimal on-target toxicityLFTs (ALT, AST)
HeartLOWMinimal cardiac riskECG monitoring
KidneyLOWNo renal toxicityCreatinine

Clinical Tractability Across Modalities

The analysis assessed three modalities:

Small molecules scored 9.5/10 for tractability but the market is saturated with 6 FDA-approved drugs and $2.0-2.5B in annual sales. Investment is only justified with clear differentiation (G1202R activity, improved safety, or combination readiness).

Antibody-drug conjugates (ADCs) scored 5.5/10. EML4-ALK fusions are cytoplasmic proteins with truncated extracellular domains. Because antibodies cannot reach intracellular targets, ADCs require identifying an alternative surface antigen expressed on ALK-positive tumor cells. ADC potential is conditional on validating surface expression, internalization kinetics, and antigen density.

PROTAC degraders scored 7.5/10 and were flagged as the highest priority modality. No clinical-stage ALK PROTACs exist (first-mover advantage). Preclinical data shows G1202R degradation at DC50 of approximately 50 nM. Brain-penetrant designs are feasible. The estimated peak sales potential is $500M-1B with $105-150M required to reach Phase 2 data.

The drug_discovery_expert and clinical_dev_expert agents handled target validation, safety, and modality assessment in a single coordinated pass.


Turn 2: Compound Profiling

The second query requests a head-to-head molecular comparison between the first-generation and best-in-class ALK inhibitors.

lobster query --session-id alk_hero \
  "Compare crizotinib vs alectinib molecular profiles. \
   I need: (1) ChEMBL bioactivity with IC50 values, \
   (2) RDKit molecular descriptors including TPSA and LogP, \
   (3) Lipinski rule-of-five assessment, \
   (4) PubChem cross-validation of key properties. \
   Highlight the molecular basis for alectinib's clinical superiority."

Molecular Property Comparison

The cheminformatics_expert pulled compound data from ChEMBL, computed descriptors with RDKit, and cross-validated against PubChem. The results:

PropertyCrizotinib (1st-gen)Alectinib (2nd-gen)Clinical Impact
ALK IC5020 nM1.9 nM (10x better, per ChEMBL bioactivity data)83% ORR vs 76%
TPSA78 angstroms squared72 angstroms squared~300-1000x brain penetration
MET Selectivity8 nM (dual-target activity)>1000 nM (125x selective)Fewer off-target effects from improved selectivity
Lipinski Violations1 (LogP 5.04)0 (LogP 4.77)Perfect drug-like profile
Rotatable Bonds53Pre-organized for binding
Resistance CoverageL1196M escapesCovers L1196M34.8 month PFS durability

IC50 values represent the most potent biochemical kinase assay reported in ChEMBL for each compound. Cellular IC50 values are typically 5-50x higher depending on cell line and assay conditions. The relative comparison (approximately 10-fold difference) is consistent across published literature regardless of assay type.

The TPSA / Blood-Brain Barrier Insight

A difference of just 5.6 angstroms squared in TPSA separates a CNS-limited drug from a CNS-active one. Alectinib's TPSA of 72 angstroms squared sits below the critical 75 angstroms squared blood-brain barrier threshold, enabling a CSF/plasma ratio of approximately 0.75-0.86 compared to crizotinib's approximately 0.0006-0.003 -- an approximately 300-1000-fold improvement depending on study and measurement method. This translates to an 84% CNS response rate versus 50%, and a 3.6x longer CNS-specific PFS (25.4 vs 7.4 months).

The rigid carbazole scaffold (3 rotatable bonds vs 5) pre-organizes alectinib for binding, evades P-glycoprotein efflux for CNS penetration, and accommodates resistance mutations for durability.

Cross-Validation

Molecular weights, TPSA values, and core drug-like properties were validated across ChEMBL, PubChem, and RDKit. TPSA values were consistent within 0.01 angstroms squared. LogP discrepancies between databases were attributed to different algorithms (ALogP vs XLogP) and flagged transparently.

IC50 values are as reported by ChEMBL bioactivity data queried during the session. Exact IC50 values vary by assay conditions and cell line; the relative comparison (approximately 10-fold difference) is consistent across published literature.


Turn 3: Resistance Pharmacogenomics

The third query addresses the critical question for next-generation drug design: which resistance mutations are tractable, and which represent fundamental limits for ATP-competitive inhibitors?

lobster query --session-id alk_hero \
  "Analyze ALK resistance mutations G1202R and L1196M. \
   I need: (1) structural mechanism of resistance for each, \
   (2) cross-generational inhibitor sensitivity, \
   (3) variant impact scoring, \
   (4) full mutation pattern analysis across all known resistance sites, \
   (5) PROTAC degrader design implications. \
   Explain why G1202R is fundamentally different from L1196M."

The "Door Frame" vs "Lock Cylinder" Paradigm

The pharmacogenomics_expert produced the key structural insight that differentiates these two classes of resistance mutation.

G1202R is a "door frame" mutation. It replaces glycine (1 atom) with arginine (11 atoms, +158 Da) at the solvent front, introducing a positive charge and globally deforming the ATP pocket entrance. Every inhibitor generation is affected:

InhibitorFold Resistance
Crizotinib>50x
Alectinib / Brigatinib100-150x
Lorlatinib (best-in-class)15-20x

L1196M is a "lock cylinder" mutation. It replaces leucine with methionine at the gatekeeper position -- a focal steric clash that only affects the extended binding mode used by first-generation compounds. Second-generation inhibitors use compact scaffolds that avoid the clash:

InhibitorFold Resistance
Crizotinib~20x
Alectinib / Brigatinib2.5-8x (manageable)
Lorlatinib~2x (essentially solved)

Variant Impact Scoring

All three scored mutations (G1202R, L1196M, C1156Y) received a functional impact score of 0.49 (moderate). However, the clinical significance varies dramatically. G1202R is pan-resistant and drives the need for lorlatinib. L1196M is generation-specific and already solved by second-generation compounds. C1156Y shows drug-specific resistance (ceritinib 250x, alectinib only 4-8x).

Mutation Pattern Analysis

The analysis mapped six known resistance mutations across five structural regions:

RegionMutationsFunctionResistance Mechanism
GatekeeperL1196MPocket access controlSteric clash (focal)
Solvent FrontG1202R, G1269APocket entranceGlobal deformation
Hinge RegionI1171TATP H-bondingBinding disruption
Activation LoopF1174LKinase activationGain-of-function
Alpha-C-HelixC1156YConformational switchScaffold-specific clash

These were categorized into three resistance mechanism classes:

  1. Steric clash (druggable by redesign): L1196M, C1156Y, I1171T. Direct physical interference with drug binding. Already overcome by second-generation inhibitors.
  2. Pocket deformation (fundamental limit): G1202R, G1269A. Alters pocket shape and polarity globally. Cannot be solved by scaffold redesign alone.
  3. Kinase activation (gain-of-function): F1174L. Stabilizes the active conformation and increases catalytic activity. Degradation (removing the protein entirely) is superior to inhibition for this class.

PROTAC Degrader Implications

The pharmacogenomics analysis directly informed the PROTAC design strategy from Turn 1:

  • G1202R (80% confidence of PROTAC sensitivity): PROTACs tolerate moderate binding affinity (Kd 100-500 nM) versus inhibitors requiring less than 10 nM. Preclinical compound 18c (lorlatinib-based warhead) degrades G1202R with DC50 of approximately 50 nM.
  • F1174L (90% confidence): Gain-of-function mutation makes degradation mechanistically superior to inhibition. Relevant for pediatric neuroblastoma (orphan indication).
  • L1196M, C1156Y, I1171T (60-75% confidence): Already overcome by approved drugs. PROTAC offers incremental benefit through more durable responses.

Total session cost across all three turns: under $1.50 (435,195 tokens).


What This Demonstrates

This case study illustrates several capabilities that distinguish Lobster AI from manual research workflows, raw LLM queries, and single-tool approaches.

Multi-Agent Coordination

No single agent could produce this analysis. The drug_discovery_expert handled target validation and disease associations. The cheminformatics_expert computed molecular properties and cross-validated across databases. The clinical_dev_expert assessed safety and regulatory pathways. The pharmacogenomics_expert analyzed resistance mutations and their structural implications. The supervisor routed each sub-question to the appropriate specialist and synthesized the results.

Database Integration

The agents queried Open Targets, ChEMBL, and PubChem programmatically through validated API tools, not through LLM hallucination. Molecular descriptors were computed locally with RDKit. Cross-validation between databases was performed automatically, with discrepancies (such as LogP algorithm differences) flagged transparently.

Provenance and Reproducibility

Every tool call is logged with an AnalysisStep IR (intermediate representation) that captures the operation, parameters, data sources, and outputs. The session can be reproduced or extended with --session-id.

Comparison Table

Estimates based on this case study session. Human researcher timing assumes manual database queries without automation.

TaskHuman ResearcherRaw LLMLobster AI
Query 4+ databases30-45 minCannot access APIs~2 min
Cross-validate molecular properties15-20 minHallucinates valuesAutomatic (RDKit + PubChem)
Map 6 resistance mutations structurally1-2 hoursApproximate, no scoringScored with structural context
Synthesize modality recommendation2-4 hoursGeneric, no real dataData-driven with cost estimates
Total time for complete assessment1-2 daysNot reliable~6 min (3 turns)
Session costResearcher salaryAPI cost onlyUnder $1.50 total

Limitations

This case study demonstrates the platform's analytical workflow, not a complete drug discovery assessment. Key limitations to note:

  • IC50 values are assay-dependent. ChEMBL aggregates biochemical and cellular assays. Values reported here are the most potent biochemical measurements; cellular IC50 values in patient-derived lines may differ substantially.
  • Druggability scoring is a platform composite. The 0.850 score combines genetic association, known drug evidence, literature support, expression specificity, and structural tractability from Open Targets data. It is not an externally benchmarked or independently validated metric.
  • PROTAC confidence estimates are model-derived. The percentage confidence values for PROTAC sensitivity are heuristic estimates based on structural and mechanistic reasoning, not experimentally validated predictions.
  • Variant impact scoring limitations. All three scored mutations (G1202R, L1196M, C1156Y) received identical functional impact scores (0.49) despite vastly different clinical significance. The scoring captures structural disruption but does not distinguish clinical tractability.
  • This analysis does not replace experimental IC50 validation, FEP+ binding free energy calculations, PROTAC degradation assays (Dmax, hook effect), clinical PK/PD modeling, or patent freedom-to-operate analysis.

Reproducibility

To reproduce this analysis, install the drug discovery package and run the three turns sequentially:

pip install 'lobster-ai[full]==1.0.12'
lobster query --session-id alk_study \
  "Assess ALK as a drug discovery investment target for NSCLC. \
   I need: Open Targets disease associations, druggability scoring, \
   safety profile, and clinical tractability across modalities."
lobster query --session-id alk_study \
  "Compare crizotinib vs alectinib molecular profiles with ChEMBL \
   bioactivity, RDKit descriptors, Lipinski assessment, and PubChem \
   cross-validation."
lobster query --session-id alk_study \
  "Analyze ALK resistance mutations G1202R and L1196M with structural \
   mechanisms, cross-generational sensitivity, variant scoring, and \
   PROTAC design implications."

Session continuity via --session-id ensures each turn builds on prior context. Results are stored in the .lobster_workspace/ directory and can be exported with /pipeline export.


On this page