Skip to content
Abstract neural vascular network visualization
All Research

Research Program

Reinforcement Learning

Open-source simulation platforms for RL in scientific applications. Every project ships code, data, and a paper.

Publications

14 publications in Reinforcement Learning

WoundSim paper title page
2026Reinforcement LearningWound HealingTreatment OptimizationBiomedical SimulationOpen Source

WoundSim

Gymnasium-Compatible Reinforcement Learning Environments for Wound Healing Treatment Optimization

Hass Dhia - Smart Technology Investments Research Institute

Four Gymnasium-compatible RL environments for wound healing treatment optimization: Zlobina macrophage polarization (5 variables), simplified Xue-Friedman ischemic wound healing (6 variables), Flegg HBOT angiogenesis (4 variables), and extended diabetic wound model with glucose-insulin dynamics (7 variables). All ODE parameters sourced from peer-reviewed publications with explicit provenance. Includes random, clinical heuristic, and PPO baselines. Key finding: PPO achieves 11.9x improvement over random baseline on HBOT environment by exploiting the non-monotonic oxygen-angiogenesis relationship. 173 tests, MIT licensed.

HemoSim paper title page
2026Reinforcement LearningHemostasisAnticoagulationPharmacogenomicsOpen Source

HemoSim

Gymnasium Environments for Reinforcement Learning in Hemostasis and Anticoagulation Management

Hass Dhia - Smart Technology Investments Research Institute

Four Gymnasium-compatible RL environments for hemostasis and anticoagulation management: warfarin dose titration with CYP2C9/VKORC1 pharmacogenomics (Hamberg 2007), heparin infusion optimization with aPTT monitoring (Raschke 1993), direct oral anticoagulant selection for atrial fibrillation (RE-LY, ROCKET-AF, ARISTOTLE trials), and disseminated intravascular coagulation management with multi-component blood product therapy (ISTH DIC scoring). Implements an 8-state reduced coagulation cascade ODE model derived from Hockin et al. (2002), population PK/PD models with pharmacogenomic patient variability, and clinical protocol baselines (IWPC, Raschke nomogram). Key finding: PPO achieves 83.4% improvement over clinical baselines in DOAC management, where learned drug-dose optimization outperforms guideline-based selection. 142 tests, MIT licensed.

AntibioSim paper title page
2026Reinforcement LearningAntimicrobial StewardshipInfectious DiseaseOpen Source

AntibioSim

Gymnasium Environments for Reinforcement Learning in Antimicrobial Stewardship

Hass Dhia - Smart Technology Investments Research Institute

Four Gymnasium-compatible RL environments for antimicrobial stewardship: antibiotic selection from a five-drug formulary, dose optimization for PK/PD target attainment, IV-to-oral therapy switching with escalation/de-escalation, and ward-level antibiotic policy for resistance control. Implements one-compartment pharmacokinetics (Drusano 2004), sigmoidal Emax pharmacodynamics (Regoes 2004), and susceptible-resistant bacterial population dynamics (Austin 1999). Key finding: PPO achieves 20.5x improvement over random baseline in the multi-patient ResistanceControl environment, suggesting RL-based stewardship is most impactful at the ward policy level where multi-patient resistance dynamics create coordination challenges. 170 tests, MIT licensed.

ImmunoSim paper title page
2026Reinforcement LearningImmunotherapyOncologyOpen Source

ImmunoSim

Gymnasium Environments for Reinforcement Learning in Cancer Immunotherapy Optimization

Hass Dhia - Smart Technology Investments Research Institute

Four Gymnasium-compatible RL environments for cancer immunotherapy optimization: checkpoint inhibitor dosing (anti-PD-1), combination dual checkpoint blockade (anti-PD-1 + anti-CTLA-4), CAR-T cell infusion scheduling, and adaptive dosing with pseudo-progression detection. Implements Kuznetsov-Taylor (1994) tumor-immune ODEs, Nikolopoulou (2018/2021) checkpoint inhibitor pharmacodynamics, Barros CARTmath (2021) CAR-T compartmental model, and Shulgin (2020) immune toxicity curves. Key finding: reward landscape curvature, not state dimensionality, determines RL difficulty - asymmetric drug toxicity profiles create richer gradient signals. 175 tests, MIT licensed.

SepsiSim paper title page
2026Reinforcement LearningSepsisCritical CareOpen Source

SepsiSim

Gymnasium Environments for Reinforcement Learning in Sepsis Management

Hass Dhia -- Smart Technology Investments Research Institute

Three Gymnasium-compatible RL environments for sepsis management: fluid resuscitation via bolus dosing, vasopressor titration for MAP maintenance, and combined multi-intervention management. Implements Reynolds et al. (2006) 4-ODE inflammation dynamics coupled with cardiovascular hemodynamics, lactate kinetics, and SOFA scoring. Three difficulty tiers per environment enable curriculum learning. PPO outperforms baselines on vasopressor titration with 10.6% improvement and lowest variance. 136 tests, MIT licensed.

NephroSim paper title page
2026Reinforcement LearningNephrologyHemodialysisOpen Source

NephroSim

Gymnasium Environments for Reinforcement Learning in Hemodialysis Optimization

Hass Dhia -- Smart Technology Investments Research Institute

Four Gymnasium-compatible RL environments for hemodialysis optimization: urea clearance via two-compartment Gotch-Sargent kinetics, ultrafiltration control with baroreceptor reflex cardiovascular dynamics, phosphate management with binder dosing across weekly cycles, and a full multi-objective dialysis session. Features three difficulty tiers, clinical protocol heuristic baselines, and PPO agents that exceed random baselines by up to 3.13x. 158 tests, MIT licensed.

AnestheSim paper title page
2026Reinforcement LearningAnesthesiologyDrug DosingOpen Source

AnestheSim

Gymnasium Environments for Reinforcement Learning in Automated Anesthesia Drug Dosing

Hass Dhia -- Smart Technology Investments Research Institute

Three Gymnasium-compatible RL environments for automated anesthesia drug dosing: propofol infusion control via the Marsh three-compartment pharmacokinetic model with Hill pharmacodynamic BIS prediction, remifentanil effect-site concentration targeting via the Minto model, and combined propofol-remifentanil anesthesia management using the Greco synergistic interaction surface. Includes configurable difficulty tiers with patient variability and surgical stimulation events, heuristic TCI clinical baselines, PPO RL agents, and a benchmark suite across three difficulty levels. Key finding: pharmacokinetic timescale, not task complexity, is the primary determinant of RL sample efficiency in drug dosing control. 109 tests, MIT licensed.

NeuroSim paper title page
2026Reinforcement LearningBrain-Computer InterfacesNeuroscienceOpen Source

NeuroSim

A Gymnasium Platform for Reinforcement Learning in Brain-Computer Interfaces

Hass Dhia — Smart Technology Investments Research Institute

Gymnasium-compatible RL environment suite for brain-computer interfaces with three environments modeling motor imagery decoding, intracortical cursor control, and P300 speller navigation. Includes pluggable signal models (electrode drift, fatigue, co-adaptation, noise), a conditional VAE neural surrogate, CSP+LDA classical baseline, PPO RL baseline, and a five-tier benchmark suite. 158 tests, MIT licensed.

VascularSim paper title page
2026Reinforcement LearningMedical RoboticsSimulationOpen Source

VascularSim

A Gymnasium Platform for Microrobot Navigation in Patient-Derived Vascular Networks

Hass Dhia — Smart Technology Investments Research Institute

Open-source simulation platform providing a complete stack for training RL agents to navigate blood vessel graphs: TubeTK data ingestion, three Gymnasium environments with physics-based observations (VascularNav, FlowAwareNav, MagneticNav), analytical hemodynamics and magnetic field models, a neural flow surrogate, PPO baseline agents, and a benchmark suite across 5 difficulty tiers. 139 tests, MIT licensed.

PeptideGym paper title page
2026Reinforcement LearningDrug DiscoveryPeptide EngineeringOpen Source

PeptideGym

Gymnasium-Compatible RL Environments for Therapeutic Peptide Design

Hass Dhia — Smart Technology Investments Research Institute

Three Gymnasium-compatible RL environments for therapeutic peptide design: antimicrobial peptides (AMP), cyclic peptides, and T-cell epitopes. Includes heuristic biophysical scoring models, PPO and random baseline agents, reward shaping analysis revealing mode collapse boundaries, and a benchmark suite. First systematic demonstration that per-step reward shaping magnitude determines whether RL agents learn meaningful peptide sequences or degenerate to single-residue exploitation. 125 tests, MIT licensed.

OncoSim paper title page
2026Reinforcement LearningRadiation TherapyOncologyOpen Source

OncoSim

Gymnasium Environments for Reinforcement Learning in Radiation Therapy Treatment Planning

Hass Dhia — Smart Technology Investments Research Institute

Three Gymnasium-compatible RL environments for radiation therapy treatment planning: beam angle optimization, dose fractionation scheduling, and adaptive replanning. Includes analytical pencil beam dose calculation, linear-quadratic cell survival, TCP/NTCP radiobiological models, configurable difficulty tiers, and baseline agents (random, heuristic, PPO). PPO achieves 11.7x improvement on beam selection and 15.4x on adaptive replanning over clinical heuristics. 141 tests, MIT licensed.

GlucoSim paper title page
2026Reinforcement LearningGlucose ManagementDiabetesOpen Source

GlucoSim

Gymnasium Environments for Reinforcement Learning in Glucose Management

Hass Dhia — Smart Technology Investments Research Institute

Three Gymnasium-compatible RL environments for Type 1 diabetes glucose management: basal rate optimization, meal bolus dosing, and full closed-loop insulin delivery. Includes the Bergman minimal glucose-insulin model, Dalla Man gut absorption dynamics, a CGM sensor noise model, 30 virtual patients across three age groups, configurable difficulty tiers, heuristic clinical baselines, PPO RL agents, and a five-tier benchmark suite. Key finding: composite reward functions with safety constraints are necessary to differentiate learned policies from naive baselines in glucose management RL. 117 tests, MIT licensed.

VentiSim paper title page
2026Reinforcement LearningMechanical VentilationCritical CareOpen Source

VentiSim

Gymnasium Environments for Reinforcement Learning in Mechanical Ventilation

Hass Dhia -- Smart Technology Investments Research Institute

Three Gymnasium-compatible RL environments for mechanical ventilation: tidal volume control via inspiratory pressure adjustment, PEEP optimization for oxygenation, and full ventilator parameter management for ARDS patients. Implements a single-compartment lung mechanics model coupled with a simplified gas exchange model, configurable difficulty tiers with patient variability and disease progression, heuristic clinical baselines, and PPO agents. Key finding: PPO improvement over baselines scales monotonically with action dimensionality, from 11.8% in 1D to 65.0% in 4D control. 230 tests, MIT licensed.

CardioSim paper title page
2026Reinforcement LearningCardiac ElectrophysiologyDrug DosingOpen Source

CardioSim

Gymnasium Environments for Reinforcement Learning in Cardiac Electrophysiology

Hass Dhia -- Smart Technology Investments Research Institute

Three Gymnasium-compatible RL environments for cardiac electrophysiology: pacemaker rate optimization via the FitzHugh-Nagumo model with a cardiac conduction system simulator, antiarrhythmic drug dosing using the FitzHugh-Nagumo model with single-compartment PK/PD dynamics, and defibrillation timing via the Aliev-Panfilov model with probabilistic shock success. Includes configurable difficulty tiers, heuristic clinical baselines, and PPO agents. Environments span a difficulty spectrum from learnable (drug dosing, pacing) to open-challenge (defibrillation timing). 134 tests, MIT licensed.

14

Publications

29

Domains

14

Open-Source Repos

14

Published Packages

Collaborate with us

We welcome research collaborations, dataset contributions, and open-source partnerships across any discipline. Reach out to discuss.