
ImmunoSim
Gymnasium Environments for Reinforcement Learning in Cancer Immunotherapy Optimization
Hass Dhia - Smart Technology Investments Research Institute
Four Gymnasium-compatible RL environments for cancer immunotherapy optimization: checkpoint inhibitor dosing (anti-PD-1), combination dual checkpoint blockade (anti-PD-1 + anti-CTLA-4), CAR-T cell infusion scheduling, and adaptive dosing with pseudo-progression detection. Implements Kuznetsov-Taylor (1994) tumor-immune ODEs, Nikolopoulou (2018/2021) checkpoint inhibitor pharmacodynamics, Barros CARTmath (2021) CAR-T compartmental model, and Shulgin (2020) immune toxicity curves. Key finding: reward landscape curvature, not state dimensionality, determines RL difficulty - asymmetric drug toxicity profiles create richer gradient signals. 175 tests, MIT licensed.










