
OpthaSim
Gymnasium-Compatible Reinforcement Learning Environments for Ophthalmic Treatment Optimization
Hass Dhia - Smart Technology Investments Research Institute
Four Gymnasium-compatible RL environments for ophthalmic treatment optimization: glaucoma IOP management with Goldmann-equation dynamics and four medication classes, anti-VEGF injection scheduling with 1-compartment PK/PD and CST/VA response models, selective laser trabeculoplasty parameter optimization with stochastic outcome modeling, and diabetic retinopathy screening/treatment escalation with HbA1c-modulated Markov progression. Key finding: PPO advantage over clinical guideline heuristics correlates with temporal decision density - PPO achieves 33.57 mean reward on daily-step GlaucomaIOP (vs -12.75 for guideline heuristic) but clinical guidelines edge out PPO on sparse monthly LaserTrabeculoplasty decisions. 238 tests, MIT licensed.














