MedGRPO

Last updated: 2026-05-03

Summary

MedGRPO is a reinforcement learning framework for heterogeneous medical video understanding. Its key claim is that standard RL can collapse when medical video datasets and tasks have incompatible reward scales, and that balanced multi-dataset learning requires reward normalization plus domain-specific caption evaluation.

Key Points

Practical Takeaways

Results To Remember

Model CVS acc STG mIoU TAG@0.3 DVC F1 VS LLM RC LLM
Qwen2.5-VL-7B SFT 0.894 0.177 0.142 0.165 3.596 2.757
Qwen2.5-VL-7B MedGRPO 0.896 0.202 0.216 0.214 4.184 3.442
MedGRPO without reward normalization 0.020 0.010 0.004 0.002 3.805 3.469

Related

Sources

Open Questions