Abstract: Although AI has been extensively adopted and has profoundly transformed our lives, it is not feasible to directly deploy large AI models on edge devices with limited resources. To enhance ...
Abstract: This paper presents a first-order distributed algorithm for solving a convex semi-infinite program (SIP) over a time-varying network. In this setting, the objective function associated with ...
I'm currently training models using the CISPO method, with both dense models (Qwen2.5-7B) and MoE models (Qwen3-30B-A3B). During my experiments, I've encountered an issue where the gradient norm ...
. ├── ppo.py # Core PPO implementation ├── demonstrations/ # Example implementations │ ├── cartpole_demo.py │ ├── lunar_lander_demo.py │ └── README.md ├── requirements.txt # Project dependencies └── ...