Gradient Descent Algorithm Derivation

ERA: a QoE-Aware Collaborative Inference Algorithm for NOMA-based Edge Intelligence

Abstract: Although AI has been extensively adopted and has profoundly transformed our lives, it is not feasible to directly deploy large AI models on edge devices with limited resources. To enhance ...

IEEE

Distributed Alternating Gradient Descent for Convex Semi-infinite Programs Over A Network

Abstract: This paper presents a first-order distributed algorithm for solving a convex semi-infinite program (SIP) over a time-varying network. In this setting, the objective function associated with ...

GitHub

Gradient norm explosion after 40 steps using CISPO algorithm

I'm currently training models using the CISPO method, with both dense models (Qwen2.5-7B) and MoE models (Qwen3-30B-A3B). During my experiments, I've encountered an issue where the gradient norm ...

GitHub

Proximal Policy Optimization (PPO) Implementation

. ├── ppo.py # Core PPO implementation ├── demonstrations/ # Example implementations │ ├── cartpole_demo.py │ ├── lunar_lander_demo.py │ └── README.md ├── requirements.txt # Project dependencies └── ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results