We released Safe RLHF: Safe Reinforcement Learning from Human Feedback.

GitHub Repo Stars AK's Daily Papers