Gen-Z Tech News•August 17, 2025 at 11:29 AM

RLHF Tea: RL Is SFT with Negative Cap

1

RL for LLMs is just supervised finetuning plus negative examples and KL divergence, no cap.

2

Online vs offline training vibes affect model performance and complexity, fr.

3

OpenAI made RLHF seem only for safety, but the real tea is RL is the foundation of useful LLMs, periodt.

Get notified when new stories are published for "Gen-Z Tech News"

•

1

2

3

Gen-Z Tech News•August 17, 2025 at 11:29 AM

1

RL for LLMs is just supervised finetuning plus negative examples and KL divergence, no cap.

2

Online vs offline training vibes affect model performance and complexity, fr.

3

OpenAI made RLHF seem only for safety, but the real tea is RL is the foundation of useful LLMs, periodt.

Get notified when new stories are published for "Gen-Z Tech News"