Post-Training, RL, Experiments and Indic AI

「ツール」は右上に移動しました。

利用したサーバー: wtserver1

125いいね 2,818 views回再生

Post-Training, RL, Experiments and Indic AI | Tokenbender

Tokenbender on:
Post Training
RL and Reasoning
Post AGI Landscape
Experiments
Indic AI Landscape
-------------------------------------------------------------
Tokenbender [Guest]: https://x.com/tokenbender
Himanshu [Host at GroundZero]: https://x.com/himanshustwts
GroundZero AI: https://x.com/groundzero_ai
--------------------------------------------------------------

To sponsor a future episode: https://buymeacoffee.com/himanshustwts

---------------------------------------------------------------

TIMESTAMPS

(00:00:00) - INTRO
(00:01:30) - Career Trajectory and Motivation
(00:10:00) - Non-CS Background and Building Intuitions
(00:14:00) - Journey with Codecherrypop and Small Models
(00:21:50) - Partner-In-Crime and Roleplay Series
(00:28:45) - Post-Training and how it is evolved?
(00:36:45) - Is pre-training actually dead?
(00:45:10) - RL over next-token-predictors?
(00:52:10) - Reliable agents, RL in training workload
(00:58:07) - Weak priors and Reward Sparsity
(01:01:20) - What's new RL sauce?
(01:06:11) - RL from Zero Pre-train, Coherent text and Beyond
(01:12:01) - Intelligence isn't flat, Optimizing for one sharp spike?
(01:16:37) - Sampling and Creating Data for Models, New approaches?
(01:20:55) - Role of failures
(01:24:26) - Obsession over next number
(01:27:05) - Shallow safety alignment
(01:31:58) - RL over Diffusion Models, 'aha' moments
(01:37:40) - 50x in productivity?
(01:40:18) - How do you build the mindset to keep experimenting?
(01:48:20) - Writing papers on AI research
(01:51:10) - How you look up to open source models, what next?
(01:53:45) - Finding or Creating synthetic datasets
(01:56:08) - TRIVIA
(02:07:06) - Indic AI Landscape, Challenges
(02:12:32) - ADVICE FOR STUDENTS
(02:16:50) - FINAL THOUGHTS FROM TOKENBENDER

Post-Training, RL, Experiments and Indic AI | Tokenbender

コメント