The art of training a good (reasoning) language model

「ツール」は右上に移動しました。

利用したサーバー: wtserver3

77いいね 1567回再生

The art of training a good (reasoning) language model

Why are some models that are totally exceptional on every benchmark a total flop in normal use? This is a question I was hinting at in my post on GPT-4o’s sycophancy, where I described it as “The Art of The Model”:

RLHF is where the art of the model is crafted and requires a qualitative eye, deep intuition, and bold stances to achieve the best outcomes.

In many ways, it takes restraint to land a great model. It takes saying no to researchers who want to include their complex methods that may degrade the overall experience (even if the evaluation scores are better). It takes saying yes to someone advocating for something that is harder to measure.

In many ways, it seems that frontier labs ride a fine line between rapid progress and usability. Quoting the same article:

While pushing so hard to reach the frontier of models, it appears that the best models are also the ones that are closest to going too far.

Once labs are in sight of a true breakthrough model, new types of failure modes and oddities come into play. This phase won’t last forever, but seeing into it is a great opportunity to understanding how the sausage is made and what trade-offs labs are making explicitly or implicitly when they release a model (or in their org chart).

This talk expands on the idea and goes into some of the central grey areas and difficulties in getting a good model out the door. Overall, this serves as a great recap to a lot of my writing on Interconnects in 2025, so I wanted to share it along with a reading list for where people can find more.

The talk took place at an AI Agents Summit local to me in Seattle. It was hosted by the folks at OpenPipe who I’ve been crossing paths with many times in recent months — they’re trying to take similar RL tools I’m using for research and make them into agents and products (surely, they’re also one of many companies).

See OpenPipe here (not sponsored, just friends): openpipe.ai/

Homepage (notes, links, transcript, watch without ads): www.interconnects.ai/p/crafting-a-good-reasoning-m…

Slides: docs.google.com/presentation/u/2/d/1_ByHKLt49h3Vuk…

Chapters:
00:00 Introduction & the state of reasoning
05:50 Hillclimbing imperfect evals
09:18 Technical bottlenecks
13:02 Sycophancy
18:08 The Goldilocks Zone
19:28 What comes next? (hint, planning)
26:40 Q&A

Reading list (roughly in order of the talk):
(June 12 2025) The rise of reasoning machines – www.interconnects.ai/p/the-rise-of-reasoning-machi…
(Feb 24 2025) Claude 3.7 Thonks and What’s Next for Inference-time Scaling – www.interconnects.ai/p/claude-3-7-thonks
(Apr 19 2025) OpenAI’s o3: Over-optimization is back and weirder than ever – www.interconnects.ai/p/openais-o3-over-optimizatio…
RLHF Book – Over Optimization (chapter 17) – rlhfbook.com/c/17-over-optimization.html
(Feb 28 2025) GPT-4.5: “Not a frontier model”? – www.interconnects.ai/p/gpt-45-not-a-frontier-model
(May 4 2025) Sycophancy and the art of the model – www.interconnects.ai/p/sycophancy-and-the-art-of-t…
(Apr 7 2025) Llama 4: Did Meta just push the panic button? – www.interconnects.ai/p/llama-4
RLHF Book – Preference Data (chapter 6) – rlhfbook.com/c/06-preference-data.html
(Jul 3 2024) Switched to Claude 3.5 – www.interconnects.ai/p/switched-to-claude-from-cha…
(Jun 4 2025) A taxonomy for next-generation reasoning models – www.interconnects.ai/p/next-gen-reasoners
(Jun 9 2025) What comes next with reinforcement learning – www.interconnects.ai/p/what-comes-next-with-reinfo…

Get Interconnects (www.interconnects.ai/)...
... on YouTube: / @interconnects
... on Twitter: x.com/interconnectsai
... on Linkedin: www.linkedin.com/company/interconnects-ai
... on Spotify: open.spotify.com/show/2UE6s7wZC4kiXYOnWRuxGv
… on Apple Podcasts: podcasts.apple.com/us/podcast/interconnects/id1719…

The art of training a good (reasoning) language model

コメント