All posts

Ptakha >> Sutskever

Ilya Sutskever just gave his first interview in two years. Co-founder of OpenAI, one of 3 people who literally invented modern AI. He went dark for two years. Worth paying attention.

1/ The scaling era is over

From 2020 to 2025, the formula was simple: more data + more compute = smarter model. It worked. Companies weren't doing research, they were scaling what already existed. But the internet is finite, and that's what the models trained on. From 2026 onward, you need genuinely new ideas. And there are far fewer new ideas than there are companies trying to build foundation models.

Why it matters: while Google, OpenAI, and Anthropic search for the next breakthrough, the window to build products on current models is wide open. Whoever dramatically improves UX with LLMs first, wins.

2/ Benchmarks ≠ Economic Value

Models ace evals and fail basic real-world tasks. I see this constantly: an agent handles something a human couldn't, then (literally in the next message) loops and repeats itself three times. Why? Companies optimized for benchmarks. They built great test-takers that perform poorly on actual work.

Why it matters: don't trust benchmarks when choosing a model for production. Build your own evals for each specific task and compare models on those. Easier said than done. Every team building real things with LLMs has eaten a lot of shit with evals over the past year.

3/ Pre-training commoditizes, post-training differentiates

All foundation models train on roughly the same data. Base capabilities barely differ. The differences emerge at post-training: RLHF, domain-specific fine-tuning, and similar techniques.

Why it matters: don't try to build a better foundation model. Grab open-source, Qwen or DeepSeek are already at SOTA for most domains. Invest in post-training on your own data. Quality post-training requires real infrastructure (online RL environments, tooling) and that's where the actual investment should go.

4/ Vertical > Horizontal

Narrow AI companies are winning: legal, medical, support, finance. Harvey (legal AI), PathAI (medical diagnostics) — it's already happening. First person to assemble the full puzzle in a specific domain, with unique data plus domain evals plus post-training infra, can monopolize that vertical. Beat ChatGPT in support. Or in legal docs. General-purpose models will survive, but business value will concentrate in verticals.

5/ The Generalization Gap

The most important thing he said: "These models somehow just generalize dramatically worse than people."

A kid hears a new word 2-3 times and uses it correctly. Models need thousands of examples. This isn't a scale problem. It's a fundamentally different mechanism, and we don't understand it.

Ilya's timeline to the next breakthrough: 5 to 20 years. A 4x spread. Make of that what you will.


P.S. I deliberately left out AGI and AI safety. Both are massively overhyped, nobody knows how it unfolds, and it's more speculation than actionable insight.

P.P.S. Two interviews dropped a week apart. Ptakha on Dud': 6M views. Sutskever on Dwarkesh: 1M. Interesting world.

Stay tuned

🥷🥷🥷


Links:


More takes — @tldrdaniel