Deepseek v3 0324 is the first open-source model to match SOTA coding performance
- Understands user intention better than before; I’d say it’s better than Claude 3.7 Sonnet base and thinking. 3.5 is still better at this (perhaps the best)
- Again, in raw quality code generation, it is better than 3.7, on par with 3.5, and sometimes better.
- Great at reasoning, much better than any and all non-reasoning models available right now.
- Better at the instruction following than 3,7 Sonnet but below 3.5 Sonnet.
TIL Sonnet 3.7 is worse than 3.5. How come?
It’s not: https://llm-stats.com/models/compare/claude-3-7-sonnet-20250219-vs-claude-3-5-sonnet-20241022