Useless benchmark? Why M3 won Fable 5?
https://artificialanalysis.ai/evaluations/artificial-analysis-long-context-reasoning?models=gpt-oss-120b%2Cgpt-oss-20b%2Cllama-4-maverick%2Cclaude-fable-5%2Cclaude-sonnet-4-6-adaptive%2Cclaude-opus-4-8%2Cmistral-large-3%2Cfalcon-h1r-7b%2Cnova-2-0-pro-reasoning-medium%2Cminimax-m2-7%2Cminimax-m3%2Cnvidia-nemotron-3-nano-30b-a3b-reasoning%2Ckimi-k2-6%2Ckimi-k2-7-code%2Ckat-coder-pro-v1%2Ck2-think-v2%2Cmi-dm-k-2-5-pro-dec28%2Chyperclova-x-seed-think-32b%2Cglm-5-1%2Cglm-5-2%2Cqwen3-7-max%2Cqwen3-7-plus%2Cqwen3-6-35b-a3b%2Cgpt-5-mini%2Cgpt-5-2-codex%2Cgpt-5-1%2Cgpt-5-2%2Cgpt-5-1-non-reasoning%2Cgemini-3-pro%2Cgemini-3-flash-reasoning%2Cdeepseek-v3-2-reasoning%2Cgrok-4-1-fast-reasoning%2Ckimi-k2-5%2Ckimi-k2-thinking%2Cmimo-v2-flash-reasoning%2Cglm-5%2Cqwen3-235b-a22b-instruct-2507-reasoning%2Cqwen3-32b-instruct-reasoning#artificial-analysis-long-context-reasoning-benchmark-leaderboard-score
Comments