
Agent Platform Scorecard
Twelve dimensions. Seven providers. Re-graded as announcements land — with the market's reaction alongside.
Twelve dimensions, equal weight, 1–10, no hidden math. How the grades are set, how they move when announcements land, and why the market reaction is part of the story.
The Agent Platform Scorecard grades the major AI platform providers on a 1–10 scale across twelve dimensions. This page explains exactly how those numbers are produced, how they change, and what they are — and aren't.
Every provider is graded on the same twelve dimensions, grouped into four themes plus a north star:
The overall score is a simple equal-weight average of all twelve. There are no hidden weights and no secret multipliers. If you think a dimension should count for more, you have everything you need to compute your own weighting — the per-cell scores are all public.
These are editorial grades — a considered, opinionated read of the public record. They are not a benchmark, and they are not a measurement. A 9 means "clear leader, hard to beat right now." A 5 means "real but developing." A 2 means "essentially absent." Each cell carries a one-sentence rationale; hover any cell on the scorecard to read it.
We grade what has shipped and is verifiable, not roadmaps and not demos. A keynote slide is a reason to watch a dimension, not to move it. Capability is earned, not asserted — the same principle the rest of Myagi is built on.
When an announcement lands — a keynote, a model launch, a major feature — we do two things at once:
from → to delta and a plain-English read of what the announcement means from an agent and control-plane perspective.Because the change log is the single source of history, the scorecard can show you not just where a provider stands but how it got there — including which keynote moved which dimension.
When a public company makes an AI announcement, investors vote in real time. A keynote that moves AAPL, GOOGL, MSFT, or META is telling you something our editorial grade can't: whether the people with money on the line believe it matters. So during live events we capture the market reaction alongside the score delta — a second opinion, not a substitute for one. (OpenAI, Anthropic, and xAI are private; where a public proxy exists, we note it.)
Grades are opinions, and opinions can be wrong. When new information contradicts a score, we change it and date the change. The data behind this scorecard is versioned; nothing is quietly edited away.