Agent Platform Scorecard

Who is building the agent operating system?

Twelve dimensions across seven providers. Re-graded as announcements land, with the market's reaction shown alongside.

How We Score: The Agent Platform Scorecard Methodology

Twelve dimensions, equal weight, 1 to 10, no hidden math. How the grades are set, how they move when announcements land, and why the market reaction is part of the story.

MyagiJune 1, 20265 min read

The Agent Platform Scorecard grades the major AI platform providers on a 1 to 10 scale across twelve dimensions. This page explains exactly how those numbers are produced, how they change, and what they are and aren't.

The rubric

Every provider is graded on the same twelve dimensions, grouped into four themes plus a north star:

Intelligence: Assistant Intelligence, Agent Capability, Conversational UX
Orchestration: Cross-App Actions, Personal Context, Third-Party Integrations
Platform: Developer APIs, Model Strategy, Agent Platform Potential
Trust & Reach: On-Device AI, Enterprise Readiness
North Star: MyAGI Alignment, how close the provider is to a true agent operating system

The overall score is a simple equal-weight average of all twelve. There are no hidden weights and no secret multipliers. If you think a dimension should count for more, you have everything you need to compute your own weighting. The per-cell scores are all public.

What the scores are

These are editorial grades, a considered, opinionated read of the public record. They are not a benchmark, and they are not a measurement. A 9 means "clear leader, hard to beat right now." A 5 means "real but developing." A 2 means "essentially absent." Each cell carries a one-sentence rationale; hover any cell on the scorecard to read it.

We grade what has shipped and is verifiable, not roadmaps and not demos. A keynote slide is a reason to watch a dimension, not to move it. Capability is earned, not asserted, the same principle the rest of Myagi is built on.

How scores move

When an announcement lands, whether a keynote, a model launch, or a major feature, we do two things at once:

Re-grade the cell in the live data, bumping the score with a fresh rationale and date.
Log the change as an entry in that event's live feed, recording the exact from -> to delta and a plain-English read of what the announcement means from an agent and control-plane perspective.

Because the change log is the single source of history, the scorecard can show you not just where a provider stands but how it got there, including which keynote moved which dimension.

Why the market is part of the story

When a public company makes an AI announcement, investors vote in real time. A keynote that moves AAPL, GOOGL, MSFT, or META is telling you something our editorial grade can't: whether the people with money on the line believe it matters. So during live events we capture the market reaction alongside the score delta, a second opinion, not a substitute for one. (OpenAI, Anthropic, and xAI are private; where a public proxy exists, we note it.)

Corrections

Grades are opinions, and opinions can be wrong. When new information contradicts a score, we change it and date the change. The data behind this scorecard is versioned; nothing is quietly edited away.