Effective February 1, 2026
Initial methodology. 6 dimensions: 3 deterministic (60%) + 3 AI-judged (40%).
Did the agent correctly identify the injected fault as the root cause?
Did the agent apply a valid remediation action?
How quickly did the agent reach a correct diagnosis? Faster = higher score.
How thorough and accurate is the agent's written explanation?
Does the agent follow SRE best practices and structured triage?
How clear and actionable is the agent's output for a human SRE?
Total weight: 100% across 6 dimensions. Results scored under older methodology versions retain their original scores.