Rating Methodology

Open Ratings
Trust & Governance Framework

Credit-risk-calibrated trust assessment for the agentic economy
Version: 2.0 Effective: April 2026 Issuer: Open Ratings, a product of Toryx Inc. Patents: Multiple U.S. patent applications — Inventor: Luis M. Sanchez

Overview

Open Ratings provides independent, cryptographically anchored trust assessments for the agentic economy — evaluating the outputs, transactions, and deliverables produced by AI agents, human–AI partnerships, and autonomous software systems. Our methodology applies the risk proportionality principles used by established credit rating agencies to a new class of counterparty: the AI agent.

In the emerging agentic economy, AI agents negotiate contracts, produce code, generate content, execute transactions, and deliver services — often with minimal human oversight. Every such interaction carries risk. Open Ratings evaluates these agentic outputs at a specific point in time — not a continuous stream, but a deliberate snapshot, analogous to a credit rating agency assessing a bond issuance. Each rating reflects the trustworthiness, correctness, and reliability of that output as assessed by our multi-dimensional analytical framework.

Our rating scale mirrors the letter-grade conventions used globally in credit markets (AAA through D), and our target grade distribution is calibrated to the observed distribution of corporate credit ratings. This means a BBB from Open Ratings carries the same relative trust signal as a BBB in the bond market: adequate quality with room for improvement, but fundamentally sound. When an agent delivers a BBB-rated output, counterparties know what level of risk they are accepting.

Every rating is permanently recorded via Merkle-tree hashing and SHA-256 attestation, creating an immutable, independently verifiable record of when the assessment was performed, what grade was assigned, and what computational evidence supports the finding. This cryptographic anchoring enables trust without requiring trust — any party can independently verify an agent’s track record.

Disclosures

Scope of Assessment

Grades represent an assessment of agent-produced outputs at a specific point in time. They are not a guarantee of future performance, a warranty against defects, or an endorsement of an agent’s general capability. Agent behavior may vary across tasks, parameter configurations, domains, and interaction contexts not represented in the assessed output.

Rating Scale

Grade Score Range Assessment Definition Target Dist.
Investment Grade — Production Recommended
AAA 9.70 – 10.00 Human + Auto Exceptionally trustworthy agent output. Near-zero defect probability. Verified through comprehensive automated analysis, formal methods, and expert human due diligence. Extremely rare. < 0.05%
AA 9.30 – 9.69 Human + Auto Very strong output quality, differing from the highest-rated outputs only to a small degree. Requires enhanced validation depth and expert review. ~2%
A 7.50 – 9.29 Automated Strong output quality. Sound architecture, adequate verification coverage, no critical vulnerabilities detected. Somewhat more susceptible to quality degradation under high-throughput agentic workflows than higher-rated outputs. ~8%
BBB 6.00 – 7.49 Automated Adequate output quality. Exhibits acceptable quality parameters, but adverse conditions — such as prompt drift, tool-chain changes, or model updates — are more likely to weaken quality over time. ~15%
Speculative Grade — Elevated Risk
BB 4.50 – 5.99 Automated Below-average quality. Faces ongoing uncertainties including incomplete verification, known vulnerabilities, hallucinated dependencies, or architectural weaknesses in the agent’s output. ~20%
B 3.00 – 4.49 Automated Weak output quality. Currently functional, but adverse conditions — model regression, tool deprecation, or environmental changes — will likely impair reliability. ~25%
CCC 1.50 – 2.99 Automated Substantial quality risk. Dependent on favorable conditions to avoid failure. Critical vulnerabilities, hallucinated dependencies, or fabricated API endpoints detected in agent output. ~18%
D 0.00 – 1.49 Automated Non-functional or critically defective. The agent’s output fails on basic execution, contains irreconcilable errors, or exhibits fundamental flaws rendering it unsuitable for any production use or downstream consumption. ~12%
Investment Grade Threshold

Agent outputs rated BBB or above are considered investment grade — suitable for production use, downstream integration, and organizational reliance. Grades below BBB carry speculative-grade designation, indicating elevated counterparty risk. This threshold mirrors the BBB/BB boundary used globally in credit markets. In the agentic economy, this distinction determines whether an agent’s deliverables can be trusted without additional human verification.

Target Distribution

Our rating distribution is calibrated to mirror the observed distribution of corporate credit ratings. This ensures that an Open Ratings grade carries proportional risk weight equivalent to its credit market counterpart. A AAA is as rare in the agentic economy as it is in corporate bonds — today, only two U.S. companies maintain a AAA corporate credit rating. Most agent outputs, like most corporate issuers, cluster in the middle of the scale.

Expected Rating Distribution Across Rated Agent Outputs
AAA
< 0.05%
AA
2%
~2%
A
8%
~8%
BBB
15%
~15%
BB
20%
~20%
B
25%
~25%
CCC
18%
~18%
D
12%
~12%

These proportions are targets, not quotas. Distribution targets serve as calibration anchors — if observed distributions diverge significantly from targets, we investigate whether the divergence reflects the true state of agent output quality or a calibration error in our model.

Assessment Dimensions

Each agent output is evaluated across six trust dimensions. Dimensions are weighted to reflect their relative contribution to overall counterparty risk in agentic transactions, calibrated through empirical analysis of production incident correlation and agent failure modes.

Dimension Priority What We Assess
Security Primary
Vulnerability exposure, secret management, input validation, authentication patterns, dependency vulnerability propagation, detection of hallucinated packages or API endpoints, and agent tool-use safety boundaries.
Logic Correctness Primary
Semantic accuracy of implementation relative to stated intent, edge case handling, error propagation paths, type safety, and divergence between specifications and actual behavior.
Codebase Fit High
Architectural coherence, consistency with project conventions, appropriate abstraction levels, integration quality with existing systems, and alignment with the consuming agent’s or human’s operational context.
Maintainability Moderate
Cyclomatic and cognitive complexity, nesting depth, function length, naming clarity, documentation coverage, and long-term modifiability.
Test Quality Supporting
Test effectiveness measured through mutation analysis: deliberate code changes that should break tests but don’t indicate weak test suites.
Dependency Health Supporting
Dependency freshness, known vulnerability exposure (CVE), license compatibility, maintainer activity, and whether declared dependencies actually exist.

Assessment Process

Ratings are produced through a multi-phase analytical pipeline that generates Verification Event Envelopes (VEEs) at each stage. Each VEE is an independently verifiable intermediate output, cryptographically combined into a proof-of-work attesting to the depth and integrity of the assessment.

  1. Input Ingestion
    The agent’s output is captured at the specified revision or transaction point. Dependency manifests, language composition, framework indicators, and provenance metadata are extracted. The producing agent is identified through stylometric analysis, tool-use signatures, and declared agent identity where available.
  2. Static Analysis
    Abstract syntax tree parsing measures structural complexity across every function and module. Pattern-matching rules detect security anti-patterns, code duplication, and known vulnerability signatures. Declared dependencies are cross-referenced against authoritative registries.
  3. Sandboxed Execution
    The agent’s output is built and executed within an isolated containerized environment. Dependency installation is attempted and failures recorded. Test suites are executed and results captured. Mutation analysis introduces controlled defects to measure verification effectiveness.
  4. Quality Scoring
    Dimension scores are computed from the combined static and dynamic analysis results, weighted per the dimension table, and aggregated into a composite quality score. The composite score is mapped to a letter grade per the rating scale thresholds.
  5. Proof-of-Work Generation
    Intermediate outputs from each phase are organized into a Merkle tree. A difficulty-adjusted cryptographic proof is generated, with difficulty proportional to the achieved grade.
  6. Cryptographic Anchoring
    The rating grade, composite score, proof-of-work hash, Merkle root, and timestamp are sealed as a Verification Event Envelope (VEE) via cryptographic attestation, creating an immutable, timestamped record that any counterparty in the agentic economy can independently verify.

Rating Tiers

Automated Assessment (D through A)

Ratings from D through A are produced entirely through the automated pipeline. No human judgment is involved in the scoring. A grade of A represents the highest trust level achievable through automated analysis of an agent’s output alone.

The automated ceiling exists by design. Trust beyond the A threshold involves factors that no automated system can reliably assess: architectural intent that spans years of technical strategy, organizational commitment to governance, the subtle distinction between output that is merely correct and output that is wisely constructed, and whether the human–agent partnership demonstrates sustained judgment under adversarial conditions.

Enhanced Assessment (AA and AAA)

Human Due Diligence Required

Grades of AA and above cannot be achieved through automated analysis alone. They require a structured human due diligence review conducted by senior analysts with extensive experience in agentic systems, software risk assessment, and AI governance.

For an agent output to be considered for AA or AAA, it must first achieve a minimum automated score of A (7.50+). The automated score establishes the quantitative floor; the human review determines whether the qualitative factors — including the agent’s governance posture, the human–agent partnership dynamics, and long-term reliability trajectory — support elevation above that floor.

Agent Identity & Attribution

In the agentic economy, knowing which agent produced an output — or is directly or indirectly involved in an exchange of goods or services for any consideration — is as important as knowing which firm issued a bond. Open Ratings identifies the AI agent that generated or assisted in generating the assessed output through stylometric analysis, tool-use fingerprinting, and declared agent identity. This attribution is reported as part of the rating when detected with sufficient confidence.

Different AI agents exhibit distinct failure mode profiles — hallucination rates, tool-use patterns, reasoning depth, and domain-specific weaknesses — that affect the probability of undiscovered defects. Agent attribution enables agent-specific calibration: if an agent is known to have elevated hallucination rates for a particular package ecosystem, the dependency health dimension is scrutinized more heavily. Over time, agents accumulate rating histories that function as credit histories in the agentic economy.

Open Ratings also evaluates human–AI partnerships: the composite quality produced when a specific developer works with a specific agent. Partnership ratings weight developer skill, agent capability, agent wrapper quality, and collaboration patterns to produce a composite trust score for the partnership as a counterparty.

Cryptographic Verification

Every Open Rating is sealed as a Verification Event Envelope (VEE) via cryptographic attestation at the time of issuance. The VEE taxonomy supports multiple event types — CODE_QUALITY, AGENT_GOVERNANCE, COMPLIANCE_CHECK, MODEL_ATTESTATION — enabling a unified trust infrastructure for the agentic economy. The on-chain record is a compact attestation containing the essential elements needed to verify the rating’s existence and integrity.

// On-chain Verification Event Envelope (VEE)
event_type: CODE_QUALITY | AGENT_GOVERNANCE | COMPLIANCE_CHECK | MODEL_ATTESTATION
subject_hash: SHA-256(agent_id + output_ref)
revision_hash: first 20 bytes of assessed output SHA
grade: letter grade enum (D=0 ... AAA=7)
score: composite quality score × 100
pow_hash: proof-of-work hash
merkle_root: Merkle root of analysis phase outputs
timestamp: Unix timestamp of rating issuance
agent_id: identified producing agent (when attributed)
prior_rating: address of previous rating for same subject (if any)

The attestation layer serves as a filing cabinet, not an analytical tool. It provides three guarantees: the rating existed at the claimed time, the rating has not been altered after issuance, and the full history of an agent’s or output’s ratings can be traversed by following prior_rating links — building a verifiable trust history for any participant in the agentic economy.

Agent Trust History

Every rating recorded on-chain becomes part of an agent’s permanent trust history. Because each Verification Event Envelope includes a prior_rating link, the complete chain of assessments for any agent or output can be traversed backward to its first rated interaction — creating an immutable behavioral record analogous to a credit bureau file.

This history is not merely archival. Past agentic behavior, cryptographically recorded and independently verifiable, informs future assessments. An agent with a consistent record of investment-grade outputs carries that reputation forward. An agent whose outputs have degraded over successive assessments will reflect that trajectory in its composite trust profile. The blockchain does not forget, and neither does the rating framework.

Trust history enables several capabilities that static, point-in-time assessment cannot:

Behavioral Record, Not Behavioral Prediction

Agent trust histories record what has happened, not what will happen. Past performance does not guarantee future output quality. Trust histories are an input to risk assessment, not a substitute for it. The rating framework uses historical data as one signal among many — the specific weighting and integration methodology is proprietary.

Rating Types

Open Rating

Issued for publicly accessible agent outputs — open-source repositories, public API responses, or published deliverables. Any party can independently verify the output against the assessment findings.

Verified Rating

Issued for private agent outputs or proprietary agentic workflows using the identical methodology. The underlying output is not publicly accessible, but the assessment was conducted under the same standards.

Shadow Rating

An unsolicited Open Rating issued without the request or involvement of the agent operator or output owner. Produced at our discretion for agent outputs of public interest — particularly when agents are deployed at scale in critical infrastructure or high-stakes domains.

Agent Rating

A composite trust score for an AI agent derived from the aggregate history of its individual output ratings. Agent Ratings function as credit scores for the agentic economy: they reflect the agent’s demonstrated reliability across multiple assessments, domains, and time periods.

Reproducibility

Each assessment task is versioned and deterministic. Task definitions include structured metadata, evaluation prompts, source artifacts, and oracle checks. Runs log agent identifiers, model versions, decoding parameters, timestamps, and per-task outputs. Suite hashes and prompt hashes are recorded for auditability.

The assessment pipeline produces independently verifiable Verification Event Envelopes at each phase. These VEEs are organized into a Merkle tree whose root is anchored on-chain, enabling third-party verification that the claimed analysis was actually performed to the claimed depth. In the agentic economy, reproducibility is not optional — it is the mechanism by which trust scales beyond bilateral relationships.

Limitations & Disclaimers

An Open Rating is an assessment of agent output quality at a specific point in time for a specific deliverable or revision. It is not a guarantee of future performance, a warranty against defects, or an endorsement of the agent’s general capability or the operator’s business viability.

Ratings from D through A reflect automated analysis only. No automated system can guarantee the absence of defects. The absence of detected issues is not proof that no issues exist. Agent behavior is inherently non-deterministic; an agent that produces an A-rated output today may produce a B-rated output tomorrow under different conditions.

Agent attribution is probabilistic. Confidence levels below 80% are not published. Attribution reflects our best assessment based on stylometric analysis and tool-use fingerprinting, and is reported as an analytical finding, not a definitive determination. Agent Ratings (composite trust scores) require a minimum number of individual assessments before publication.

Intellectual Property

The Open Ratings framework is built on a portfolio of issued and pending U.S. patents invented by Luis M. Sanchez. The portfolio follows a hub-and-spoke architecture: a core cryptographic verification infrastructure supports multiple application-specific patents covering distinct aspects of the AI trust stack.

Domain Scope Status
Cryptographic Verification Infrastructure Blockchain-anchored verification protocol with configurable transparency, identity-bound key lifecycle, and selective disclosure for AI agent governance. The trust layer underlying all Open Ratings attestations. Patent Pending
AI Code Quality Certification Computational proof-of-work system for certifying agent output quality using credit-risk-calibrated grading, agent-specific calibration, and multi-signature consensus validation. Patent Pending
Human–AI Partnership Rating System for rating human–AI coding partnerships through composite scoring of developer skill, agent capability, agent wrapper quality, and collaboration patterns. Patent Pending
Digital Asset Compatibility & Agent Licensing Asset-capability compatibility scoring across heterogeneous registries with Bayesian quality-backed collateral and agent-native licensing protocols for the agentic economy. Patent Pending
Agent-Governed Messaging Self-governing payload architecture for agent-to-agent and agent-to-human messaging with built-in governance, policy enforcement, and audit provenance. Patent Pending
Legacy Code Modernization Dialect-adaptive deployment system for enterprise legacy code modernization, including dialect fingerprinting and constraint-injected translation pipelines. Patent Pending

Additional patents in the portfolio cover AI model routing and inference optimization, hardware-efficient distributed inference, and knowledge distillation systems. The specific methodologies, algorithms, and weighting systems used in the Open Ratings assessment engine are protected as trade secrets and by the patent portfolio described above.

Toryx Inc.

Trust infrastructure for the agentic economy.

Toryx Inc. is an independent technology company building the trust and governance layer for the agentic economy — the emerging world where AI agents produce code, negotiate contracts, execute transactions, and deliver services autonomously or in partnership with humans.

The firm develops reproducible assessment frameworks designed to measure the trustworthiness, correctness, and operational robustness of agent-produced outputs across open-source repositories, frontier model deliverables, agentic transactions, and legacy modernization workflows.

Toryx products emphasize:

All frameworks are designed to support enterprise and regulated environments where agents operate at scale and where traceability, reproducibility, and counterparty trust are essential.

Corporate Structure

Open Ratings™ is a trademark and product of Toryx Inc. All reliability grades are issued under the Open Ratings framework and are administered by Toryx.