Artax

Use the left navigation to move between review, account, pricing, and setup pages.

Benchmark methodology

Benchmark definitions, without benchmark theater.

This page explains Artax benchmark methodology and the benchmark evidence bundles currently available. It does not publish superiority claims, competitive scorecards, or production-grade benchmark results.

Methodology onlyNo superiority claimsRegistry-derived

Benchmark definitions are public, and evidence stays narrow on purpose.

This page shows the benchmark families and minimum metrics Artax uses, plus the narrow links between public trust metrics and those definitions. It also shows where direct benchmark evidence exists, where only supporting artifacts exist, and where a benchmark is definition-only. The goal is simple: make methodology visible, keep partial evidence visible, and avoid turning incomplete evidence into superiority claims.

Benchmark families

Benchmark dimensions Artax can measure and publish when evidence exists.

Minimum metrics

Definition-level benchmark metrics that exist before any benchmark result claims are allowed.

Evidence bundles

Current benchmark evidence bundles spanning implemented, supporting-only, and definition-only coverage.

Public linkages

Current public-safe metrics already tied back to explicit benchmark definitions.

Snapshot sections

Generated benchmark proxy sections derived from current evidence.

What this page can claim

Current honesty boundary

Published benchmark claims

Not allowed

Benchmark pipelines

Implemented

Narrow evidence bundles

1 implemented, 2 supporting-only, 0 definition-only.

Freshness warnings

3 benchmark evidence bundles need review.

Regression thresholds

5 of 5 checked-in regression rules are currently passing.

What is live

The public surface is a narrow evidence layer for the transfer completion path, supporting artifacts for broader benchmark families, and a generated benchmark snapshot. It is not a competitive scorecard or a superiority claim.

Generated artifacts

Artax publishes benchmark definitions, benchmark evidence, and a generated snapshot from the same inputs so people can inspect the basis for each claim.

Why this exists

Artax may publish narrow repo-scoped benchmark evidence bundles plus generated internal benchmark snapshots that summarize current evidence coverage and measurable proxy sections. These artifacts do not authorize competitor scorecards, benchmark wins, or production certification claims.

Current benchmark evidence bundles

These bundles show the evidence behind the benchmark claims Artax makes publicly. Some have direct evidence, some only have supporting artifacts, and some are definition-only. This page keeps those levels separate instead of flattening them into one vague story.

benchmark_evidence.transfer_completion_repo_scope

Transfer completion evidence bundle

Status

Implemented Narrow Repo Evidence (review overdue)

Sample scope

Current transfer-first Artax rail only, using live devnet rehearsal plus delayed public-safe transfer submit success and verified submission-path coverage.

Benchmark families

No-SOL completion success (Definition Only)

Minimum metrics

Supported no-SOL completion success rate (Definition Only Not Collected As Benchmark, Ratio)

Public metrics

supported_flow_submit_success_rate: Delayed success rate for the current supported transfer-rail submissions within a reviewed public-safe publication window. public_verified_submission_path_coverage: Current coverage ratio of implemented flows with a submission path whose submission route has strict local and devnet verification evidence in the current repo-scoped release-evidence model.

Release evidence

evidence.devnet.live_verification: Live devnet verification workflow

Launch gates

gate.flow.spl_token_transfer_review_and_submit: Local And Devnet Verified Not Production; gate.flow.transfer_sponsorship_class_d_bounded_principal: Local And Devnet Verified Not Production

Exclusions and caveats

Simple swap review remains outside submit-success benchmark evidence because submit sponsorship is intentionally unsupported there.; This bundle is narrower than production reliability, broader category leadership, or competitor scorecard proof.

Current blockers

Production compatibility certification evidence does not exist yet.; The delayed public metric is threshold-gated and may be withheld for low-volume windows.

benchmark_evidence.review_submit_and_account_surface_repo_scope

Review, submit, and account-surface evidence bundle

Status

Supporting Evidence Only (review overdue)

Sample scope

Current hosted review, developers, and governed account surfaces only.

Benchmark families

User understanding quality (Definition Only); Integration effort (Definition Only); Compatibility breadth (Definition Only)

Minimum metrics

Average required user prompts by supported flow (Definition Only Not Collected As Benchmark, Count); Average builder integration time for documented starter paths (Definition Only Not Collected As Benchmark, Hours); Compatibility success rate by wallet and surface (Definition Only Not Collected As Benchmark, Ratio)

Public metrics

public_supported_flow_coverage: Current supported-flow coverage that is approved for public trust publication.

Release evidence

None

Launch gates

gate.surface.sdk_package: Implemented Not Production Certified; gate.flow.simple_swap_review: Local And Devnet Review Only Not Production

Exclusions and caveats

Compatibility matrix truth is not the same thing as compatibility benchmark evidence or certification.; The current snapshot for this bundle is a certification-evidence coverage proxy, not a direct compatibility success-rate benchmark result.; No recurring builder-integration-time or required-user-prompt benchmark job exists yet.

Current blockers

Compatibility certification evidence is still missing.; Builder integration-time and user-prompt metrics are not collected as benchmark artifacts.; Current proxy coverage does not replace direct compatibility success-rate benchmarking for these metrics.

benchmark_evidence.support_truth_and_operational_unification_repo_scope

Support-truth and operational evidence bundle

Status

Supporting Evidence Only (review overdue)

Sample scope

Repo and localhost operational-support evidence only.

Benchmark families

Support burden (Definition Only); Operational resilience (Definition Only); Denial precision (Definition Only)

Minimum metrics

Support cases per 1,000 sponsored transactions (Definition Only Not Collected As Benchmark, Count Per 1000); Replay or duplicate prevented events (Definition Only Not Collected As Benchmark, Count); Sponsorship denial rate by category (Definition Only Not Collected As Benchmark, Ratio)

Public metrics

None

Release evidence

evidence.repo.provenance_baseline_generation: Generated provenance baseline for build and rollout inputs evidence.local.production_like_rehearsal: Local production-like Docker rehearsal

Launch gates

gate.release_target.render_staging_rollout: Staging Rollout Gated Not Production

Exclusions and caveats

There is no aggregated support-case benchmark dataset yet.; Replay-prevention and denial-precision benchmark outputs are not published as recurring benchmark artifacts.

Current blockers

Support-burden and denial-precision benchmark metrics are not collected yet.; Operational evidence exists, but it is still narrower than production-grade resilience benchmarking.

Current benchmark snapshot

This snapshot condenses release checks, compatibility checks, and operations checks into measurable sections. It is a supporting reference, not a competitive scorecard. The same snapshot also feeds the regression alerts that pause stronger claims when the evidence slips.

transfer_completion_proxy

Transfer completion benchmark proxy

Collection status

Proxy Snapshot Generated

Evidence bundle

benchmark_evidence.transfer_completion_repo_scope

Summary

This proxy measures strict local-plus-devnet verification coverage for current submission-capable flows, which is narrower than a live no-SOL completion success-rate benchmark run.

Strict verified submission coverage

5/6. 83% of implemented submission-capable flows are strictly verified in both local and devnet evidence.

Strictly verified flows

SPL token transfer review and submit; Transfer sponsorship Class A simple transfer; Transfer sponsorship Class B recipient account setup; Transfer sponsorship Class C wrapped SOL setup; Transfer sponsorship Class D bounded wrapped SOL principal shortfall. Only flows with local=verified and devnet=verified are counted.

Current caveated remainder

Simple swap review. Flows with optional or caveated submission evidence stay outside the strict numerator.

Release evidence gate

Local production-like Docker rehearsal; Live devnet verification workflow. The current transfer-wedge proxy only counts flows tied back to both local and devnet release evidence.

Current caveats

This is a benchmark-family-aligned proxy, not the live delayed public-safe submit success-rate metric itself.; It does not publish competitor comparisons, broad production reliability, or swap-submit success.

compatibility_breadth_proxy

Compatibility breadth benchmark proxy

Collection status

Proxy Snapshot Generated

Evidence bundle

benchmark_evidence.review_submit_and_account_surface_repo_scope

Summary

This proxy measures how much of the current compatibility profile catalog has explicit certification-evidence records and evidence beyond repo-local-only scope.

Profiles with certification records

4/4. This shows catalog coverage for certification records, not production certification success.

Profiles with devnet or staging evidence

3/4. These profiles have at least some tested-environment evidence beyond repo-local-only scope.

Production certifications granted

0/4. Current production certification remains explicitly withheld across the entire compatibility catalog.

Current caveats

This is not direct wallet-and-surface success-rate telemetry.; It does not certify browser brands, wallets, or signing paths for production use.

operational_resilience_proxy

Operational resilience benchmark proxy

Collection status

Proxy Snapshot Generated

Evidence bundle

benchmark_evidence.support_truth_and_operational_unification_repo_scope

Summary

This proxy measures current recurring operator-job coverage and flow-gate readiness rather than a collected support-burden or resilience benchmark dataset.

Required ops jobs implemented

6/6. Implemented jobs are checked-in workflows or manual substitutes recorded in the operations-job inventory.

Required ops jobs still partial

0/6. none

Required ops jobs still planned

0/6. No required cron-job records remain fully planned.

Production-ready supported-flow gates

0/0. Flow-gate readiness remains narrower than full operational resilience benchmarking or production support claims.

Current caveats

This is not a direct support-case or incident-rate dataset.; It must not be read as a production SRE, support desk, or enterprise disaster-recovery metric.

Current public benchmark linkages

These are the current benchmark-like public metrics that already trace back to benchmark families and minimum metrics. Today that linkage is intentionally narrow: it supports trust-page methodology honesty without pretending there is a full benchmark program or competitive proof.

benchmark_linkage.supported_flow_submit_success_rate.transfer_completion

Public metrics

supported_flow_submit_success_rate: Delayed success rate for the current supported transfer-rail submissions within a reviewed public-safe publication window.

Benchmark families

No-SOL completion success (Definition Only)

Minimum metrics

Supported no-SOL completion success rate (Definition Only Not Collected As Benchmark, Ratio)

Linked claims

claim.public_supported_flow_submit_success_rate_metric_exists_for_transfer_scope: Artax now publishes a delayed public-safe supported-flow submit success metric for the current transfer submission rail when the reviewed public window has sufficient terminal submissions.

Notes

The current delayed transfer-rail submit success metric is a narrower public-safe publication aligned to the broader no-SOL completion success benchmark family. It is public-app-only, transfer-rail-only, delayed, rounded, and threshold-gated.

Benchmark families and minimum metrics

benchmark_family.user_understanding_quality

User understanding quality

Measures whether Artax helps users understand reviewed intent, warnings, fee handling, and denial reasons clearly.

Status: Definition Only

Average required user prompts by supported flow

benchmark_metric.average_required_user_prompts_by_supported_flow

Status: Definition Only Not Collected As Benchmark. Unit: Count. Public benchmark publication: not allowed.

benchmark_family.no_sol_completion_success

No-SOL completion success

Measures how often supported no-SOL flows actually complete successfully under the current Artax boundaries.

Status: Definition Only

Supported no-SOL completion success rate

benchmark_metric.supported_no_sol_completion_success_rate

Status: Definition Only Not Collected As Benchmark. Unit: Ratio. Public benchmark publication: not allowed.

benchmark_family.latency

Latency

Measures review, quoting, and submit-path responsiveness for supported flows.

Status: Definition Only

Review latency

benchmark_metric.review_latency

Status: Definition Only Not Collected As Benchmark. Unit: Milliseconds. Public benchmark publication: not allowed.

Quote latency

benchmark_metric.quote_latency

Status: Definition Only Not Collected As Benchmark. Unit: Milliseconds. Public benchmark publication: not allowed.

Approval-to-submission latency

benchmark_metric.approval_to_submission_latency

Status: Definition Only Not Collected As Benchmark. Unit: Milliseconds. Public benchmark publication: not allowed.

benchmark_family.quote_accuracy

Quote accuracy

Measures how closely quoted sponsorship and recovery expectations match later outcomes.

Status: Definition Only

benchmark_family.settlement_variance

Settlement variance

Measures how much final fee collection and settlement diverge from the reviewed quote.

Status: Definition Only

Quote-to-final-collection variance rate

benchmark_metric.quote_to_final_collection_variance_rate

Status: Definition Only Not Collected As Benchmark. Unit: Ratio. Public benchmark publication: not allowed.

benchmark_family.denial_precision

Denial precision

Measures whether denials are correctly scoped, reason-coded, and avoid both loose approval and vague failure.

Status: Definition Only

Sponsorship denial rate by category

benchmark_metric.sponsorship_denial_rate_by_category

Status: Definition Only Not Collected As Benchmark. Unit: Ratio. Public benchmark publication: not allowed.

benchmark_family.compatibility_breadth

Compatibility breadth

Measures how much Artax support is explicitly certified across supported wallets, browsers, and surfaces.

Status: Definition Only

Compatibility success rate by wallet and surface

benchmark_metric.compatibility_success_rate_by_wallet_surface

Status: Definition Only Not Collected As Benchmark. Unit: Ratio. Public benchmark publication: not allowed.

benchmark_family.support_burden

Support burden

Measures the operator and support cost of running supported Artax flows and explaining denials or degraded modes.

Status: Definition Only

Support cases per 1,000 sponsored transactions

benchmark_metric.support_cases_per_1000_sponsored_transactions

Status: Definition Only Not Collected As Benchmark. Unit: Count Per 1000. Public benchmark publication: not allowed.

benchmark_family.integration_effort

Integration effort

Measures how hard it is for developers to adopt documented Artax integration paths safely.

Status: Definition Only

Average builder integration time for documented starter paths

benchmark_metric.average_builder_integration_time_documented_starters

Status: Definition Only Not Collected As Benchmark. Unit: Hours. Public benchmark publication: not allowed.

benchmark_family.operational_resilience

Operational resilience

Measures Artax's ability to keep its review and execution rail stable through deploys, dependency issues, and recovery events.

Status: Definition Only

Replay or duplicate prevented events

benchmark_metric.replay_duplicate_prevented_events

Status: Definition Only Not Collected As Benchmark. Unit: Count. Public benchmark publication: not allowed.

benchmark_family.pricing_clarity

Pricing clarity

Measures whether Artax pricing stays understandable, bounded, and aligned with disclosed fee handling and sponsorship class semantics.

Status: Definition Only

Average effective fee by transaction class and token tier

benchmark_metric.average_effective_fee_by_transaction_class_and_token_tier

Status: Definition Only Not Collected As Benchmark. Unit: Ratio And Atomic Amount. Public benchmark publication: not allowed.

benchmark_family.safety_quality

Safety quality

Measures the quality of safety warnings and the balance between false positives and false negatives where measurable.

Status: Definition Only

Percentage of flows that degrade safely versus fail ambiguously

benchmark_metric.degrade_safely_vs_ambiguous_failure_rate

Status: Definition Only Not Collected As Benchmark. Unit: Ratio. Public benchmark publication: not allowed.

Trust

See the public trust metrics that link back to benchmark doctrine where required.

Compatibility

Inspect the compatibility matrix and supporting certification notes without mistaking either for production approval.

Comparisons

Read the class-based comparison surface that uses this benchmark methodology without pretending benchmark wins exist.

Status

Read the bounded live-status disclosure without mistaking it for benchmark evidence.

Developers

See the integration and limited-access surfaces that benchmark claims must stay narrower than.

Quickstart

Start from the live setup path instead of inferring support from benchmark language.