Benchmark methodology
Benchmark definitions, without benchmark theater.
This page explains Artax benchmark methodology and the benchmark evidence bundles currently available. It does not publish superiority claims, competitive scorecards, or production-grade benchmark results.
Benchmark definitions are public, and evidence stays narrow on purpose.
This page shows the benchmark families and minimum metrics Artax uses, plus the narrow links between public trust metrics and those definitions. It also shows where direct benchmark evidence exists, where only supporting artifacts exist, and where a benchmark is definition-only. The goal is simple: make methodology visible, keep partial evidence visible, and avoid turning incomplete evidence into superiority claims.
Benchmark families
12
Benchmark dimensions Artax can measure and publish when evidence exists.
Minimum metrics
13
Definition-level benchmark metrics that exist before any benchmark result claims are allowed.
Evidence bundles
3
Current benchmark evidence bundles spanning implemented, supporting-only, and definition-only coverage.
Public linkages
1
Current public-safe metrics already tied back to explicit benchmark definitions.
Snapshot sections
3
Generated benchmark proxy sections derived from current evidence.
What this page can claim
Current honesty boundary
Published benchmark claims
Not allowed
Benchmark pipelines
Implemented
Narrow evidence bundles
1 implemented, 2 supporting-only, 0 definition-only.
Freshness warnings
3 benchmark evidence bundles need review.
Regression thresholds
5 of 5 checked-in regression rules are currently passing.
What is live
The public surface is a narrow evidence layer for the transfer completion path, supporting artifacts for broader benchmark families, and a generated benchmark snapshot. It is not a competitive scorecard or a superiority claim.
Generated artifacts
Artax publishes benchmark definitions, benchmark evidence, and a generated snapshot from the same inputs so people can inspect the basis for each claim.
Why this exists
Artax may publish narrow repo-scoped benchmark evidence bundles plus generated internal benchmark snapshots that summarize current evidence coverage and measurable proxy sections. These artifacts do not authorize competitor scorecards, benchmark wins, or production certification claims.
Current benchmark evidence bundles
These bundles show the evidence behind the benchmark claims Artax makes publicly. Some have direct evidence, some only have supporting artifacts, and some are definition-only. This page keeps those levels separate instead of flattening them into one vague story.
benchmark_evidence.transfer_completion_repo_scope
Transfer completion evidence bundle
Status
Implemented Narrow Repo Evidence (review overdue)
Sample scope
Current transfer-first Artax rail only, using live devnet rehearsal plus delayed public-safe transfer submit success and verified submission-path coverage.
Benchmark families
No-SOL completion success (Definition Only)
Minimum metrics
Supported no-SOL completion success rate (Definition Only Not Collected As Benchmark, Ratio)
Public metrics
supported_flow_submit_success_rate: Delayed success rate for the current supported transfer-rail submissions within a reviewed public-safe publication window. public_verified_submission_path_coverage: Current coverage ratio of implemented flows with a submission path whose submission route has strict local and devnet verification evidence in the current repo-scoped release-evidence model.
Release evidence
evidence.devnet.live_verification: Live devnet verification workflow
Launch gates
gate.flow.spl_token_transfer_review_and_submit: Local And Devnet Verified Not Production; gate.flow.transfer_sponsorship_class_d_bounded_principal: Local And Devnet Verified Not Production
Exclusions and caveats
Simple swap review remains outside submit-success benchmark evidence because submit sponsorship is intentionally unsupported there.; This bundle is narrower than production reliability, broader category leadership, or competitor scorecard proof.
Current blockers
Production compatibility certification evidence does not exist yet.; The delayed public metric is threshold-gated and may be withheld for low-volume windows.
benchmark_evidence.review_submit_and_account_surface_repo_scope
Review, submit, and account-surface evidence bundle
Status
Supporting Evidence Only (review overdue)
Sample scope
Current hosted review, developers, and governed account surfaces only.
Benchmark families
User understanding quality (Definition Only); Integration effort (Definition Only); Compatibility breadth (Definition Only)
Minimum metrics
Average required user prompts by supported flow (Definition Only Not Collected As Benchmark, Count); Average builder integration time for documented starter paths (Definition Only Not Collected As Benchmark, Hours); Compatibility success rate by wallet and surface (Definition Only Not Collected As Benchmark, Ratio)
Public metrics
public_supported_flow_coverage: Current supported-flow coverage that is approved for public trust publication.
Release evidence
None
Launch gates
gate.surface.sdk_package: Implemented Not Production Certified; gate.flow.simple_swap_review: Local And Devnet Review Only Not Production
Exclusions and caveats
Compatibility matrix truth is not the same thing as compatibility benchmark evidence or certification.; The current snapshot for this bundle is a certification-evidence coverage proxy, not a direct compatibility success-rate benchmark result.; No recurring builder-integration-time or required-user-prompt benchmark job exists yet.
Current blockers
Compatibility certification evidence is still missing.; Builder integration-time and user-prompt metrics are not collected as benchmark artifacts.; Current proxy coverage does not replace direct compatibility success-rate benchmarking for these metrics.
benchmark_evidence.support_truth_and_operational_unification_repo_scope
Support-truth and operational evidence bundle
Status
Supporting Evidence Only (review overdue)
Sample scope
Repo and localhost operational-support evidence only.
Benchmark families
Support burden (Definition Only); Operational resilience (Definition Only); Denial precision (Definition Only)
Minimum metrics
Support cases per 1,000 sponsored transactions (Definition Only Not Collected As Benchmark, Count Per 1000); Replay or duplicate prevented events (Definition Only Not Collected As Benchmark, Count); Sponsorship denial rate by category (Definition Only Not Collected As Benchmark, Ratio)
Public metrics
None
Release evidence
evidence.repo.provenance_baseline_generation: Generated provenance baseline for build and rollout inputs evidence.local.production_like_rehearsal: Local production-like Docker rehearsal
Launch gates
gate.release_target.render_staging_rollout: Staging Rollout Gated Not Production
Exclusions and caveats
There is no aggregated support-case benchmark dataset yet.; Replay-prevention and denial-precision benchmark outputs are not published as recurring benchmark artifacts.
Current blockers
Support-burden and denial-precision benchmark metrics are not collected yet.; Operational evidence exists, but it is still narrower than production-grade resilience benchmarking.
Current benchmark snapshot
This snapshot condenses release checks, compatibility checks, and operations checks into measurable sections. It is a supporting reference, not a competitive scorecard. The same snapshot also feeds the regression alerts that pause stronger claims when the evidence slips.
transfer_completion_proxy
Transfer completion benchmark proxy
Collection status
Proxy Snapshot Generated
Evidence bundle
benchmark_evidence.transfer_completion_repo_scope
Summary
This proxy measures strict local-plus-devnet verification coverage for current submission-capable flows, which is narrower than a live no-SOL completion success-rate benchmark run.
Strict verified submission coverage
5/6. 83% of implemented submission-capable flows are strictly verified in both local and devnet evidence.
Strictly verified flows
SPL token transfer review and submit; Transfer sponsorship Class A simple transfer; Transfer sponsorship Class B recipient account setup; Transfer sponsorship Class C wrapped SOL setup; Transfer sponsorship Class D bounded wrapped SOL principal shortfall. Only flows with local=verified and devnet=verified are counted.
Current caveated remainder
Simple swap review. Flows with optional or caveated submission evidence stay outside the strict numerator.
Release evidence gate
Local production-like Docker rehearsal; Live devnet verification workflow. The current transfer-wedge proxy only counts flows tied back to both local and devnet release evidence.
Current caveats
This is a benchmark-family-aligned proxy, not the live delayed public-safe submit success-rate metric itself.; It does not publish competitor comparisons, broad production reliability, or swap-submit success.
compatibility_breadth_proxy
Compatibility breadth benchmark proxy
Collection status
Proxy Snapshot Generated
Evidence bundle
benchmark_evidence.review_submit_and_account_surface_repo_scope
Summary
This proxy measures how much of the current compatibility profile catalog has explicit certification-evidence records and evidence beyond repo-local-only scope.
Profiles with certification records
4/4. This shows catalog coverage for certification records, not production certification success.
Profiles with devnet or staging evidence
3/4. These profiles have at least some tested-environment evidence beyond repo-local-only scope.
Production certifications granted
0/4. Current production certification remains explicitly withheld across the entire compatibility catalog.
Current caveats
This is not direct wallet-and-surface success-rate telemetry.; It does not certify browser brands, wallets, or signing paths for production use.
operational_resilience_proxy
Operational resilience benchmark proxy
Collection status
Proxy Snapshot Generated
Evidence bundle
benchmark_evidence.support_truth_and_operational_unification_repo_scope
Summary
This proxy measures current recurring operator-job coverage and flow-gate readiness rather than a collected support-burden or resilience benchmark dataset.
Required ops jobs implemented
6/6. Implemented jobs are checked-in workflows or manual substitutes recorded in the operations-job inventory.
Required ops jobs still partial
0/6. none
Required ops jobs still planned
0/6. No required cron-job records remain fully planned.
Production-ready supported-flow gates
0/0. Flow-gate readiness remains narrower than full operational resilience benchmarking or production support claims.
Current caveats
This is not a direct support-case or incident-rate dataset.; It must not be read as a production SRE, support desk, or enterprise disaster-recovery metric.
Current public benchmark linkages
These are the current benchmark-like public metrics that already trace back to benchmark families and minimum metrics. Today that linkage is intentionally narrow: it supports trust-page methodology honesty without pretending there is a full benchmark program or competitive proof.
benchmark_linkage.supported_flow_submit_success_rate.transfer_completion
Public metrics
supported_flow_submit_success_rate: Delayed success rate for the current supported transfer-rail submissions within a reviewed public-safe publication window.
Benchmark families
No-SOL completion success (Definition Only)
Minimum metrics
Supported no-SOL completion success rate (Definition Only Not Collected As Benchmark, Ratio)
Linked claims
claim.public_supported_flow_submit_success_rate_metric_exists_for_transfer_scope: Artax now publishes a delayed public-safe supported-flow submit success metric for the current transfer submission rail when the reviewed public window has sufficient terminal submissions.
Notes
The current delayed transfer-rail submit success metric is a narrower public-safe publication aligned to the broader no-SOL completion success benchmark family. It is public-app-only, transfer-rail-only, delayed, rounded, and threshold-gated.
Benchmark families and minimum metrics
benchmark_family.user_understanding_quality
User understanding quality
Measures whether Artax helps users understand reviewed intent, warnings, fee handling, and denial reasons clearly.
Status: Definition Only
Average required user prompts by supported flow
benchmark_metric.average_required_user_prompts_by_supported_flow
Status: Definition Only Not Collected As Benchmark. Unit: Count. Public benchmark publication: not allowed.
benchmark_family.no_sol_completion_success
No-SOL completion success
Measures how often supported no-SOL flows actually complete successfully under the current Artax boundaries.
Status: Definition Only
Supported no-SOL completion success rate
benchmark_metric.supported_no_sol_completion_success_rate
Status: Definition Only Not Collected As Benchmark. Unit: Ratio. Public benchmark publication: not allowed.
benchmark_family.latency
Latency
Measures review, quoting, and submit-path responsiveness for supported flows.
Status: Definition Only
Review latency
benchmark_metric.review_latency
Status: Definition Only Not Collected As Benchmark. Unit: Milliseconds. Public benchmark publication: not allowed.
Quote latency
benchmark_metric.quote_latency
Status: Definition Only Not Collected As Benchmark. Unit: Milliseconds. Public benchmark publication: not allowed.
Approval-to-submission latency
benchmark_metric.approval_to_submission_latency
Status: Definition Only Not Collected As Benchmark. Unit: Milliseconds. Public benchmark publication: not allowed.
benchmark_family.quote_accuracy
Quote accuracy
Measures how closely quoted sponsorship and recovery expectations match later outcomes.
Status: Definition Only
benchmark_family.settlement_variance
Settlement variance
Measures how much final fee collection and settlement diverge from the reviewed quote.
Status: Definition Only
Quote-to-final-collection variance rate
benchmark_metric.quote_to_final_collection_variance_rate
Status: Definition Only Not Collected As Benchmark. Unit: Ratio. Public benchmark publication: not allowed.
benchmark_family.denial_precision
Denial precision
Measures whether denials are correctly scoped, reason-coded, and avoid both loose approval and vague failure.
Status: Definition Only
Sponsorship denial rate by category
benchmark_metric.sponsorship_denial_rate_by_category
Status: Definition Only Not Collected As Benchmark. Unit: Ratio. Public benchmark publication: not allowed.
benchmark_family.compatibility_breadth
Compatibility breadth
Measures how much Artax support is explicitly certified across supported wallets, browsers, and surfaces.
Status: Definition Only
Compatibility success rate by wallet and surface
benchmark_metric.compatibility_success_rate_by_wallet_surface
Status: Definition Only Not Collected As Benchmark. Unit: Ratio. Public benchmark publication: not allowed.
benchmark_family.support_burden
Support burden
Measures the operator and support cost of running supported Artax flows and explaining denials or degraded modes.
Status: Definition Only
Support cases per 1,000 sponsored transactions
benchmark_metric.support_cases_per_1000_sponsored_transactions
Status: Definition Only Not Collected As Benchmark. Unit: Count Per 1000. Public benchmark publication: not allowed.
benchmark_family.integration_effort
Integration effort
Measures how hard it is for developers to adopt documented Artax integration paths safely.
Status: Definition Only
Average builder integration time for documented starter paths
benchmark_metric.average_builder_integration_time_documented_starters
Status: Definition Only Not Collected As Benchmark. Unit: Hours. Public benchmark publication: not allowed.
benchmark_family.operational_resilience
Operational resilience
Measures Artax's ability to keep its review and execution rail stable through deploys, dependency issues, and recovery events.
Status: Definition Only
Replay or duplicate prevented events
benchmark_metric.replay_duplicate_prevented_events
Status: Definition Only Not Collected As Benchmark. Unit: Count. Public benchmark publication: not allowed.
benchmark_family.pricing_clarity
Pricing clarity
Measures whether Artax pricing stays understandable, bounded, and aligned with disclosed fee handling and sponsorship class semantics.
Status: Definition Only
Average effective fee by transaction class and token tier
benchmark_metric.average_effective_fee_by_transaction_class_and_token_tier
Status: Definition Only Not Collected As Benchmark. Unit: Ratio And Atomic Amount. Public benchmark publication: not allowed.
benchmark_family.safety_quality
Safety quality
Measures the quality of safety warnings and the balance between false positives and false negatives where measurable.
Status: Definition Only
Percentage of flows that degrade safely versus fail ambiguously
benchmark_metric.degrade_safely_vs_ambiguous_failure_rate
Status: Definition Only Not Collected As Benchmark. Unit: Ratio. Public benchmark publication: not allowed.
Trust
See the public trust metrics that link back to benchmark doctrine where required.
Compatibility
Inspect the compatibility matrix and supporting certification notes without mistaking either for production approval.
Comparisons
Read the class-based comparison surface that uses this benchmark methodology without pretending benchmark wins exist.
Status
Read the bounded live-status disclosure without mistaking it for benchmark evidence.
Developers
See the integration and limited-access surfaces that benchmark claims must stay narrower than.
Quickstart
Start from the live setup path instead of inferring support from benchmark language.