Why Does My Transaction Fail? A First Look at Failed Transactions on the Solana Blockchain (2504.18055v1)
Abstract: Solana is an emerging blockchain platform, recognized for its high throughput and low transaction costs, positioning it as a preferred infrastructure for Decentralized Finance (DeFi), Non-Fungible Tokens (NFTs), and other Web 3.0 applications. In the Solana ecosystem, transaction initiators submit various instructions to interact with a diverse range of Solana smart contracts, among which are decentralized exchanges (DEXs) that utilize automated market makers (AMMs), allowing users to trade cryptocurrencies directly on the blockchain without the need for intermediaries. Despite the high throughput and low transaction costs of Solana, the advantages have exposed Solana to bot spamming for financial exploitation, resulting in the prevalence of failed transactions and network congestion. Prior work on Solana has mainly focused on the evaluation of the performance of the Solana blockchain, particularly scalability and transaction throughput, as well as on the improvement of smart contract security, leaving a gap in understanding the characteristics and implications of failed transactions on Solana. To address this gap, we conducted a large-scale empirical study of failed transactions on Solana, using a curated dataset of over 1.5 billion failed transactions across more than 72 million blocks. Specifically, we first characterized the failed transactions in terms of their initiators, failure-triggering programs, and temporal patterns, and compared their block positions and transaction costs with those of successful transactions. We then categorized the failed transactions by the error messages in their error logs, and investigated how specific programs and transaction initiators are associated with these errors...
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Knowledge Gaps
Knowledge gaps, limitations, and open questions
Below is a concise, actionable list of what remains missing, uncertain, or unexplored in the paper, aimed to guide future research.
- Sampling design and representativeness: The one-day-per-week stratified sampling (53 days) may miss multi-day events and bursty phenomena (e.g., “memecoin mania”), limiting the robustness of temporal analyses and autocorrelation findings. A continuous, full-year dataset or event-aware sampling is needed.
 - RPC-source bias: All data were retrieved via a single Alchemy RPC endpoint; potential node-specific biases, missing logs, or rate-limiting effects were not assessed. Cross-provider, multi-node validation would strengthen reliability.
 - Exclusion of vote transactions: While intentional, not analyzing vote transactions leaves unexplored how consensus traffic interacts with (and potentially amplifies) non-vote failure dynamics and congestion.
 - Unknown initiator accounts dominate: Of 12,712,516 accounts, only 2,162,908 were labeled (803,136 bots; 1,359,772 humans), leaving ~10.55M accounts as “unknown.” Key conclusions about bot vs. human failure rates may not generalize. Methods to reduce “unknown” (e.g., semi-supervised learning, richer features) are needed.
 - Small ground-truth and limited features for bot classification: The RF model was trained on only 200 manually labeled accounts using coarse behavioral features (frequency, volume, intervals). Incorporating richer signals (e.g., wallet fingerprinting, program interaction graphs, timing jitter analysis, tip usage) and larger labeled sets could improve accuracy and reduce misclassification.
 - Attribution of responsibility to the outermost program: Errors were attributed to the outermost program in the call stack, which may misassign failures originating in nested calls (e.g., aggregators delegating to AMMs). A call-graph-aware attribution is needed to apportion blame across composing programs.
 - Error extraction coverage gaps: Approximately 365M failed transactions (~24% of failures) were excluded due to incomplete or missing log messages. Understanding why logs are missing (program logging practices, RPC limits, truncation) and recovering or triangulating error causes is a key gap.
 - Long-tail error types underexplored: Thematic analysis focused on the top 173 high-frequency messages; rare but critical errors (security-sensitive, protocol-level) may be overlooked. Systematic coverage of low-frequency errors is needed.
 - Limited decoding of numeric error codes: Many Solana programs emit “custom program error: 0xNN” without descriptive text. The paper did not systematically map these codes to program-specific error registries/IDLs; doing so could sharpen error categorization and root-cause analysis.
 - Lack of instruction-level failure analysis: The paper does not identify which specific instructions within transactions fail most often (e.g., account initialization vs. swaps vs. transfers), hindering precise remediation targets.
 - No analysis of account-lock conflicts: Solana’s parallel execution relies on account read/write locks; failures due to contention (“account in use,” lock conflicts) are not quantified or linked to program types or times of day.
 - Priority-fee mechanics unexamined: The effects of compute unit price, priority fees, and Jito tips on success/failure, block position, and latency are discussed but not measured. Collecting per-transaction tip amounts and CU price settings is essential to quantify scheduling outcomes.
 - Compute unit limit vs. usage: The paper analyzes CUs consumed but not the sender-specified CU limit per transaction, leaving under-/over-provisioning and its relationship to failure rates unresolved.
 - Validator- and leader-level heterogeneity: Failure rates were not stratified by leader schedule, validator client/version, or region. Per-leader/validator analyses could reveal scheduling policies, tip adoption, and congestion hotspots.
 - Pre/post-upgrade causal effects: The observed failure-rate reduction post June 16, 2024 is attributed to upgrades, but no causal inference (e.g., interrupted time series, difference-in-differences) is performed. Formal evaluation of protocol changes is needed.
 - Economic impact of failures: The aggregate SOL spent on failed transactions, user-level loss metrics, and ecosystem-level resource waste (e.g., total consumed CUs by failed txs per block/program) are not quantified.
 - “Expected” vs. “problematic” failures: Protective failures (e.g., slippage checks) are conflated with undesirable failures. A taxonomy distinguishing intentional safeguards from misconfigurations, contention, or bugs would focus mitigation efforts.
 - Market-state linkage: Price/profit-not-met errors likely depend on volatility, liquidity, and pool states; the paper does not correlate error incidence with market conditions, oracle updates, or AMM liquidity dynamics.
 - Front-running/MEV characterization: Bots are said to trigger invalid status errors via front-running/manipulation, but MEV strategies, outcomes, and their share of failures are not measured. Instrumentation to detect and quantify MEV-related failures is an open need.
 - Program-version evolution: Differences between Jupiter V4 vs. V6 and high-failure unnamed programs are reported but not tied to code changes, deployment epochs, or configuration updates. Linking failures to program versions/commits would identify regressions.
 - Wallet/frontend effects: Human failures (e.g., out-of-funds) may be frontend-driven (poor fee estimation, slippage defaults). The paper does not stratify by wallet/application, leaving UI/UX improvement opportunities unquantified.
 - Geographic/temporal drivers of daily cycles: The 24-hour periodicity is documented but not explained (e.g., overlap with major market hours, bot scheduling, validator maintenance windows). Attribution of cyclical drivers remains open.
 - Cross-chain comparison: The paper cites Ethereum’s lower failure rate but does not perform controlled cross-chain analyses (error types, economic conditions, fee markets, scheduler designs), limiting generalization of findings.
 - Reproducibility and data availability: The curated dataset is described but not publicly linked, and data collection/processing pipelines are not fully documented for replication. Publishing datasets, parsers, and classification code would enable external validation.
 - Mitigation evaluation: Recommendations are high-level; there is no empirical assessment of concrete mitigations (e.g., slippage advisories, dynamic CU estimation, program-level prechecks, anti-spam gating) on failure reduction. Experimental pilots or A/B tests are needed.
 
Collections
Sign up for free to add this paper to one or more collections.