Building an Arbitrage Bot: Finding Arbitrage Opportunities

·

In the fast-evolving world of decentralized finance (DeFi), automated trading strategies like arbitrage have become essential tools for maximizing returns. One of the most effective ways to capitalize on market inefficiencies is by building an arbitrage bot capable of identifying and exploiting price discrepancies across decentralized exchanges. This article walks you through the core process of detecting arbitrage opportunities between liquidity pools that trade the same token pairs—focusing on ETH-based pairs for simplicity and efficiency.

We’ll cover token pair selection, derive the mathematical model for optimal trade sizing, and implement a practical algorithm to surface profitable opportunities. By the end, you'll understand how to pre-select viable trading pairs and compute potential profits using real on-chain data.

Selecting Token Pairs for Arbitrage

Defining the Arbitrage Strategy Scope

Before scanning for opportunities, it's crucial to define the operational boundaries of your arbitrage bot. The safest and most straightforward strategy involves ETH-centric arbitrage, where both legs of a trade involve Ethereum (or its wrapped version, WETH). Since gas fees on Ethereum are paid in ETH, concluding trades with ETH ensures you maintain liquidity for transaction costs.

However, this widely adopted approach also means increased competition—popular ETH-based arbitrage routes are often saturated, reducing profitability over time. Still, for beginners, focusing on ETH pairs provides a stable foundation due to deeper liquidity and fewer risks associated with volatile or low-cap tokens.

For this implementation:

👉 Discover how to optimize your DeFi trading strategy with real-time data analysis.

While future enhancements could include stablecoin inventory management or statistical arbitrage on illiquid "shitcoins," this guide sticks to atomic, risk-free arbitrage within well-established pools.

Filtering Eligible Token Pairs

To identify eligible pairs, we start by fetching all liquidity pools from major DEX factory contracts (e.g., Uniswap V2). Using event logs, we extract deployed pairs and filter those containing WETH (0xC02aaA39b223FE8D0A0e5C4F27eAD9083C756Cc2).

Next, we group pools by their token pair. Only pairs listed on two or more distinct pools are retained—since arbitrage requires at least two price sources to compare.

Here’s a simplified version of the filtering logic:

WETH = "0xC02aaA39b223FE8D0A0e5C4F27eAD9083C756Cc2"
pair_pool_dict = {}

for pool in pairDataList:
    token0, token1 = pool['token0'], pool['token1']
    if WETH not in (token0, token1):
        continue
    pair = tuple(sorted([token0, token1]))
    if pair not in pair_pool_dict:
        pair_pool_dict[pair] = []
    pair_pool_dict[pair].append(pool)

# Keep only pairs with multiple pools
eligible_pairs = {k: v for k, v in pair_pool_dict.items() if len(v) >= 2}

At the time of analysis, this process yielded:

This volume of data is manageable—reserves for all pools can be fetched in under a second using public RPC endpoints.

Detecting Profitable Arbitrage Opportunities

Understanding Price Discrepancies

An arbitrage opportunity arises when two pools trading the same token pair display different prices. However, not every discrepancy is exploitable. Factors like pool reserves (liquidity depth) and transaction gas costs determine whether a trade will be profitable after fees.

Our goal is to calculate the maximum net profit achievable from a two-swap sequence:

  1. Buy Token Y with ETH in Pool A (where ETH is cheaper)
  2. Sell Token Y for ETH in Pool B (where ETH is more valuable)

The challenge lies in determining the optimal input size—the amount of ETH that maximizes profit before gas expenses.

Mathematical Model for Optimal Trade Size

Automated market makers (AMMs) like Uniswap V2 use the constant product formula: x * y = k. When a swap occurs, reserves shift, altering the effective price. This non-linear behavior means larger trades erode potential gains due to slippage.

Let:

The output of a single swap is given by:

swap_output(x, a, b) = b * (1 - a / (a + x * (1 - fee)))

For two consecutive swaps (A → B), gross profit as a function of input x becomes:

profit(x) = swap_output(swap_output(x, a1, b1), b2, a2) - x

Using calculus, we find the value of x that maximizes profit by solving d(profit)/dx = 1. The solution yields the optimal trade size:

import math

def optimal_trade_size(reserves1, reserves2, fee=0.003):
    a1, b1 = reserves1
    a2, b2 = reserves2
    numerator = math.sqrt(a1 * b1 * a2 * b2 * (1 - fee)**4 * (b1 * (1 - fee) + b2)**2)
    numerator -= a1 * b2 * (1 - fee) * (b1 * (1 - fee) + b2)
    denominator = ((1 - fee) * (b1 * (1 - fee) + b2)) ** 2
    return numerator / denominator

This formula allows us to precisely compute the best input amount for any given pool configuration.

Implementing the Arbitrage Scanner

With the mathematical foundation in place, we now scan all eligible pairs and pool combinations.

Step-by-Step Execution Flow

  1. Fetch reserves for all 3,081 eligible pools.
  2. For each token pair:

    • Reorder reserves so WETH is always first.
    • Compare every pool combination (A vs B).
    • Skip invalid cases (zero reserves or identical pools).
  3. Compute optimal input and gross profit using the derived formulas.
  4. Store all opportunities in a list.

After processing, we identified 1,791 potential arbitrage routes.

Estimating Net Profitability

Gross profit isn't enough—we must subtract gas costs. A basic estimate assumes:

Using current gas prices:

gas_price = w3.eth.gas_price
for opp in opportunities:
    opp['net_profit'] = opp['profit'] - (107000 * gas_price / 1e18)

Sorting by net profit reveals only 57 initially positive opportunities. However, many involve toxic tokens—malicious ERC-20 contracts designed to trap traders by restricting sells or enabling rug pulls.

After manual filtering:

👉 Learn how top traders use smart contract simulations to avoid failed executions.

These values represent best-case scenarios; actual gas costs vary based on contract complexity and network congestion.

Frequently Asked Questions

How do I detect toxic tokens in liquidity pools?

Toxic tokens often manipulate balance tracking or restrict transfers. To detect them:

Why focus only on two-pool arbitrage?

Two-pool arbitrage is atomic and risk-free—it executes within one transaction. Multi-hop routes increase complexity, slippage risk, and failure probability. Starting simple ensures reliability before scaling.

Can I run this bot profitably on mainnet?

Possibly—but competition is fierce. Most low-hanging opportunities are claimed within milliseconds by specialized bots. To succeed, you need:

What’s next after finding an opportunity?

Next steps involve:

👉 Start simulating your arbitrage strategies in a secure environment today.

How accurate is the gas cost estimation?

The 107k gas estimate is a lower bound. Real-world usage may exceed this due to:

Should I include stablecoin pairs?

Yes—but with caution. Stablecoins like USDC/DAI often exhibit small but frequent mispricings. However:

They’re excellent for diversification once your core ETH strategy is stable.


By combining rigorous mathematical modeling with efficient data processing, this framework lays the groundwork for a functional DeFi arbitrage bot. While raw profitability may seem limited after filtering and gas costs, optimization through better infrastructure and expanded strategies can unlock significant gains. In the next article, we'll build the smart contract that executes these trades—turning theory into action.