Ethereum Smart Contract OPCODE Reverse Engineering: Theoretical Foundations

·

Understanding Ethereum smart contract bytecode at the OPCODE level is essential for security researchers, auditors, and developers working with on-chain code—especially when source code is unavailable. This article dives deep into the foundational concepts of Ethereum Virtual Machine (EVM) OPCODEs, storage structures, function dispatch mechanisms, and runtime behavior to equip you with the knowledge needed for effective reverse engineering.

Whether you're analyzing suspicious contracts or building your own disassembler, mastering these core principles enhances both accuracy and efficiency in low-level smart contract analysis.

👉 Discover powerful tools to explore blockchain data and smart contract behavior.


Core Concepts of EVM OPCODEs

The EVM executes smart contracts using a stack-based architecture where each instruction—known as an OPCODE—is represented by a single byte ranging from 0x00 to 0xff. While not all 256 possible values are currently used, this design allows room for future expansion.

Each OPCODE performs specific operations related to the stack, memory, storage, or control flow. You can refer to publicly available EVM opcode lists for a complete reference, and detailed semantics are documented in the Solidity assembly guide.

Data Handling: Stack, Memory, and Storage

Unlike traditional computing environments, the EVM lacks registers and network I/O. Instead, it relies on three primary data areas:

Stack

The stack is a LIFO (last-in, first-out) structure with a maximum depth of 1024 items. Most arithmetic and logical operations consume arguments from the stack and push results back onto it.

Push operations (PUSH1 to PUSH32) place immediate values onto the stack. For example:

All other instructions pull operands from the stack unless otherwise specified.

Memory

Memory is volatile and used during execution for temporary data storage, such as function arguments or return values.

Key instructions include:

Since PUSH instructions can only push up to 32 bytes, memory operations are typically aligned to 32-byte boundaries. One notable use case is hashing:

SHA3(offset, size) → Computes keccak256 over memory region MEM[offset:offset+size]

Persistent Storage

Storage is persistent and resides on-chain. It maps 256-bit keys to 256-bit values and persists across transactions.

You interact with storage via:

You can query storage directly using:

eth.getStorageAt(contractAddress, slot)

This makes all storage data publicly readable, even if variables are marked private. Never store sensitive information in blockchain storage.


Variable Storage Layout in Smart Contracts

Smart contract variables are stored differently based on their type and scope. Understanding these patterns is critical for decoding OPCODE logic.

Global vs Local Variables

🔍 Important: Declaring a variable as private does not hide its value—it only restricts external function calls. The data remains accessible via eth.getStorageAt.

Storage Organization Models

1. Fixed-Length Types

Types like uint256, address, bytes32, etc., occupy full or partial storage slots (each 256 bits). They are stored sequentially unless packed.

Example:

uint a;      // slot 0
address b;   // slot 1
bytes32 c;   // slot 2

However, smaller types may share a slot:

address a;   // slot 0 (160 bits)
uint8 b;     // shares slot 0

To extract b:
SLOAD(0) >> 160 & 0xFF

This packing improves gas efficiency but complicates reverse engineering.

2. Mapping Types

Mappings use a hashing scheme to compute storage locations:

mapping(keyType => valueType) myMap;

Value stored at:
keccak256(key_encoded_left_padded_to_32_bytes ++ slot)

For example:

key = "0xd25ed029c093e56bc8911a07c46545000cbf37c6".rjust(64, '0')
slot = "00".rjust(64, '0')
location = keccak256(key + slot)

Even though data is public, knowing the correct key is required to retrieve meaningful values—offering pseudo-secrecy through obscurity.

3. Dynamic Arrays and Strings

Dynamic arrays store their length in the assigned slot:

uint[] arr; // SLOAD(1) returns arr.length

Element at index i is located at:
keccak256(slot) + i

Special cases: string and bytes

Example:

string s = "hello";
// Stored as hex: '68656c6c6f' + padding + '0a' (5*2)

If longer than 31 bytes:

4. Structs

Structs follow sequential layout rules similar to fixed-length variables:

struct User {
    uint id;
    address addr;
}
User user;

Maps to two consecutive slots: id at slot X, addr at X+1.


Function Dispatch Mechanism

EVM determines which function to execute based on the first four bytes of transaction calldata—the function selector.

How Function Selectors Work

The selector is derived from:

first_4_bytes(keccak256("functionName(type1,type2,...)"))

Examples:

You can reverse-engineer unknown functions using known hash databases like those maintained by Trail of Bits.

👉 Explore Ethereum transaction patterns and decode contract interactions easily.

Internal vs External Function Calls

There are two ways to invoke functions:

MethodBehavior
eth.call()Simulates execution locally; returns output without state changes
TransactionChanges state (storage), consumes gas, recorded on-chain

Only payable functions accept Ether. Non-payable functions include a value check:

CALLVALUE
DUP1
ISZERO
PUSH2 jump_to_next
JUMPI
REVERT

Public Variables as Functions

Public variables generate getter functions automatically:

address public owner;

Compiles to:

function owner() public view returns (address) {
    return owner;
}

Thus, every public variable appears as a callable function in the dispatch table.

Private variables lack such getters but are still readable via direct storage access.


Decoding Function Parameters

Function parameters are extracted from calldata based on type:

Fixed-Length Parameters

Passed inline in calldata:

data = selector (4 bytes) ++ arg1 (32 bytes) ++ arg2 (32 bytes) ...

Accessed via CALLDATALOAD(offset).

Variable-Length Parameters (e.g., string, bytes[])

Passed using offset-based addressing:

selector ++ offset_a ++ offset_b ++ length_a ++ data_a ++ length_b ++ data_b

Each dynamic argument has:

Use CALLDATACOPY(destMem, offset, length) to load them into memory.

Even without knowing the function name, you can infer parameter count and types by analyzing how calldata is accessed.


Contract Deployment Process

Contracts are deployed via transactions with an empty to address and initialization bytecode in the input field.

How Contract Addresses Are Determined

The address is computed from:

Formula:

address(keccak256(0xd6, 0x94, _origin, nonce)))

Different prefixes handle various nonce sizes (e.g., 0xd7, 0xd8 for larger nonces).

This ensures deterministic address generation pre-Byzantium; post-Byzantium uses CREATE2 for salted deployments.

Initialization Code vs Runtime Code

Deployment bytecode includes:

  1. Constructor logic (executed once)
  2. Copying of runtime code into memory via CODECOPY
  3. Returning runtime bytecode with RETURN

After deployment, only the returned runtime code persists on-chain.


Practical Applications and Reverse Engineering Tips

Understanding OPCODE structure enables:

Key indicators for identifying variable types:

Use existing tools like IDA Pro with EVM plugins or open-source disassemblers to speed up analysis.

👉 Start analyzing smart contract behavior with real-time blockchain insights.


Frequently Asked Questions (FAQ)

Q: Can private variables in Solidity be truly hidden?

No. All storage is public on Ethereum. Marking a variable private only prevents external function access—it does not encrypt or hide the value. Anyone can read it via eth.getStorageAt.

Q: How do I find out what a function does without source code?

Analyze the OPCODE flow after matching the function selector. Look for patterns like storage writes (SSTORE), external calls (CALL), or arithmetic operations. Parameter handling reveals input types.

Q: Is it possible to decompile EVM bytecode into Solidity?

Partial decompilation is feasible. Tools can reconstruct high-level structures like loops and conditionals, but exact variable names and comments are lost. Accuracy depends on optimization levels and obfuscation.

Q: Why do some functions have no selectors?

Fallback and receive functions don’t have selectors. They execute when no matching function is found or when Ether is sent without calldata.

Q: What tools help with OPCODE analysis?

Popular options include:

Q: How can I detect if a contract uses dynamic arrays?

Look for usage of keccak256 on a storage slot value—this typically indicates element location calculation for dynamic arrays or mappings.


Conclusion

Mastering Ethereum OPCODEs unlocks deeper visibility into smart contract behavior, especially when source code is missing. By understanding storage layouts, function dispatch logic, parameter encoding, and deployment mechanics, you gain powerful capabilities in security auditing and reverse engineering.

As blockchain applications grow more complex, so too must our analytical tools—and foundational knowledge remains the strongest asset.

Stay tuned for the next part: Building an EVM Debugger for OPCODE Analysis.