Understanding Ethereum smart contract bytecode at the OPCODE level is essential for security researchers, auditors, and developers working with on-chain code—especially when source code is unavailable. This article dives deep into the foundational concepts of Ethereum Virtual Machine (EVM) OPCODEs, storage structures, function dispatch mechanisms, and runtime behavior to equip you with the knowledge needed for effective reverse engineering.
Whether you're analyzing suspicious contracts or building your own disassembler, mastering these core principles enhances both accuracy and efficiency in low-level smart contract analysis.
👉 Discover powerful tools to explore blockchain data and smart contract behavior.
Core Concepts of EVM OPCODEs
The EVM executes smart contracts using a stack-based architecture where each instruction—known as an OPCODE—is represented by a single byte ranging from 0x00 to 0xff. While not all 256 possible values are currently used, this design allows room for future expansion.
Each OPCODE performs specific operations related to the stack, memory, storage, or control flow. You can refer to publicly available EVM opcode lists for a complete reference, and detailed semantics are documented in the Solidity assembly guide.
Data Handling: Stack, Memory, and Storage
Unlike traditional computing environments, the EVM lacks registers and network I/O. Instead, it relies on three primary data areas:
Stack
The stack is a LIFO (last-in, first-out) structure with a maximum depth of 1024 items. Most arithmetic and logical operations consume arguments from the stack and push results back onto it.
Push operations (PUSH1 to PUSH32) place immediate values onto the stack. For example:
0x6060→PUSH1 0x60- The range
0x60–0x7fcorresponds to pushing 1 to 32 bytes of data.
All other instructions pull operands from the stack unless otherwise specified.
Memory
Memory is volatile and used during execution for temporary data storage, such as function arguments or return values.
Key instructions include:
MSTORE(offset, value): Stores a 32-byte word at the given offset.MLOAD(offset): Loads a 32-byte word from memory.MSTORE8(offset, byte): Stores a single byte.
Since PUSH instructions can only push up to 32 bytes, memory operations are typically aligned to 32-byte boundaries. One notable use case is hashing:
SHA3(offset, size) → Computes keccak256 over memory region MEM[offset:offset+size]Persistent Storage
Storage is persistent and resides on-chain. It maps 256-bit keys to 256-bit values and persists across transactions.
You interact with storage via:
SSTORE(key, value)SLOAD(key)
You can query storage directly using:
eth.getStorageAt(contractAddress, slot)This makes all storage data publicly readable, even if variables are marked private. Never store sensitive information in blockchain storage.
Variable Storage Layout in Smart Contracts
Smart contract variables are stored differently based on their type and scope. Understanding these patterns is critical for decoding OPCODE logic.
Global vs Local Variables
- Local variables exist only in memory or on the stack during execution.
- Global variables are persisted in storage, regardless of visibility (
publicorprivate).
🔍 Important: Declaring a variable asprivatedoes not hide its value—it only restricts external function calls. The data remains accessible viaeth.getStorageAt.
Storage Organization Models
1. Fixed-Length Types
Types like uint256, address, bytes32, etc., occupy full or partial storage slots (each 256 bits). They are stored sequentially unless packed.
Example:
uint a; // slot 0
address b; // slot 1
bytes32 c; // slot 2However, smaller types may share a slot:
address a; // slot 0 (160 bits)
uint8 b; // shares slot 0To extract b: SLOAD(0) >> 160 & 0xFF
This packing improves gas efficiency but complicates reverse engineering.
2. Mapping Types
Mappings use a hashing scheme to compute storage locations:
mapping(keyType => valueType) myMap;Value stored at: keccak256(key_encoded_left_padded_to_32_bytes ++ slot)
For example:
key = "0xd25ed029c093e56bc8911a07c46545000cbf37c6".rjust(64, '0')
slot = "00".rjust(64, '0')
location = keccak256(key + slot)Even though data is public, knowing the correct key is required to retrieve meaningful values—offering pseudo-secrecy through obscurity.
3. Dynamic Arrays and Strings
Dynamic arrays store their length in the assigned slot:
uint[] arr; // SLOAD(1) returns arr.lengthElement at index i is located at: keccak256(slot) + i
Special cases: string and bytes
- If length ≤ 31 bytes: stored in the same slot.
Last byte encodes:
(length * 2) | (flag)- Flag = 1 → data stored off-slot
- Flag = 0 → data inlined
Example:
string s = "hello";
// Stored as hex: '68656c6c6f' + padding + '0a' (5*2)If longer than 31 bytes:
- Slot holds
(length * 2) | 1 - Data stored starting at
keccak256(slot)
4. Structs
Structs follow sequential layout rules similar to fixed-length variables:
struct User {
uint id;
address addr;
}
User user;Maps to two consecutive slots: id at slot X, addr at X+1.
Function Dispatch Mechanism
EVM determines which function to execute based on the first four bytes of transaction calldata—the function selector.
How Function Selectors Work
The selector is derived from:
first_4_bytes(keccak256("functionName(type1,type2,...)"))Examples:
test1()→0x6b59084dtest2(uint256)→0xcaf44683
You can reverse-engineer unknown functions using known hash databases like those maintained by Trail of Bits.
👉 Explore Ethereum transaction patterns and decode contract interactions easily.
Internal vs External Function Calls
There are two ways to invoke functions:
| Method | Behavior |
|---|---|
eth.call() | Simulates execution locally; returns output without state changes |
| Transaction | Changes state (storage), consumes gas, recorded on-chain |
Only payable functions accept Ether. Non-payable functions include a value check:
CALLVALUE
DUP1
ISZERO
PUSH2 jump_to_next
JUMPI
REVERTPublic Variables as Functions
Public variables generate getter functions automatically:
address public owner;Compiles to:
function owner() public view returns (address) {
return owner;
}Thus, every public variable appears as a callable function in the dispatch table.
Private variables lack such getters but are still readable via direct storage access.
Decoding Function Parameters
Function parameters are extracted from calldata based on type:
Fixed-Length Parameters
Passed inline in calldata:
data = selector (4 bytes) ++ arg1 (32 bytes) ++ arg2 (32 bytes) ...Accessed via CALLDATALOAD(offset).
Variable-Length Parameters (e.g., string, bytes[])
Passed using offset-based addressing:
selector ++ offset_a ++ offset_b ++ length_a ++ data_a ++ length_b ++ data_bEach dynamic argument has:
- Offset pointing to its position in calldata
- Length field immediately followed by raw data
Use CALLDATACOPY(destMem, offset, length) to load them into memory.
Even without knowing the function name, you can infer parameter count and types by analyzing how calldata is accessed.
Contract Deployment Process
Contracts are deployed via transactions with an empty to address and initialization bytecode in the input field.
How Contract Addresses Are Determined
The address is computed from:
- Deployer address (
_origin) - Nonce of the deploying account
Formula:
address(keccak256(0xd6, 0x94, _origin, nonce)))Different prefixes handle various nonce sizes (e.g., 0xd7, 0xd8 for larger nonces).
This ensures deterministic address generation pre-Byzantium; post-Byzantium uses CREATE2 for salted deployments.
Initialization Code vs Runtime Code
Deployment bytecode includes:
- Constructor logic (executed once)
- Copying of runtime code into memory via
CODECOPY - Returning runtime bytecode with
RETURN
After deployment, only the returned runtime code persists on-chain.
Practical Applications and Reverse Engineering Tips
Understanding OPCODE structure enables:
- Auditing obfuscated or malicious contracts
- Recovering logic without source code
- Building custom analysis tools (disassemblers, decompilers)
Key indicators for identifying variable types:
- Use of
AND 0xffffffff...→ likely an address - Truncation with
& 0xff→ small integer (uint8, etc.) - Presence of
SHA3after load → mapping or dynamic array
Use existing tools like IDA Pro with EVM plugins or open-source disassemblers to speed up analysis.
👉 Start analyzing smart contract behavior with real-time blockchain insights.
Frequently Asked Questions (FAQ)
Q: Can private variables in Solidity be truly hidden?
No. All storage is public on Ethereum. Marking a variable private only prevents external function access—it does not encrypt or hide the value. Anyone can read it via eth.getStorageAt.
Q: How do I find out what a function does without source code?
Analyze the OPCODE flow after matching the function selector. Look for patterns like storage writes (SSTORE), external calls (CALL), or arithmetic operations. Parameter handling reveals input types.
Q: Is it possible to decompile EVM bytecode into Solidity?
Partial decompilation is feasible. Tools can reconstruct high-level structures like loops and conditionals, but exact variable names and comments are lost. Accuracy depends on optimization levels and obfuscation.
Q: Why do some functions have no selectors?
Fallback and receive functions don’t have selectors. They execute when no matching function is found or when Ether is sent without calldata.
Q: What tools help with OPCODE analysis?
Popular options include:
- IDA-EVM – Disassembler plugin for IDA Pro
- EthDO – Command-line disassembler
- Remix Debugger – For contracts with source code
Q: How can I detect if a contract uses dynamic arrays?
Look for usage of keccak256 on a storage slot value—this typically indicates element location calculation for dynamic arrays or mappings.
Conclusion
Mastering Ethereum OPCODEs unlocks deeper visibility into smart contract behavior, especially when source code is missing. By understanding storage layouts, function dispatch logic, parameter encoding, and deployment mechanics, you gain powerful capabilities in security auditing and reverse engineering.
As blockchain applications grow more complex, so too must our analytical tools—and foundational knowledge remains the strongest asset.
Stay tuned for the next part: Building an EVM Debugger for OPCODE Analysis.