Introduction

revm is an Ethereum Virtual Machine (EVM) written in Rust that is focused on speed and simplicity. This documentation is very much a work in progress and a community effort. If you would like to contribute and improve these docs please make a pr to the github repo. Most importantly, Revm is just the execution environment for ethereum; there is no networking or consensus related work in this repository.

Crates

The project has 4 main crates that are used to build revm. These are:

revm: The main EVM library.
interpreter: Execution loop with instructions.
primitives: Primitive data types.
precompile: EVM precompiles.

Testing with the binaries

There are two binaries both of which are used for testing. To install them run cargo install --path bins/<binary-name>. The binaries are:

revme: A CLI binary, used for running state test json. Currently it is used to run ethereum tests to check if revm is compliant. For example if you have the eth tests cloned into a directory called eth tests and the EIP tests in the following directories you can run

cargo run --profile ethtests -p revme -- \
    statetest \
    ../ethtests/GeneralStateTests/ \
    ../ethtests/LegacyTests/Constantinople/GeneralStateTests/ \
    bins/revme/tests/EIPTests/StateTests/stEIP5656-MCOPY/ \
    bins/revme/tests/EIPTests/StateTests/stEIP1153-transientStorage/

revm-test: test binaries with contracts; used mostly to check performance

If you are interested in contributing, be sure to run the statetests. It is recommended to read about the ethereum tests.

Rust Ethereum Virtual Machine (revm)

The evm crate is focused on the implementation of Ethereum Virtual Machine (EVM) including call loop and host implementation, database handling, state journaling and powerful logic handlers that can be overwritten. This crate pulls Primitives, Interpreter and Precompiles together to deliver the rust evm.

The starting point for reading the documentation is Evm, that is main structure of EVM. Then, you can read about the EvmBuilder that is used to create the Evm and modify it. After, you can read about the Handler that is used to modify the logic of the Evm, and it will tie with how Evm introspection can be done. Finally, you can read about the Inspector, a legacy interface for inspecting execution that is now repurposed as a handler register example.

Modules:

evm: This is main module that executes EVM calls.
builder: This module builds the Evm, sets database, handlers and other parameters. Here is where we set handlers for specific fork or external state for inspection.
db: This module includes structures and functions for database interaction. It is a glue between EVM and database. It transforms or aggregates the EVM changes.
inspector: This module introduces the Inspector trait and its implementations for observing the EVM execution. This was the main way to inspect EVM execution before the Builder and Handlers were introduced. It is still enabled through the Builder.
journaled_state: This module manages the state of the EVM and implements a journaling system to handle changes and reverts.

Re-exported Modules:

revm_precompile: Crate that provides precompiled contracts used in the EVM implementation.
revm_interpreter: Crate that provides execution engine for EVM opcodes.
revm_interpreter::primitives: This module from the revm_interpreter crate provides primitive types and other functionality used in the EVM implementation.

Re-exported Types:

Database, DatabaseCommit, InMemoryDB: These types from the db module are re-exported for handling the database operations.
EVM: The EVM struct from the evm module is re-exported, serving as the main interface to the EVM implementation.
EvmContext: The EvmContext struct from the context module is re-exported, providing data structures to encapsulate EVM execution data.
JournalEntry, JournaledState: These types from the journaled_state module are re-exported, providing the journaling system for the EVM state.
inspectors, Inspector: The Inspector trait and its implementations from the inspector module are re-exported for observing the EVM execution.

EVM

Evm is the primary structure that implements the Ethereum Virtual Machine (EVM), a stack-based virtual machine that executes Ethereum smart contracts.

What is inside

It is consisting of two main parts the Context and the Handler. Context represent the state that is needed for execution and Handler contains list of functions that act as a logic.

Context is additionally split between EvmContext and External context. EvmContext is internal and contains Database, Environment, JournaledState and Precompiles. And External context is fully generic without any trait restrains and its purpose is to allow custom handlers to save state in runtime or allows hooks to be added (For example external contexts can be a Inspector), more on its usage can be seen in EvmBuilder.

Evm implements the Host trait, which defines an interface for the interaction of the EVM Interpreter with its environment (or "host"), encompassing essential operations such as account and storage access, creating logs, and invoking sub calls and selfdestruct.

Data structures of block and transaction can be found inside Environment. And more information on journaled state can be found in JournaledState documentation.

Runtime

Runtime consists of list of functions from Handler that are called in predefined order. They are grouped by functionality on Verification, PreExecution, Execution, PostExecution and Instruction functions. Verification function are related to the pre-verification of set Environment data. Pre-/Post-execution functions deduct and reward caller beneficiary. And Execution functions handle initial call and creates and sub calls. Instruction functions are part of the instruction table that is used inside Interpreter to execute opcodes.

The Evm execution runs two loops:

Call loop

The first loop is call loop that everything starts with, it creates call frames, handles subcalls, it returns outputs and calls Interpreter loop to execute bytecode instructions. It is handled by ExecutionHandler.

The first loop implements a stack of Frames. It is responsible for handling sub calls and its return outputs. At the start, Evm creates Frame containing Interpreter and starts the loop.

The Interpreter returns the InterpreterAction which can be:

Return: This interpreter finished its run. Frame is popped from the stack and its return value is pushed to the parent Frame stack.
SubCall/SubCreate: A new Frame needs to be created and pushed to the stack. A new Frame is created and pushed to the stack and the loop continues. When the stack is empty, the loop finishes.

Interpreter loop

The second loop is the Interpreter loop which is called by the call loop and loops over bytecode opcodes and executes instructions based on the InstructionTable. It is implemented in the Interpreter crate.

To dive deeper into the Evm logic check Handler documentation.

Functionalities

The function of Evm is to start execution, but setting up what it is going to execute is done by EvmBuilder. The main functions of the builder are:

preverify - that only pre-verifies transaction information.
transact preverified - is next step after pre-verification that executes transactions.
transact - it calls both preverifies and executes transactions.
builder and modify functions - allow building or modifying the Evm, more on this can be found in EvmBuilder documentation. builder is the main way of creating Evm and modify allows you to modify parts of it without dissolving Evm.
into_context - is used when we want to get the Context from Evm.

Evm Builder

The builder creates or modifies the EVM and applies different handlers. It allows setting external context and registering handler custom logic.

The revm Evm consists of Context and Handler. Context is additionally split between EvmContext (contains generic Database) and External context (generic without restrain). Read evm for more information on the internals.

The Builder ties dependencies between generic Database, External context and Spec. It allows handle registers to be added that implement logic on those generics. As they are interconnected, setting Database or ExternalContext resets handle registers, so builder stages are introduced to mitigate those misuses.

Simple example of using EvmBuilder:

  use crate::evm::Evm;

  // build Evm with default values.
  let mut evm = Evm::builder().build();
  let output = evm.transact();

Builder Stages

There are two builder stages that are used to mitigate potential misuse of the builder:

SetGenericStage: Initial stage that allows setting the database and external context.
HandlerStage: Allows setting the handler registers but is explicit about setting new generic type as it will void the handler registers.

Functions from one stage are just renamed functions from other stage, it is made so that user is more aware of what underlying function does. For example, in SettingDbStage we have with_db function while in HandlerStage we have reset_handler_with_db, both of them set the database but the latter also resets the handler. There are multiple functions that are common to both stages such as build.

Builder naming conventions

In both stages we have:

build creates the Evm.
spec_id creates new mainnet handler and reapplies all the handler registers.
modify_* functions are used to modify the database, external context or Env.
clear_* functions allows setting default values for Environment.
append_handler_register_* functions are used to push handler registers. This will transition the builder to the HandlerStage.

In SetGenericStage we have:

with_* are found in SetGenericStage and are used to set the generics.

In HandlerStage we have:

reset_handler_with_* is used if we want to change some of the generic types this will reset the handler registers. This will transition the builder to the SetGenericStage.

Creating and modification of Evm

Evm implements functions that allow using the EvmBuilder without even knowing that it exists. The most obvious one is Evm::builder() that creates a new builder with default values.

Additionally, a function that is very important is evm.modify() that allows modifying the Evm. It returns a builder, allowing users to modify the Evm.

Examples

The following example uses the builder to create an Evm with inspector:

  use crate::{
      db::EmptyDB, Context, EvmContext, inspector::inspector_handle_register, inspectors::NoOpInspector, Evm,
  };

  // Create the evm.
  let evm = Evm::builder()
      .with_db(EmptyDB::default())
      .with_external_context(NoOpInspector)
      // Register will modify Handler and call NoOpInspector.
      .append_handler_register(inspector_handle_register)
      // .with_db(..) does not compile as we already locked the builder generics,
      // alternative fn is reset_handler_with_db(..)
      .build();
  
  // Execute the evm.
  let output = evm.transact();
  
  // Extract evm context.
  let Context {
      external,
      evm: EvmContext { db, .. },
  } = evm.into_context();

The next example changes the spec id and environment of an already built evm.

  use crate::{Evm,SpecId::BERLIN};

  // Create default evm.
  let evm = Evm::builder().build();

  // Modify evm spec.
  let evm = evm.modify().with_spec_id(BERLIN).build();

  // Shortcut for above.
  let mut evm = evm.modify_spec_id(BERLIN);

  // Execute the evm.
  let output1 = evm.transact();

  // Example of modifying the tx env.
  let mut evm = evm.modify().modify_tx_env(|env| env.gas_price = 0.into()).build();

  // Execute the evm with modified tx env.
  let output2 = evm.transact();

Example of adding custom precompiles to Evm.

use super::SpecId;
use crate::{
    db::EmptyDB,
    inspector::inspector_handle_register,
    inspectors::NoOpInspector,
    primitives::{Address, Bytes, ContextStatefulPrecompile, ContextPrecompile, PrecompileResult},
    Context, Evm, EvmContext,
};
use std::sync::Arc;

struct CustomPrecompile;

impl ContextStatefulPrecompile<EvmContext<EmptyDB>, ()> for CustomPrecompile {
    fn call(
        &self,
        _input: &Bytes,
        _gas_price: u64,
        _context: &mut EvmContext<EmptyDB>,
        _extctx: &mut (),
    ) -> PrecompileResult {
        Ok((10, Bytes::new()))
    }
}
fn main() {
    let mut evm = Evm::builder()
        .with_empty_db()
        .with_spec_id(SpecId::HOMESTEAD)
        .append_handler_register(|handler| {
            let precompiles = handler.pre_execution.load_precompiles();
            handler.pre_execution.load_precompiles = Arc::new(move || {
                let mut precompiles = precompiles.clone();
                precompiles.extend([(
                    Address::ZERO,
                    ContextPrecompile::ContextStateful(Arc::new(CustomPrecompile)),
                )]);
                precompiles
            });
        })
        .build();

    evm.transact().unwrap();
}

Appending handler registers

Handler registers are simple functions that allow modifying the Handler logic by replacing the handler functions. They are used to add custom logic to the evm execution but as they are free to modify the Handler in any form they want. There may be conflicts if handlers that override the same function are added.

The most common use case for adding new logic to Handler is Inspector that is used to inspect the execution of the evm. Example of this can be found in Inspector documentation.

Handler

This is the logic part of the Evm. It contains the Specification ID, list of functions that do the logic and list of registers that can change behavior of the Handler when it is build.

Functions can be grouped in five categories and are marked in that way in the code:

Validation functions: ValidationHandler
Pre-execution functions: PreExecutionHandler
Execution functions: ExecutionHandler
Post-execution functions: PostExecutionHandler
Instruction table: InstructionTable

Handle Registers

This is a simple function that is used to modify handler functions. The amazing thing about them is that they can be done over generic external type. For example, this allows to have a register over trait that allows to add hooks to any type that implements the trait. That trait can be a GetInspector trait, so any implementation is able to register inspector-related functions. GetInspector is implemented on every Inspector and it is used inside the EvmBuilder to change behavior of the default mainnet Handler.

Handle registers are set in EvmBuilder. The order of the registers is important as they are called in the order they are registered. It matters if register overrides the previous handle or just wraps it, overriding handle can disrupt the logic of previous registered handles.

Registers are very powerful as they allow modification of any part of the Evm and with additional of the External context it becomes a powerful combo. A simple example is to register new pre-compiles for the Evm.

ValidationHandler

Consists of functions that are used to validate transaction and block data. They are called before the execution of the transaction, to check whether the (Environment) data is valid. They are called in the following order:

validate_env: Verifies if all data is set in Environment and if valid, for example if gas_limit is smaller than block gas_limit.
validate_initial_tx_gas: Calculates initial gas needed for the transaction to be executed and checks if it is less than the transaction gas_limit. Note that this does not touch the Database or state.
validate_tx_against_state: Loads the caller account and checks their information. Among them the nonce, if there is enough balance to pay for max gas spent and balance transferred.

PreExecutionHandler

Consists of functions that are called before execution. They are called in the following order:

load: Loads access list and beneficiary from Database. Cold load is done here.
load_precompiles: Retrieves the precompiles for the given spec ID. More info: precompile.
deduct_caller: Deducts values from the caller to calculate the maximum amount of gas that can be spent on the transaction. This loads the caller account from the Database.

ExecutionHandler

Consists of functions that handle the execution of the transaction and the stack of the call frames.

call: Called on every frame. It creates a new call frame or returns the frame result (the frame result is only returned when calling precompile). If FrameReturn is returned, then the next function that is called is insert_call_outcome.
call_return: Called after call frame returns from execution. It is used to calculate the gas that is returned from the frame and create the FrameResult that is used to apply the outcome to parent frame in insert_call_outcome.
insert_call_outcome: Inserts the call outcome to the parent frame. It is called on every frame that is created except the first one. For the first frame we use last_frame_return.
create: Creates new create call frame, create new account and execute bytecode that outputs the code of the new account.
create_return: This handler is called after every frame is executed (Expect first). It will calculate the gas that is returned from the frame and apply output to the parent frame.
insert_create_outcome: Inserts the outcome of a call into the virtual machine's state.
last_frame_return: This handler is called after last frame is returned. It is used to calculate the gas that is returned from the first frame and incorporate transaction gas limit (the first frame has limit gas_limit - initial_gas).

InstructionTable

This is a list of 256 function pointers that are used to execute instructions. They have two types, first is simple function that is faster and second is BoxedInstruction that has a small performance penalty but allows to capture the data. Look at the Interpreter documentation for more information.

PostExecutionHandler

Is a list of functions that are called after the execution. They are called in the following order:

reimburse_caller: Reimburse the caller with gas that was not spent during the execution of the transaction. Or balance of gas that needs to be refunded.
reward_beneficiary: Reward the beneficiary with the fee that was paid for the transaction.
output: Returns the state changes and the result of the execution.
end: Always called after transaction. End handler will not be called if validation fails.
clear: Clears journal state and error and it is always called for the cleanup.

Inspectors

This module contains various inspectors that can be used to execute and monitor transactions on the Ethereum Virtual Machine (EVM) through the revm library.

Overview

There are several built-in inspectors in this module:

NoOpInspector: A basic inspector that does nothing, which can be used when you don't need to monitor transactions.
GasInspector: Monitors the gas usage of transactions.
CustomPrintTracer: Traces and prints custom messages during EVM execution. Available only when the std feature is enabled.
TracerEip3155: This is an inspector that conforms to the EIP-3155 standard for tracing Ethereum transactions. It's used to generate detailed trace data of transaction execution, which can be useful for debugging, analysis, or for building tools that need to understand the inner workings of Ethereum transactions. This is only available when both std and serde-json features are enabled.

Inspector trait

The Inspector trait defines methods that are called during various stages of EVM execution. You can implement this trait to create your own custom inspectors.

Each of these methods is called at different stages of the execution of a transaction. They can be used to monitor, debug, or modify the execution of the EVM.

For example, the step method is called on each step of the interpreter, and the log method is called when a log is emitted.

You can implement this trait for a custom database type DB that implements the Database trait.

Usage

To use an inspector, you need to implement the Inspector trait. For each method, you can decide what you want to do at each point in the EVM execution. For example, to capture all SELFDESTRUCT operations, implement the selfdestruct method.

All methods in the Inspector trait are optional to implement; if you do not need specific functionality, you can use the provided default implementations.

State implementations

State inherits the Database trait and implements fetching of external state and storage, and various functionality on output of the EVM execution. Most notably, caching changes while execution multiple transactions.

Database Abstractions

You can implement the traits Database, DatabaseRef or Database + DatabaseCommit depending on the desired handling of the struct.

Database: Has mutable self in its functions. It is useful if you want to modify your cache or update some statistics on get calls. This trait enables preverify_transaction, transact_preverified, transact and inspect functions.
DatabaseRef: Takes a reference on the object. It is useful if you only have a reference on the state and don't want to update anything on it. It enables preverify_transaction, transact_preverified_ref, transact_ref and inspect_ref functions.
Database + DatabaseCommit: Allows directly committing changes of a transaction. It enables transact_commit and inspect_commit functions.

Journaled State

The journaled_state module of the revm crate provides a state management implementation for Ethereum-style accounts. It includes support for various actions such as self-destruction of accounts, initial account loading, account state modification, and logging. It also contains several important utility functions such as is_precompile.

This module is built around the JournaledState structure, which encapsulates the entire state of the blockchain. JournaledState uses an internal state representation (a HashMap) that tracks all accounts. Each account is represented by the Account structure, which includes fields like balance, nonce, and code hash. For state-changing operations, the module keeps track of all the changes within a "journal" for easy reversion and commitment to the database. This feature is particularly useful for handling reversion of state changes in case of transaction failures or other exceptions. The module interacts with a database through the Database trait, which abstracts the operations for fetching and storing data. This design allows for a pluggable backend where different implementations of the Database trait can be used to persist the state in various ways (for instance, in-memory or disk-based databases).

Data Structures

JournaledState: This structure represents the entire state of the blockchain, including accounts, their associated balances, nonces, and code hashes. It maintains a journal of all state changes that allows for easy reversion and commitment of changes to the database.
Account: This structure represents an individual account on the blockchain. It includes the account's balance, nonce, and code hash. It also includes a flag indicating if the account is self-destructed, and a map representing the account's storage.
JournalEntry: This structure represents an entry in the JournaledState's journal. Each entry describes an operation that changes the state, such as an account loading, an account destruction, or a storage change.

Methods

selfdestruct: This method marks an account as self-destructed and transfers its balance to a target account. If the target account does not exist, it's created. If the self-destructed account and the target are the same, the balance will be lost.
initial_account_load: This method loads an account's basic information from the database without loading the code. It also loads specified storage slots into memory.
load_account: This method loads an account's information into memory and returns whether the account was cold or warm accessed.
load_account_exist: This method checks whether an account exists or not. It returns whether the account was cold or warm accessed and whether it exists.
load_code: This method loads an account's code into memory from the database.
sload: This method loads a specified storage value of an account. It returns the value and whether the storage was cold loaded.
sstore: This method changes the value of a specified storage slot in an account and returns the original value, the present value, the new value, and whether the storage was cold loaded.
log: This method adds a log entry to the journal.
is_precompile: This method checks whether an address is a precompiled contract or not.

Relevant EIPs

The JournaledState module's operations are primarily designed to comply with the Ethereum standards defined in several Ethereum Improvement Proposals (EIPs). More specifically:

EIP-161: State Trie Clearing

EIP-161 aims to optimize Ethereum's state management by deleting empty accounts. The specification was proposed by Gavin Wood and was activated in the Spurious Dragon hardfork at block number 2,675,000 on the Ethereum mainnet.proposal. The EIP focuses on four main changes:

Account Creation: During the creation of an account (whether by transactions or the CREATE operation), the nonce of the new account is incremented by one before the execution of the initialization code. For most networks, the starting value is 1, but this may vary for test networks with non-zero default starting nonces.
CALL and SELFDESTRUCT charges: Prior to EIP-161, a gas charge of 25,000 was levied for CALL and SELFDESTRUCT operations if the destination account did not exist. With EIP-161, this charge is only applied if the operation transfers more than zero value and the destination account is dead (non-existent or empty).
Existence of Empty Accounts: An account cannot change its state from non-existent to existent-but-empty. If an operation resulted in an empty account, the account remains non-existent.
Removal of Empty Accounts: At the end of a transaction, any account that was involved in potentially state-changing operations and is now empty will be deleted.

Definitions:

empty: An account is considered "empty" if it has no code, and its nonce and balance are both zero.
dead: An account is considered "dead" if it is non-existent or empty.
touched: An account is considered "touched" when it is involved in any potentially state-changing operation.

These rules have an impact on how state is managed within the EIP-161 context, and this affects how the JournaledState module functions. For example, operations like initial_account_load, and selfdestruct all need to take into account whether an account is empty and/or dead.

Rationale

The rationale behind EIP-161 is to optimize the Ethereum state management by getting rid of unnecessary data. Prior to this change, it was possible for the state trie to become bloated with empty accounts. This bloating resulted in increased storage requirements and slower processing times for Ethereum nodes.

By removing these empty accounts, the size of the state trie can be reduced, leading to improvements in the performance of Ethereum nodes. Additionally, the changes regarding the gas costs for CALL and SELFDESTRUCT operations add a new level of nuance to the Ethereum gas model, further optimizing transaction processing.

EIP-161 has a significant impact on the state management of Ethereum, and thus is highly relevant to the JournaledState module of the revm crate. The operations defined in this module, such as loading accounts, self-destructing accounts, and changing storage, must all conform to the rules defined in EIP-161.

EIP-658: Embedding transaction status code in receipts

This EIP is particularly important because it introduced a way to unambiguously determine whether a transaction was successful or not. Before the introduction of EIP-658, it was impossible to determine with certainty if a transaction was successful simply based on its gas consumption. This was because with the introduction of the REVERT opcode in EIP-140, transactions could fail without consuming all gas.

EIP-658 replaced the intermediate state root field in the receipt with a status code that indicates whether the top-level call of the transaction succeeded or failed. The status code is 1 for success and 0 for failure.

This EIP affects the JournaledState module, as the result of executing transactions and their success or failure status directly influences the state of the blockchain. The execution of state-modifying methods like , selfdestruct, sstore, and log can result in success or failure, and the status needs to be properly reflected in the transaction receipt.

Rationale

The main motivation behind EIP-658 was to provide an unambiguous way to determine the success or failure of a transaction. Before EIP-658, users had to rely on checking if a transaction had consumed all gas to guess if it had failed. However, this was not reliable because of the introduction of the REVERT opcode in EIP-140.

Moreover, although full nodes can replay transactions to get their return status, fast nodes can only do this for transactions after their pivot point, and light nodes cannot do it at all. This means that without EIP-658, it is impractical for a non-full node to reliably determine the status of a transaction.

EIP-658 addressed this problem by embedding the status code directly in the transaction receipt, making it easily accessible. This change was minimal and non-disruptive, while it significantly improved the clarity and usability of transaction receipts.

EIP-2929: Gas cost increases for state access opcodes

EIP-2929 proposes an increase in the gas costs for several opcodes when they're used for the first time in a transaction. The EIP was created to mitigate potential DDoS (Distributed Denial of Service) attacks by increasing the cost of potential attack vectors, and to make the stateless witness sizes in Ethereum more manageable.

EIP-2929 also introduces two sets, accessed_addresses and accessed_storage_keys, to track the addresses and storage slots that have been accessed within a transaction. This mitigates the additional gas cost for repeated operations on the same address or storage slot within a transaction, as any repeated operation on an already accessed address or storage slot will cost less gas.

In the context of this EIP, "cold" and "warm" (or "hot") refer to whether an address or storage slot has been accessed before during the execution of a transaction. If an address or storage slot is being accessed for the first time in a transaction, it is referred to as a "cold" access. If it has already been accessed within the same transaction, any subsequent access is referred to as "warm" or "hot".

Parameters: The EIP defines new parameters such as COLD_SLOAD_COST (2100 gas) for a "cold" storage read, COLD_ACCOUNT_ACCESS_COST (2600 gas) for a "cold" account access, and WARM_STORAGE_READ_COST (100 gas) for a "warm" storage read.
Storage read changes: For SLOAD operation, if the (address, storage_key) pair is not yet in accessed_storage_keys, COLD_SLOAD_COST gas is charged and the pair is added to accessed_storage_keys. If the pair is already in accessed_storage_keys, WARM_STORAGE_READ_COST gas is charged.
Account access changes: When an address is the target of certain opcodes (EXTCODESIZE, EXTCODECOPY, EXTCODEHASH, BALANCE, CALL, CALLCODE, DELEGATECALL, STATICCALL), if the target is not in accessed_addresses, COLD_ACCOUNT_ACCESS_COST gas is charged, and the address is added to accessed_addresses. Otherwise, WARM_STORAGE_READ_COST gas is charged.
SSTORE changes: For SSTORE operation, if the (address, storage_key) pair is not in accessed_storage_keys, an additional COLD_SLOAD_COST gas is charged, and the pair is added to accessed_storage_keys.
SELFDESTRUCT changes: If the recipient of SELFDESTRUCT is not in accessed_addresses, an additional COLD_ACCOUNT_ACCESS_COST is charged, and the recipient is added to the set.

This methodology allows Ethereum to maintain an internal record of accessed accounts and storage slots within a transaction, making it possible to charge lower gas fees for repeated operations, thereby reducing the cost for such operations.

Rationale

Security: Previously, these opcodes were underpriced, making them susceptible to DoS attacks where an attacker sends transactions that access or call a large number of accounts. By increasing the gas costs, the EIP intends to mitigate these potential security risks.
Improving stateless witness sizes: Stateless Ethereum clients don't maintain the complete state of the blockchain, but instead rely on block "witnesses" (a list of all the accounts, storage, and contract code accessed during transaction execution) to validate transactions. This EIP helps in reducing the size of these witnesses, thereby making stateless Ethereum more viable.

Interpreter

The interpreter crate is concerned with the execution of the EVM opcodes and serves as the event loop to step through the opcodes. The interpreter is concerned with attributes like gas, contracts, memory, stack, and returning execution results. It is structured as follows:

Modules:

gas: Handles gas mechanics in the EVM, such as calculating gas costs for operations.
host: Defines the EVM context Host trait.
inner_models: Contains inner data structures used in the EVM implementation.
instruction_result: Defines results of instruction execution.
instructions: Defines the EVM opcodes (i.e. instructions).

External Crates:

alloc: The alloc crate is used to provide the ability to allocate memory on the heap. It's a part of Rust's standard library that can be used in environments without a full host OS.
core: The core crate is the dependency-free foundation of the Rust standard library. It includes fundamental types, macros, and traits.

Re-exports:

Several types and functions are re-exported for easier access by users of this library, such as Gas, Host, InstructionResult, OpCode, Interpreter, Memory, Stack, and others. This allows users to import these items directly from the library root instead of from their individual modules.
revm_primitives: This crate is re-exported, providing primitive types or functionality used in the EVM implementation.

The `gas.rs` Module

The gas.rs module in this Rust EVM implementation manages the concept of "gas" within the Ethereum network. In Ethereum, "gas" signifies the computational effort needed to execute operations, whether a simple transfer of ether or the execution of a smart contract function. Each operation carries a gas cost, and transactions must specify the maximum amount of gas they are willing to consume.

Data Structures

Gas Struct

The Gas struct represents the gas state for a particular operation or transaction. The struct is defined as follows:

Fields in Gas Struct
- limit: The maximum amount of gas allowed for the operation or transaction.
- all_used_gas: The total gas used, inclusive of memory expansion costs.
- used: The gas used, excluding memory expansion costs.
- memory: The gas used for memory expansion.
- refunded: The gas refunded. Certain operations in Ethereum allow for gas refunds, up to half the gas used by a transaction.

Methods of the `Gas` Struct

The Gas struct also includes several methods to manage the gas state. Here's a brief summary of their functions:

new: Creates a new Gas instance with a specified gas limit and zero usage and refunds.
limit, memory, refunded, spend, remaining: These getters return the current state of the corresponding field.
erase_cost: Decreases the gas usage by a specified amount.
record_refund: Increases the refunded gas by a specified amount.
record_cost: Increases the used gas by a specified amount. It also checks for gas limit overflow. If the new total used gas would exceed the gas limit, it returns false and doesn't change the state.
record_memory: This method works similarly to record_cost, but specifically for memory expansion gas. It only updates the state if the new memory gas usage is greater than the current usage.
gas_refund: Increases the refunded gas by a specified amount.

EVM Memory

Is a memory localized to the current Interpreter context. Interpreter context is a call or create frame. It is used by opcodes to store or format data that are more then 32 bytes long, for example calls to format input, return output or for logs data. Revm has a shared memory between all the Interpreters but Interpreter loop only see the part it is allocated to it.

Extending memory is paid by the gas. It consumes 3 gas per word plus square of the number of words added divided by 512 (3*N+ N^2/512). There is no limit on the size of the memory, but it is limited by logaritmic growth of the gas cost. For 30M there is a calculated max memory of 32MB (Blog post by ramco: Upper bound for transaction memory).

Opcodes

Here is a list of all opcodes that are reading or writing to the memory. All read on memory can still change the memory size by extending it with zeroes. Call opcodes are specific as they read input before the call but also write their output after the call (if call is okay and there is an output to write) to the memory.

These opcodes read from the memory:

RETURN
REVERT
LOG
KECCAK256
CREATE
CREATE2
CALL
CALLCODE
DELEGATECALL
STATICCALL

These opcodes change the memory:

EXTCODECOPY
MLOAD
MSTORE
MSTORE8
MCOPY
CODECOPY
CALLDATACOPY
RETURNDATACOPY
CALL
CALLCODE
DELEGATECALL
STATICCALL

The `host.rs` Module

The host.rs module in this Rust EVM implementation defines a crucial trait Host. The Host trait outlines an interface for the interaction of the EVM interpreter with its environment (or "host"), encompassing essential operations such as account and storage access, creating logs, and invoking transactions.

The Evm struct implements this Host trait.

Trait Methods

env: This method provides access to the EVM environment, including information about the current block and transaction.
load_account: Retrieves information about a given Ethereum account.
block_hash: Retrieves the block hash for a given block number.
balance, code, code_hash, sload: These methods retrieve specific information (balance, code, code hash, and specific storage value) for a given Ethereum account.
sstore: This method sets the value of a specific storage slot in a given Ethereum account.
log: Creates a log entry with the specified address, topics, and data. Log entries are used by smart contracts to emit events.
selfdestruct: Marks an Ethereum account to be self-destructed, transferring its funds to a target account.

The Host trait provides a standard interface that any host environment for the EVM must implement. This abstraction allows the EVM code to interact with the state of the Ethereum network in a generic way, thereby enhancing modularity and interoperability. Different implementations of the Host trait can be used to simulate different environments for testing or for connecting to different Ethereum-like networks.

The `inner_models.rs` Module in the Rust Ethereum Virtual Machine (EVM)

The inner_models.rs module within this Rust EVM implementation encompasses a collection of datastructures used as internal models within the EVM. These models represent various aspects of EVM operations such as call and create inputs, call context, value transfers, and the result of self-destruction operations.

Data Structures

CallInputs Struct

The CallInputs struct is used to encapsulate the inputs to a smart contract call in the EVM. This struct includes the target contract address, the value to be transferred (if any), the input data, the gas limit for the call, the call context, and a boolean indicating if the call is a static call (a read-only operation).
CreateInputs Struct

The CreateInputs struct encapsulates the inputs for creating a new smart contract. This includes the address of the creator, the creation scheme, the value to be transferred, the initialization code for the new contract, and the gas limit for the creation operation.
CallScheme Enum

The CallScheme enum represents the type of call being made to a smart contract. The different types of calls (CALL, CALLCODE, DELEGATECALL, STATICCALL) represent different modes of interaction with a smart contract, each with its own semantics concerning the treatment of the message sender, value transfer, and the context in which the called code executes.
CallContext Struct

The CallContext struct encapsulates the context of a smart contract call. This includes the executing contract's address, the caller's address, the address from which the contract code was loaded, the apparent value of the call (for DELEGATECALL and CALLCODE), and the call scheme.
Transfer Struct

The Transfer struct represents a value transfer between two accounts.

SelfDestructResult Struct

Finally, the SelfDestructResult struct captures the result of a self-destruction operation on a contract. In summary, the inner_models.rs module provides several crucial data structures that facilitate the representation and handling of various EVM operations and their associated data within this Rust EVM implementation.

The `instruction_result.rs` Module

The instruction_result.rs module of this Rust EVM implementation includes the definitions of the InstructionResult and SuccessOrHalt enum, which represent the possible outcomes of EVM instruction execution, and functions to work with these types.

InstructionResult Enum

The InstructionResult enum categorizes the different types of results that can arise from executing an EVM instruction. This enumeration uses the #[repr(u8)] attribute, meaning its variants have an explicit storage representation of an 8-bit unsigned integer. The different instruction results represent outcomes such as successful continuation, stop, return, self-destruction, reversion, deep call, out of funds, out of gas, and various error conditions.
SuccessOrHalt Enum

The SuccessOrHalt enum represents the outcome of a transaction execution, distinguishing successful operations, reversion, halting conditions, fatal external errors, and internal continuation. It also provides several methods to check the kind of result and to extract the value of the successful evaluation or halt.
From<InstructionResult> for SuccessOrHalt Implementation

This implementation provides a way to convert an InstructionResult into a SuccessOrHalt. It maps each instruction result to the corresponding SuccessOrHalt variant.
Macros for returning instruction results

The module provides two macros, return_ok! and return_revert!, which simplify returning some common sets of instruction results.

The `instruction.rs` Module in the Rust Ethereum Virtual Machine (EVM)

The instruction.rs module defines interpretation mappings for EVM bytecode. It provides the definition of the Instruction struct, as well as the Opcode enumeration and the execute function, which runs a specific instruction.

`Opcode` Enum

The Opcode enum represents the opcodes that are available in the Ethereum Virtual Machine. Each variant corresponds to an operation that can be performed, such as addition, multiplication, subtraction, jumps, and memory operations.

`Instruction` Struct

The Instruction struct represents a single instruction in the EVM. It contains the opcode, which is the operation to be performed, and a list of bytes representing the operands for the instruction.

`step` Function

The step function interprets an instruction. It uses the opcode to determine what operation to perform and then performs the operation using the operands in the instruction.

Primitives

This crate is a core component of the revm system. It is designed to provide definitions for a range of types and structures commonly used throughout the application. It is set up to be compatible with environments that do not include Rust's standard library, as indicated by the no_std attribute.

Modules:

bytecode: This module provides functionality related to EVM bytecode.
constants: This module contains constant values used throughout the EVM implementation.
db: This module contains data structures and functions related to the EVM's database implementation.
env: This module contains types and functions related to the EVM's environment, including block headers, and environment values.
precompile: This module contains types related to Ethereum's precompiled contracts.
result: This module provides types for representing execution results and errors in the EVM.
specification: This module defines types related to Ethereum specifications (also known as hard forks).
state: This module provides types and functions for managing Ethereum state, including accounts and storage.
utilities: This module provides utility functions used in multiple places across the EVM implementation.
kzg: This module provides types and functions related to KZG commitment, it is empolyed visibly in the Point Evalution Precompile.

External Crates:

alloc: The alloc crate provides types for heap allocation.
bitvec: The bitvec crate provides a data structure to handle sequences of bits.
bytes: The bytes crate provides utilities for working with bytes.
hex: The hex crate provides utilities for encoding and decoding hexadecimal.
hex_literal: The hex_literal crate provides a macro for including hexadecimal data directly in the source code.
hashbrown: The hashbrown crate provides high-performance hash map and hash set data structures.
ruint: The ruint crate provides types and functions for big unsigned integer arithmetic.
c-kzg: A minimal implementation of the Polynomial Commitments API for EIP-4844, written in C. (With rust bindings)

Re-exported Types:

Address: A type representing a 160-bit (or 20-byte) array, typically used for Ethereum addresses.
B256: A type representing a 256-bit (or 32-byte) array, typically used for Ethereum hashes or integers.
Bytes: A type representing a sequence of bytes.
U256: A 256-bit unsigned integer type from the ruint crate.
HashMap and HashSet: High-performance hash map and hash set data structures from the hashbrown crate.

Re-exported Modules: All types, constants, and functions from the bytecode, constants, env, precompile, result, specification, state, utilities, KzgSettings, EnvKzgSettings, trusted_setup_points types and methods were all re-exported, allowing users to import these items directly from the primitives crate.

Database

Responsible for database operations. This module is where the blockchain's state persistence is managed. The module defines three primary traits (Database, DatabaseCommit, and DatabaseRef), a structure RefDBWrapper, and their associated methods.

The Database trait defines an interface for mutable interaction with the database. It has a generic associated type Error to handle different kinds of errors that might occur during these interactions. It provides methods to retrieve basic account information (basic), retrieve account code by its hash (code_by_hash), retrieve the storage value of an address at a certain index (storage), and retrieve the block hash for a certain block number (block_hash).

The DatabaseCommit trait defines a single commit method for committing changes to the database. The changes are a map between Ethereum-like addresses (type Address) and accounts.

The DatabaseRef trait is similar to the Database trait but is designed for read-only or immutable interactions. It has the same Error associated type and the same set of methods as Database, but these methods take &self instead of &mut self, indicating that they do not mutate the database.

The RefDBWrapper structure is a wrapper around a reference to a DatabaseRef type. It implements the Database trait, essentially providing a way to treat a DatabaseRef as a Database by forwarding the Database methods to the corresponding DatabaseRef methods.

Result

At the core of this module is the ExecutionResult enum, which describes the possible outcomes of an EVM execution: Success, Revert, and Halt. Success represents a successful transaction execution, and it holds important information such as the reason for success (an Eval enum), the gas used, the gas refunded, a vector of logs (Vec<Log>), and the output of the execution. This aligns with the stipulation in EIP-658 that introduces a status code in the receipt of a transaction, indicating whether the top-level call was successful or failed.

Revert represents a transaction that was reverted by the REVERT opcode without spending all of its gas. It stores the gas used and the output. Halt represents a transaction that was reverted for various reasons and consumed all its gas. It stores the reason for halting (a Halt enum) and the gas used.

The ExecutionResult enum provides several methods to extract important data from an execution result, such as is_success(), logs(), output(), into_output(), into_logs(), and gas_used(). These methods facilitate accessing key details of a transaction execution.

The EVMError and InvalidTransaction enums handle different kinds of errors that can occur in an EVM, including database errors, errors specific to the transaction itself, and errors that occur due to issues with gas, among others.

The Output enum handles different kinds of outputs of an EVM execution, including Call and Create. This is where the output data from a successful execution or a reverted transaction is stored.

Environment

A significant module that manages the execution environment of the EVM. The module contains objects and methods associated with processing transactions and blocks within such a blockchain environment. It defines several structures: Env, BlockEnv, TxEnv, CfgEnv, TransactTo, and CreateScheme. These structures contain various fields representing the block data, transaction data, environmental configurations, transaction recipient details, and the method of contract creation respectively.

The Env structure, which encapsulates the environment of the EVM, contains methods for calculating effective gas prices and for validating block and transaction data. It also checks transactions against the current state of the associated account, which is necessary to validate the transaction's nonce and the account balance. Various Ethereum Improvement Proposals (EIPs) are also considered in these validations, such as EIP-1559 for the base fee, EIP-3607 for rejecting transactions from senders with deployed code, and EIP-3298 for disabling gas refunds. The code is structured to include optional features and to allow for changes in the EVM specifications.

Specifications

Holds data related to Ethereum's technical specifications, serving as a reference point for Ethereum's rules and procedures obtained from the Ethereum execution specifications. The module is primarily used to enumerate and handle Ethereum's network upgrades or "hard forks" within the Ethereum Virtual Machine (EVM). These hard forks are referred to as SpecId in the code, representing different phases of Ethereum's development.

The SpecId enum assigns a unique numerical value and a unique string identifier to each Ethereum hard fork. These upgrades range from the earliest ones such as FRONTIER and HOMESTEAD, through to the most recent ones, including LONDON, MERGE, SHANGHAI, and LATEST.

The code also includes conversion methods such as try_from_u8() and from(). The former attempts to create a SpecId from a given u8 integer, while the latter creates a SpecId based on a string representing the name of the hard fork.

The enabled() method in SpecId is used to check if one spec is enabled on another, considering the order in which the hard forks were enacted.

The Spec trait is used to abstract the process of checking whether a given spec is enabled. It only has one method, enabled(), and a constant SPEC_ID.

The module then defines various Spec structs, each representing a different hard fork. These structs implement the Spec trait and each struct's SPEC_ID corresponds to the correct SpecId variant.

This module provides the necessary framework to handle and interact with the different Ethereum hard forks within the EVM, making it possible to handle transactions and contracts differently depending on which hard fork rules apply. It also simplifies the process of adapting to future hard forks by creating a new SpecId and corresponding Spec struct.

Bytecode

This module defines structures and methods to manipulate Ethereum bytecode and manage its state. It's built around three main components: JumpTable, BytecodeState, and Bytecode.

The JumpTable structure stores a map of valid jump destinations within a given Ethereum bytecode sequence. It is essentially an Arc (Atomic Reference Counter) wrapping a BitVec (bit vector), which can be accessed and modified using the defined methods, such as as_slice(), from_slice(), and is_valid().

The BytecodeState is an enumeration, capturing the three possible states of the bytecode: Raw, Checked, and Analysed. In the Checked and Analysed states, additional data is provided, such as the length of the bytecode and, in the Analysed state, a JumpTable.

The Bytecode struct holds the actual bytecode, its hash, and its current state (BytecodeState). It provides several methods to interact with the bytecode, such as getting the length of the bytecode, checking if it's empty, retrieving its state, and converting the bytecode to a checked state. It also provides methods to create new instances of the Bytecode struct in different states.

Constants

Holds constant values used throughout the system. This module defines important constants that help limit and manage resources in the Ethereum Virtual Machine (EVM). The constants include STACK_LIMIT and CALL_STACK_LIMIT, which restrict the size of the interpreter stack and the EVM call stack, respectively. Both are set to 1024.

The module also defines MAX_CODE_SIZE, which is set according to EIP-170's specification. EIP-170 imposes a maximum limit on the contract code size to mitigate potential vulnerabilities and inefficiencies in Ethereum. Without this cap, the act of calling a contract can trigger costly operations that scale with the size of the contract's code. These operations include reading the code from disk, preprocessing the code for VM execution, and adding data to the block's proof-of-validity. By implementing MAX_CODE_SIZE (set to 0x6000 or ~25kb), the EVM ensures that the cost of these operations remains manageable, even under high gas levels that could be encountered in the future. EIP-170's implementation thus offers crucial protection against potential DoS attacks and maintains efficiency, especially for future light clients verifying proofs of validity or invalidity.

Another constant defined here is MAX_INITCODE_SIZE, set in accordance with EIP-3860. EIP-3860 extends EIP-170 by introducing a maximum size limit for initialization code (initcode) and enforcing a gas charge for every 32-byte chunk of initcode, to account for the cost of jump destination analysis. Before EIP-3860, initcode analysis during contract creation wasn't metered, nor was there an upper limit for its size, resulting in potential inefficiencies and vulnerabilities. By setting MAX_INITCODE_SIZE to 2 * MAX_CODE_SIZE and introducing the said gas charge, EIP-3860 ensures that the cost of initcode analysis scales proportionately with its size. This constant, therefore, facilitates fair charging, simplifies EVM engines by setting explicit limits, and helps to create an extendable cost system for the future.

precompile

This module implements precompiled contracts in the EVM, adding a layer of pre-set functionalities. These are documented in more detail in the next section. The module defines the types and the enum that are used to handle precompiled contracts.

PrecompileResult: This is a type alias for a Result type. The Ok variant of this type contains a tuple (u64, Vec<u8>), where the u64 integer likely represents gas used by the precompiled contract, and the Vec<u8> holds the output data. The Err variant contains a PrecompileError.

StandardPrecompileFn and CustomPrecompileFn: These are type aliases for function pointers. Both functions take a byte slice and a u64 (probably the available gas) as arguments and return a PrecompileResult. The naming suggests that the former refers to built-in precompiled contracts, while the latter may refer to custom, user-defined contracts.

PrecompileError: This is an enumeration (enum) which describes the different types of errors that could occur while executing a precompiled contract. The listed variants suggest these errors are related to gas consumption, Blake2 hash function, modular exponentiation ("Modexp"), and Bn128, which is a specific elliptic curve used in cryptography.

State

Manages the EVM's state, including account balances, contract storage, and more.

This module models an Ethereum account and its state, which includes balance, nonce, code, storage, and status flags. The module also includes methods for interacting with the account's state.

The Account struct includes fields for info (of type AccountInfo), storage (a HashMap mapping a U256 value to a StorageSlot), and status (of type AccountStatus). AccountInfo represents the basic information about an Ethereum account, including its balance (balance), nonce (nonce), code (code), and a hash of its code (code_hash).

The AccountStatus is a set of bitflags, representing the state of the account. The flags include Loaded, Created, SelfDestructed, Touched, and LoadedAsNotExisting. The different methods provided within the Account struct allow for manipulating these statuses.

The StorageSlot struct represents a storage slot in the Ethereum Virtual Machine. It holds an original_value and a present_value and includes methods for creating a new slot and checking if the slot's value has been modified.

Two HashMap type aliases are created: State and Storage. State maps from a Address address to an Account and Storage maps from a U256 key to a StorageSlot.

The module includes a series of methods implemented for Account to manipulate and query the account's status. These include methods like mark_selfdestruct, unmark_selfdestruct, is_selfdestructed, mark_touch, unmark_touch, is_touched, mark_created, is_newly_created, is_empty, and new_not_existing.

Utilities

This Rust module provides utility functions and constants for handling Keccak hashing (used in Ethereum) and creating Ethereum addresses via legacy and CREATE2 methods. It also includes serialization and deserialization methods for hexadecimal strings representing byte arrays.

The KECCAK_EMPTY constant represents the Keccak-256 hash of an empty input.

The keccak256 function takes a byte slice input and returns its Keccak-256 hash as a B256 value.

KZG

With the introduction of EIP4844, this use of blobs for a more efficent short term storage is employed, the validity of this blob stored in the consensus layer is verified using the Point Evaluation pre-compile, a fancy way of verifing that and evaluation at a given point of a commited polynomial is vaild, in a much more bigger scale, implies that Data is Available.

This module houses;

KzgSettings: Stores the setup and parameters needed for computing and verify KZG proofs.

The KZG premitive provides a default KZGSettings obtained from this trusted setup ceremony, a provision is also made for using a custom KZGSettings if need be, this is available in the env.cfg.
trusted_setup_points: This module contains functions and types used for parsing and utilizing the Trusted Setup for the KzgSettings.

Precompile

The precompile crate contains the implementation of the Ethereum precompile opcodes in the EVM. Precompiles are a shortcut to execute a function implemented by the EVM itself, rather than an actual contract. Precompiled contracts are essentially predefined smart contracts on Ethereum, residing at hardcoded addresses and used for computationally heavy operations that are cheaper when implemented this way. There are 6 precompiles implemented in REVM, and they are: blake2, bn128 curve, identity, secp256k1, modexp, and sha256 and ripemd160 hash functions.

Modules:

blake2: Implements the BLAKE2 compression function, as specified in EIP-152.
bn128: Implements precompiled contracts for addition, scalar multiplication, and optimal ate pairing check on the alt_bn128 elliptic curve.
hash: Implements the SHA256 and RIPEMD160 hash functions.
identity: Implements the Identity precompile, which returns the input data unchanged.
point_evaluation: Implements the point evaluation precompile for EIP-4844.
modexp: Implements the big integer modular exponentiation precompile.
secp256k1: Implements the ECDSA public key recovery precompile, based on secp256k1 curves.

Types and Constants:

Address: A type alias for an array of 20 bytes. This is typically used to represent Ethereum addresses.
B256: A type alias for an array of 32 bytes, typically used to represent 256-bit hashes or integer values in Ethereum.
PrecompileOutput: Represents the output of a precompiled contract execution, including the gas cost, output data, and any logs generated.
Log: Represents an Ethereum log, with an address, a list of topics, and associated data.
Precompiles: A collection of precompiled contracts available in a particular hard fork of Ethereum.
Precompile: Represents a precompiled contract, which can either be a standard Ethereum precompile, or a custom precompile.
PrecompileWithAddress: Associates a precompiled contract with its address.
SpecId: An enumeration representing different hard fork specifications in Ethereum, such as Homestead, Byzantium, Istanbul, Berlin, and Latest.

Functions:

calc_linear_cost_u32: A utility function to calculate the gas cost for certain precompiles based on their input length.
u64_to_b160: A utility function for converting a 64-bit unsigned integer into a 20-byte Ethereum address.

External Crates:

alloc: The alloc crate provides types for heap allocation, and is used here for the Vec type.
core: The core crate provides fundamental Rust types, macros, and traits, and is used here for fmt::Result.

Re-exported Crates and Types:

revm_primitives: This crate is re-exported, indicating it provides some types used by the precompile crate.
primitives: Types from the primitives module of revm_primitives are re-exported, including Bytes, HashMap, and all types under precompile. The latter includes the PrecompileError type, which is aliased to Error.

Re-exported Functionality:

Precompiles provides a static method for each Ethereum hard fork specification (e.g., homestead, byzantium, istanbul, berlin, cancun, and latest), each returning a set of precompiles for that specification.
Precompiles also provides methods to retrieve the list of precompile addresses (addresses), to check if a given address is a precompile (contains), to get the precompile at a given address (get), to check if there are no precompiles (is_empty), and to get the number of precompiles (len).

blake2 hash

This module represents a Rust implementation of the Blake2b cryptographic hash function, a vital component of Ethereum's broader EIP-152 proposal. The primary purpose of this module is to integrate the Blake2b function into Ethereum's precompiled contract mechanism, providing a consistent and efficient way to perform the cryptographic hashing that underpins Ethereum's functionality.

In EIP-152 introduced a new precompiled contract that implements the BLAKE2 cryptographic hashing algorithm's compression function. The purpose of this is to enhance the interoperability between Ethereum and Zcash, as well as to introduce more versatile cryptographic hash primitives to the Ethereum Virtual Machine (EVM).

BLAKE2 is not just a powerful cryptographic hash function and SHA3 contender, but it also allows for the efficient validation of the Equihash Proof of Work (PoW) used in Zcash. This could make a Bitcoin Relay-style Simplified Payment Verification (SPV) client feasible on Ethereum, as it enables the verification of Zcash block headers without excessive computational cost. BLAKE2b, a common 64-bit BLAKE2 variant, is highly optimized and performs faster than MD5 on modern processors.

The rationale behind incorporating Blake2b into Ethereum's suite of precompiled contracts is multifaceted:

Performance: The Blake2b hash function offers excellent performance, particularly when processing large inputs.
Security: Blake2b also provides a high degree of security, making it a suitable choice for cryptographic operations.
Interoperability: This function is widely used in various parts of the ecosystem, making it a prime candidate for inclusion in Ethereum's precompiled contracts.
Gas Cost: The gas cost per round (F_ROUND) is specified as 1. This number was decided considering the computational complexity and the necessity to keep the blockchain efficient and prevent spamming.

Core Components

Two primary constants provide the framework for the precompiled contract:

F_ROUND: u64: This is the cost of each round of computation in gas units. Currently set to 1. INPUT_LENGTH: usize: This specifies the required length of the input data, 213 bytes in this case.

Precompile Function - run

The run function is the main entry point for the precompiled contract. It consumes an input byte slice and a gas limit, returning a PrecompileResult. This function handles input validation, gas cost computation, data manipulation, and the compression algorithm.

It checks for correct input length and reads the final block flag. It then calculates the gas cost based on the number of rounds to be executed. If the gas cost exceeds the provided gas limit, it immediately returns an error.

Once the validation and gas cost computation are complete, it parses the input into three components: state vector h, message block vector m, and offset counter t.

Following this, it calls the compress function from the algo module, passing in the parsed input data and the final block flag.

Finally, it constructs and returns the PrecompileResult containing the gas used and the output data.

Algorithm Module - algo

The algo module encapsulates the technical implementation of the Blake2b hash function. It includes several key elements:

Constants:

SIGMA: This 2D array represents the message word selection permutation used in each round of the algorithm.
IV: These are the initialization vectors for the Blake2b algorithm.
The g Function: This is the core function within each round of the Blake2b algorithm. It manipulates the state vector and mixes in the message data.
The compress Function: This is the main function that executes the rounds of the g function, handles the last block flag, and updates the state vector with the output of each round.

bn128 curve

EIP-197 proposed the addition of precompiled contracts for a pairing function on a specific pairing-friendly elliptic curve. This complements EIP-196 in enabling zkSNARKs verification within Ethereum smart contracts. zkSNARKs (Zero-Knowledge Succinct Non-Interactive Argument of Knowledge) technology can enhance privacy for Ethereum users due to its Zero-Knowledge property. Moreover, it may offer a scalability solution because of its succinctness and efficient verifiability property.

Prior to this EIP, Ethereum's smart contract executions were fully transparent, limiting their use in cases involving private information, such as location, identity, or transaction history. While the Ethereum Virtual Machine (EVM) can theoretically use zkSNARKs, their implementation was presently too costly to fit within the block gas limit. EIP-197 defines specific parameters for basic primitives that facilitate zkSNARKs. This allows for more efficient implementation, thereby reducing gas costs.

Notably, setting these parameters doesn't restrict zkSNARKs' use-cases but actually enables the integration of zkSNARK research advancements without requiring further hard forks. Pairing functions, which enable a limited form of multiplicatively homomorphic operations necessary for zkSNARKs, could then be executed within the block gas limit through this precompiled contract.

The code consists of three modules: add, mul, and pair. The add and mul modules implement elliptic curve point addition and scalar multiplication respectively on the bn128 curve, an elliptic curve utilized within Ethereum. Each module defines two versions of the contract, one for the Istanbul and another for the Byzantium Ethereum network upgrades.

The pair module conducts the pairing check, an operation that enables comparison of two points on the elliptic curve, an essential part of many zero-knowledge proof systems, including zk-SNARKs. Again, two versions for Istanbul and Byzantium are defined. The run_add, run_mul, and run_pair functions embody the main implementations of the precompiled contracts, with each function accepting an input byte array, executing the appropriate elliptic curve operations, and outputting the results as a byte array.

The code ensures the allocation of sufficient gas for each operation by stipulating gas costs as constants at the start of each module. It employs the bn library to carry out the actual bn128 operations. As the functions operate with byte arrays, the code features significant byte manipulation and conversion. Consequently, the code presents an implementation of specific elliptic curve operations utilized in Ethereum.

SHA256 and RIPEMD160

REVM includes precompiled contracts for SHA256 and RIPEMD160, cryptographic hashing functions integral for data integrity and security. The addresses for these precompiled contracts are 0x0000000000000000000000000000000000000002 for SHA256 and 0x0000000000000000000000000000000000000003 for RIPEMD160.

Each function (sha256_run and ripemd160_run) accepts two arguments, the input data to be hashed and the gas_limit representing the maximum amount of computational work permissible for the function. They both calculate the gas cost of the operation based on the input data length. If the computed cost surpasses the gas_limit, an Error::OutOfGas is triggered.

The sha256_run function, corresponding to the SHA256 precompiled contract, computes the SHA256 hash of the input data. The ripemd160_run function computes the RIPEMD160 hash of the input and pads it to match Ethereum's 256-bit word size. These precompiled contracts offer a computationally efficient way for Ethereum contracts to perform necessary cryptographic operations.

Identity function

This precompiled contract performs the identity function. In mathematics, an identity function is a function that always returns the same value as its argument. In this context, the contract takes the input data and returns it as is. This precompiled contract resides at the hardcoded Ethereum address 0x0000000000000000000000000000000000000004.

The identity_run function takes two arguments: input data, which it returns unaltered, and gas_limit which defines the maximum computational work the function is allowed to do. A linear gas cost calculation based on the size of the input data and two constants, IDENTITY_BASE (the base cost of the operation) and IDENTITY_PER_WORD (the cost per word), is performed. If the calculated gas cost exceeds the gas_limit, an Error::OutOfGas is returned.

This identity function can be useful in various scenarios such as forwarding data or acting as a data validation check within a contract. Despite its simplicity, it contributes to the flexibility and broad utility of the Ethereum platform.

Modular Exponentiation

REVM also implements two versions of a precompiled contract (Modular Exponential operation), each corresponding to different Ethereum hard forks: Byzantium and Berlin. The contract addresses are 0x0000000000000000000000000000000000000005 for both versions, as they replaced each other in subsequent network upgrades. This operation is used for cryptographic computations and is a crucial part of Ethereum's toolkit.

The byzantium_run and berlin_run functions each run the modular exponential operation using the run_inner function, but each uses a different gas calculation method: byzantium_gas_calc for Byzantium and berlin_gas_calc for Berlin. The gas calculation method used is chosen based on the Ethereum network's current version. The run_inner function is a core function that reads the inputs and performs the modular exponential operation. If the calculated gas cost is higher than the gas limit, an error Error::OutOfGas is returned. If all computations are successful, the function returns the result of the operation and the gas cost.

The calculate_iteration_count function calculates the number of iterations required to compute the operation, based on the length and value of the exponent. The read_u64_with_overflow macro reads input data and checks for potential overflows.

The byzantium_gas_calc function calculates the gas cost for the modular exponential operation as defined in the Byzantium version of the Ethereum protocol. The berlin_gas_calc function calculates the gas cost according to the Berlin version, as defined in EIP-2565. These two versions have different formulas to calculate the gas cost of the operation, reflecting the evolution of the Ethereum network.

Secp256k1

This implements Ethereum's precompiled contract ECRECOVER, an elliptic curve digital signature algorithm (ECDSA) recovery function that recovers the Ethereum address (public key hash) associated with a given signature. The implementation features two versions, each contingent on whether the secp256k1 cryptographic library is enabled, which depends on the build configuration.

Both versions define a secp256k1 module that includes an ecrecover function. This function takes a digital signature and a message as input, both represented as byte arrays, and returns the recovered Ethereum address. It performs this operation by using the signature to recover the original public key used for signing, then hashing this public key with Keccak256, Ethereum's chosen hash function. The hash is then truncated to match Ethereum's 20-byte address size.

When secp256k1 is not enabled, the ecrecover function uses the k256 library to parse the signature, recover the public key, and perform the hashing. When secp256k1 is enabled, the function uses the secp256k1 library for these operations. Although both versions perform the same fundamental operation, they use different cryptographic libraries, which can offer different optimizations and security properties.

The ec_recover_run function is the primary entry point for this precompiled contract. It parses the input to extract the message and signature, checks if enough gas is provided for execution, and calls the appropriate ecrecover function. The result of the recovery operation is returned as a PrecompileResult, a type that represents the outcome of a precompiled contract execution in Ethereum.

Point Evaluation Precompile

This precompile is introduced in EIP-4844 and is used to verify KZG commitments of blobs. The precompile allows for efficient verification of commitments to blob transactions. Blob transactions contain a large amount of data that cannot be accessed by EVM execution, but has a commitment that can be accessed and verified. The EIP is designed to be forward compatible with Danksharding architecture while giving L2s access to cheaper L1 commitments. This precompiled contract resides at the hardcoded Ethereum address 0x000000000000000000000000000000000000000A.

A useful resource is the Python reference implementation for the precompile, which can be found here. The implementation in REVM uses c-kzg-4844, via its foreign function interface bindings, from the Ethereum Foundation.