Introduction
revm
is an Ethereum Virtual Machine (EVM) written in Rust that is focused on speed and simplicity. This documentation is very much a work in progress and a community effort. If you would like to contribute and improve these docs please make a pr to the github repo. Most importantly, Revm is just the execution environment for ethereum; there is no networking or consensus related work in this repository.
Crates
The project has 4 main crates that are used to build revm. These are:
revm
: The main EVM library.interpreter
: Execution loop with instructions.primitives
: Primitive data types.precompile
: EVM precompiles.
Testing with the binaries
There are two binaries both of which are used for testing. To install them run cargo install --path bins/<binary-name>
. The binaries are:
revme
: A CLI binary, used for running state test json. Currently it is used to run ethereum tests to check if revm is compliant. For example if you have the eth tests cloned into a directory called eth tests and the EIP tests in the following directories you can run
cargo run --profile ethtests -p revme -- \
statetest \
../ethtests/GeneralStateTests/ \
../ethtests/LegacyTests/Constantinople/GeneralStateTests/ \
bins/revme/tests/EIPTests/StateTests/stEIP5656-MCOPY/ \
bins/revme/tests/EIPTests/StateTests/stEIP1153-transientStorage/
revm-test
: test binaries with contracts; used mostly to check performance
If you are interested in contributing, be sure to run the statetests. It is recommended to read about the ethereum tests.
Rust Ethereum Virtual Machine (revm)
The evm
crate is focused on the implementation of Ethereum Virtual Machine (EVM) including call loop and host implementation, database handling, state journaling and powerful logic handlers that can be overwritten.
This crate pulls Primitives, Interpreter and Precompiles together to deliver the rust evm.
The starting point for reading the documentation is Evm
, that is main structure of EVM.
Then, you can read about the EvmBuilder
that is used to create the Evm
and modify it.
After, you can read about the Handler
that is used to modify the logic of the Evm, and it will tie with how Evm introspection can be done.
Finally, you can read about the Inspector
, a legacy interface for inspecting execution that is now repurposed as a handler register example.
Modules:
evm
: This is main module that executes EVM calls.builder
: This module builds the Evm, sets database, handlers and other parameters. Here is where we set handlers for specific fork or external state for inspection.db
: This module includes structures and functions for database interaction. It is a glue between EVM and database. It transforms or aggregates the EVM changes.inspector
: This module introduces theInspector
trait and its implementations for observing the EVM execution. This was the main way to inspect EVM execution before the Builder and Handlers were introduced. It is still enabled through the Builder.journaled_state
: This module manages the state of the EVM and implements a journaling system to handle changes and reverts.
Re-exported Modules:
revm_precompile
: Crate that provides precompiled contracts used in the EVM implementation.revm_interpreter
: Crate that provides execution engine for EVM opcodes.revm_interpreter::primitives
: This module from therevm_interpreter
crate provides primitive types and other functionality used in the EVM implementation.
Re-exported Types:
Database
,DatabaseCommit
,InMemoryDB
: These types from thedb
module are re-exported for handling the database operations.EVM
: TheEVM
struct from theevm
module is re-exported, serving as the main interface to the EVM implementation.EvmContext
: TheEvmContext
struct from thecontext
module is re-exported, providing data structures to encapsulate EVM execution data.JournalEntry
,JournaledState
: These types from thejournaled_state
module are re-exported, providing the journaling system for the EVM state.inspectors
,Inspector
: TheInspector
trait and its implementations from theinspector
module are re-exported for observing the EVM execution.
EVM
Evm
is the primary structure that implements the Ethereum Virtual Machine (EVM), a stack-based virtual machine that executes Ethereum smart contracts.
What is inside
It is consisting of two main parts the Context
and the Handler
. Context
represent the state that is needed for execution and Handler
contains list of functions that act as a logic.
Context
is additionally split between EvmContext
and External
context. EvmContext
is internal and contains Database
, Environment
, JournaledState
and Precompiles
. And External
context is fully generic without any trait restrains and its purpose is to allow custom handlers to save state in runtime or allows hooks to be added (For example external contexts can be a Inspector), more on its usage can be seen in EvmBuilder
.
Evm
implements the Host
trait, which defines an interface for the interaction of the EVM Interpreter with its environment (or "host"), encompassing essential operations such as account and storage access, creating logs, and invoking sub calls and selfdestruct.
Data structures of block and transaction can be found inside Environment
. And more information on journaled state can be found in JournaledState
documentation.
Runtime
Runtime consists of list of functions from Handler
that are called in predefined order.
They are grouped by functionality on Verification
, PreExecution
, Execution
, PostExecution
and Instruction
functions.
Verification function are related to the pre-verification of set Environment
data.
Pre-/Post-execution functions deduct and reward caller beneficiary.
And Execution
functions handle initial call and creates and sub calls.
Instruction
functions are part of the instruction table that is used inside Interpreter
to execute opcodes.
The Evm
execution runs two loops:
Call loop
The first loop is call loop that everything starts with, it creates call frames, handles subcalls, it returns outputs and calls Interpreter
loop to execute bytecode instructions.
It is handled by ExecutionHandler
.
The first loop implements a stack of Frames
.
It is responsible for handling sub calls and its return outputs.
At the start, Evm
creates Frame
containing Interpreter
and starts the loop.
The Interpreter
returns the InterpreterAction
which can be:
Return
: This interpreter finished its run.Frame
is popped from the stack and its return value is pushed to the parentFrame
stack.SubCall
/SubCreate
: A newFrame
needs to be created and pushed to the stack. A newFrame
is created and pushed to the stack and the loop continues. When the stack is empty, the loop finishes.
Interpreter loop
The second loop is the Interpreter
loop which is called by the call loop and loops over bytecode opcodes and executes instructions based on the InstructionTable
.
It is implemented in the Interpreter
crate.
To dive deeper into the Evm
logic check Handler
documentation.
Functionalities
The function of Evm
is to start execution, but setting up what it is going to execute is done by EvmBuilder
.
The main functions of the builder are:
preverify
- that only pre-verifies transaction information.transact preverified
- is next step after pre-verification that executes transactions.transact
- it calls both preverifies and executes transactions.builder
andmodify
functions - allow building or modifying theEvm
, more on this can be found inEvmBuilder
documentation.builder
is the main way of creatingEvm
andmodify
allows you to modify parts of it without dissolvingEvm
.into_context
- is used when we want to get theContext
fromEvm
.
Evm Builder
The builder creates or modifies the EVM and applies different handlers. It allows setting external context and registering handler custom logic.
The revm Evm
consists of Context
and Handler
.
Context
is additionally split between EvmContext
(contains generic Database
) and External
context (generic without restrain).
Read evm for more information on the internals.
The Builder
ties dependencies between generic Database
, External
context and Spec
.
It allows handle registers to be added that implement logic on those generics.
As they are interconnected, setting Database
or ExternalContext
resets handle registers, so builder stages are introduced to mitigate those misuses.
Simple example of using EvmBuilder
:
use crate::evm::Evm;
// Build Evm with default values.
let mut evm = Evm::builder().build();
let output = evm.transact();
Builder Stages
There are two builder stages that are used to mitigate potential misuse of the builder:
SetGenericStage
: Initial stage that allows setting the database and external context.HandlerStage
: Allows setting the handler registers but is explicit about setting new generic type as it will void the handler registers.
Functions from one stage are just renamed functions from other stage, it is made so that user is more aware of what underlying function does.
For example, in SettingDbStage
we have with_db
function while in HandlerStage
we have reset_handler_with_db
, both of them set the database but the latter also resets the handler.
There are multiple functions that are common to both stages such as build
.
Builder naming conventions
In both stages we have:
build
creates the Evm.spec_id
creates new mainnet handler and reapplies all the handler registers.modify_*
functions are used to modify the database, external context or Env.clear_*
functions allows setting default values for Environment.append_handler_register_*
functions are used to push handler registers. This will transition the builder to theHandlerStage
.
In SetGenericStage
we have:
with_*
are found inSetGenericStage
and are used to set the generics.
In HandlerStage
we have:
reset_handler_with_*
is used if we want to change some of the generic types this will reset the handler registers. This will transition the builder to theSetGenericStage
.
Creating and modification of Evm
Evm implements functions that allow using the EvmBuilder
without even knowing that it exists.
The most obvious one is Evm::builder()
that creates a new builder with default values.
Additionally, a function that is very important is evm.modify()
that allows modifying the Evm.
It returns a builder, allowing users to modify the Evm.
Examples
The following example uses the builder to create an Evm
with inspector:
use crate::{
db::EmptyDB, Context, EvmContext, inspector::inspector_handle_register, inspectors::NoOpInspector, Evm,
};
// Create the evm.
let evm = Evm::builder()
.with_db(EmptyDB::default())
.with_external_context(NoOpInspector)
// Register will modify Handler and call NoOpInspector.
.append_handler_register(inspector_handle_register)
// .with_db(..) does not compile as we already locked the builder generics,
// alternative fn is reset_handler_with_db(..)
.build();
// Execute the evm.
let output = evm.transact();
// Extract evm context.
let Context {
external,
evm: EvmContext { db, .. },
} = evm.into_context();
The next example changes the spec id and environment of an already built evm.
use crate::{Evm,SpecId::BERLIN};
// Create default evm.
let evm = Evm::builder().build();
// Modify evm spec.
let evm = evm.modify().with_spec_id(BERLIN).build();
// Shortcut for above.
let mut evm = evm.modify_spec_id(BERLIN);
// Execute the evm.
let output1 = evm.transact();
// Example of modifying the tx env.
let mut evm = evm.modify().modify_tx_env(|env| env.gas_price = 0.into()).build();
// Execute the evm with modified tx env.
let output2 = evm.transact();
Example of adding custom precompiles to Evm.
use super::SpecId;
use crate::{
db::EmptyDB,
inspector::inspector_handle_register,
inspectors::NoOpInspector,
primitives::{Address, Bytes, ContextStatefulPrecompile, ContextPrecompile, PrecompileResult},
Context, Evm, EvmContext,
};
use std::sync::Arc;
struct CustomPrecompile;
impl ContextStatefulPrecompile<EvmContext<EmptyDB>, ()> for CustomPrecompile {
fn call(
&self,
_input: &Bytes,
_gas_limit: u64,
_context: &mut EvmContext<EmptyDB>,
_extcontext: &mut (),
) -> PrecompileResult {
Ok((10, Bytes::new()))
}
}
fn main() {
let mut evm = Evm::builder()
.with_empty_db()
.with_spec_id(SpecId::HOMESTEAD)
.append_handler_register(|handler| {
let precompiles = handler.pre_execution.load_precompiles();
handler.pre_execution.load_precompiles = Arc::new(move || {
let mut precompiles = precompiles.clone();
precompiles.extend([(
Address::ZERO,
ContextPrecompile::ContextStateful(Arc::new(CustomPrecompile)),
)]);
precompiles
});
})
.build();
evm.transact().unwrap();
}
Appending handler registers
Handler registers are simple functions that allow modifying the Handler
logic by replacing the handler functions.
They are used to add custom logic to the evm execution but as they are free to modify the Handler
in any form they want.
There may be conflicts if handlers that override the same function are added.
The most common use case for adding new logic to Handler
is Inspector
that is used to inspect the execution of the evm.
Example of this can be found in Inspector
documentation.
Handler
This is the logic part of the Evm. It contains the Specification ID, list of functions that do the logic and list of registers that can change behavior of the Handler when it is build.
Functions can be grouped in five categories and are marked in that way in the code:
- Validation functions: ValidationHandler
- Pre-execution functions: PreExecutionHandler
- Execution functions: ExecutionHandler
- Post-execution functions: PostExecutionHandler
- Instruction table: InstructionTable
Handle Registers
This is a simple function that is used to modify handler functions.
The amazing thing about them is that they can be done over generic external type.
For example, this allows to have a register over trait that allows to add hooks to any type that implements the trait.
That trait can be a GetInspector
trait, so any implementation is able to register inspector-related functions.
GetInspector
is implemented on every Inspector
and it is used inside the EvmBuilder
to change behavior of the default mainnet Handler.
Handle registers are set in EvmBuilder
.
The order of the registers is important as they are called in the order they are registered.
It matters if register overrides the previous handle or just wraps it, overriding handle can disrupt the logic of previous registered handles.
Registers are very powerful as they allow modification of any part of the Evm and with additional of the External
context it becomes a powerful combo.
A simple example is to register new pre-compiles for the Evm.
ValidationHandler
Consists of functions that are used to validate transaction and block data.
They are called before the execution of the transaction, to check whether the (Environment
) data is valid.
They are called in the following order:
validate_env
: Verifies if all data is set inEnvironment
and if valid, for example ifgas_limit
is smaller than blockgas_limit
.validate_initial_tx_gas
: Calculates initial gas needed for the transaction to be executed and checks if it is less than the transaction gas_limit. Note that this does not touch theDatabase
or state.validate_tx_against_state
: Loads the caller account and checks their information. Among them the nonce, if there is enough balance to pay for max gas spent and balance transferred.
PreExecutionHandler
Consists of functions that are called before execution. They are called in the following order:
-
load
: Loads access list and beneficiary fromDatabase
. Cold load is done here. -
load_precompiles
: Retrieves the precompiles for the given spec ID. More info: precompile. -
apply_eip7702_auth_list
Applies the EIP-7702 authorization list to the accounts. Return gas refund of already created accounts. -
deduct_caller
: Deducts values from the caller to calculate the maximum amount of gas that can be spent on the transaction. This loads the caller account from theDatabase
.
ExecutionHandler
Consists of functions that handle the execution of the transaction and the stack of the call frames.
-
call
: Called on every frame. It creates a new call frame or returns the frame result (the frame result is only returned when callingprecompile
). IfFrameReturn
is returned, then the next function that is called isinsert_call_outcome
. -
call_return
: Called after call frame returns from execution. It is used to calculate the gas that is returned from the frame and create theFrameResult
that is used to apply the outcome to parent frame ininsert_call_outcome
. -
insert_call_outcome
: Inserts the call outcome to the parent frame. It is called on every frame that is created except the first one. For the first frame we uselast_frame_return
. -
create
: Creates new create call frame, create new account and execute bytecode that outputs the code of the new account. -
create_return
: This handler is called after every frame is executed (Expect first). It will calculate the gas that is returned from the frame and apply output to the parent frame. -
insert_create_outcome
: Inserts the outcome of a call into the virtual machine's state. -
last_frame_return
: This handler is called after last frame is returned. It is used to calculate the gas that is returned from the first frame and incorporate transaction gas limit (the first frame has limitgas_limit - initial_gas
).
InstructionTable
This is a list of 256 function pointers that are used to execute instructions.
They have two types, first is simple function that is faster and second is BoxedInstruction
that has a small performance penalty but allows to capture the data.
Look at the Interpreter documentation for more information.
PostExecutionHandler
Is a list of functions that are called after the execution. They are called in the following order:
-
refund
Add EIP-7702 refund for already created accounts and calculates final gas refund that can be a maximum of 1/5 (1/2 before London hardfork) of spent gas. -
reimburse_caller
: Reimburse the caller with gas that was not spent during the execution of the transaction. Or balance of gas that needs to be refunded. -
reward_beneficiary
: Reward the beneficiary with the fee that was paid for the transaction. -
output
: Returns the state changes and the result of the execution. -
end
: Called after transaction. End handler will not be called if validation fails. -
clear
: Clears journal state and error and it is always called for the cleanup.
Inspectors
This module contains various inspectors that can be used to execute and monitor transactions on the Ethereum Virtual Machine (EVM) through the revm
library.
Overview
There are several built-in inspectors in this module:
NoOpInspector
: A basic inspector that does nothing, which can be used when you don't need to monitor transactions.GasInspector
: Monitors the gas usage of transactions.CustomPrintTracer
: Traces and prints custom messages during EVM execution. Available only when thestd
feature is enabled.TracerEip3155
: This is an inspector that conforms to the EIP-3155 standard for tracing Ethereum transactions. It's used to generate detailed trace data of transaction execution, which can be useful for debugging, analysis, or for building tools that need to understand the inner workings of Ethereum transactions. This is only available when bothstd
andserde-json
features are enabled.
Inspector trait
The Inspector
trait defines methods that are called during various stages of EVM execution.
You can implement this trait to create your own custom inspectors.
Each of these methods is called at different stages of the execution of a transaction. They can be used to monitor, debug, or modify the execution of the EVM.
For example, the step
method is called on each step of the interpreter, and the log
method is called when a log is emitted.
You can implement this trait for a custom database type DB
that implements the Database
trait.
Usage
To use an inspector, you need to implement the Inspector
trait.
For each method, you can decide what you want to do at each point in the EVM execution.
For example, to capture all SELFDESTRUCT
operations, implement the selfdestruct
method.
All methods in the Inspector
trait are optional to implement; if you do not need specific functionality, you can use the provided default implementations.
State implementations
State inherits the Database
trait and implements fetching of external state and storage, and various functionality on output of the EVM execution.
Most notably, caching changes while execution multiple transactions.
Database Abstractions
You can implement the traits Database
, DatabaseRef
or Database + DatabaseCommit
depending on the desired handling of the struct.
Database
: Has mutableself
in its functions. It is useful if you want to modify your cache or update some statistics onget
calls. This trait enablespreverify_transaction
,transact_preverified
,transact
andinspect
functions.DatabaseRef
: Takes a reference on the object. It is useful if you only have a reference on the state and don't want to update anything on it. It enablespreverify_transaction
,transact_preverified_ref
,transact_ref
andinspect_ref
functions.Database + DatabaseCommit
: Allows directly committing changes of a transaction. It enablestransact_commit
andinspect_commit
functions.
Journaled State
The journaled_state
module of the revm
crate provides a state management implementation for Ethereum-style accounts. It includes support for various actions such as self-destruction of accounts, initial account loading, account state modification, and logging. It also contains several important utility functions such as is_precompile
.
This module is built around the JournaledState
structure, which encapsulates the entire state of the blockchain. JournaledState
uses an internal state representation (a HashMap
) that tracks all accounts. Each account is represented by the Account
structure, which includes fields like balance, nonce, and code hash. For state-changing operations, the module keeps track of all the changes within a "journal" for easy reversion and commitment to the database. This feature is particularly useful for handling reversion of state changes in case of transaction failures or other exceptions. The module interacts with a database through the Database
trait, which abstracts the operations for fetching and storing data. This design allows for a pluggable backend where different implementations of the Database
trait can be used to persist the state in various ways (for instance, in-memory or disk-based databases).
Data Structures
-
JournaledState
: This structure represents the entire state of the blockchain, including accounts, their associated balances, nonces, and code hashes. It maintains a journal of all state changes that allows for easy reversion and commitment of changes to the database. -
Account
: This structure represents an individual account on the blockchain. It includes the account's balance, nonce, and code hash. It also includes a flag indicating if the account is self-destructed, and a map representing the account's storage. -
JournalEntry
: This structure represents an entry in theJournaledState
's journal. Each entry describes an operation that changes the state, such as an account loading, an account destruction, or a storage change.
Methods
-
selfdestruct
: This method marks an account as self-destructed and transfers its balance to a target account. If the target account does not exist, it's created. If the self-destructed account and the target are the same, the balance will be lost. -
initial_account_load
: This method loads an account's basic information from the database without loading the code. It also loads specified storage slots into memory. -
load_account
: This method loads an account's information into memory and returns whether the account was cold or warm accessed. -
load_account_exist
: This method checks whether an account exists or not. It returns whether the account was cold or warm accessed and whether it exists. -
load_code
: This method loads an account's code into memory from the database. -
sload
: This method loads a specified storage value of an account. It returns the value and whether the storage was cold loaded. -
sstore
: This method changes the value of a specified storage slot in an account and returns the original value, the present value, the new value, and whether the storage was cold loaded. -
log
: This method adds a log entry to the journal. -
is_precompile
: This method checks whether an address is a precompiled contract or not.
Relevant EIPs
The JournaledState module's operations are primarily designed to comply with the Ethereum standards defined in several Ethereum Improvement Proposals (EIPs). More specifically:
EIP-161: State Trie Clearing
EIP-161 aims to optimize Ethereum's state management by deleting empty accounts. The specification was proposed by Gavin Wood and was activated in the Spurious Dragon hardfork at block number 2,675,000
on the Ethereum mainnet.proposal. The EIP focuses on four main changes:
-
Account Creation: During the creation of an account (whether by transactions or the
CREATE
operation), the nonce of the new account is incremented by one before the execution of the initialization code. For most networks, the starting value is 1, but this may vary for test networks with non-zero default starting nonces. -
CALL
andSELFDESTRUCT
charges: Prior to EIP-161, a gas charge of25,000
was levied forCALL
andSELFDESTRUCT
operations if the destination account did not exist. With EIP-161, this charge is only applied if the operation transfers more than zero value and the destination account is dead (non-existent or empty). -
Existence of Empty Accounts: An account cannot change its state from non-existent to existent-but-empty. If an operation resulted in an empty account, the account remains non-existent.
-
Removal of Empty Accounts: At the end of a transaction, any account that was involved in potentially state-changing operations and is now empty will be deleted.
Definitions:
- empty: An account is considered "empty" if it has no code, and its nonce and balance are both zero.
- dead: An account is considered "dead" if it is non-existent or empty.
- touched: An account is considered "touched" when it is involved in any potentially state-changing operation.
These rules have an impact on how state is managed within the EIP-161 context, and this affects how the JournaledState module functions.
For example, operations like initial_account_load
, and selfdestruct
all need to take into account whether an account is empty and/or dead.
Rationale
The rationale behind EIP-161 is to optimize the Ethereum state management by getting rid of unnecessary data. Prior to this change, it was possible for the state trie to become bloated with empty accounts. This bloating resulted in increased storage requirements and slower processing times for Ethereum nodes.
By removing these empty accounts, the size of the state trie can be reduced, leading to improvements in the performance of Ethereum nodes.
Additionally, the changes regarding the gas costs for CALL
and SELFDESTRUCT
operations add a new level of nuance to the Ethereum gas model, further optimizing transaction processing.
EIP-161 has a significant impact on the state management of Ethereum, and thus is highly relevant to the JournaledState module of the revm crate. The operations defined in this module, such as loading accounts, self-destructing accounts, and changing storage, must all conform to the rules defined in EIP-161.
EIP-658: Embedding transaction status code in receipts
This EIP is particularly important because it introduced a way to unambiguously determine whether a transaction was successful or not. Before the introduction of EIP-658, it was impossible to determine with certainty if a transaction was successful simply based on its gas consumption. This was because with the introduction of the REVERT
opcode in EIP-140, transactions could fail without consuming all gas.
EIP-658 replaced the intermediate state root field in the receipt with a status code that indicates whether the top-level call of the transaction succeeded or failed. The status code is 1 for success and 0 for failure.
This EIP affects the JournaledState module, as the result of executing transactions and their success or failure status directly influences the state of the blockchain. The execution of state-modifying methods like , selfdestruct
, sstore
, and log
can result in success or failure, and the status needs to be properly reflected in the transaction receipt.
Rationale
The main motivation behind EIP-658 was to provide an unambiguous way to determine the success or failure of a transaction.
Before EIP-658, users had to rely on checking if a transaction had consumed all gas to guess if it had failed.
However, this was not reliable because of the introduction of the REVERT
opcode in EIP-140.
Moreover, although full nodes can replay transactions to get their return status, fast nodes can only do this for transactions after their pivot point, and light nodes cannot do it at all. This means that without EIP-658, it is impractical for a non-full node to reliably determine the status of a transaction.
EIP-658 addressed this problem by embedding the status code directly in the transaction receipt, making it easily accessible. This change was minimal and non-disruptive, while it significantly improved the clarity and usability of transaction receipts.
EIP-2929: Gas cost increases for state access opcodes
EIP-2929 proposes an increase in the gas costs for several opcodes when they're used for the first time in a transaction. The EIP was created to mitigate potential DDoS (Distributed Denial of Service) attacks by increasing the cost of potential attack vectors, and to make the stateless witness sizes in Ethereum more manageable.
EIP-2929 also introduces two sets, accessed_addresses
and accessed_storage_keys
, to track the addresses and storage slots that have been accessed within a transaction. This mitigates the additional gas cost for repeated operations on the same address or storage slot within a transaction, as any repeated operation on an already accessed address or storage slot will cost less gas.
In the context of this EIP, "cold" and "warm" (or "hot") refer to whether an address or storage slot has been accessed before during the execution of a transaction. If an address or storage slot is being accessed for the first time in a transaction, it is referred to as a "cold" access. If it has already been accessed within the same transaction, any subsequent access is referred to as "warm" or "hot".
-
Parameters: The EIP defines new parameters such as
COLD_SLOAD_COST
(2100 gas) for a "cold" storage read,COLD_ACCOUNT_ACCESS_COST
(2600 gas) for a "cold" account access, andWARM_STORAGE_READ_COST
(100 gas) for a "warm" storage read. -
Storage read changes: For
SLOAD
operation, if the (address, storage_key) pair is not yet inaccessed_storage_keys
,COLD_SLOAD_COST
gas is charged and the pair is added toaccessed_storage_keys
. If the pair is already inaccessed_storage_keys
,WARM_STORAGE_READ_COST
gas is charged. -
Account access changes: When an address is the target of certain opcodes (
EXTCODESIZE
,EXTCODECOPY
,EXTCODEHASH
,BALANCE
,CALL
,CALLCODE
,DELEGATECALL
,STATICCALL
), if the target is not inaccessed_addresses
,COLD_ACCOUNT_ACCESS_COST
gas is charged, and the address is added toaccessed_addresses
. Otherwise,WARM_STORAGE_READ_COST
gas is charged. -
SSTORE
changes: ForSSTORE
operation, if the (address, storage_key) pair is not inaccessed_storage_keys
, an additionalCOLD_SLOAD_COST
gas is charged, and the pair is added toaccessed_storage_keys
. -
SELFDESTRUCT
changes: If the recipient ofSELFDESTRUCT
is not inaccessed_addresses
, an additionalCOLD_ACCOUNT_ACCESS_COST
is charged, and the recipient is added to the set.
This methodology allows Ethereum to maintain an internal record of accessed accounts and storage slots within a transaction, making it possible to charge lower gas fees for repeated operations, thereby reducing the cost for such operations.
Rationale
-
Security: Previously, these opcodes were underpriced, making them susceptible to DoS attacks where an attacker sends transactions that access or call a large number of accounts. By increasing the gas costs, the EIP intends to mitigate these potential security risks.
-
Improving stateless witness sizes: Stateless Ethereum clients don't maintain the complete state of the blockchain, but instead rely on block "witnesses" (a list of all the accounts, storage, and contract code accessed during transaction execution) to validate transactions. This EIP helps in reducing the size of these witnesses, thereby making stateless Ethereum more viable.
Interpreter
The interpreter
crate is concerned with the execution of the EVM opcodes and serves as the event loop to step through the opcodes.
The interpreter is concerned with attributes like gas, contracts, memory, stack, and returning execution results.
It is structured as follows:
Modules:
- gas: Handles gas mechanics in the EVM, such as calculating gas costs for operations.
- host: Defines the EVM context
Host
trait. - interpreter_action: Contains data structures used in the EVM implementation.
- instruction_result: Defines results of instruction execution.
- instructions: Defines the EVM opcodes (i.e. instructions).
External Crates:
- alloc:
The
alloc
crate is used to provide the ability to allocate memory on the heap. It's a part of Rust's standard library that can be used in environments without a full host OS. - core:
The
core
crate is the dependency-free foundation of the Rust standard library. It includes fundamental types, macros, and traits.
Re-exports:
- Several types and functions are re-exported for easier access by users of this library, such as
Gas
,Host
,InstructionResult
,OpCode
,Interpreter
,Memory
,Stack
, and others. This allows users to import these items directly from the library root instead of from their individual modules. revm_primitives
: This crate is re-exported, providing primitive types or functionality used in the EVM implementation.
The gas.rs
Module
The gas.rs
module in this Rust EVM implementation manages the concept of "gas" within the Ethereum network. In Ethereum, "gas" signifies the computational effort needed to execute operations, whether a simple transfer of ether or the execution of a smart contract function. Each operation carries a gas cost, and transactions must specify the maximum amount of gas they are willing to consume.
Data Structures
-
Gas
StructThe
Gas
struct represents the gas state for a particular operation or transaction. The struct is defined as follows:Fields in
Gas
Structlimit
: The maximum amount of gas allowed for the operation or transaction.all_used_gas
: The total gas used, inclusive of memory expansion costs.used
: The gas used, excluding memory expansion costs.memory
: The gas used for memory expansion.refunded
: The gas refunded. Certain operations in Ethereum allow for gas refunds, up to half the gas used by a transaction.
Methods of the Gas
Struct
The Gas
struct also includes several methods to manage the gas state. Here's a brief summary of their functions:
new
: Creates a newGas
instance with a specified gas limit and zero usage and refunds.limit
,memory
,refunded
,spend
,remaining
: These getters return the current state of the corresponding field.erase_cost
: Decreases the gas usage by a specified amount.record_refund
: Increases the refunded gas by a specified amount.record_cost
: Increases the used gas by a specified amount. It also checks for gas limit overflow. If the new total used gas would exceed the gas limit, it returnsfalse
and doesn't change the state.record_memory
: This method works similarly torecord_cost
, but specifically for memory expansion gas. It only updates the state if the new memory gas usage is greater than the current usage.gas_refund
: Increases the refunded gas by a specified amount.
EVM Memory
Is a memory localized to the current Interpreter context. Interpreter context is a call or create frame. It is used by opcodes to store or format data that are more than 32 bytes long, for example calls to format input, return output or for logs data. Revm has a shared memory between all the Interpreters but Interpreter loop only see the part it is allocated to it.
Extending memory is paid by the gas. It consumes 3 gas per word plus square of the number of words added divided by 512
(3*N+ N^2/512
). There is no limit on the size of the memory, but it is limited by logarithmic growth of the gas cost. For 30M there is a calculated max memory of 32MB (by Remco in 2022).
Opcodes
Here is a list of all opcodes that are reading or writing to the memory. All read on memory can still change the memory size by extending it with zeroes. Call opcodes are specific as they read input before the call but also write their output after the call (if call is okay and there is an output to write) to the memory.
These opcodes read from the memory:
- RETURN
- REVERT
- LOG
- KECCAK256
- CREATE
- CREATE2
- CALL
- CALLCODE
- DELEGATECALL
- STATICCALL
These opcodes change the memory:
- EXTCODECOPY
- MLOAD
- MSTORE
- MSTORE8
- MCOPY
- CODECOPY
- CALLDATACOPY
- RETURNDATACOPY
- CALL
- CALLCODE
- DELEGATECALL
- STATICCALL
The host.rs
Module
The host.rs
module in this Rust EVM implementation defines a crucial trait Host
. The Host
trait outlines an interface for the interaction of the EVM interpreter with its environment (or "host"), encompassing essential operations such as account and storage access, creating logs, and invoking transactions.
The Evm
struct implements this Host
trait.
Trait Methods
-
env
: This method provides access to the EVM environment, including information about the current block and transaction. -
load_account
: Retrieves information about a given Ethereum account. -
block_hash
: Retrieves the block hash for a given block number. -
balance
,code
,code_hash
,sload
: These methods retrieve specific information (balance, code, code hash, and specific storage value) for a given Ethereum account. -
sstore
: This method sets the value of a specific storage slot in a given Ethereum account. -
log
: Creates a log entry with the specified address, topics, and data. Log entries are used by smart contracts to emit events. -
selfdestruct
: Marks an Ethereum account to be self-destructed, transferring its funds to a target account.
The Host
trait provides a standard interface that any host environment for the EVM must implement. This abstraction allows the EVM code to interact with the state of the Ethereum network in a generic way, thereby enhancing modularity and interoperability. Different implementations of the Host
trait can be used to simulate different environments for testing or for connecting to different Ethereum-like networks.
The interpreter_action.rs
Module in the Rust Ethereum Virtual Machine (EVM)
The interpreter_action.rs
module within this Rust EVM implementation encompasses a collection of data structures used as internal models within the EVM. These models represent various aspects of EVM operations such as call and create inputs, call context, value transfers, and the result of self-destruction operations.
Data Structures
-
CallInputs
StructThe
CallInputs
struct is used to encapsulate the inputs to a smart contract call in the EVM. This struct includes the target contract address, the value to be transferred (if any), the input data, the gas limit for the call, the call context, and a boolean indicating if the call is a static call (a read-only operation). -
CallScheme
EnumThe
CallScheme
enum represents the type of call being made to a smart contract. The different types of calls (CALL
,CALLCODE
,DELEGATECALL
,STATICCALL
) represent different modes of interaction with a smart contract, each with its own semantics concerning the treatment of the message sender, value transfer, and the context in which the called code executes. -
CallValue
EnumThe
CallValue
Enum represents a value transfer between two accounts. -
CallOutcome
Represents the outcome of a call operation in a virtual machine. This struct encapsulates the result of executing an instruction by an interpreter, including the result itself, gas usage information, and the memory offset where output data is stored.
-
CreateInputs
StructThe
CreateInputs
struct encapsulates the inputs for creating a new smart contract. This includes the address of the creator, the creation scheme, the value to be transferred, the initialization code for the new contract, and the gas limit for the creation operation. -
CreateOutcome
StructRepresents the outcome of a create operation in an interpreter. This struct holds the result of the operation along with an optional address. It provides methods to determine the next action based on the result of the operation.
-
EOFCreateInput
StructInputs for EOF create call.
-
EOFCreateOutcome
StructRepresents the outcome of a create operation in an interpreter.
In summary, the interpreter_action.rs
module provides several crucial data structures that facilitate the representation and handling of various EVM operations and their associated data within this Rust EVM implementation.
The instruction_result.rs
Module
The instruction_result.rs
module of this Rust EVM implementation includes the definitions of the InstructionResult
and SuccessOrHalt
enum, which represent the possible outcomes of EVM instruction execution, and functions to work with these types.
-
InstructionResult
EnumThe
InstructionResult
enum categorizes the different types of results that can arise from executing an EVM instruction. This enumeration uses the#[repr(u8)]
attribute, meaning its variants have an explicit storage representation of an 8-bit unsigned integer. The different instruction results represent outcomes such as successful continuation, stop, return, self-destruction, reversion, deep call, out of funds, out of gas, and various error conditions. -
SuccessOrHalt
EnumThe
SuccessOrHalt
enum represents the outcome of a transaction execution, distinguishing successful operations, reversion, halting conditions, fatal external errors, and internal continuation. It also provides several methods to check the kind of result and to extract the value of the successful evaluation or halt. -
From<InstructionResult> for SuccessOrHalt
ImplementationThis implementation provides a way to convert an
InstructionResult
into aSuccessOrHalt
. It maps each instruction result to the correspondingSuccessOrHalt
variant. -
Macros for returning instruction results
The module provides two macros,
return_ok!
andreturn_revert!
, which simplify returning some common sets of instruction results.
The instruction.rs
Module in the Rust Ethereum Virtual Machine (EVM)
The instruction.rs
module defines interpretation mappings for EVM bytecode. It provides the definition of the Instruction
struct, as well as the Opcode
enumeration and the execute
function, which runs a specific instruction.
Opcode
Enum
The Opcode
enum represents the opcodes that are available in the Ethereum Virtual Machine. Each variant corresponds to an operation that can be performed, such as addition, multiplication, subtraction, jumps, and memory operations.
Instruction
Struct
The Instruction
struct represents a single instruction in the EVM. It contains the opcode, which is the operation to be performed, and a list of bytes representing the operands for the instruction.
step
Function
The step
function interprets an instruction. It uses the opcode to determine what operation to perform and then performs the operation using the operands in the instruction.
Primitives
This crate is a core component of the revm system.
It is designed to provide definitions for a range of types and structures commonly used throughout the application.
It is set up to be compatible with environments that do not include Rust's standard library, as indicated by the no_std
attribute.
Modules:
- bytecode: This module provides functionality related to EVM bytecode.
- constants: This module contains constant values used throughout the EVM implementation.
- db: This module contains data structures and functions related to the EVM's database implementation.
- env: This module contains types and functions related to the EVM's environment, including block headers, and environment values.
- precompile: This module contains types related to Ethereum's precompiled contracts.
- result: This module provides types for representing execution results and errors in the EVM.
- specification: This module defines types related to Ethereum specifications (also known as hard forks).
- state: This module provides types and functions for managing Ethereum state, including accounts and storage.
- utilities: This module provides utility functions used in multiple places across the EVM implementation.
- kzg: This module provides types and functions related to KZG commitment, it is employed visibly in the Point Evaluation Precompile.
External Crates:
alloc
: The alloc crate provides types for heap allocation.bitvec
: The bitvec crate provides a data structure to handle sequences of bits.bytes
: The bytes crate provides utilities for working with bytes.hex
: The hex crate provides utilities for encoding and decoding hexadecimal.hex_literal
: The hex_literal crate provides a macro for including hexadecimal data directly in the source code.hashbrown
: The hashbrown crate provides high-performance hash map and hash set data structures.ruint
: The ruint crate provides types and functions for big unsigned integer arithmetic.c-kzg
: A minimal implementation of the Polynomial Commitments API for EIP-4844, written in C. (With rust bindings)
Re-exported Types:
Address
: A type representing a 160-bit (or 20-byte) array, typically used for Ethereum addresses.B256
: A type representing a 256-bit (or 32-byte) array, typically used for Ethereum hashes or integers.Bytes
: A type representing a sequence of bytes.U256
: A 256-bit unsigned integer type from theruint
crate.HashMap
andHashSet
: High-performance hash map and hash set data structures from the hashbrown crate.
Re-exported Modules:
All types, constants, and functions from the bytecode
, constants
, env
, precompile
, result
, specification
, state
, utilities
, KzgSettings
, EnvKzgSettings
, trusted_setup_points
types and methods were all re-exported, allowing users to import these items directly from the primitives
crate.
Database
Responsible for database operations. This module is where the blockchain's state persistence is managed.
The module defines three primary traits (Database
, DatabaseCommit
, and DatabaseRef
), a structure RefDBWrapper
, and their associated methods.
The Database
trait defines an interface for mutable interaction with the database. It has a generic associated type Error
to handle different kinds of errors that might occur during these interactions. It provides methods to retrieve basic account information (basic
), retrieve account code by its hash (code_by_hash
), retrieve the storage value of an address at a certain index (storage
), and retrieve the block hash for a certain block number (block_hash
).
The DatabaseCommit
trait defines a single commit
method for committing changes to the database. The changes are a map between Ethereum-like addresses (type Address
) and accounts.
The DatabaseRef
trait is similar to the Database
trait but is designed for read-only or immutable interactions. It has the same Error
associated type and the same set of methods as Database
, but these methods take &self
instead of &mut self
, indicating that they do not mutate the database.
The RefDBWrapper
structure is a wrapper around a reference to a DatabaseRef
type. It implements the Database
trait, essentially providing a way to treat a DatabaseRef
as a Database
by forwarding the Database
methods to the corresponding DatabaseRef
methods.
Result
At the core of this module is the ExecutionResult
enum, which describes the possible outcomes of an EVM execution: Success
, Revert
, and Halt
. Success
represents a successful transaction execution, and it holds important information such as the reason for success
(an Eval enum), the gas used, the gas refunded, a vector of logs (Vec<Log>
), and the output of the execution. This aligns with the stipulation in EIP-658 that introduces a status code in the receipt of a transaction, indicating whether the top-level call was successful or failed.
Revert
represents a transaction that was reverted by the REVERT
opcode without spending all of its gas. It stores the gas used and the output. Halt
represents a transaction that was reverted for various reasons and consumed all its gas. It stores the reason for halting (a Halt
enum) and the gas used.
The ExecutionResult
enum provides several methods to extract important data from an execution result, such as is_success()
, logs()
, output()
, into_output()
, into_logs()
, and gas_used()
. These methods facilitate accessing key details of a transaction execution.
The EVMError
and InvalidTransaction
enums handle different kinds of errors that can occur in an EVM, including database errors, errors specific to the transaction itself, and errors that occur due to issues with gas, among others.
The Output
enum handles different kinds of outputs of an EVM execution, including Call
and Create
. This is where the output data from a successful execution or a reverted transaction is stored.
Environment
A significant module that manages the execution environment of the EVM. The module contains objects and methods associated with processing transactions and blocks within such a blockchain environment. It defines several structures: Env
, BlockEnv
, TxEnv
, CfgEnv
, and CreateScheme
. These structures contain various fields representing the block data, transaction data, environmental configurations, transaction recipient details, and the method of contract creation respectively.
The Env
structure, which encapsulates the environment of the EVM, contains methods for calculating effective gas prices and for validating block and transaction data. It also checks transactions against the current state of the associated account, which is necessary to validate the transaction's nonce and the account balance. Various Ethereum Improvement Proposals (EIPs) are also considered in these validations, such as EIP-1559 for the base fee, EIP-3607 for rejecting transactions from senders with deployed code, and EIP-3298 for disabling gas refunds. The code is structured to include optional features and to allow for changes in the EVM specifications.
Specifications
Holds data related to Ethereum's technical specifications, serving as a reference point for Ethereum's rules and procedures obtained from the Ethereum execution specifications. The module is primarily used to enumerate and handle Ethereum's network upgrades or "hard forks" within the Ethereum Virtual Machine (EVM). These hard forks are referred to as SpecId
in the code, representing different phases of Ethereum's development.
The SpecId
enum assigns a unique numerical value and a unique string identifier to each Ethereum hard fork. These upgrades range from the earliest ones such as FRONTIER
and HOMESTEAD
, through to the most recent ones, including LONDON
, MERGE
, SHANGHAI
, and LATEST
.
The code also includes conversion methods such as try_from_u8()
and from()
. The former attempts to create a SpecId
from a given u8 integer, while the latter creates a SpecId
based on a string representing the name of the hard fork.
The enabled()
method in SpecId
is used to check if one spec is enabled on another, considering the order in which the hard forks were enacted.
The Spec
trait is used to abstract the process of checking whether a given spec is enabled. It only has one method, enabled()
, and a constant SPEC_ID
.
The module then defines various Spec
structs, each representing a different hard fork. These structs implement the Spec
trait and each struct's SPEC_ID
corresponds to the correct SpecId
variant.
This module provides the necessary framework to handle and interact with the different Ethereum hard forks within the EVM, making it possible to handle transactions and contracts differently depending on which hard fork rules apply. It also simplifies the process of adapting to future hard forks by creating a new SpecId
and corresponding Spec
struct.
Bytecode
This module defines structures and methods to manipulate Ethereum bytecode and manage its state. It's built around three main components: JumpTable
, BytecodeState
, and Bytecode
.
The JumpTable
structure stores a map of valid jump
destinations within a given Ethereum bytecode sequence. It is essentially an Arc
(Atomic Reference Counter) wrapping a BitVec
(bit vector), which can be accessed and modified using the defined methods, such as as_slice()
, from_slice()
, and is_valid()
.
The BytecodeState
is an enumeration, capturing the three possible states of the bytecode: Raw
, Checked
, and Analysed
. In the Checked
and Analysed
states, additional data is provided, such as the length of the bytecode and, in the Analysed
state, a JumpTable
.
The Bytecode
struct holds the actual bytecode, its hash, and its current state (BytecodeState
). It provides several methods to interact with the bytecode, such as getting the length of the bytecode, checking if it's empty, retrieving its state, and converting the bytecode to a checked state. It also provides methods to create new instances of the Bytecode
struct in different states.
Constants
Holds constant values used throughout the system. This module defines important constants that help limit and manage resources in the Ethereum Virtual Machine (EVM). The constants include STACK_LIMIT
and CALL_STACK_LIMIT
, which restrict the size of the interpreter stack and the EVM call stack, respectively. Both are set to 1024.
The module also defines MAX_CODE_SIZE
, which is set according to EIP-170's specification. EIP-170 imposes a maximum limit on the contract code size to mitigate potential vulnerabilities and inefficiencies in Ethereum. Without this cap, the act of calling a contract can trigger costly operations that scale with the size of the contract's code. These operations include reading the code from disk, preprocessing the code for VM execution, and adding data to the block's proof-of-validity. By implementing MAX_CODE_SIZE
(set to 0x6000
or ~25kb), the EVM ensures that the cost of these operations remains manageable, even under high gas levels that could be encountered in the future. EIP-170's implementation thus offers crucial protection against potential DoS attacks and maintains efficiency, especially for future light clients verifying proofs of validity or invalidity.
Another constant defined here is MAX_INITCODE_SIZE
, set in accordance with EIP-3860. EIP-3860 extends EIP-170 by introducing a maximum size limit for initialization code (initcode) and enforcing a gas charge for every 32-byte chunk of initcode, to account for the cost of jump destination analysis. Before EIP-3860, initcode analysis during contract creation wasn't metered, nor was there an upper limit for its size, resulting in potential inefficiencies and vulnerabilities. By setting MAX_INITCODE_SIZE
to 2 * MAX_CODE_SIZE
and introducing the said gas charge, EIP-3860 ensures that the cost of initcode analysis scales proportionately with its size. This constant, therefore, facilitates fair charging, simplifies EVM engines by setting explicit limits, and helps to create an extendable cost system for the future.
precompile
This module implements precompiled contracts in the EVM, adding a layer of pre-set functionalities. These are documented in more detail in the next section. The module defines the types and the enum that are used to handle precompiled contracts.
PrecompileResult
: This is a type alias for a Result
type. The Ok
variant of this type contains a tuple (u64
, Vec<u8>
), where the u64
integer likely represents gas used by the precompiled contract, and the Vec<u8>
holds the output data. The Err variant contains a PrecompileError.
StandardPrecompileFn
and CustomPrecompileFn
: These are type aliases for function pointers. Both functions take a byte slice and a u64
(probably the available gas) as arguments and return a PrecompileResult
. The naming suggests that the former refers to built-in precompiled contracts, while the latter may refer to custom, user-defined contracts.
PrecompileError
: This is an enumeration (enum) which describes the different types of errors that could occur while executing a precompiled contract. The listed variants suggest these errors are related to gas consumption, Blake2
hash function, modular exponentiation ("Modexp
"), and Bn128
, which is a specific elliptic curve used in cryptography.
State
Manages the EVM's state, including account balances, contract storage, and more.
This module models an Ethereum account and its state, which includes balance, nonce, code, storage, and status flags. The module also includes methods for interacting with the account's state.
The Account
struct includes fields for info (of type AccountInfo
), storage (a HashMap
mapping a U256
value to a StorageSlot
), and status (of type AccountStatus
). AccountInfo
represents the basic information about an Ethereum account, including its balance (balance
), nonce (nonce
), code (code
), and a hash of its code (code_hash
).
The AccountStatus
is a set of bitflags, representing the state of the account. The flags include Loaded
, Created
, SelfDestructed
, Touched
, and LoadedAsNotExisting
. The different methods provided within the Account
struct allow for manipulating these statuses.
The StorageSlot
struct represents a storage slot in the Ethereum Virtual Machine. It holds an original_value
and a present_value
and includes methods for creating a new slot and checking if the slot's value has been modified.
Two HashMap
type aliases are created: State
and Storage
. State
maps from a Address
address to an Account
and Storage
maps from a U256
key to a StorageSlot
.
The module includes a series of methods implemented for Account
to manipulate and query the account's status. These include methods like mark_selfdestruct
, unmark_selfdestruct
, is_selfdestructed
, mark_touch
, unmark_touch
, is_touched
, mark_created
, is_newly_created
, is_empty
, and new_not_existing
.
Utilities
This Rust module provides utility functions and constants for handling Keccak hashing (used in Ethereum) and creating Ethereum addresses via legacy and CREATE2
methods. It also includes serialization and deserialization methods for hexadecimal strings representing byte arrays.
The KECCAK_EMPTY
constant represents the Keccak-256 hash of an empty input.
The keccak256
function takes a byte slice input and returns its Keccak-256 hash as a B256
value.
KZG
With the introduction of EIP4844, this use of blobs for a more efficient short term storage is employed, the validity of this blob stored in the consensus layer is verified using the Point Evaluation
pre-compile, a fancy way of verifying that and evaluation at a given point of a committed polynomial is valid, in a much more bigger scale, implies that Data is Available
.
This module houses;
-
KzgSettings
: Stores the setup and parameters needed for computing and verify KZG proofs.The
KZG
primitive provides a defaultKZGSettings
obtained from this trusted setup ceremony, a provision is also made for using a customKZGSettings
if need be, this is available in theenv.cfg
. -
trusted_setup_points
: This module contains functions and types used for parsing and utilizing the Trusted Setup for theKzgSettings
.
Precompile
The precompile
crate contains the implementation of the Ethereum precompile opcodes in the EVM.
Precompiles are a shortcut to execute a function implemented by the EVM itself, rather than an actual contract.
Precompiled contracts are essentially predefined smart contracts on Ethereum, residing at hardcoded addresses and used for computationally heavy operations that are cheaper when implemented this way.
There are 6 precompiles implemented in REVM, and they are: blake2
, bn128
curve, identity
, secp256k1
, modexp
, and sha256
and ripemd160
hash functions.
Modules:
- blake2: Implements the
BLAKE2
compression function, as specified in EIP-152. - bn128: Implements precompiled contracts for addition, scalar multiplication, and optimal ate pairing check on the
alt_bn128
elliptic curve. - hash: Implements the
SHA256
andRIPEMD160
hash functions. - identity: Implements the
Identity
precompile, which returns the input data unchanged. - point_evaluation: Implements the point evaluation precompile for EIP-4844.
- modexp: Implements the big integer modular exponentiation precompile.
- secp256k1: Implements the ECDSA public key recovery precompile, based on
secp256k1
curves.
Types and Constants:
Address
: A type alias for an array of 20 bytes. This is typically used to represent Ethereum addresses.B256
: A type alias for an array of 32 bytes, typically used to represent 256-bit hashes or integer values in Ethereum.PrecompileOutput
: Represents the output of a precompiled contract execution, including the gas cost, output data, and any logs generated.Log
: Represents an Ethereum log, with an address, a list of topics, and associated data.Precompiles
: A collection of precompiled contracts available in a particular hard fork of Ethereum.Precompile
: Represents a precompiled contract, which can either be a standard Ethereum precompile, or a custom precompile.PrecompileWithAddress
: Associates a precompiled contract with its address.SpecId
: An enumeration representing different hard fork specifications in Ethereum, such as Homestead, Byzantium, Istanbul, Berlin, and Latest.
Functions:
calc_linear_cost_u32
: A utility function to calculate the gas cost for certain precompiles based on their input length.u64_to_b160
: A utility function for converting a 64-bit unsigned integer into a 20-byte Ethereum address.
External Crates:
- alloc: The alloc crate provides types for heap allocation, and is used here for the
Vec
type. - core: The core crate provides fundamental Rust types, macros, and traits, and is used here for
fmt::Result
.
Re-exported Crates and Types:
revm_primitives
: This crate is re-exported, indicating it provides some types used by the precompile crate.primitives
: Types from theprimitives
module ofrevm_primitives
are re-exported, includingBytes
,HashMap
, and all types underprecompile
. The latter includes thePrecompileError
type, which is aliased toError
.
Re-exported Functionality:
Precompiles
provides a static method for each Ethereum hard fork specification (e.g.,homestead
,byzantium
,istanbul
,berlin
,cancun
, andlatest
), each returning a set of precompiles for that specification.Precompiles
also provides methods to retrieve the list of precompile addresses (addresses
), to check if a given address is a precompile (contains
), to get the precompile at a given address (get
), to check if there are no precompiles (is_empty
), and to get the number of precompiles (len
).
blake2 hash
This module represents a Rust implementation of the Blake2b
cryptographic hash function, a vital component of Ethereum's broader EIP-152 proposal. The primary purpose of this module is to integrate the Blake2b
function into Ethereum's precompiled contract mechanism, providing a consistent and efficient way to perform the cryptographic hashing that underpins Ethereum's functionality.
In EIP-152 introduced a new precompiled contract that implements the BLAKE2
cryptographic hashing algorithm's compression function. The purpose of this is to enhance the interoperability between Ethereum and Zcash, as well as to introduce more versatile cryptographic hash primitives to the Ethereum Virtual Machine (EVM).
BLAKE2 is not just a powerful cryptographic hash function and SHA3 contender, but it also allows for the efficient validation of the Equihash Proof of Work (PoW) used in Zcash. This could make a Bitcoin Relay-style Simplified Payment Verification (SPV) client feasible on Ethereum, as it enables the verification of Zcash block headers without excessive computational cost. BLAKE2b
, a common 64-bit BLAKE2
variant, is highly optimized and performs faster than MD5 on modern processors.
The rationale behind incorporating Blake2b
into Ethereum's suite of precompiled contracts is multifaceted:
- Performance: The
Blake2b
hash function offers excellent performance, particularly when processing large inputs. - Security:
Blake2b
also provides a high degree of security, making it a suitable choice for cryptographic operations. - Interoperability: This function is widely used in various parts of the ecosystem, making it a prime candidate for inclusion in Ethereum's precompiled contracts.
- Gas Cost: The gas cost per round (F_ROUND) is specified as 1. This number was decided considering the computational complexity and the necessity to keep the blockchain efficient and prevent spamming.
Core Components
Two primary constants provide the framework for the precompiled contract:
F_ROUND: u64
: This is the cost of each round of computation in gas units. Currently set to 1.
INPUT_LENGTH: usize
: This specifies the required length of the input data, 213 bytes in this case.
Precompile Function - run
The run
function is the main entry point for the precompiled contract. It consumes an input byte slice and a gas limit, returning a PrecompileResult
. This function handles input validation, gas cost computation, data manipulation, and the compression algorithm.
It checks for correct input length and reads the final block
flag. It then calculates the gas cost based on the number of rounds to be executed. If the gas cost exceeds the provided gas limit, it immediately returns an error.
Once the validation and gas cost computation are complete, it parses the input into three components: state vector h
, message block
vector m
, and offset counter t
.
Following this, it calls the compress
function from the algo module, passing in the parsed input data and the final block
flag.
Finally, it constructs and returns the PrecompileResult
containing the gas used and the output data.
Algorithm Module - algo
The algo module encapsulates the technical implementation of the Blake2b
hash function. It includes several key elements:
Constants:
-
SIGMA
: This 2D array represents the message word selection permutation used in each round of the algorithm. -
IV
: These are the initialization vectors for theBlake2b
algorithm. -
The
g
Function: This is the core function within each round of theBlake2b
algorithm. It manipulates the state vector and mixes in the message data. -
The
compress
Function: This is the main function that executes the rounds of theg
function, handles the lastblock
flag, and updates the state vector with the output of each round.
bn128 curve
EIP-197 proposed the addition of precompiled contracts for a pairing function on a specific pairing-friendly elliptic curve. This complements EIP-196 in enabling zkSNARKs verification within Ethereum smart contracts. zkSNARKs (Zero-Knowledge Succinct Non-Interactive Argument of Knowledge) technology can enhance privacy for Ethereum users due to its Zero-Knowledge property. Moreover, it may offer a scalability solution because of its succinctness and efficient verifiability property.
Prior to this EIP, Ethereum's smart contract executions were fully transparent, limiting their use in cases involving private information, such as location, identity, or transaction history. While the Ethereum Virtual Machine (EVM) can theoretically use zkSNARKs, their implementation was presently too costly to fit within the block gas limit. EIP-197 defines specific parameters for basic primitives that facilitate zkSNARKs. This allows for more efficient implementation, thereby reducing gas costs.
Notably, setting these parameters doesn't restrict zkSNARKs' use-cases but actually enables the integration of zkSNARK research advancements without requiring further hard forks. Pairing functions, which enable a limited form of multiplicatively homomorphic operations necessary for zkSNARKs, could then be executed within the block gas limit through this precompiled contract.
The code consists of three modules: add
, mul
, and pair
. The add and mul
modules implement elliptic curve point addition and scalar multiplication respectively on the bn128 curve, an elliptic curve utilized within Ethereum. Each module defines two versions of the contract, one for the Istanbul and another for the Byzantium Ethereum network upgrades.
The pair module conducts the pairing check, an operation that enables comparison of two points on the elliptic curve, an essential part of many zero-knowledge proof systems, including zk-SNARKs. Again, two versions for Istanbul and Byzantium are defined. The run_add
, run_mul
, and run_pair functions embody the main implementations of the precompiled contracts, with each function accepting an input byte array, executing the appropriate elliptic curve operations, and outputting the results as a byte array.
The code ensures the allocation of sufficient gas for each operation by stipulating gas costs as constants at the start of each module. It employs the bn library to carry out the actual bn128 operations. As the functions operate with byte arrays, the code features significant byte manipulation and conversion. Consequently, the code presents an implementation of specific elliptic curve operations utilized in Ethereum.
SHA256 and RIPEMD160
REVM includes precompiled contracts for SHA256
and RIPEMD160
, cryptographic hashing functions integral for data integrity and security. The addresses for these precompiled contracts are 0x0000000000000000000000000000000000000002
for SHA256
and 0x0000000000000000000000000000000000000003
for RIPEMD160
.
Each function (sha256_run
and ripemd160_run
) accepts two arguments, the input data to be hashed and the gas_limit representing the maximum amount of computational work permissible for the function. They both calculate the gas cost of the operation based on the input data length. If the computed cost surpasses the gas_limit
, an Error::OutOfGas
is triggered.
The sha256_run
function, corresponding to the SHA256
precompiled contract, computes the SHA256
hash of the input data. The ripemd160_run
function computes the RIPEMD160
hash of the input and pads it to match Ethereum's 256-bit word size. These precompiled contracts offer a computationally efficient way for Ethereum contracts to perform necessary cryptographic operations.
Identity function
This precompiled contract performs the identity function. In mathematics, an identity function is a function that always returns the same value as its argument. In this context, the contract takes the input data and returns it as is. This precompiled contract resides at the hardcoded Ethereum address 0x0000000000000000000000000000000000000004
.
The identity_run
function takes two arguments: input data, which it returns unaltered, and gas_limit
which defines the maximum computational work the function is allowed to do. A linear gas cost calculation based on the size of the input data and two constants, IDENTITY_BASE
(the base cost of the operation) and IDENTITY_PER_WORD
(the cost per word), is performed. If the calculated gas cost exceeds the gas_limit
, an Error::OutOfGas
is returned.
This identity function can be useful in various scenarios such as forwarding data or acting as a data validation check within a contract. Despite its simplicity, it contributes to the flexibility and broad utility of the Ethereum platform.
Modular Exponentiation
REVM also implements two versions of a precompiled contract (Modular Exponential operation), each corresponding to different Ethereum hard forks: Byzantium and Berlin. The contract addresses are 0x0000000000000000000000000000000000000005
for both versions, as they replaced each other in subsequent network upgrades. This operation is used for cryptographic computations and is a crucial part of Ethereum's toolkit.
The byzantium_run and berlin_run functions each run the modular exponential operation using the run_inner
function, but each uses a different gas calculation method: byzantium_gas_calc
for Byzantium and berlin_gas_calc
for Berlin. The gas calculation method used is chosen based on the Ethereum network's current version. The run_inner
function is a core function that reads the inputs and performs the modular exponential operation. If the calculated gas cost is higher than the gas limit, an error Error::OutOfGas
is returned. If all computations are successful, the function returns the result of the operation and the gas cost.
The calculate_iteration_count function calculates the number of iterations required to compute the operation, based on the length and value of the exponent. The read_u64_with_overflow
macro reads input data and checks for potential overflows.
The byzantium_gas_calc function calculates the gas cost for the modular exponential operation as defined in the Byzantium version of the Ethereum protocol. The berlin_gas_calc
function calculates the gas cost according to the Berlin version, as defined in EIP-2565. These two versions have different formulas to calculate the gas cost of the operation, reflecting the evolution of the Ethereum network.
Secp256k1
This implements Ethereum's precompiled contract ECRECOVER
, an elliptic curve digital signature algorithm (ECDSA) recovery function that recovers the Ethereum address (public key hash) associated with a given signature. The implementation features two versions, each contingent on whether the secp256k1 cryptographic library is enabled, which depends on the build configuration.
Both versions define a secp256k1
module that includes an ecrecover
function. This function takes a digital signature and a message as input, both represented as byte arrays, and returns the recovered Ethereum address. It performs this operation by using the signature to recover the original public key used for signing, then hashing this public key with Keccak256
, Ethereum's chosen hash function. The hash is then truncated to match Ethereum's 20-byte address size.
When secp256k1
is not enabled, the ecrecover function uses the k256
library to parse the signature, recover the public key, and perform the hashing. When secp256k1
is enabled, the function uses the secp256k1
library for these operations. Although both versions perform the same fundamental operation, they use different cryptographic libraries, which can offer different optimizations and security properties.
The ec_recover_run
function is the primary entry point for this precompiled contract. It parses the input to extract the message and signature, checks if enough gas is provided for execution, and calls the appropriate ecrecover function. The result of the recovery operation is returned as a PrecompileResult
, a type that represents the outcome of a precompiled contract execution in Ethereum.
Point Evaluation Precompile
This precompile is introduced in EIP-4844 and is used to verify KZG commitments of blobs. The precompile allows for efficient verification of commitments to blob transactions. Blob transactions contain a large amount of data that cannot be accessed by EVM execution, but has a commitment that can be accessed and verified. The EIP is designed to be forward compatible with Danksharding architecture while giving L2s access to cheaper L1 commitments. This precompiled contract resides at the hardcoded Ethereum address 0x000000000000000000000000000000000000000A
.
A useful resource is the Python reference implementation for the precompile, which can be found here. The implementation in REVM uses c-kzg-4844, via its foreign function interface bindings, from the Ethereum Foundation.