Skip to content

EVM-based chain vulnerability analysis framework

License

Notifications You must be signed in to change notification settings

mchara01/centaur

Repository files navigation

Centaur: An EVM-Based Blockchain Vulnerability Analysis Framework

Analysis Tools ETH Smart Contracts BSC Smart Contracts Repository Size

A study on two EVM-based blockchains, namely Ethereum (ETH) and Binance Smart Chain (BSC). It explores the existence of vulnerabilities in deployed smart contracts in these two chains using smart contract automated analysis tools for EVM bytecode. Our codebase artifact is encapsulated into the Centaur framework. The framework also extends SmartBugs for analysing the dataset of smart contract bytecodes using multiple analysis tools and it is easily extendable to support other EVM chains.

Table of Contents

  1. Prerequisites
  2. Installation
  3. Step-by-Step Analysis Procedure
    1. Database Creation
    2. Data Collection
  4. Running the SmartBugs Framework
  5. Parsing the Analysis Tools Results
  6. Centaur Usage
  7. Analysis Tools
  8. Vulnerability Categorisation
  9. Using the SQLite3 Database
  10. Experiment Setup
  11. License

Prerequisites

Before you begin, ensure you have met the following requirements:

  • You have installed all the required Python and Shell dependencies with:
    pip install -r requirements.txt and
    apt-get install -y cowsay figlet
  • You are using Python >= 3.8 and Golang == 1.17
  • You have created an account on Etherscan and BscScan and generated an API key on both
  • You have installed Docker and docker-compose
  • You are using a UNIX-like OS

Installation

Note: We recommend using Centaur via its Docker image (Option 2) as it encapsulates all the required dependencies and allows running the framework without needing to install anything on your system.

Option 1: Once all the above prerequisites are met, you can clone this repository with:

git clone https://github.com/mchara01/centaur.git

Option 2: Use our Docker image

docker pull mchara01/centaur:1.0

Once installed, the Centaur CLI framework will be available for usage.

Step-by-Step Analysis Procedure

The following sections constituted the steps required to replicate the process of analysing the EVM bytecode of smart contracts.

Database Creation

  • Prepare the local MariaDB database running over a Docker container;
    • Create the files db_password.txt and db_root_password.txt containing the passwords for a normal user and root respectively.
    • Then, start the container using:
      docker-compose -f build/database/docker-compose.yaml up -d
    • After, create the two tables in the database where the collected data will be inserted with:
      docker exec -it <CONTAINER_ID> mysql -u root -p'<ROOT_PASSWORD>' -P 3306 -h 127.0.0.1 < scripts/database/schema.sql

Data Collection

  • Perform random sampling on the blocks of the desired EVM chain (ETH, BNB). Block numbers generated are stored in a file for the crawler to read from. Sampling size and output location are passed as arguments:
    python scripts/utils/blockNumberGenerator.py --size 1000 --chain eth --output blockNumbersEth.txt

  • Run the blockchain crawling script that connects to the Ethereum and BSC archive nodes (their IP and ports are declared as constants in the scripts) and extracts the contract addresses and bytecodes from the transactions of the blocks provided. Client (eth, bsc), input file and whether to use the tracer or not are provided as arguments:
    go mod tidy
    go run go-src/*.go --client eth --input data/block_samples/<latest_date>/blockNumbersEth.txt --tracer
    To check only the connection to the archive node and the local database execute:
    go run go-src/*.go --client eth --check

  • Crawl Etherscan or BscScan to gather any other missing data for given smart contract addresses. An API key must be provided for this script to work:
    python scripts/crawl/mainCrawl.py --chain eth --apikey <ENTER_API_KEY_HERE> --output data/logs/results_eth.json --invalid data/logs/exceptions_eth.json

  • At this point, the database is populated with all the required data. If you wish to perform a backup of the database, execute the following command: (mysqldump needs to be installed first )
    bash scripts/database/backup/db_backup.sh
    If you need restore the backup use:
    bash scripts/database/backup/db_restore.sh
    Before using the above two scripts, make sure first you change the DB_BACKUP_PATH variable to match the locations on your local file system.

  • Extract the bytecodes from the database and write them in files on the file system. The smart contracts of the respective bytecodes that are selected must pass one of the following conditions:

    • a balance > 0 or
    • number of transactions > 0 or
    • number of token transfers > 0

    Execute the script that does this with:
    python scripts/utils/bytecodeToFileCreator.py --chain eth

Running the SmartBugs Framework

After finishing successfully with the above steps, we have everything we need ready to run the SmartBugs framework and execute the EVM bytecode analysis tools on the EVM bytecodes we have written on the local file system. We can do this using:

python smartbugs_bytecode/smartBugs.py --tool all --dataset eth_bc --bytecode

Please check the official repository of SmartBugs for more details on how to run the framework.

Note: Bear in mind that SmartBugs will execute 9 tools on every single contract from the corpus of contracts you will provide to it. Thus, this particular step may take a significant amount of time to complete (in our case it took approximately three days for 334 contracts). We recommend using a tool such as tmux that enable keeping a session alive for long periods of time even when logging out of the machine running the framework.

Parsing the Analysis Tools Results

Once SmartBugs has finished, a result.json is created for every contract at the smartbugs_bytecode/results/<TOOL_NAME> directory. To parse these results, we use parser.py file in the scripts directory. This file is used as the main point to execute every tool result parsers that reside in the scripts/result_parsing directory. To parse a tool's results use:
python3 scripts/parser.py -t <TOOL_OF_CHOICE> -d <RESULT_DIRECTORY>
You can replace the <TOOL_OF_CHOICE> placeholder with all if you want to parse the results of every tool and print their results on the screen. The amount of time taken to process all contracts by every tool can be found on the last line of results/logs/SmartBugs_<DATE>.log

Centaur Usage

As an attempt to make the above Step-by-Step Procedure easier, we created the Centaur framework which executes all the above steps automatically, printing relevant messages. The easiest way to run Centaur is with Docker. To do this, we must first make sure we have the respective image either by pulling it (see Option 2) or by building it with:

docker build --no-cache -t centaur:1.0 -f Dockerfile .

Then, we can run the Centaur script with:

docker run centaur ./run_main.sh <API_KEY>

Before running the above command, make sure you have added the desired values for the constants in the CONSTANTS DECLARATION section of the config file, as this file is sourced into the main script.

Analysis Tools

We have gathered information about plenty of smart contract security analysis tools but only a subset of these can be included in our study as we want these tools to fulfil some criteria. More specifically, we wanted tools that work on EVM bytecode (not source code only), are open source and can execute without the need of human interaction (e.g. no GUI). The list of tools that pass these requirements along with their open-source repository and paper link are the following:

Analysis Tool Paper
1 Conkas link
2 HoneyBadger link
3 MadMax link
4 Maian link
5 Mythril link
6 Osiris link
7 Oyente link
8 Securify link
9 Vandal link

Vulnerability Categorisation

For categorising the vulnerabilities found by the smart contract analysis tools, we extended the DASP10 taxonomy to replace category Unknown Unknowns (10), which includes any vulnerabilities that do not fall in any other category, with each one of this uncategorised vulnerabilities. Category Short Address Attack (9) is not discovered by any of the tools that where used in this study.
To enrich the vulnerability categorisation, we have used also the Smart Contract Weakness Classification (SWC Registry) to map found vulnerabilities to an SWC id, which can help users learn more (e.g. description, remediation, code examples) about a specific vulnerability.

Using the SQLite3 Database

For convenience, we migrated the data to an sqlite db file located in the database directory. The schema of the database can be seen in the schema.pdf file. You can run the database file from the command line with:

sqlite3 analysis.db

We created also a script that automates the process of creating the sqlite db file from the smartbugs results. The file is located in scripts/database/create_db.py. For creating the db file, follow the steps below:

rm -rf database/analysis.db
sqlite3 -init database/schema.sql database/analysis.db .quit
rm -rf csvs
python scripts/database/create_db.py csvs \
        smartbugs_bytecode/results \
        build/database/03_Jul_2022/sqlite/run1.sqlite3 \
        build/database/02_Aug_2022/sqlite/run2.sqlite3
sqlite3 database/analysis.db < csvs/populate.sql

Experiment Setup

This artefact has been tested on a 64-bits 20.04 Ubuntu machine and an Apple M1 Mac mini 12.3.1, both with 8 cores and 16GB of RAM. However, our Docker image can be used on any machine that has Docker installed.

License

This project is licensed under the terms of the MIT license, which can be found at the LICENSE file. This license applies to the whole codebase except for the SmartBugs framework and .hex and .sol files found in the data directory, which are publicly available and retain their original licenses.