Skip to content

The multi-tab website fingerprinting attack library contains real-world multi-tab Tor traffic and code for multi-tab website fingerprinting attacks.

Notifications You must be signed in to change notification settings

Xinhao-Deng/Multitab-WF-Datasets

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Robust Multi-tab Website Fingerprinting Attacks in the Wild



This repository contains the source code and datasets for our paper "Robust Multi-tab Website Fingerprinting Attacks in the Wild" (Published in IEEE S&P 2023).

If you want to cite the library, you can use our paper.

@INPROCEEDINGS {multitab-wf-datasets,
author = {X. Deng and Q. Yin and Z. Liu and X. Zhao and Q. Li and M. Xu and K. Xu and J. Wu},
booktitle = {2023 IEEE Symposium on Security and Privacy (SP)},
title = {Robust Multi-tab Website Fingerprinting Attacks in the Wild},
year = {2023},
}

Prerequisites

We prototype attacks using Pytorch 2.0.1 and Python 3.8. For convenience, we recommend running the following command.

conda create --name <env> --file requirements.txt

Datasets

We collect our Tor browsing datasets under the real multi-tab scenario. You can download the dataset via the link.

You can load the dataset using numpy.

import numpy as np

inpath = "example.npz"
data = np.load(inpath)
dir_array = data["direction"]  # Sequence of packet direction
time_array = data["time"] # Sequence of packet timestamps
label = data["label"]  # labels

Note that we improved the quality of the datasets after publication. Specifically, we preserved screenshots of the websites after loading by using xvfbwrapper during traffic collection. We built an image classification model based on ResNet, capable of filtering out the website traffic that failed to load using the screenshots.

Usage

Prepare Data

  • Download datasets and place it in the folder ./datasets

  • Divide the dataset into training, validation, and test sets. For example, for the 2-tab dataset collected in the closed-world, you can execute the following command.

python scripts/dataset_split.py -i datasets/closed_2tab.npz -o datasets/processed/closed_2tab

Training

We take the training of ARES on a 2-tab dataset in the closed-world as an example.

python train.py -d closed-2tab -g 0 -l ARES

Training separate models for each website is costly. We use torch.nn.MultiLabelSoftMarginLoss to achieve a similar effect. This loss function creates a criterion that optimizes a multi-label one-versus-all loss based on max-entropy. In our study, compared to the original ARES, using this approximate calculation results in a 1-2% performance loss.

Specifically, you can use TensorBoard to visualize the training process.

tensorboard --logdir=runs



Note that, benefiting from the Transformer architecture, ARES's performance gradually improves with an increase in epochs, even experiencing slight improvements beyond 500 epochs.

Evaluation

We take the evaluation of ARES on a 2-tab dataset in the closed-world as an example.

python eval.py -d closed_2tab -g 0 -m ARES

You can directly download the trained ARES parameter file (with the random seed set to 1018) link.

Contact

If you have any questions or suggestions, feel free to contact:

About

The multi-tab website fingerprinting attack library contains real-world multi-tab Tor traffic and code for multi-tab website fingerprinting attacks.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages