Skip to content

Network Traffic Identification with Convolutional Neural Networks

License

Notifications You must be signed in to change notification settings

akshitvjain/deeplearning-network-traffic

Repository files navigation

deeplearning-network-traffic

  • Network Traffic Identification with Convolutional Neural Networks - This project aims to implement a new payload-based method to identify network protocol/service using convolutional neural network.
  • The paper was published by IEEE and was presented at the 4th Intl Conf on Big Data Intelligence and Computing. Access the publication at https://ieeexplore.ieee.org/document/8512009

Network Traffic Dataset

For this study, network traffic was collected during the national CPTC held at RIT in November, 2017. From the collected traffic, 34,929 TCP flows were extracted. These flows contained 24 unique protocol labels, with a fairly unbalanced distribution. The dataset is curated by extracting payload bytes from TCP flows, and the protocol/service labels associated with the flows are detected using a network deep packet inspection tool (nDPI). The following table displays the first few service labels and their associated payload bytes.

0 1 2 3 4 5 6 7 8 9 ... 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024
0 Google 22 3 1 2 0 1 0 1 252 ... 113 118 108 144 87 17 63 67 134 114
1 SSL 22 3 3 0 57 2 0 0 53 ... 140 123 32 18 193 74 221 192 98 78
2 LDAP 48 132 0 0 4 249 2 2 3 ... 161 230 107 18 191 84 166 85 176 245
3 LDAP 48 132 0 0 5 8 2 2 3 ... 168 49 160 26 52 181 64 181 202 160
4 MS_OneDrive 72 84 84 80 47 49 46 49 32 ... 46 105 112 118 54 116 101 115 116 99

5 rows × 1025 columns

The bar chart shows the most frequent protocols/services and their frequency distribution. link

Data pipeline for Network Traffic Identification

There are multiple phases through which the payload data needs to pass through before it can be used to train a deep learning model.

link

Results

The table below shows the aggregated performance metrics for the different optimizers used to train the CNN model.

link