Skip to content

ybubnov/deep-lookup

Repository files navigation

Deep Lookup - Deep Learning for Domain Name System

Installation

Installation Using PyPi

pip install deeplookup

Using DeepLookup

DeepLookup provides a Resolver instance that inherits dns.resolver.Resolver

from deeplookup import Resolver


resolver = Resolver()

for ip in resolver.resolve("google.com", "A"):
    print(f"ip: {ip.to_text()}")

The code above performs a verification of a queried name using a neural network trained to detect malicious queries (DGAs and tunnels). For the example above the output will look like following:

ip: 142.250.184.206

When the queried name is generated using domain generation algorithm, the resolver throws dns.resolver.NXDOMAIN without even accessing a remote name server.

for ip in resolver.resolve("mjewnjixnjaa.com", "A"):
    print(f"ip: {ip.to_text()}")

The example above throws dns.resolver.NXDOMAIN error with the following message:

dns.resolver.NXDOMAIN: The DNS query name does not exist: mjewnjixnjaa.com.

Training

The model is trained using tfx pipeline, where the training dataset is uploaded, split into the training and evaluation subsets and then used to fit the neural network.

In order to trigger the training pipeline use the following command:

python -m deeplookup.pipeline.gta1

This command creates a folder called "tfx", where all artifacts are persisted. See the tfx/pipelines/gta1/serving_model/gta1/* folder to access the model in HDF5 format.

Publications

  1. Bubnov Y., Ivanov N. (2020) Text analysis of DNS queries for data exfiltration protection of computer networks, Informatics, 3, 78-86.
  2. Bubnov Y., Ivanov N. (2020) Hidden Markov model for malicious hosts detection in a computer network, Journal of BSU. Mathematics and Informatics, 3, 73-79.
  3. Bubnov Y., Ivanov N. (2021) DGA domain detection and botnet prevention using Q-learning for POMDP, Doklady BGUIR, 2, 91-99.

Datasets

The most robust dataset DGTA-BENCH is available through tensorflow datasets API and used for training other neural network architectures:

import deeplookup.datasets as dlds
import tensorflow_datasets as tfds

ds = tfds.load("gta1", shuffle_files=True)

for example in ds.take(1):
  domain, label = example["domain"], example["class"]
  1. Bubnov Y. (2019) DNS Tunneling Queries for Binary Classification, Mendeley Data, v1.
  2. Zago M., Perez. M.G., Perez G.M. (2020) UMUDGA - University of Murcia Domain Generation Algorithm Dataset, Mendeley Data, v1.
  3. Bybnov Y. (2021) DGTA-BENCH - Domain Generation and Tunneling Algorithms for Benchmark, Mendeley Data, v1.