Skip to content

sebasmos/MetaDengue

Repository files navigation

MetaDengue

MetaDengue is a unified dataset format that combines satellite imagery and Socioeconomical and environmental metadata.

DengueSet seeks to organize satellite imagery and metadata using a unified and standard dataset, that can be scalable and reproduceable to any geographical area. For this particular project, we have extracted the satellite imagery of the municialities of Colombia from 2016 to 2018 based on the Epiweek using satellite extractor, a proposal based on SentinelHub, and with this proposal we combine Socioeconomical and environmental metadata in JSON files, one for each corresponding image so that the data is temporally and spacially aligned.

Find the full open-opensource datasets in HuggingFace: Link

Updates

  • build_dataset_adapted.py is designed for the version 2.0 from the dataset. The name of the images contain the numeric code of each municipality instead of the prefix "image"

Proposed Structure

The following is the desired structured that we propose for DengueSet. There are 2 main folders, one called images which contains a subfolder for each corresponding municipality and inside of it, all the temporal satellite imagery for each corresponding municipality. On the second folder, the same structure as the previous one, but on each case, containing the corresponding metadata on JSON files.

DATASET/ 
	images/
		5001/
                  images_01_01_2016.tiff
                  images_01_07_2016.tiff
                  .
                  .
		5002/
    .
    .
		500N/
	annotations/
		5001/
                  images_01_01_2016.json
                  images_01_07_2016.json
		      .
                  .
		5002
    .
    .
		500N

Create metadata dataset.

In order to create a customized dataset, please update config.py with the corresponding metadata and adapt build_dataset.py as required. For DengueSet case, please run build_dataset.py. Afterwards, make sure the images folder is stored on the same tree hierarchy as the annotations/ folder.

Metadata organization:

{
      "image_path": "DATASET/images/23001/image_2016-01-03.tiff",
      "municipality_code": 23001,
      "epiweek": 201601,
      "dynamic": {
            "cases": {
                  "dengue_cases": 31,
                  "binary_classification": 1,
                  "multiclass_classification": 0
            },
            "environmental_data":{
                  "temperature": [
                        30.703164269355987
                  ],
                  "precipitation": [
                        0.1678038044754522
                  ]
            }
      },
      "static": {
            "environmental_data": {
                  "elevation": 36.0
            },
            "socio_economic_demographic_data": {
                  "socio_economic_data":{
                        "Secondary/HigherEducation(%)": 59.93,
                        "Employedpopulation(%)": 36.46,
                        "Unemployedpopulation(%)": 4.61,
                        "Peopledoinghousework(%)": 17.02,
                        "Householdswithoutwateraccess(%)": 10.07,
                        "Householdswithoutinternetaccess(%)": 52.84,
                        "Buildingstratification1(%)": 60.2975,
                        "Buildingstratification2(%)": 14.4266,
                        "Buildingstratification3(%)": 6.6937,
                        "Buildingstratification4(%)": 2.3756,
                        "Buildingstratification5(%)": 0.8777,
                        "Buildingstratification6(%)": 0.7190000000000001,
                        "NumberofhospitalsperKm2": 0.087552,
                        "NumberofhousesperKm2": 37.051894
                  },
                  "socio_demographic_data":{
                        "Age0-4(%)": 7.76,
                        "Age5-14(%)": 26.31,
                        "Age>30(%)": 48.69,
                        "Men(%)": 48.51,
                        "Women(%)": 51.49,
                        "Population per year": 471724,
                        "AfrocolombianPopulation(%)": 1.7,
                        "IndianPopulation(%)": 0.71,
                        "Peoplewhocannotreadorwrite(%)": 5.93,
                        "PeoplewithDisabilities(%)": 2.69
                  }
            }
      }
}

Download DengueSet Augmented data with aligned metadata [Download]:

Google CLoud Platform bucket storing 10 top cities. . Data extracted using recursive artifact removal, cloud removal based on LeastCC, and Nearest Interpolation for spatial resolution. Implemented [here] and augmentations applied to RGB channels while leaving other satellite channels unchanged:

  1. Contrast limited adaptive histogram equalization (CLAHE) - using clip_limit=6.0 and tile_grid_size=(16, 16)
  2. RGBShift (applied to 30 pixels per channel with 100 % probability )
  3. RandomBrightnessContrast (applied with a probability of 50% probability)

Dataloder demos

Define custom dataloders in dataloders/

Pytorch implementation [Notebeook]

  1. Custom dataloader to load all folders within DATASET/images folder. [Here]
  2. Custom dataloder to load filtered folders within DATASET/image folder. [Here]

Tensorflow implementation [Notebook]

  1. Custom dataloader to load all folders within DATASET/images folder. [Here]
  2. Custom dataloder to load filtered folders within DATASET/image folder. [Here]

Releases

No releases published

Packages

No packages published