Skip to content

TRI-AMDD/PolyGen

Repository files navigation

De novo designs of polymer electrolytes with high conductivities using Generative AIs

Generated polymer electrolyte

Installation

minGPT

Python version: 3.8

Install the required packages minGPT, rdkit, deepchem and transformers:

pip install rdkit deepchem transformers

cd minGPT/model
pip install -e .

diffusion1D

Python version: 3.8

Install the required packages denoising_diffusion_pytorch, rdkit, deepchem and transformers:

pip install rdkit deepchem transformers

cd diffusion1D/model
pip install -e .

diffusionLM

Python version: 3.8

Install the required packages diffusionLM, transformers (customized) and others:

pip install mpi4py
pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113
pip install -e diffusionLM/improved-diffusion/ 
pip install -e diffusionLM/transformers/
pip install spacy==3.2.6
pip install datasets==2.0.0 
pip install huggingface_hub==0.16.4
pip install wandb deepchem torchsummary

Dataset

minGPT & diffusion1D

Prepare the data used for training in .csv file with two columns, the separation marker is "\t"

  • 1st column: "mol_smiles" (SMILES code for the monomer)
  • 2nd column: "conductivity" ("1" is high conductivity, "0" is low conductivity)

diffusionLM

  • The datasets are stored in .json format, please check the diffusionLM/datasets for examples.

Training, generation and evaluation pipeline

  • data preprocessing (data_config)
  • build the model (model_config)
  • train the model (train_config)
  • generate candidates (generate_config)
  • evaluation (6 metrics including validity, novelty, uniqueness, synthesizability, similarity and diversity)

Demo

The demos are shown in minGPT_pipeline.ipynb, diffusion1D_pipeline.ipynb, diffusionLM_pipeline.ipynb

minGPT & diffusion1D

  • For minGPT_pipeline.ipynb, diffusion1D_pipeline.ipynb, all the steps in pipeline can be executed in the notebook.

diffusionLM

  • For diffusionLM_pipeline.ipynb, the notebook generates the the bash scripts for training and generation. The scripts will be stored under diffusionLM/improved-diffusion.

To run the training:

cd diffusionLM/improved-diffusion
bash train_conditional.sh or bash train_unconditional.sh
The model checkpoints will be stored in ```diffusionLM/improved-diffusion/diffusion_models```

To run the generation:

cd diffusionLM/improved-diffusion
bash generate_conditional.sh or bash generate_unconditional.sh

The generated output will be stored in diffusionLM/improved-diffusion/generation_outputs

Pretrained models

minGPT

The checkpoints of pretrained model at different epochs can be obtained here:https://drive.google.com/drive/folders/1M1VjgUnFDospbmVSnr17JdUcUa-_4O79?usp=sharing. Please put the checkpoints files under minGPT/ckpts/.

diffusion1D

The checkpoints of pretrained model at different epochs can be obtained here: https://drive.google.com/drive/folders/1kFnKtnmuQLTNDZ7BJG2ZhoJKGWoXlI--?usp=sharing. Please put the checkpoints files under diffusion1D/ckpts/.

diffusionLM

The checkpoints of pretrained model at different epochs can be obtained here: https://drive.google.com/drive/folders/1ndLNhRZu8TL2Ni7VL8Q9GRAeX9fFVOq0?usp=sharing. Please put the whole checkpoints folder and files under diffusionLM/improved-diffusion/diffusion_models/.

Reference

The github repositories that are referenced for this code include:

https://github.com/karpathy/minGPT
https://github.com/lucidrains/denoising-diffusion-pytorch
https://github.com/XiangLi1999/Diffusion-LM

Citation

If you use PolyGen, please cite the following:

@article{lei2023self,
  title={A self-improvable Polymer Discovery Framework Based on Conditional Generative Model},
  author={Lei, Xiangyun and Ye, Weike and Yang, Zhenze and Schweigert, Daniel and Kwon, Ha-Kyung and Khajeh, Arash},
  journal={arXiv preprint arXiv:2312.04013},
  year={2023}
}

@article{yang2023novo,
  title={De novo design of polymer electrolytes with high conductivity using gpt-based and diffusion-based generative models},
  author={Yang, Zhenze and Ye, Weike and Lei, Xiangyun and Schweigert, Daniel and Kwon, Ha-Kyung and Khajeh, Arash},
  journal={arXiv preprint arXiv:2312.06470},
  year={2023}
}