Sparse representation for machine learning the properties of defects in 2D materials

Quickstart

Open in Constructor Research Platform (a cloud service for scientific computations)

Summary

In the paper we propose sparse representation as a way to reduce the computational cost and improve the accuracy of machine learning the properties of defects in 2D materials. The code in the project implements the method, and a rigorous comparison of its performance to the a set of baselines.

Two-dimensional materials offer a promising platform for the next generation of (opto-) electronic devices and other high technology applications. One of the most exciting characteristics of 2D crystals is the ability to tune their properties via controllable introduction of defects. However, the search space for such structures is enormous, and ab-initio computations prohibitively expensive. We propose a machine learning approach for rapid estimation of the properties of 2D material given the lattice structure and defect configuration. The method suggests a way to represent configuration of 2D materials with defects that allows a neural network to train quickly and accurately. We compare our methodology with the state-of-the-art approaches and demonstrate at least 3.7 times energy prediction error drop. Also, our approach is an order of magnitude more resource-efficient than its contenders both for the training and inference part.

The main idea of our method is using a point cloud of defects as an input to the predictive model, as opposed to the usual point cloud of atoms, or expertly created feature vector.

We compare our approach to state-of-the-art generic structure-property prediction algorithms: GemNet, SchNet, MegNet, matminer+CatBoost.

For dataset, we use 2DMD. It consists of the most popular 2D materials: MoS2, WSe2, h-BN, GaSe, InSe, and black phosphorous (BP) with point defect density in the range of 2.5% to 12.5%. We use DFT to relax the structures and compute the defect formation energy and HOMO-LUMO gap. ML algorithms predict those quantities, taking unrelaxed structures as input.

Using the pre-trained models

Library

Use the library https://github.com/HSE-LAMBDA/MEGNetSparse/

This repository

Clone the repository
Set up the environment
Download the weights and data:

dvc pull datasets/checkpoints/combined_mixed_all_train/formation_energy_per_site/megnet_pytorch/sparse/05-12-2022_19-50-53/d6b7ce45/0.pth.dvc datasets/checkpoints/combined_mixed_all_train/homo_lumo_gap_min/megnet_pytorch/sparse/05-12-2022_19-50-53/831cc496/0.pth.dvc csv-cif-low-density-8x8 csv-cif-no-spin-500-data csv-cif-spin-500-data train-only-split

The data are not needed for predictions, and are only used to generate new structures in the example notebook.

Open the notebook. It contains the prediction code, along with generation of new structures with defects, and example processing of user-uploaded data.

Citation

Please cite the following two papers if you use the code or the data:

Kazeev, N., Al-Maeeni, A.R., Romanov, I. et al. Sparse representation for machine learning the properties of defects in 2D materials. npj Comput Mater 9, 113 (2023). https://doi.org/10.1038/s41524-023-01062-z

Huang, P., Lukin, R., Faleev, M. et al. Unveiling the complex structure-property correlation of defects in 2D materials based on high throughput datasets. npj 2D Mater Appl 7, 6 (2023). https://doi.org/10.1038/s41699-023-00369-1

Internal links

The overall design is documented in an obsolete flowchart
Some design decisions are outlined in an obsolete RFC
Project log is in Notion
Paper in Overleaf

Name		Name	Last commit message	Last commit date
Latest commit History 1,118 Commits
.dvc		.dvc
ai4mat		ai4mat
configs/gemnet		configs/gemnet
datasets		datasets
defects_generation		defects_generation
dichalcogenides8x8_vasp_nus_202110		dichalcogenides8x8_vasp_nus_202110
docs		docs
notebooks		notebooks
nscc_logs		nscc_logs
scripts		scripts
templates		templates
tests		tests
trials		trials
.dockerignore		.dockerignore
.dvcignore		.dvcignore
.gitignore		.gitignore
.gitmodules		.gitmodules
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
__init__.py		__init__.py
data_format.yaml		data_format.yaml
dvc.lock		dvc.lock
dvc.yaml		dvc.yaml
gemnet-train-best-config.sbatch		gemnet-train-best-config.sbatch
params-rolos-workflow.yaml		params-rolos-workflow.yaml
params.yaml		params.yaml
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
run_experiments.py		run_experiments.py
run_gemnet_hyperopt.sbatch		run_gemnet_hyperopt.sbatch
singularity.sbatch		singularity.sbatch
slurm-job.sbatch		slurm-job.sbatch
storage.yaml		storage.yaml

License

HSE-LAMBDA/ai4material_design

Folders and files

Latest commit

History

Repository files navigation

Sparse representation for machine learning the properties of defects in 2D materials

Quickstart

Table of contents

Summary

Using the pre-trained models

Library

This repository

Citation

Internal links

About

Topics

Resources

License

Stars

Watchers

Forks

Languages