SynthGPT

This repository contains the data and code for Large Language Models for Inorganic Synthesis Predictions by Seongmin Kim, Yousung Jung, and Joshua Schrier.

Organization

Input data and pre-defined training and cross-validation and train/test splits are found in the data_MP and data folders, for the synthesizability and precursor selection tasks, respectively.

Results are in the results_MP and results folders, for the synthesizability and precursor selection tasks, respectively. We have used a JSON format to facilitate interpretation of the results.

Prompts for the LLM are in the prompts folder as plain text files; they can also be found in the online Supporting Information file.

Source code is in the src folder; some haphazard tests are included in tests.

Instructions

Run the notebooks in the top-level directory in order. Mathematica code (.wls) uses Mathematica 14.0 and no other libraries. Python code (.py) uses python 3.8.13 and requires libraries; Numpy (version == 1.22.3), PyTorch (version == 1.11.0), and Pymatgen (version == 2022.9.21).

The directory is organized around the order in which we performed the work, dividing the work into discrete tasks:

Precursor selection (scripts 00_Data_Curation.py - 07_Estimate_Perfect_Elemwise.py)
Synthesizability prediction (08_Data_Preparation_Synthesizability.wls - 11_Score_GPT_Outputs_Synthesizability.wls)
Evaluation of precursor rescoring results with GPT-4 (12a_SetupData_Combined.wls and 12b_Evaluate_Combined.wls ) and by removing recommendations that do not consist of only allowed precursors (13_Precursor_Compliance.wls and 14_Evaluate_Combination_Retaining_Only_Allowed_Precursors.wls)
Evaluation of the effects of prompt modification on the synthesizability prediction. These are each evaluated for only the first 5000 test items. They include modifying the prompt to add additional specialization ("You are an expert oxide inorganic chemist...", 15a_Prompt_Modification_Oxide.wls), removing specialization ("You are a magician..." 15b_Prompt_Modification_Magician.wls), and alternate ways of expressing the positive-unlabelled training task ("...items labeled "U" could be positive or negative (i.e., synthesizable or unsynthesizable"), 15c_Prompt_Modification_Labeling.wls).

Yes, this is different from the order the paper. "Life can only be understood backwards; but it must be lived forwards." --Søren Kierkegaard

Cite

A preprint appears on the ChemRXiv as doi:10.26434/chemrxiv-2024-9bmfj

Name		Name	Last commit message	Last commit date
Latest commit History 101 Commits
data		data
data_MP		data_MP
figures		figures
prompts		prompts
results		results
results_MP		results_MP
src		src
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
00_Data_Curation.py		00_Data_Curation.py
01_Data_Preparation.wls		01_Data_Preparation.wls
02_Random_Baseline.wls		02_Random_Baseline.wls
03_Pretrained_GPT-3.5.wls		03_Pretrained_GPT-3.5.wls
04a_Setup_Finetuned_GPT3.5.wls		04a_Setup_Finetuned_GPT3.5.wls
04b_Finetuned_GPT3.5.wls		04b_Finetuned_GPT3.5.wls
05_Pretrained-GPT-4.wls		05_Pretrained-GPT-4.wls
06_Score_GPT_outputs.wls		06_Score_GPT_outputs.wls
07_Estimate_Perfect_Elemwise.py		07_Estimate_Perfect_Elemwise.py
08_Data_Preparation_Synthesizability.wls		08_Data_Preparation_Synthesizability.wls
09a_Pretrained_GPT-3.5_Synthesizability.wls		09a_Pretrained_GPT-3.5_Synthesizability.wls
09b_Pretrained_GPT-4_Synthesizability.wls		09b_Pretrained_GPT-4_Synthesizability.wls
10a_Setup_PU_Finetuned_GPT3.5.wls		10a_Setup_PU_Finetuned_GPT3.5.wls
10b_Finetuned_GPT3.5_Synthesizability.wls		10b_Finetuned_GPT3.5_Synthesizability.wls
11_Score_GPT_Outputs_Synthesizability.wls		11_Score_GPT_Outputs_Synthesizability.wls
12a_SetupData_Combined.wls		12a_SetupData_Combined.wls
12b_Evaluate_Combined.wls		12b_Evaluate_Combined.wls
13_Precursor_Compliance.wls		13_Precursor_Compliance.wls
14_Evaluate_Combination_Retaining_Only_Allowed_Precursors.wls		14_Evaluate_Combination_Retaining_Only_Allowed_Precursors.wls
15a_Prompt_Modification_Oxide.wls		15a_Prompt_Modification_Oxide.wls
15b_Prompt_Modification_Magician.wls		15b_Prompt_Modification_Magician.wls
15c_Prompt_Modification_Labeling.wls		15c_Prompt_Modification_Labeling.wls
15d_Prompt_Modification_Labeling_More.wls		15d_Prompt_Modification_Labeling_More.wls
15e_Prompt_Modification_Oxide_More.wls		15e_Prompt_Modification_Oxide_More.wls
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md

License

jschrier/SynthGPT

Folders and files

Latest commit

History

Repository files navigation

SynthGPT

Organization

Instructions

Cite

About

Topics

Resources

License

Stars

Watchers

Forks

Languages