Skip to content

MoleculeTransformers/smiles-featurizers

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SMILES Featurizers

Extract Molecular SMILES embeddings from language models pre-trained with various objectives architectures.

Downloads

Getting Started

Colab Example Notebook


Open In Colab

Install using Pip

pip install smiles-featurizers==1.0.8

Model List

Our released models are listed as following. You can import these models by using the smiles-featurizers package or using HuggingFace's Transformers.

Model Type
UdS-LSV/smole-bert Bert
UdS-LSV/smole-bert-mtr Bert
UdS-LSV/smole-bart Bart
UdS-LSV/muv2x-simcse-smole-bert Simcse
UdS-LSV/siamese-smole-bert-muv-1x SentenceTransformer

Use SMILES Featurizers

Bert Featurizer

from smiles_featurizers import BertFeaturizer
import torch

## set device
use_gpu = True if torch.cuda.is_available() else False

featurizer = BertFeaturizer("UdS-LSV/smole-bert", use_gpu=use_gpu)
embeddings = featurizer.embed(["CCC(C)(C)Br"])

Bart (Encoder) Featurizer

from smiles_featurizers import BartFeaturizer

featurizer = BartFeaturizer("UdS-LSV/smole-bart")
embeddings = featurizer.embed(["CCC(C)(C)Br"], embedder="encoder")

Bart (Decoder) Featurizer

from smiles_featurizers import BartFeaturizer

featurizer = BartFeaturizer("UdS-LSV/smole-bart")
embeddings = featurizer.embed(["CCC(C)(C)Br"], embedder="decoder")

SimCSE Featurizer

from smiles_featurizers import SimcseFeaturizer
import torch

## set device
device = "cuda" if torch.cuda.is_available() else "cpu"

featurizer = SimcseFeaturizer("UdS-LSV/muv2x-simcse-smole-bert", device=device)
embeddings = featurizer.embed(["CCC(C)(C)Br"])

SentenceTransformer Featurizer

from smiles_featurizers import SentenceTransformersFeaturizer
import torch

## set device
device = "cuda" if torch.cuda.is_available() else "cpu"

featurizer = SentenceTransformersFeaturizer("UdS-LSV/siamese-smole-bert-muv-1x", device=device)
embeddings = featurizer.embed(["CCC(C)(C)Br"])