Atomic encodings #861

KnathanM · 2024-05-09T17:44:18Z

Description

#722 asked about getting an interface for atomic encodings. This is simple to implement and may not be entirely necessary because users can do it themselves as @davidegraff showed in a comment. But in v2.2 we plan to add support for atom and bond targets which would probably benefit from a function at the model level to get the atom encodings. We could add it now, given that it would be simple.

Example

atom_encodings = []
for batch in dataloader:
    bmg, V_d, *_ = batch
    atom_encodings.extend(model.atomic_encodings(bmg, V_d)

Each element in atom_encodings is the atomic fingerprints for a single molecule.

Questions

What is a good name for this function? I brainstormed atom_encodings, atomic_encoding, atoms_fingerprint.

I still have a couple things to do before merging this if the reviews are positive.

Checklist

Add unit tests
Add example in notebooks

davidegraff · 2024-05-09T18:08:40Z

chemprop/models/model.py

@@ -129,6 +129,10 @@ def encoding(
        """Calculate the :attr:`i`-th hidden representation"""
        return self.predictor.encode(self.fingerprint(bmg, V_d, X_d), i)

+    def atomic_encodings(self, bmg: BatchMolGraph, V_d: Tensor | None = None) -> tuple[Tensor]:
+        H_v = self.message_passing(bmg, V_d)
+        return H_v.split(torch.bincount(bmg.batch).tolist())


Suggested change

return H_v.split(torch.bincount(bmg.batch).tolist())

sizes = torch.bincount(bmg.batch, minlength=len(bmg)).tolist()

return H_v.split(sizes)

davidegraff · 2024-05-09T18:08:54Z

chemprop/models/multi.py

+        self, bmgs: Iterable[BatchMolGraph], V_ds: Iterable[Tensor | None]
+    ) -> list[tuple[Tensor]]:
+        H_vs: list[Tensor] = self.message_passing(bmgs, V_ds)
+        return [H_v.split(torch.bincount(bmgs.batch).tolist()) for H_v in H_vs]


same as above

davidegraff · 2024-05-09T18:22:55Z

FWIW I prefer atom_encodings. I prefer to avoid the term "fingerprint" as much as possible as it's not defined in the CS community, but the chemists occasionally like to use it so that's why it's been kept around. You'll also want to add a unit test or two based on the output shape. There are two strategies here:

keep the code as-is and unit test that the function, given an input batch with some number of molecules produces a list[Tensor] of shape $(b, n_{a,i}, d)$, where $n_{a,i}$ is the number atoms in molecule $i$
refactor this function to call some other utility function split_into_tensors() (or some other descriptive name) that handles the operation of taking a tensor of shape $(n, d)$ a batch index $\mathbf i \in [0 .. b]^{n}$ and splits it into a list[Tensor] of shape $(b, n_{a,i}, d)$, where "..."

The advantage of (2) is that it's simpler and more isolated. The unit test only needs to cover the split_into_tensors() function (as testing atom_encodings() would be an "integration" test now). In comparison, (1) can result in failures for a number of reasons, with only one of them being the fault of the atom_encodings() function (because it relies featurization and message-passing to work correctly.) The fix to this is to "mock" the output of message_passing to be correct by construction. I'm ambivalent between the two approaches: mocking (1) and refactoring (2).

start work on atomic encodings

6081f1d

davidegraff reviewed May 9, 2024

View reviewed changes

kevingreenman added this to the v2.0.1 milestone May 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Atomic encodings #861

Atomic encodings #861

KnathanM commented May 9, 2024

davidegraff May 9, 2024

davidegraff May 9, 2024

davidegraff commented May 9, 2024

-        return H_v.split(torch.bincount(bmg.batch).tolist())
+        sizes = torch.bincount(bmg.batch, minlength=len(bmg)).tolist()
+        return H_v.split(sizes)

Atomic encodings #861

Are you sure you want to change the base?

Atomic encodings #861

Conversation

KnathanM commented May 9, 2024

Description

Example

Questions

Checklist

davidegraff May 9, 2024

Choose a reason for hiding this comment

davidegraff May 9, 2024

Choose a reason for hiding this comment

davidegraff commented May 9, 2024