Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replica state indices array contains integers with large magnitude #687

Open
zhang-ivy opened this issue Apr 19, 2023 · 0 comments
Open

Comments

@zhang-ivy
Copy link
Contributor

I'm observing an issue with the replica state indices saved in the .nc file of one of my RBD:ACE2 complex phase repex simulations. For one particular iteration, the replica state indices (which should be between 0 and 35 because there are 36 states in the repex simulation) are very large. I wonder if this is caused by overflow?

I'm using openmm 8, openmmtools 0.21.5, and python 3.10 to run this. I've attached a dump of my env. I can't attach the .nc file because its too big (even when compressed), but can supply the filepath on lilac (@ijpulidos )

import numpy as np
from openmmtools.multistate import MultiStateReporter, MultiStateSamplerAnalyzer

max_n_iterations = 20000
path = "2_complex.nc"
reporter = MultiStateReporter(path, open_mode='r')
analyzer = MultiStateSamplerAnalyzer(reporter, max_n_iterations=max_n_iterations)
equilibration_data = analyzer._get_equilibration_data()

yields this error:

IndexError                                Traceback (most recent call last)
Cell In[6], line 6
      4 reporter = MultiStateReporter(path, open_mode='r')
      5 analyzer = MultiStateSamplerAnalyzer(reporter, max_n_iterations=max_n_iterations)
----> 6 equilibration_data = analyzer._get_equilibration_data()
      7 # print(f"n_equilibration_iterations: {n_equilibration_iterations}")
      8 # print(f"subsample rate: {subsample_rate}")

File ~/miniconda3/envs/perses-paper7/lib/python3.10/site-packages/openmmtools/multistate/multistateanalyzer.py:2046, in MultiStateSamplerAnalyzer._get_equilibration_data(self, energies, neighborhoods, replica_state_indices)
   2043     n_effective_max = (self.max_n_iterations - n_equilibration + 1) / g_t
   2045 else:
-> 2046     u_n = self.get_effective_energy_timeseries(energies=energies, neighborhoods=neighborhoods, replica_state_indices=replica_state_indices)
   2048     # For SAMS, if there is a second-stage start time, use only the asymptotically optimal data
   2049     t0 = self._n_equilibration_iterations if self._n_equilibration_iterations is not None else 1 # if self._n_equilibration_iterations was not specified, discard minimization frame

File ~/miniconda3/envs/perses-paper7/lib/python3.10/site-packages/openmmtools/multistate/multistateanalyzer.py:1458, in MultiStateSamplerAnalyzer.get_effective_energy_timeseries(self, energies, neighborhoods, replica_state_indices)
   1455 for iteration in range(n_iterations):
   1456     # Slice the current sampled states by those replicas.
   1457     states_slice = replica_state_indices[:, iteration]
-> 1458     u_n[iteration] = np.sum(energies[replicas_slice, states_slice, iteration])
   1460     # Correct for potentially-changing log weights
   1461     if has_log_weights:

IndexError: index 1699962488 is out of bounds for axis 1 with size 36

The error is occurring in get_effective_energy_timeseries(). When I run the code in that function:

# create effective energy timeseries (copied from MultiStateSamplerAnalyzer.get_effective_energy_timeseries())
energies, _, neighborhoods, replica_state_indices = analyzer._read_energies(truncate_max_n_iterations=True)
n_replicas, n_states, n_iterations = energies.shape

u_n = np.zeros([n_iterations], np.float64)
# Slice of all replicas, have to use this as : is too greedy
replicas_slice = range(n_replicas)
for iteration in range(n_iterations):
    # Slice the current sampled states by those replicas.
    states_slice = replica_state_indices[:, iteration]
    u_n[iteration] = np.sum(energies[replicas_slice, states_slice, iteration])

I find that at iteration 19310, I get the following error:

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
Cell In[7], line 12
     10 # Slice the current sampled states by those replicas.
     11 states_slice = replica_state_indices[:, iteration]
---> 12 u_n[iteration] = np.sum(energies[replicas_slice, states_slice, iteration])

IndexError: index 1699962488 is out of bounds for axis 1 with size 36

When I examine replica_state_indices[:, 19310]:

masked_array(data=[ 1699962488,  -675782556,    25841912,   867372552,
                     520093744,  2018181943,  1684362078,   907524192,
                     134318661,   808579126,   622460928,  1584973570,
                    1617192275,  1274706784, -1912077942,     3158450,
                      39983616,  1398700273,  1616929893, -1974732165,
                   -1307703295,       12337,  -318615014,  1699962488,
                    2069913700,    25840634,   833687048,   436207664,
                    2028864086,  1684362078,   -92577696,   134318667,
                     808563470,  1511653376,  1584983810,  1617192275],
             mask=False,
       fill_value=999999)

The values in this array should be between 0 and 35 (because there are 36 states in this simulation), but are instead very large in magnitude.

The replica_state_indices for all other iterations look fine. For example, here is `replica_state_indices[:, 19311]:

masked_array(data=[14,  4, 31,  2, 10, 21, 19, 20,  7, 28, 22, 26, 13,  3,
                   24, 16, 27, 33, 25, 23, 30, 18, 11, 35, 32, 15, 17, 29,
                   34,  0,  6, 12,  5,  9,  1,  8],
             mask=False,
       fill_value=999999)

perses-paper7-explicit.txt.zip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant