Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSEED: accumulating record-level micro gaps/overlaps over long time spans #3447

Open
1 task done
raciner opened this issue May 16, 2024 · 7 comments
Open
1 task done
Labels
bug-unconfirmed reported bug that still needs to be confirmed .io.mseed

Comments

@raciner
Copy link

raciner commented May 16, 2024

Avoid duplicates

  • I searched existing issues

Bug Summary

The file linked here contains data from 2024-05-14T23:52:07 to 2024-05-16T00:04:33. However, when using

st = obspy.core.read('file.mseed')
print(st)

this is shown:

XX.XDR01.02.VKD | 2024-05-14T23:52:07.000000Z - 2024-05-15T23:57:37.000000Z | 0.1 Hz, 8674 samples

The problem is that the delta is slightly higher than 10.0 due to the behaviour of the datalogger, however, the difference is within the default tolerance of obspy, so that the difference between the time used in obspy and the actual timestamp grows up to ten minutes throughout the day. The time record for each individual miniseed record is correct. Other software such as qmerge handles this differently. Wouldn't it make more sense to accumulate the time difference between the time indicated in the header and the time that obspy computes based on the time stamp of the first header + delta*number of seen data points and introduce a break once this gets bigger than the tolerance instead of not accumulating it and checking it for each boundary between records individually?

Code to Reproduce

No response

Error Traceback

No response

ObsPy Version?

1.4.1

Operating System?

No response

Python Version?

3.10.12

Installation Method?

pip

@raciner raciner added the bug-unconfirmed reported bug that still needs to be confirmed label May 16, 2024
@megies
Copy link
Member

megies commented May 16, 2024

I'm a little bit confused. From what I can see the sampling interval is stated in the file as exactly 10s but the individual records' start times don't line up, i.e. there are gaps and overlaps on a subsample scale between every record? That correct or am I missing something?

We are merging together records that have gap/overlaps on subsample scale, otherwise you would end up with 420 individual Trace objects.

Am I correct that you want obspy to assume the file is without gaps and compute an odd sampling rate and treat it as a gapless trace?

I'm pretty sure that would mean touching our C code wrapper for libmseed around here, maybe passing in a flag from Python to change behavior in the wrapper. I'm no C wizard, so definitely somebody else would need to do that.

@megies
Copy link
Member

megies commented May 16, 2024

The time record for each individual miniseed record is correct.

I am doubtful of that statement.. note how the individual records' start time single digit seconds jumps around "7" but with ".000000" decimal..?!?

msi version: 0.9.6
msi version: 0.9.6
XX_XDR01_01_VKD, 000001, D
             start time: 2024,135,23:52:07.000000
      number of samples: 14
     sample rate factor: -10  (0.1 samples per second)
 sample rate multiplier: -1
   number of blockettes: 1
        time correction: 0
            data offset: 56
 first blockette offset: 48
         BLOCKETTE 1000: (Data Only SEED)
              next blockette: 0
                    encoding: IEEE double precision float (val:5)
                  byte order: Big endian (val:1)
               record length: 512 (val:9)
XX_XDR01_01_VKD, 000001, D
             start time: 2024,135,23:54:28.000000
      number of samples: 3
     sample rate factor: -10  (0.1 samples per second)
 sample rate multiplier: -1
   number of blockettes: 1
        time correction: 0
            data offset: 56
 first blockette offset: 48
         BLOCKETTE 1000: (Data Only SEED)
              next blockette: 0
                    encoding: IEEE double precision float (val:5)
                  byte order: Big endian (val:1)
               record length: 512 (val:9)
XX_XDR01_01_VKD, 000001, D
             start time: 2024,135,23:54:56.000000
      number of samples: 3
     sample rate factor: -10  (0.1 samples per second)
 sample rate multiplier: -1
   number of blockettes: 1
        time correction: 0
            data offset: 56
 first blockette offset: 48
         BLOCKETTE 1000: (Data Only SEED)
              next blockette: 0
                    encoding: IEEE double precision float (val:5)
                  byte order: Big endian (val:1)
               record length: 512 (val:9)
XX_XDR01_01_VKD, 000001, D
             start time: 2024,135,23:55:27.000000
      number of samples: 21
     sample rate factor: -10  (0.1 samples per second)
 sample rate multiplier: -1
   number of blockettes: 1
        time correction: 0
            data offset: 56
 first blockette offset: 48
         BLOCKETTE 1000: (Data Only SEED)
              next blockette: 0
                    encoding: IEEE double precision float (val:5)
                  byte order: Big endian (val:1)
               record length: 512 (val:9)

@megies
Copy link
Member

megies commented May 16, 2024

accumulate the time difference between the time indicated in the header and the time that obspy computes based on the time stamp of the first header + delta*number of seen data points and introduce a break once this gets bigger than the tolerance instead of not accumulating it and checking it for each boundary between records individually?

could make sense as a switch, but this would mean adding a lot of logic to our libmseed wrapper code and somebody else would have to do it.

@megies megies changed the title Miniseed file not fully read MSEED: accumulating record-level micro gaps/overlaps over long time spans May 16, 2024
@megies
Copy link
Member

megies commented May 16, 2024

@raciner
Copy link
Author

raciner commented May 16, 2024

The basic issue is that the datalogger is supposed to make one measurement every 10s. However, due to the poor quality its implementation, this is not stable, so that there is a slow drift. From one miniseed record to the next, it's always within the tolerance, but on a whole day, it adds up to about seven minutes, which means, that the timestamps are completely wrong after reading the file and writing it to disk. A less surprising behaviour would be to declare a gap or overlap whenever the accumulated time difference is larger than the tolerance.

@megies
Copy link
Member

megies commented May 16, 2024

A less surprising behaviour would be to declare a gap or overlap whenever the accumulated time difference is larger than the tolerance

I agree that makes sense, I might not be able to work on it anytime soon though, trying to finish up some loose ends and more of those appear than get closed anyway..

I'd still say though, that the initial problem comes from a quite bad file and not us and it's an edge case.

@megies
Copy link
Member

megies commented May 16, 2024

What you can do in the meantime is this:

from obspy import read
from obspy.io.mseed.util import get_start_and_end_time

path = '/tmp/proof.mseed'
st = read(path)
assert len(st) == 1
tr = st[0]
start, end = get_start_and_end_time("/tmp/proof.mseed")
tr.stats.sampling_rate = len(tr) / (end - start)

This isn't safe though, and relies heavily on lots of assumptions (ordering of records, no multiplexing, constant record size, ..., see docs for that helper routine).
To do this safely in Python would mean to sacrifice some read speed because it would mean looking at every record header.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug-unconfirmed reported bug that still needs to be confirmed .io.mseed
Projects
None yet
Development

No branches or pull requests

2 participants