Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed signature support to handle Delta-persistent objects #271

Merged
merged 15 commits into from Apr 27, 2024

Commits on Feb 23, 2024

  1. Added debug statements and test code to try to find the bug

    The "logic error in PersistentRegistry" warning gets printed in Cascade
    whenever an RPC update changes only a non-signed field, and not the
    signed field. This seems to indicate that the sign() method is being
    called on a version that does not exist in the signed field's log, and
    is trying to add a signature to this nonexistent version. I'm trying to
    reproduce this bug in Derecho's signed_log_test so I can fix it here.
    etremel committed Feb 23, 2024
    Configuration menu
    Copy the full SHA
    52c9226 View commit details
    Browse the repository at this point in the history

Commits on Feb 27, 2024

  1. Further testing and logging changes to find the bug

    It turns out the error only shows up when the signed persistent field is
    also a DeltaSupport field, meaning it's possible for it to create a
    version with no data (because the delta is empty). Regular non-Delta
    signed fields always create a version with their current state even when
    a non-signed field originates the makeVersion call, so mixing regular
    signed and non-signed fields doesn't create a log with gaps.
    etremel committed Feb 27, 2024
    Configuration menu
    Copy the full SHA
    be3b985 View commit details
    Browse the repository at this point in the history

Commits on Mar 6, 2024

  1. Fixed sign and handle_verify_request to skip empty versions

    As identified by the new signed_log_test, Delta-supporting signed
    fields create a problem for the sign() method because they can end up
    saving no data to the log when makeVersion is called. In particular, if
    makeVersion is called when only non-signed fields were updated, the
    non-signed fields will get a log entry at that version, but the
    Delta-suporting signed field will just advance its current version
    without creating a log entry (since there is no Delta data). Then when
    PersistentRegistry::sign() is called, getMinimumVersionAfter() will
    return the new version that is in the non-signed log as
    cur_nonempty_version, but updateSignature won't sign any data for this
    version because it doesn't exist in the signed field's log. This results
    in the "logic error" message because no data was signed for
    cur_nonempty_version.
    
    The most direct solution is to simply write a new version of
    getMinimumVersionAfter() that's just for sign(), called
    getNextSignedVersion(), which only calls getNextVersionOf() on fields
    that have signatures enabled. The getNextVersionOf() method already
    correctly skips ahead to the next version with nonempty log data for
    signed fields, and if no non-signed fields get to return their versions,
    the overall minimum must be a version that actually has data in the log
    of a signed field.
    
    Fixing the same problem in handle_verify_request also involved fixing
    a longstanding performance flaw. Instead of doing a public-key
    verification on the signature posted to the SST by another replica, we
    should just compare that signature to the signature in the local
    replica's log on the same version - they should be bytewise identical
    since both replicas have the same signing key. With this change made, we
    should also ensure that we only do the equality check if both replicas
    actually have a signature for the requested version; a version that is
    in the persistent log but not in any signed fields doesn't need to be
    verified.
    
    Finally, this commit removes noexcept(true) from the PersistLog
    constructor because the from_pem_private method (called to determine the
    signature length) can throw an exception if the PEM file is not found.
    etremel committed Mar 6, 2024
    Configuration menu
    Copy the full SHA
    8159dc7 View commit details
    Browse the repository at this point in the history

Commits on Mar 11, 2024

  1. Rewrote Replicated<T>::persist to separate out a "sign" method

    Now that the signed_num can be different from the persisted_num
    (smaller, if the current persisted_num has no signed data), it no longer
    makes sense to use the single return value from Replicated::persist as
    the new version that is "signed and persisted" - a single persist
    request may result in a different persisted_num and signed_num.
    PersistenceManager was the only part of the code calling the
    Replicated::persist method, so I refactored it into separate "persist"
    and "sign" methods that PersistenceManager calls in a loop, replacing
    the sign-and-persist loop that used to be within Replicated::persist.
    This allows PersistenceManger to learn both the latest persisted version
    and the latest signed version, and update persisted_num and signed_num
    separately.
    
    While doing this, I re-arranged replicated.hpp and added some comments
    to more clearly document that some of the "public" methods are in fact
    only used by internal Derecho components and should not be relied on as
    a stable API.
    
    Also updated signed_log_test to correctly detect the end of the test,
    now that the last persisted version might not equal the last signed
    version - if the last update to MixedFieldObject is to the unsigned
    field, the version resulting from that update will never be signed.
    etremel committed Mar 11, 2024
    Configuration menu
    Copy the full SHA
    0e0a26e View commit details
    Browse the repository at this point in the history
  2. Minor fix to Derecho header includes

    I noticed that a few internal files are still including other Derecho
    headers with the "system" (angle-brackets) syntax instead of the "local"
    (quotes) syntax. Library files should always include other library files
    with the "local" syntax. Putting this in a separate commit since it's
    unrelated to my other work on sign() and persist().
    etremel committed Mar 11, 2024
    Configuration menu
    Copy the full SHA
    4b7f649 View commit details
    Browse the repository at this point in the history

Commits on Mar 28, 2024

  1. Changed getMinimumLatestVersion to getCurrentVersion

    Following a discussion with Weijia where we confirmed that the
    signatures system was the only part of Derecho that used
    getMinimumLatestVersion, I decided to change this method to return the
    current "in-memory" version of the Persistent fields as reported by the
    currMetaHeader version, rather than the latest version of the Persistent
    fields that corresponds to a log entry. I renamed the method to
    getCurrentVersion to reflect this new meaning, and added a
    getCurrentVersion method to the PersistLog API that would get the
    meta-header version. The original getLatestVersion still exists and has
    the same meaning, since the latest version in a log entry is still a
    piece of information someone might want to know.
    etremel committed Mar 28, 2024
    Configuration menu
    Copy the full SHA
    0cb6ff6 View commit details
    Browse the repository at this point in the history

Commits on Mar 29, 2024

  1. Fixed bug in handling signature requests for a version already signed

    Since the sign() method, like the persist() method, can end up signing a
    later version than the one requested (due to batching), it's possible
    for a later persistence request to call sign() only to find that an
    earlier persistence request already signed the latest version - in other
    words, there's nothing for PersistentRegistry::sign() to do because the
    latest version is already signed. In this case,
    PersistentRegistry::sign() would return lastSignedVersion without
    placing any data into the signature_buffer, because the main while loop
    would do nothing. This would cause PersistenceManager to get an empty
    buffer as the "latest signature" and then place that empty buffer in the
    SST's signature field.
    
    I added a special case to PersistentRegistry::sign() to make it return
    the cached latest signature if the method is a no-op. I also changed
    handle_persist_request to check if signed_version has actually advanced
    before updating the SST, to avoid redundantly re-copying the latest
    version's signature to the SST if it was already placed there by a
    previous persistence request.
    
    Also changed handle_verify_request to check the local node's version of
    signed_num in the SST to determine if there's a local signature to check
    against when verifying another node's signature. This is more reliable
    than trying to retrieve the local signature and testing for failure,
    because get_signature can "succeed" but return a signature of all zeroes
    if the desired log entry has been created but not yet signed.
    etremel committed Mar 29, 2024
    Configuration menu
    Copy the full SHA
    75c4dc9 View commit details
    Browse the repository at this point in the history
  2. Removed extra-verbose debugging statements

    Now that the signatures are verifying correctly, I'll stop dumping the
    entire hex value of each signature into the log. Even at the trace level
    that's a bit much for anything other than finding this particular bug.
    etremel committed Mar 29, 2024
    Configuration menu
    Copy the full SHA
    d18a271 View commit details
    Browse the repository at this point in the history

Commits on Apr 2, 2024

  1. Removed unused version parameter from sign()

    This seems to work and produces no warnings, which means we no longer
    need the "version" parameter from PersistenceManager as even a "hint" to
    the Replicated::sign() method - it can just use getCurrentVersion().
    etremel committed Apr 2, 2024
    Configuration menu
    Copy the full SHA
    fb1c954 View commit details
    Browse the repository at this point in the history

Commits on Apr 3, 2024

  1. Rewrote persist() to persist an exact version or the latest version

    The "version" argument to the various persist() methods actually didn't
    do anything because FilePersistLog's persist() ignored the argument and
    always persisted the latest version. After discussion with Weijia, I
    rewrote FilePersistLog::persist() to actually use the argument to limit
    the range of data it persists, but also to make the argument optional
    and revert to the old behavior if it is not provided. This means callers
    that want the batching behavior don't have to guess at the correct
    "latest" version to provide as an argument (they can just provide
    std::nullopt), and callers that want only a certain version persisted
    will not end up persisting a later version than they wanted.
    
    Most of the "code changes" in FilePersistLog::persist() come from me
    manually inlining the NEXT_LOG_ENTRY, NEXT_LOG_ENTRY_PERS, and NEXT_DATA
    macros so that I could figure out what they were doing. The substantial
    change is to replace all references to m_currMetaHeader in those macros
    with shadow_header, which is a modified copy of m_currMetaHeader. If the
    latest_version argument is provided, shadow_header is modified to
    "point" to an earlier log entry, namely the one corresponding to
    latest_version. If latest_version is not provided, shadow_header equals
    m_currMetaHeader and the function should behave exactly as before.
    Either way, shadow_header becomes the next m_persMetaHeader after the
    data is flushed, so the m_persMetaHeader accurately reflects which log
    entries were persisted.
    etremel committed Apr 3, 2024
    Configuration menu
    Copy the full SHA
    d9058fb View commit details
    Browse the repository at this point in the history
  2. Separated verification requests into their own thread

    Interleaving persistence and verification requests in a single queue
    processed by a single thread makes it harder for the thread to skip
    ahead and discard obsolete persistence requests when the persisted
    number advances due to batching. There's also no good reason for these
    requests to be handled by the same thread, since they are independent of
    one another (and verification has very little to do with perisistence).
    etremel committed Apr 3, 2024
    Configuration menu
    Copy the full SHA
    51a068b View commit details
    Browse the repository at this point in the history

Commits on Apr 16, 2024

  1. Added a last_verified_version analogous to last_persisted_version

    The handle_verify_request method should also return early if the
    requested version has already been verified, like
    handle_persist_request.
    etremel committed Apr 16, 2024
    Configuration menu
    Copy the full SHA
    a92542c View commit details
    Browse the repository at this point in the history

Commits on Apr 17, 2024

  1. Configuration menu
    Copy the full SHA
    d357408 View commit details
    Browse the repository at this point in the history
  2. Fixed an overlooked instance of persist() with a version

    When I refactored persist() to have an optional version parameter
    (instead of a required one that was ignored), I changed the instances
    where it was called with "the latest version" to call it with nullopt
    instead, but I missed this one.
    etremel committed Apr 17, 2024
    Configuration menu
    Copy the full SHA
    cb95b7a View commit details
    Browse the repository at this point in the history

Commits on Apr 25, 2024

  1. Made persistent's _impl files include their headers

    Technically these includes are redundant since the bottom of the header
    file includes the _impl file, but this makes VSCode's parser happy and
    we do the same thing in group_impl.hpp.
    etremel committed Apr 25, 2024
    Configuration menu
    Copy the full SHA
    0590177 View commit details
    Browse the repository at this point in the history