Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Derive the ChaCha20 key from the long-term key and the data hash #157

Open
DemiMarie opened this issue Apr 28, 2023 · 6 comments
Open

Derive the ChaCha20 key from the long-term key and the data hash #157

DemiMarie opened this issue Apr 28, 2023 · 6 comments
Labels
help wanted Extra attention is needed research

Comments

@DemiMarie
Copy link

Currently, Wyng uses XChaCha20 with a nonce derived from the system time, a counter, and random data. I am not aware of any way to exploit this, but it is more complex than I would like. However, Wyng archives are content-addressed, meaning that the hash of an object being retrieved is always known at the time of retrieval. This allows a much simpler, deterministic solution: generate the ChaCha20 key and initial nonce from that hash and the long-term key using a pseudorandom function (PRF).

There are many good PRFs that could be used, but I recommend using Blake2b with the long-term key (after key stretching, etc) as the key, the hash of the plaintext (concatenated with some randomness) to be encrypted as the data, and something like Wyng Backup Key Derivation as the personalization string. This produces a 512-bit secret, which is more than enough for a 256-bit ChaCha20 key and a 96-bit ChaCha20 nonce. This provides strong robustness: if the random part of the KDF input repeats, the adversary learns whether or not the same message was encrypted more than once, and if the random part does not repeat, the adversary learns nothing at all.

@marmarek
Copy link

it is more complex than I would like

Based just on reading your description above, it sounds like your proposed solution is significantly more complex than the thing you consider too complex. And sounds dangerously close to rolling your own crypto, which definitely would be a bad idea.

@tasket
Copy link
Owner

tasket commented Apr 28, 2023

The initial premise is tempting as a way to save CPU cycles: Wyng already treats its internal hash lists as each object's identity, so why not incorporate them in nonces as well. I will say, it's not as complicated as managing a key database.

OTOH, libsodium's recommendation for nonce-reuse resistance is simpler in that it only takes rnd || plaintext as input to the hash... there is no hash-of-a-hash occurring (the effects of which, if any, are unknown to me).

We could go with the libsodium recommendation, which hopefully won't be as slow AES-256-SIV, and be confident about safety. The Wyng format already makes the presence of repeat chunks noticeable when deduplicating, so we're not really losing confidentiality.

@DemiMarie
Copy link
Author

OTOH, libsodium's recommendation for nonce-reuse resistance is simpler in that it only takes rnd || message as input to the hash... there is no hash-of-a-hash occurring (the effects of which, if any, are unknown to me).

Using hash(rnd || hash(message)) instead of hash(rnd || message) has no impact on security so long as hash is collision resistent, which is a safe assumption in this case. However, hash(rnd || hash(message)) has the major advantage that hash(message) is already known when it comes time for decryption. This means that one only needs to hash the message once (instead of twice) and that one only needs to store rnd (as opposed to hash(rnd || hash(message))).

@tasket
Copy link
Owner

tasket commented Apr 29, 2023

I'm currently trying out the libsodium method in Wyng. There's an opportunity for improving performance — see #159

My non-cryptographer understanding says you're probably right about the security of using existing BLAKE2 hashes with a stretched key. Although it would be better to have some published & reviewed literature to back it up, I'd like to consider this proposal an alternative possibility.

@DemiMarie
Copy link
Author

I'm currently trying out the libsodium method in Wyng. There's an opportunity for improving performance — see #159

My non-cryptographer understanding says you're probably right about the security of using existing BLAKE2 hashes with a stretched key. Although it would be better to have some published & reviewed literature to back it up, I'd like to consider this proposal an alternative possibility.

I recommend using the keyed hash for not only the nonce, but also for the key itself. This is even stronger than using the keyed hash for the nonce, as the actual amount of unique data fed to ChaCha20 is now 96 + 256 = 352 bits. The cryptographic justification for this is that XChaCha20 does something very similar internally: it uses 128 bits of the nonce and the 256-bit key to generate a temporary 256-bit key and uses this temporary key for the actual encryption. XChaCha20 uses HChaCha20 for the key derivation, but there is nothing special about HChaCha20 and any keyed pseudorandom function could be used. libsodium’s own key derivation function (crypto_kdf_derive_from_key()) is just a wrapper around Blake2b so using Blake2b as a KDF is safe.

@tasket tasket added help wanted Extra attention is needed research labels May 12, 2023
@tasket
Copy link
Owner

tasket commented May 12, 2023

As I mentioned in 159...

The manifest hashes are themselves encrypted, albeit under a separate key from the data; that could mean having to encrypt the metadata in a special way.

If I understand this correctly, a technical hurdle is created because we don't have a key to directly decrypt (meta)data. At the very least, the archive.ini root would need special treatment so we could get the hashes needed to start decrypting anything else.

That's why I prefer methods that focus on nonces and why Hk(rnd||pthash) appeals to me (though I would even add a derived, stretched key to fit BLAKE2b's key space for this specific purpose).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed research
Projects
None yet
Development

No branches or pull requests

3 participants