Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[API Proposal] Round-trip PackagePart.CompressionOption, add CompressionLevel.Fast #102127

Open
edwardneal opened this issue May 12, 2024 · 6 comments
Labels
area-System.IO.Compression untriaged New issue has not been triaged by the area owner
Milestone

Comments

@edwardneal
Copy link
Contributor

This issue straddles System.IO.Compression and System.IO.Packaging, and is a behavioural regression between .NET and the .NET Framework.

We can use System.IO.Packaging to create a new Package and add PackageParts to it, specifying a CompressionOption on these PackageParts. When we write the Package to a ZIP file via ZipArchive, the new PackagePart is turned into a ZipArchiveEntry and the PackagePart's CompressionOption is mapped to a CompressionLevel. ZipArchiveEntry then passes that through to create the correct deflation stream and to set bits 1 & 2 in the file header's "general purpose bit flags". This behaviour works well.

The regression here comes when we load a Package from an existing stream and start to inspect each PackagePart's CompressionOption. In .NET Framework, Package calls GetCompressionOptionFromZipFileInfo, which maps bits 1 & 2 to a CompressionOption. Similarly, in .NET Package calls GetCompressionOptionFromZipFileInfo. This always returns CompressionOption.Normal, because ZipArchiveEntry doesn't expose its CompressionLevel publicly.

When the issue's closed, I'd like to be able to persist a Package containing a PackagePart with a given CompressionOption, then be able to read that CompressionOption back upon reading it.

There are three parts to this issue:

  1. An API request: I'd like ZipArchiveEntry to provide a read-only CompressionLevel property, and for the CompressionLevel enumeration to include a new member named Fast;
  2. Confirmation on the mapping from System.IO.Compression's CompressionLevel to System.IO.Packaging's CompressionOption;
  3. Discussion on the proposed new CompressionLevel member, and one CompressionOption/CompressionLevel mapping which is ambiguous without it.

API request

This request is primarily for a read-only CompressionLevel property on ZipArchiveEntry. I've chosen not to make it read-write - PackagePart.CompressionOption is read-only, so it's not needed for this specific use case. It might also imply that a ZIP file would be flagged for decompression and recompression (or simply decompressed) by setting it: I'm not sure how ZipArchiveEntry would handle a situation where the entry's header specifies one compression level, but the entry data is compressed with another; there'd definitely be a problem if its compression level was changed to NoCompression!

I've also rolled in a new enumeration member on CompressionLevel to address an ambiguous CompressionOption/CompressionLevel mapping.

namespace System.IO.Compression;

public class ZipArchiveEntry
{
+   public CompressionLevel CompressionLevel => _compressionLevel;
}

public enum CompressionLevel
{
+   Fast = 4
}

CompressionLevel -> CompressionOption mapping

This is different between .NET and .NET Framework, largely because .NET Framework has an extra member in its DeflateOptionEnum which doesn't neatly map to the current .NET equivalent of CompressionLevel at the moment.

Currently, the mapping from CompressionOption to CompressionLevel when writing a PackagePart is:

Specified CompressionOption .NET CompressionLevel .NET Standard CompressionLevel .NET Framework DeflateOptionEnum
NotCompressed NoCompression NoCompression None
Normal Optimal Optimal Normal
Maximum SmallestSize Optimal Maximum
Fast Fastest Fastest Fast
SuperFast Fastest Fastest SuperFast

If the new CompressionLevel enumeration member is approved, I'd change CompressionOption.Fast to map to CompressionLevel.Fast.

.NET Standard has a different mapping for CompressionOption.Maximum to avoid introducing a breaking change for .NET Framework applications referencing System.IO.Packaging.

The mapping from CompressionLevel to CompressionOption when reading a Package needs to roughly correlate with the table above so that values roundtrip properly.

Read CompressionLevel Resultant CompressionOption
Optimal Normal
Fast (if approved) Fast
Fastest SuperFast
NoCompression NotCompressed
SmallestSize Maximum

To continue to avoid a breaking change for .NET Framework applications, .NET Standard would continue to return Optimal in all cases.

New CompressionLevel member, otherwise-ambiguous CompressionOption/CompressionLevel mapping

At present, creating a PackagePart with a CompressionOption of Fast or SuperFast will result in a ZipArchiveEntry with a CompressionLevel of Fastest, setting general purpose bits 1 & 2. This ambiguity means that somebody could create a PackagePart with a CompressionOption of Fast, and it would subsequently be read back with a CompressionOption of Fastest.

This ambiguity arises because the CompressionLevel enumeration doesn't have a member named Fast (or something similarly named, halfway between Optimal and Fastest.) This isn't ideal, so my API request asks for this new member.

Adding a new CompressionLevel also requires selecting the correct ZLib compression level, and while I'd guess that 4 would be a good compromise between Optimal (6) and Fastest (1), I don't have anything quantitative to back that. I'm also not sure what impact the pending switch of ZLib implementation would have on this.

cc @dotnet/area-system-io-compression @carlossanlop

@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label May 12, 2024
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-io-compression
See info in area-owners.md if you want to be subscribed.

@ericstj
Copy link
Member

ericstj commented May 21, 2024

Why does this matter if the value was completely ignored by the underlying implementation? Seems like a lot of effort just to flow an enum to pass a test. What's is the material side effect of not having this? What's broken?

@edwardneal
Copy link
Contributor Author

It's definitely a corner case, but it's not solely to pass a test. This is the API proposal needed to fix two bugs, both of which are regressions between .NET Framework and .NET:

  1. When creating a package part, .NET Framework maps the part's CompressionOption to a CompressionLevel according to the OPC specification. .NET Core does not, because it doesn't have an appropriate CompressionLevel enumeration value to use. This is why I've asked to add CompressionLevel.Fast.
  2. Although both .NET and .NET Framework support retrieving the package part's CompressionOption, only .NET Framework actually returns the real value. I'd describe this as a bug - .NET's PackagePart.CompressionOption returns a misleading value to clients. I've asked for a public ZipArchiveEntry.CompressionLevel property so that I can fix this.

@ericstj
Copy link
Member

ericstj commented May 22, 2024

only .NET Framework actually returns the real value

But it's not actually a real value. It's just a persisted value. The implementation never honored it.

@ericstj
Copy link
Member

ericstj commented May 22, 2024

cc @carlossanlop since he was helping with #98278

At the end of the day I'm not opposed to making things look more compatible - just making sure that we spend our energy in the places that matter most. @edwardneal if you feel this is important and a good use of time we can help you get a proposal reviewed.

@carlossanlop carlossanlop added this to the Future milestone May 22, 2024
@edwardneal
Copy link
Contributor Author

Thanks @ericstj. It's not preventing me from porting of anything from .NET Framework to .NET - although the constant value returned by PackagePart.CompressionOption is a bug, it's one which has been around since the original code was ported in 2015. I think it'll be slightly more noticeable now that the CompressionOption we specify when adding a PackagePart is mapped through to the correct bits in the ZIP entry.

If time's particularly short with .NET 9's release schedule then it can probably wait for .NET 10, or any "washup" API reviews. Given the choice of time from someone in the Packaging area, I'd personally prioritise #97898 over this - it fixes a bug which completely prevents .NET applications from reading valid XPS packages, and which seems to be blocking a real-world application.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-System.IO.Compression untriaged New issue has not been triaged by the area owner
Projects
None yet
Development

No branches or pull requests

3 participants