Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed invalid extraction of image dimensions for HEIF files w/ multiple images #445

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

rjean-gilles
Copy link

As described in issue #373, when an HEIF file contains multiple images, metadata-extractor will just return the size of the first encountered image, which is often not what we want.
In particular for HEIF pictures taken with an iPhone, the library will report a size of 512x512 no matter the actual size. The ideal solution would be (as described in #373) to have the library report the sizes for each included image, but this would certainly require a major overhaul.
Short of doing that, we would at least like to have the correct size, that is the one of the "main image". If we just assume that the main image is the biggest one, then it is easy to do, and this is exactly what this pull request is about.
This might not cover all the actual use cases, but it is already much better than the current behaviour. In particular, I checked that it works correctly with my sample HEIF pictures taken with a recent iPhone.
This PR also adds a corresponding test, along with some test files taken from http://nokiatech.github.io/heif/examples.html.
The tests testDerivedImageGrid and testDerivedImageOverlay failed without the change, and now succeed (while not taken from an iPhone, they exhibit the same image size detection issue).

@payton
Copy link
Collaborator

payton commented Nov 8, 2019

I won't have a chance to check until next week as I am heading out of town, but great work nonetheless! Super exciting to see HEIF getting some support.

I think there may be an ownership issue with including the Nokia photos. I would request that those be removed and we only add photos taken by contributors (and explicitly given permission by the contributor). Unless of course Nokia gives free use of those photos to everyone. I believe @drewnoakes and I had a conversation about this at some point and concluded not to use the Nokia photos.

@rjean-gilles
Copy link
Author

I was actually kind of expecting your concerns regarding the nokia test images. In fact I hesitated including them, but the thing is it's hard to come by with a better set of test images, as they cover many different use cases. I would not expect Nokia to take any offence as they are test samples specifically used to demonstrate the HEIF file format, getting an explicit waiver is another story so you're probably right to not want to take any chance. Unfortunately removing them means a weaker coverage, as I can personnaly only produce test pictures taken from an iPhone. I guess that's better than nothing. I'll contact them and ask for permission, you never know. If I get no answer, I'll remove them and just put two iPhone HEIF files.

@drewnoakes
Copy link
Owner

@rjean-gilles thanks for this PR. As @payton mentions, we need to be careful about image permissioning. Further, we do not want to be adding image data to this repo, as it will quickly get out of hand and make the repo enormous. There are a few small data files used in some unit tests from the early days of the project, but nowadays we mostly use the separate image library for regression tests. In practice it's more thorough than unit tests and catches issues that would otherwise be missed. So would you be able to remove those test files and any unit tests that rely on them from this PR please?

// While it is not guaranteed to be correct, this is a pretty safe bet and seems to be validated
// by actual images found in the wild (additional images in practice are often thumbnails or overlays,
// that are going to be smaller that the main image).
// Note that here we also assume that if width is bigger than the preexisting value, so is height.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually looking at this further I think there is a different problem here. We have other image formats that can contain multiple images (e.g. GIF, JPEG, TIFF). In such cases, we produce a directory per image. These kinds of assumptions don't tend to work out well for all users. Instead it's best to reflect the actual data in the directories we produce. Fixing this is probably a larger change, but it would be correct for all users.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I completely agree that there should be multiple directories, but I didn't say anything because I assumed that there was some characteristic of the HEIF files that meant that this didn't make sense. If this is a simple "multi image container" situation, multiple directories should be generated. Developers would then make their own logic for picking which one they are interested in, if only one.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes indeed, as I hinted in my summary the ideal behaviour would be for the library to report all the included images obviously, which (as I said) will probably require a major overhaul if I'm not mistaken. Which means it might not come soon. In the meantime, the library as it is basically pretends that there is only one image in the HEIC file when there might actually be more. The only change that this PR does is to acknowledge that limitation and make it somewhat useful by at least making sure that we report information about the "main" image.

So we're just going from a situation where we just cannot use the library at all (as is) if we ever want to handle HEIC files coming from an iPhone (and let's be frank, that's the main reason today to bother supporting it), to a behaviour that while imperfect (because the lib pretends that there is only one image) will report the correct dimensions for all those iPhone HEIC photos. This is basically a worthwile compromise as I see it (and a temporary one at that, until the correct solution is implemented).

As for the test images, I have tried to contact Nokia about the permission to use their sample images and got no reply. That was more than a week ago, so I was about to replace them with some iPhone images that I would take myself, but sure, I can remove all that and keep the tests to myself, no problem.

Would you then consider approving the PR?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to validate the assumption that the correct change here is a major overhaul.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you.
Looking forward to your conclusion.

@rjean-gilles
Copy link
Author

Hello, and happy new year.
Any news on whether the correct change here really is a major overhaul?
If it is, do you plan to merge this PR?
And if it is not, do you have any estimate on when the correct solution might be released?

@drewnoakes
Copy link
Owner

@rjean-gilles I haven't had a chance to look at this in a lot of detail but I'm inclined to hold off on merging this in favour of the 'correct' approach, as this workaround only works for dimensions, meaning the returned directory includes a mixture of metadata from different images which is bound to result in further error reports.

If someone wants to pick up the task of returning a directory-per-image it'd be appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants