Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

imread on S3 stream fails with PIL.UnidentifiedImageError #1022

Open
smidm opened this issue Jul 24, 2023 · 2 comments
Open

imread on S3 stream fails with PIL.UnidentifiedImageError #1022

smidm opened this issue Jul 24, 2023 · 2 comments

Comments

@smidm
Copy link

smidm commented Jul 24, 2023

Opening an image from S3 stream fails:

import boto3
import imageio.v3 as iio

s3 = boto3.resource('s3')
file_stream = s3.Bucket( "somebucket").Object("sample.jpg").get()['Body']
iio.imread(file_stream)
Traceback (most recent call last):
  File "/home/matej/local/conda/envs/pose_service/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 3505, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-14-221e4ecde8ec>", line 3, in <module>
    iio.imread(file_stream)
  File "/home/matej/local/conda/envs/pose_service/lib/python3.9/site-packages/imageio/v3.py", line 53, in imread
    with imopen(uri, "r", **plugin_kwargs) as img_file:
  File "/home/matej/local/conda/envs/pose_service/lib/python3.9/site-packages/imageio/core/imopen.py", line 237, in imopen
    plugin_instance = config.plugin_class(request, **kwargs)
  File "/home/matej/local/conda/envs/pose_service/lib/python3.9/site-packages/imageio/plugins/pillow.py", line 95, in __init__
    self._image = Image.open(self._request.get_file())
  File "/home/matej/local/conda/envs/pose_service/lib/python3.9/site-packages/PIL/Image.py", line 3186, in open
    :exception FileNotFoundError: If the file cannot be found.
PIL.UnidentifiedImageError: cannot identify image file <_io.BytesIO object at 0x7f7cd5f31d60>

The PIL open works as expected:

img = PIL.Image.open(file_stream)

The reason is that the stream botocore.response.StreamingBody is not seekable and imageio pillow plugin opens the file once for identification and then again for actual reading. On the second open pillow tries seek(0), silently fails, reads from the file stream that results to an empty BytesIO buffer and later fails.

The pillow documentation states:

The file object must implement file.read, file.seek, and file.tell methods, and be opened in binary mode. The file object will also seek to zero before reading.

It works without seeking (at least with JPEGs), but only once.

@smidm
Copy link
Author

smidm commented Jul 26, 2023

A related problem with a video in an S3 stream in immeta with pyav plugin:

    iio.immeta(                                                                                                                                                 
  File ".../lib/python3.8/site-packages/imageio/v3.py", line 254, in immeta                                                                  
    metadata = img_file.metadata(**call_kwargs)                                                                                                                          
  File ".../lib/python3.8/site-packages/imageio/plugins/pyav.py", line 743, in metadata                                                      
    "video_format": self._video_stream.codec_context.pix_fmt,                                                                                                            
  File "av/video/codeccontext.pyx", line 103, in av.video.codeccontext.VideoCodecContext.pix_fmt.__get__                                                                 
AttributeError: 'NoneType' object has no attribute 'name'        

@smidm
Copy link
Author

smidm commented Jul 26, 2023

A workaround is to save the S3 file to a temporary file. These are stored in modern linux systems in /tmp on tmpfs which is stored in memory. There shouldn't be much overhead in this case.

import tempfile
import boto3

s3 = boto3.client('s3')
with tempfile.TemporaryFile(mode='w+b') as f:
   s3.download_fileobj(s3_bucket, str(s3_key), f)
   f.seek(0)
   img = iio.imread(f)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants