Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Excessive use of smart_str() #718

Open
jnm opened this issue May 25, 2021 · 0 comments
Open

Excessive use of smart_str() #718

jnm opened this issue May 25, 2021 · 0 comments

Comments

@jnm
Copy link
Member

jnm commented May 25, 2021

I realize that dealing with Python 2-to-3 unicode issues was painful, but we ought to know whether we're dealing with a bytes or a str based on the context and avoid using smart_str() as a cure-all. Instead, let's use encode('utf-8') or decode('utf-8')—or nothing at all—where possible.

Consider this example:

with open(path, 'rb') as f:
data = json.loads(smart_str(f.read()))
request = self.factory.post('/submission', data, format='json')
response = self.view(request)
self.assertEqual(response.status_code, 401)
request = self.factory.post('/submission', data, format='json')
auth = DigestAuth('bob', 'bobbob')
request.META.update(auth(request.META, response))
response = self.view(request)
rendered_response = response.render()
self.assertTrue('error' in rendered_response.data)
self.assertTrue(smart_str(rendered_response.data['error']).
startswith("b'Received empty submission"))

  1. We can avoid smart_str(f.read()) by opening the file with the default mode of r instead of rb (line 167)
  2. The content of a Django HttpResponse is always a bytes, but in this case (line 178), the data attribute of a Django REST Framework Response is "The unrendered, serialized data of the response." Let's look to see how this is generated, especially since we're ending up with "b'Received empty submission", which looks like str() or repr() was called on a bytes instead of decode()-ing it.
  3. The create() method of the XFormSubmissionApi viewset bypasses the serializer and goes directly to error_response() when something bad happens:
    def error_response(self, error, is_json_request, request):
    if not error:
    error_msg = _("Unable to create submission.")
    status_code = status.HTTP_400_BAD_REQUEST
    elif isinstance(error, str):
    error_msg = error
    status_code = status.HTTP_400_BAD_REQUEST
    elif not is_json_request:
    return error
    else:
    error_msg = xml_error_re.search(smart_str(error.content)).groups()[0]
    status_code = error.status_code
    return Response({'error': smart_str(error_msg)},
    headers=self.get_openrosa_headers(request),
    status=status_code)

    Here we have two more calls to smart_str().
    • It seems like error could be a variety of things, but if we're assuming it has a content attribute (line 214), then it's probably an instance of HttpResponse or its subclass—in this case, it is a OpenRosaResponseBadRequest. Since HttpResponse.content is always a bytes, we can unambiguously call decode('utf-8') on it instead of smart_str().
    • On line 217, we actually know that error_msg is either _("Unable to create submission."), an instance of str, or the first matching group from xml_error_re.search(). Since xml_error_re is a string pattern, the argument to xml_error_re.search() cannot be anything but a str, and everything it returns will be a str as well. Demonstration:
      >>> re.compile('fun').search(b'funny')
      Traceback (most recent call last):
        File "<stdin>", line 1, in <module>
      TypeError: cannot use a string pattern on a bytes-like object
      
      In all cases, error_msg is a str, and smart_str() is unnecessary.
  4. This still doesn't explain where the weird "b'Received empty submission" comes from. For that, we have to turn to OpenRosaResponse:
    class OpenRosaResponse(BaseOpenRosaResponse):
    status_code = 201
    def __init__(self, *args, **kwargs):
    super().__init__(*args, **kwargs)
    # wrap content around xml
    self.content = '''<?xml version='1.0' encoding='UTF-8' ?>
    <OpenRosaResponse xmlns="http://openrosa.org/http/response">
    <message nature="">%s</message>
    </OpenRosaResponse>''' % self.content

    This is quite naughty, because HttpResponse.content should always be a bytes. The superclass does its part to ensure that, but then we run the bytes that it gave us through string interpolation with %s, which "converts any Python object using str()". Demonstration:
    >>> '''wow %s''' % b'yeah'
    "wow b'yeah'"
    
    What's the best way to proceed here? We could self.content.decode('utf-8'), do the string interpolation, and then re-encode the whole result, but that's yucky. This seems like a job for plain ol' concatenation: after all, bytes are just "immutable sequences of single bytes". Something like this would be fine:
    self.content = (
        b"<?xml version='1.0' encoding='UTF-8' ?>\n"
        b'<OpenRosaResponse xmlns="http://openrosa.org/http/response">\n'
        b'        <message nature="">'
    ) + self.content + (
        b'</message>\n'
        b'</OpenRosaResponse>'
    )
jnm added a commit that referenced this issue May 25, 2021
…and prefer explicit encoding/decoding over `smart_str()`. See #718
jnm added a commit that referenced this issue May 25, 2021
…and prefer explicit encoding/decoding over `smart_str()`. See #718
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant