Excessive use of `smart_str()` #718

jnm · 2021-05-25T05:11:36Z

I realize that dealing with Python 2-to-3 unicode issues was painful, but we ought to know whether we're dealing with a bytes or a str based on the context and avoid using smart_str() as a cure-all. Instead, let's use encode('utf-8') or decode('utf-8')—or nothing at all—where possible.

Consider this example:

kobocat/onadata/apps/api/tests/viewsets/test_xform_submission_api.py

Lines 166 to 179 in 67cdfa5

    
           with open(path, 'rb') as f: 
        
               data = json.loads(smart_str(f.read())) 
        
               request = self.factory.post('/submission', data, format='json') 
        
               response = self.view(request) 
        
               self.assertEqual(response.status_code, 401) 
        
               request = self.factory.post('/submission', data, format='json') 
        
               auth = DigestAuth('bob', 'bobbob') 
        
               request.META.update(auth(request.META, response)) 
        
               response = self.view(request) 
        
               rendered_response = response.render() 
        
               self.assertTrue('error' in rendered_response.data) 
        
               self.assertTrue(smart_str(rendered_response.data['error']). 
        
                               startswith("b'Received empty submission"))

We can avoid smart_str(f.read()) by opening the file with the default mode of r instead of rb (line 167)
The content of a Django HttpResponse is always a bytes, but in this case (line 178), the data attribute of a Django REST Framework Response is "The unrendered, serialized data of the response." Let's look to see how this is generated, especially since we're ending up with "b'Received empty submission", which looks like str() or repr() was called on a bytes instead of decode()-ing it.

The create() method of the XFormSubmissionApi viewset bypasses the serializer and goes directly to error_response() when something bad happens:

kobocat/onadata/apps/api/viewsets/xform_submission_api.py

Lines 204 to 219 in 67cdfa5

    
           def error_response(self, error, is_json_request, request): 
        
               if not error: 
        
                   error_msg = _("Unable to create submission.") 
        
                   status_code = status.HTTP_400_BAD_REQUEST 
        
               elif isinstance(error, str): 
        
                   error_msg = error 
        
                   status_code = status.HTTP_400_BAD_REQUEST 
        
               elif not is_json_request: 
        
                   return error 
        
               else: 
        
                   error_msg = xml_error_re.search(smart_str(error.content)).groups()[0] 
        
                   status_code = error.status_code 
        
               return Response({'error': smart_str(error_msg)}, 
        
                               headers=self.get_openrosa_headers(request), 
        
                               status=status_code)

Here we have two more calls to smart_str().

It seems like error could be a variety of things, but if we're assuming it has a content attribute (line 214), then it's probably an instance of HttpResponse or its subclass—in this case, it is a OpenRosaResponseBadRequest. Since HttpResponse.content is always a bytes, we can unambiguously call decode('utf-8') on it instead of smart_str().
On line 217, we actually know that error_msg is either _("Unable to create submission."), an instance of str, or the first matching group from xml_error_re.search(). Since xml_error_re is a string pattern, the argument to xml_error_re.search() cannot be anything but a str, and everything it returns will be a str as well. Demonstration:
```
>>> re.compile('fun').search(b'funny')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: cannot use a string pattern on a bytes-like object
```
In all cases, error_msg is a str, and smart_str() is unnecessary.

This still doesn't explain where the weird "b'Received empty submission" comes from. For that, we have to turn to OpenRosaResponse:

kobocat/onadata/libs/utils/logger_tools.py

Lines 555 to 564 in 67cdfa5

    
           class OpenRosaResponse(BaseOpenRosaResponse): 
        
               status_code = 201 
        
               def __init__(self, *args, **kwargs): 
        
                   super().__init__(*args, **kwargs) 
        
                   # wrap content around xml 
        
                   self.content = '''<?xml version='1.0' encoding='UTF-8' ?> 
        
           <OpenRosaResponse xmlns="http://openrosa.org/http/response"> 
        
                   <message nature="">%s</message> 
        
           </OpenRosaResponse>''' % self.content

This is quite naughty, because HttpResponse.content should always be a bytes. The superclass does its part to ensure that, but then we run the bytes that it gave us through string interpolation with %s, which "converts any Python object using str()". Demonstration:

>>> '''wow %s''' % b'yeah'
"wow b'yeah'"

What's the best way to proceed here? We could self.content.decode('utf-8'), do the string interpolation, and then re-encode the whole result, but that's yucky. This seems like a job for plain ol' concatenation: after all, bytes are just "immutable sequences of single bytes". Something like this would be fine:

self.content = (
    b"<?xml version='1.0' encoding='UTF-8' ?>\n"
    b'<OpenRosaResponse xmlns="http://openrosa.org/http/response">\n'
    b'        <message nature="">'
) + self.content + (
    b'</message>\n'
    b'</OpenRosaResponse>'
)

The text was updated successfully, but these errors were encountered:

…and prefer explicit encoding/decoding over `smart_str()`. See #718

jnm added a commit that referenced this issue May 25, 2021

Resolve encoding-related errors

77ab828

…and prefer explicit encoding/decoding over `smart_str()`. See #718

jnm added a commit that referenced this issue May 25, 2021

Resolve encoding-related errors

95dc2cb

…and prefer explicit encoding/decoding over `smart_str()`. See #718

jnm mentioned this issue May 25, 2021

Apply pyxform 1.5.1 upgrade to beta branch #719

Merged

jnm mentioned this issue Jun 15, 2021

Incorporate previously reviewed changes (for 2.021.24) #734

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Excessive use of `smart_str()` #718

Excessive use of `smart_str()` #718

jnm commented May 25, 2021

Excessive use of smart_str() #718

Excessive use of smart_str() #718

Comments

jnm commented May 25, 2021

Excessive use of `smart_str()` #718

Excessive use of `smart_str()` #718