Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

strip unicode whitespace #4

Open
gaborbernat opened this issue Jul 18, 2018 · 5 comments
Open

strip unicode whitespace #4

gaborbernat opened this issue Jul 18, 2018 · 5 comments

Comments

@gaborbernat
Copy link

At the moment the parser uses the built-in string strip function (https://github.com/RonnyPfannschmidt/iniconfig/blob/master/iniconfig.py#L134). Nowadays is more and more common that people inadvertently introduce Unicode whitespaces in their configuration. The string strip function does not remove this. When we detect such we should at least warn (especially on the left-hand side), or fail. Failing to strip such characters is never the desired effect, as it effectively generates a new key that visually seems the same leaving the user in confusion of why his config does not work.

Note the py package uses this package vendored, and via that pytest/tox too (https://github.com/pytest-dev/py/blob/master/py/_vendored_packages/iniconfig.py).

Issues generated by this omission:

Detecting unicode spaces https://stackoverflow.com/questions/8921365/in-python-how-to-list-all-characters-matched-by-posix-extended-regex-space/37903375#37903375

@RonnyPfannschmidt
Copy link
Member

as is iniconfig is not unicode aware, so this one is a really tricky one to fix
its def not on my own roadmap

@gaborbernat
Copy link
Author

gaborbernat commented Jul 18, 2018

@RonnyPfannschmidt what about doing strip as (in case the content read from the file is of type unicode):

>>> from unicodedata import name
>>> import sys
>>> import re
>>> spaces = u''.join(re.findall(r'\s', u''.join(unichr(c) for c in xrange(sys.maxunicode+1)), re.UNICODE))
>>> 'envlist '.strip(spaces)
u'envlist'
>>> name(u'\xA0')
'NO-BREAK SPACE
>>> u'envlist\xA0'.strip(spaces)
u'envlist'
>>> 

@gaborbernat
Copy link
Author

An alternative solution would be to try to avoid such configs by either warning when such lines are detected.

@RonnyPfannschmidt
Copy link
Member

i am fine with any consistent solution - i just don't dont have the time/motivation to do the complete dance of ensuring its consistent

right now iniconfig is not unicode aware and works in terms of native strings and ascii whitespace

note that your example generates the spaces in a pretty expensive manner

Jehops pushed a commit to Jehops/freebsd-ports-legacy that referenced this issue May 18, 2019
This package supports Python 3.x support, so allow it accordingly. It is
required for an upcoming www/py-autobahn update (Python 2/3 compatible).

During QA, a UnicodeDecodeError was observed running tests under Python 3:

File "/usr/local/lib/python3.6/site-packages/py/_vendored_packages/iniconfig.py", line 82, in _parse
  for lineno, line in enumerate(line_iter):
File "/usr/local/lib/python3.6/encodings/ascii.py", line 26, in decode
  return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 165: ordinal not in range(128)

pytest uses the py package, which vendors the iniconfig package, which
isn't unicode aware [1][2][3]. Patch out unicode characters from setup.cfg
accordingly until it's resolved.

While investigating the cause of the above issue, a fix for setup.cfg's
encoding was identified, which removes the need to set the locale via
USE_LOCALE so remove it accordingly.

While I'm here:

  - Pet portlint, spurious space at end of line in pkg-descr
  - Add LICENSE_FILE/TEST_DEPENDS/test target/NO_ARCH

[1] pytest-dev/pytest#3799
[2] pytest-dev/iniconfig#5
[3] pytest-dev/iniconfig#4

portlint: OK (looks fine.)
porttest: OK (poudriere: 12amd64{py36,py27})
maketest: 215 passed, 1 skipped in 3.29 seconds (Python 2.7)
maketest: 209 passed, 7 skipped in 3.07 seconds (Python 3.6)

Approved by:	portmgr (blanket: ports/framework compliance)
MFH:		2019Q2


git-svn-id: svn+ssh://svn.freebsd.org/ports/head@501964 35697150-7ecd-e111-bb59-0022644237b5
uqs pushed a commit to freebsd/freebsd-ports that referenced this issue May 18, 2019
This package supports Python 3.x support, so allow it accordingly. It is
required for an upcoming www/py-autobahn update (Python 2/3 compatible).

During QA, a UnicodeDecodeError was observed running tests under Python 3:

File "/usr/local/lib/python3.6/site-packages/py/_vendored_packages/iniconfig.py", line 82, in _parse
  for lineno, line in enumerate(line_iter):
File "/usr/local/lib/python3.6/encodings/ascii.py", line 26, in decode
  return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 165: ordinal not in range(128)

pytest uses the py package, which vendors the iniconfig package, which
isn't unicode aware [1][2][3]. Patch out unicode characters from setup.cfg
accordingly until it's resolved.

While investigating the cause of the above issue, a fix for setup.cfg's
encoding was identified, which removes the need to set the locale via
USE_LOCALE so remove it accordingly.

While I'm here:

  - Pet portlint, spurious space at end of line in pkg-descr
  - Add LICENSE_FILE/TEST_DEPENDS/test target/NO_ARCH

[1] pytest-dev/pytest#3799
[2] pytest-dev/iniconfig#5
[3] pytest-dev/iniconfig#4

portlint: OK (looks fine.)
porttest: OK (poudriere: 12amd64{py36,py27})
maketest: 215 passed, 1 skipped in 3.29 seconds (Python 2.7)
maketest: 209 passed, 7 skipped in 3.07 seconds (Python 3.6)

Approved by:	portmgr (blanket: ports/framework compliance)
MFH:		2019Q2


git-svn-id: svn+ssh://svn.freebsd.org/ports/head@501964 35697150-7ecd-e111-bb59-0022644237b5
uqs pushed a commit to freebsd/freebsd-ports that referenced this issue May 18, 2019
This package supports Python 3.x support, so allow it accordingly. It is
required for an upcoming www/py-autobahn update (Python 2/3 compatible).

During QA, a UnicodeDecodeError was observed running tests under Python 3:

File "/usr/local/lib/python3.6/site-packages/py/_vendored_packages/iniconfig.py", line 82, in _parse
  for lineno, line in enumerate(line_iter):
File "/usr/local/lib/python3.6/encodings/ascii.py", line 26, in decode
  return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 165: ordinal not in range(128)

pytest uses the py package, which vendors the iniconfig package, which
isn't unicode aware [1][2][3]. Patch out unicode characters from setup.cfg
accordingly until it's resolved.

While investigating the cause of the above issue, a fix for setup.cfg's
encoding was identified, which removes the need to set the locale via
USE_LOCALE so remove it accordingly.

While I'm here:

  - Pet portlint, spurious space at end of line in pkg-descr
  - Add LICENSE_FILE/TEST_DEPENDS/test target/NO_ARCH

[1] pytest-dev/pytest#3799
[2] pytest-dev/iniconfig#5
[3] pytest-dev/iniconfig#4

portlint: OK (looks fine.)
porttest: OK (poudriere: 12amd64{py36,py27})
maketest: 215 passed, 1 skipped in 3.29 seconds (Python 2.7)
maketest: 209 passed, 7 skipped in 3.07 seconds (Python 3.6)

Approved by:	portmgr (blanket: ports/framework compliance)
MFH:		2019Q2
swills pushed a commit to swills/freebsd-ports that referenced this issue May 22, 2019
This package supports Python 3.x support, so allow it accordingly. It is
required for an upcoming www/py-autobahn update (Python 2/3 compatible).

During QA, a UnicodeDecodeError was observed running tests under Python 3:

File "/usr/local/lib/python3.6/site-packages/py/_vendored_packages/iniconfig.py", line 82, in _parse
  for lineno, line in enumerate(line_iter):
File "/usr/local/lib/python3.6/encodings/ascii.py", line 26, in decode
  return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 165: ordinal not in range(128)

pytest uses the py package, which vendors the iniconfig package, which
isn't unicode aware [1][2][3]. Patch out unicode characters from setup.cfg
accordingly until it's resolved.

While investigating the cause of the above issue, a fix for setup.cfg's
encoding was identified, which removes the need to set the locale via
USE_LOCALE so remove it accordingly.

While I'm here:

  - Pet portlint, spurious space at end of line in pkg-descr
  - Add LICENSE_FILE/TEST_DEPENDS/test target/NO_ARCH

[1] pytest-dev/pytest#3799
[2] pytest-dev/iniconfig#5
[3] pytest-dev/iniconfig#4

portlint: OK (looks fine.)
porttest: OK (poudriere: 12amd64{py36,py27})
maketest: 215 passed, 1 skipped in 3.29 seconds (Python 2.7)
maketest: 209 passed, 7 skipped in 3.07 seconds (Python 3.6)

Approved by:	portmgr (blanket: ports/framework compliance)
MFH:		2019Q2


git-svn-id: svn+ssh://svn.freebsd.org/ports/head@501964 35697150-7ecd-e111-bb59-0022644237b5
uqs pushed a commit to freebsd/freebsd-ports that referenced this issue May 24, 2019
This package supports Python 3.x support, so allow it accordingly. It is
required for an upcoming www/py-autobahn update (Python 2/3 compatible).

During QA, a UnicodeDecodeError was observed running tests under Python 3:

File "/usr/local/lib/python3.6/site-packages/py/_vendored_packages/iniconfig.py", line 82, in _parse
  for lineno, line in enumerate(line_iter):
File "/usr/local/lib/python3.6/encodings/ascii.py", line 26, in decode
  return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 165: ordinal not in range(128)

pytest uses the py package, which vendors the iniconfig package, which
isn't unicode aware [1][2][3]. Patch out unicode characters from setup.cfg
accordingly until it's resolved.

While investigating the cause of the above issue, a fix for setup.cfg's
encoding was identified, which removes the need to set the locale via
USE_LOCALE so remove it accordingly.

While I'm here:

  - Pet portlint, spurious space at end of line in pkg-descr
  - Add LICENSE_FILE/TEST_DEPENDS/test target/NO_ARCH

[1] pytest-dev/pytest#3799
[2] pytest-dev/iniconfig#5
[3] pytest-dev/iniconfig#4

portlint: OK (looks fine.)
porttest: OK (poudriere: 12amd64{py36,py27})
maketest: 215 passed, 1 skipped in 3.29 seconds (Python 2.7)
maketest: 209 passed, 7 skipped in 3.07 seconds (Python 3.6)

Approved by:	portmgr (blanket: ports/framework compliance)

Approved by:	ports-secteam (joneum, blanket: ports/framework compliance)
uqs pushed a commit to freebsd/freebsd-ports that referenced this issue Apr 1, 2021
This package supports Python 3.x support, so allow it accordingly. It is
required for an upcoming www/py-autobahn update (Python 2/3 compatible).

During QA, a UnicodeDecodeError was observed running tests under Python 3:

File "/usr/local/lib/python3.6/site-packages/py/_vendored_packages/iniconfig.py", line 82, in _parse
  for lineno, line in enumerate(line_iter):
File "/usr/local/lib/python3.6/encodings/ascii.py", line 26, in decode
  return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 165: ordinal not in range(128)

pytest uses the py package, which vendors the iniconfig package, which
isn't unicode aware [1][2][3]. Patch out unicode characters from setup.cfg
accordingly until it's resolved.

While investigating the cause of the above issue, a fix for setup.cfg's
encoding was identified, which removes the need to set the locale via
USE_LOCALE so remove it accordingly.

While I'm here:

  - Pet portlint, spurious space at end of line in pkg-descr
  - Add LICENSE_FILE/TEST_DEPENDS/test target/NO_ARCH

[1] pytest-dev/pytest#3799
[2] pytest-dev/iniconfig#5
[3] pytest-dev/iniconfig#4

portlint: OK (looks fine.)
porttest: OK (poudriere: 12amd64{py36,py27})
maketest: 215 passed, 1 skipped in 3.29 seconds (Python 2.7)
maketest: 209 passed, 7 skipped in 3.07 seconds (Python 3.6)

Approved by:	portmgr (blanket: ports/framework compliance)

Approved by:	ports-secteam (joneum, blanket: ports/framework compliance)
@RonnyPfannschmidt
Copy link
Member

@gaborbernat with the new release unicode support is in but unicode whitespace is not yet considered

svmhdvn pushed a commit to svmhdvn/freebsd-ports that referenced this issue Jan 10, 2024
This package supports Python 3.x support, so allow it accordingly. It is
required for an upcoming www/py-autobahn update (Python 2/3 compatible).

During QA, a UnicodeDecodeError was observed running tests under Python 3:

File "/usr/local/lib/python3.6/site-packages/py/_vendored_packages/iniconfig.py", line 82, in _parse
  for lineno, line in enumerate(line_iter):
File "/usr/local/lib/python3.6/encodings/ascii.py", line 26, in decode
  return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 165: ordinal not in range(128)

pytest uses the py package, which vendors the iniconfig package, which
isn't unicode aware [1][2][3]. Patch out unicode characters from setup.cfg
accordingly until it's resolved.

While investigating the cause of the above issue, a fix for setup.cfg's
encoding was identified, which removes the need to set the locale via
USE_LOCALE so remove it accordingly.

While I'm here:

  - Pet portlint, spurious space at end of line in pkg-descr
  - Add LICENSE_FILE/TEST_DEPENDS/test target/NO_ARCH

[1] pytest-dev/pytest#3799
[2] pytest-dev/iniconfig#5
[3] pytest-dev/iniconfig#4

portlint: OK (looks fine.)
porttest: OK (poudriere: 12amd64{py36,py27})
maketest: 215 passed, 1 skipped in 3.29 seconds (Python 2.7)
maketest: 209 passed, 7 skipped in 3.07 seconds (Python 3.6)

Approved by:	portmgr (blanket: ports/framework compliance)
MFH:		2019Q2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants