On 15 December 2016 at 22:39, Toshio Kuratomi <a.badger(a)gmail.com> wrote:
* I'm not 100% certain that LC_CTYPE is the best thing to check.
People will set LC_CTYPE in conjunction with LC_COLLATE to C to get a
predictable sort order.(CTYPE is needed because bytes can be
interpreted as different characters and those differences can affect
sort order.). Changing this will mean that their command line sort
order (ls, sort, etc) could differ from python's sort order. I
haven't thought this fully through or a better way to check for our
actual meaning, though, and perhaps python already uses LC_CTYPE in
ways that would differ from other unix tools?
Yep, LC_CTYPE is the source of all the pain, as it's what controls the
encoding used for everything that CPython needs to decode before it
gets its own codec machinery bootstrapped. Victor Stinner made a
couple of attempts at overriding it later in the interpreter
bootstrapping process (e.g. based on environment variables) after the
codec system was fully up and running, and the problem you end up with
1. Within CPython it's easy to lose track of how you decoded system
provided text like sys.argv, sys.warnoptions, sys._xoptions,
sys.executable, os.env, etc, so "fixing" incorrectly decoded values
later is fragile
2. Even if you *do* get all the details right within CPython, you may
be in trouble again as soon as you call out to a third party C/C++
library, especially GUI toolkits like Tcl/Tk, Qt or Gtk that have a
lot of locale dependent behaviour
His conclusion was that letting the locale-as-seen-by-CPython diverge
from that seen by the rest of the process simply doesn't work, and I
don't have any reason to second guess that conclusion.
However, I'll also note that any tooling written in Rust or Go already
makes the "UTF-8 everywhere" assumption at the level of the language
design, so the proposed change would just move tools written in Python
3 into the same category as those written in those languages (unless
you set PYTHONALLOWCLOCALE to request the old behaviour).
* Thinking about whether this belongs in the library or the
interpreter some more I'm seeing some hefty cons in both directions.
already noted that the con for doing it in the interpreter is that we
get out of sync with other things linking to libpython, therefore
making debugging harder.
Note that CPython already offers a range of "preconfiguration" APIs
that allow applications embedding the runtime to override otherwise
environment based configuration settings. In particular
was added specifically so Blender could just tell the Python 3 runtime
"configure the standard streams like *this*", rather than having to
persuade CPython to guess the right answer.
So the fact embedded runtimes can give you different results from what
you get at the command prompt isn't a *new* problem.
[Copying-and-pasting some of your comments from the other subthread to
consolidate the two discussions]
I'd almost say that internalizing the click behviour could be
correct design here. Have the library check that it has a locale with
non-ascii capabilities and fail if it doesn't would be helpful. That
would quickly point to differences in behaviours running under a
mod_wsgi vs /usr/bin/python, for instance, prompting the user to fix
the mod_wsgi deployment in advance.
While I don't like the idea of locale *coercion* inside the library,
I'd be fine with emitting a proper Python level warning inside
Py_Initialize after we get the warnings machinery up and running
OTOH, users don't run into the
problem all the time (it depends on the data being processed and how
it is handled) so it seems heavy handed to do it this way
I think erroring out would be unduly harsh, but a warning seems
reasonable given the availability of C.UTF-8.
by the same argument I'd have to say that click is doing it wrong to
force users to address ascii-only locales...)
click is younger than Python 3, so Armin did make some initial
attempts to get it working in the C locale on both 2 & 3. However, he
eventually gave the latter up as unsupportable and the error makes it
clear that "I don't need to support ASCII based locales on Python 3"
is a key constraint in deciding whether or not to adopt click.
Nick Coghlan | ncoghlan(a)gmail.com | Brisbane, Australia