These are chat archives for cherrypy/cherrypy

31st
Jul 2017
Jason R. Coombs
@jaraco
Jul 31 2017 16:32
@webknjaz You were asking privately about encodings in cherrypy. Better to have non-private discussions in open channels.
You suggested we should move to UTF-8 by default.
As that’s the standard in stdlib and other places.
And the instinct there is right. The concern I have is that for some things, the RFCs specify ISO-8859-1 as the default or presumed encoding.
Sviatoslav Sydorenko
@webknjaz
Jul 31 2017 16:33
sure, I just wanted that you'd get email notification about the message. I often don't get notified about messages in public channels
I've done some testing while working on https://github.com/cherrypy/cheroot/pull/39/files
Jason R. Coombs
@jaraco
Jul 31 2017 16:34
Oh. I don’t get e-mail notifications about Gitter. I just have to notice the mention in my taskbar.
Sviatoslav Sydorenko
@webknjaz
Jul 31 2017 16:34
I see
Jason R. Coombs
@jaraco
Jul 31 2017 16:34
And that’s whether the mention is here or privately.
Sviatoslav Sydorenko
@webknjaz
Jul 31 2017 16:35
alright, got it
Jason R. Coombs
@jaraco
Jul 31 2017 16:36
But as to the issue at hand, I suspect the code conflates encoding in several domains (URLs, file system, transfer encodings, etc)… and probably over-conservatively used ISO-8859-1 as a default, following the cue of the RFCs.
So I don’t think it’ll be easy to simply switch default encodings throughout without creating a massive disruption and incompatibility.
But the way you can tell is, can you describe in a specific, actionable way what steps are required to maintain compatibility.
(for a user upgrading to this new version).
If you can do that, and we’re not violating any RFCs or broadly-accepted conventions, then I’m enthusiastically +1.
Sviatoslav Sydorenko
@webknjaz
Jul 31 2017 16:39
It's just a guess, that might help us in future.
I've got a specific bug now.

It turned out that after adding 'привіт' (urlencoded, of course) to the list of test URIs at https://github.com/cherrypy/cheroot/pull/39/files#diff-a64ed79ed6fea3f8134ace5167a81c9fR46 it returns 404. I've added the corresponding method, inspected WSGI app in test.helper and it turned out that PATH_INFO WSGI env var is populated incorrectly (encoded with utf-8, decoded using latin1).

so simply changing the decoding line to bton(req.path, 'utf-8') fixed everything

I just didn't want it to be a hack applied in one place, making it hard to maintain

Jason R. Coombs
@jaraco
Jul 31 2017 16:41
hmm.
Well, w.r.t. WSGI, there are several “standards”, some of which will work well with Unicode and others which may not, so it depends on which Gateway is in use (IIUC).
Sviatoslav Sydorenko
@webknjaz
Jul 31 2017 16:45
It's in Gateway 1.0
And yes, it follows the rules: PEP 3333 states that values must be native strings (so bton call is correct, but the encoding used by default isn't)
Jason R. Coombs
@jaraco
Jul 31 2017 16:49
It sounds like you’re on the right track. I’d just be hesitant to change the default encoding throughout without understanding and communicating the implications. Therefore, your best bet is to focus on this particular case, (simply changing the one decoding line). If you have more time, you could do the same, call by call inspecting bton (and similar) calls and adjusting them to use UTF-8 until that’s the default throughout. I don’t think it’s safe to do it any other way.
Sviatoslav Sydorenko
@webknjaz
Jul 31 2017 17:00
okay, I'll stick to this one call then