On Thu, 2007-12-20 at 04:22 +0800, Abel Cheung wrote:
Hi,
Hi,
My reply is followed below, inline...
So is mine.
On Dec 17, 2007 7:22 AM, Behdad Esfahbod behdad@behdad.org wrote: [..........tons of quasi-maths ...........]
Secondly, you said that "contextual font selection" is a "cool" feature, I am wondering what languages are beneficial from this feature? (I believe there are, but just want to know).
Pretty much every non-Latin script. In some situations even the Latin script.
Take the Unicode character U+002E FULL STOP, aka ASCII period. It is used in more than just Latin, in Arabic for example, in Hebrew, possibly in Indic and many other scripts. If it was not grouped with neighboring characters for font selection purposes all those people would have got their Arabic/Hebrew/... text assigned an Arabic/Hebrew/... font while the periods in at the end of sentences assigned a different (default Latin for example) font.
The same happens for Latin under a document tagged as non-Latin. It's not a luxury thing. It's just how things are supposed to work.
That means, font change depending on context is actually preferrred in some fonts or some langauges, is it? If that's true, then this would be a per-language preference, some want it, some don't.
So does pango support toggling this behavior yet? (I guess not?)
What do you exactly mean by "this behavior"? Which behavior? Show me the source code line. I'm getting tired of all the hand waving.
The main font issue though, is that Chinese (Simplified, Traditional), Korean, and Japanese share some Unicode code points, but they require slightly different renderings. Now if you don't tell Pango which version is preferred, how can it know which font to choose? It explicitly doesn't prefer any one over the others to avoid cultural problems.
The symptoms of this problem are "multiple fonts used in the same line". Solution is: Either run under a CJK locale, or give hints to Pango about your preferred CJK locale using the env var PANGO_LANGUAGE.
Note that theoretically Pango can do text analysis to come up with a best guess, but doing that would then introduce another bug with symptoms "changes font when typing a few characters on the same line".
Let me set the record straight here. Most people seeing this problem is not exactly complaining about the font changing, but about the font changing TO SOME BAD LATIN GLYPH THEY DON'T LIKE. It is understood that font changing is almost not avoidable, since typing just a few characters may not provide enough information on what kind of font should be picked, and typing more gives more info. So far it is determined per sentence, or per what?
Believe me, I know that. And I understand it if you don't WRITE IN CAPS too. Does it help if I say THEN GO REMOVE THE CRAPPY FONT?
[...]
Sadly this way absolutely won't satisfy everybody -- one party only. And in particular, the font picked is determined per glyph, causing a sentence to be intermixed by multiple CJK fonts as described.
This is totally wrong. Pango first tags each piece of text with a language, then asks fontconfig to sort fonts for that language, then uses the sorted list to assign font to each character. That is, if you mark your text zh_CN (by either running under that locale, or setting PANGO_LANGUAGE to that, or otherwise marking it), and have a suitable font for that language and if you have crappy fonts for it, have fontconfig configured to prefer the good one, then Pango chooses the right font. Now all the "bugs" you show me are in all the steps mentioned except for what Pango is doing.
What if the font determination is not chopped glyph by glyph, but also determined heuristically with context?
Pango already does that. That's exactly what you call "contextual" something above and condemn.
If my guess is correct this would work most of the cases, even among language variants (think zh_CN and zh_TW).
No. You need to go back and read and understand my "tons of quasi-maths".
Another symptom, "digits change font after typing character" is in fact a very cool Pango feature, just badmouthed by the above problem. Fix the problem.
When a solution is not universal enough to be accepted by everybody, and caused more trouble then its worth for specific people, it would be badmouthed no matter what. Or not? I don't know the rule here.
You officially don't know what you are talking about.
behdad
Abel
As you see from the bug lists, this problem has existed for many years, and I am pretty sure that it will come back again and again, as long as the expected rendering is not achieved. If the current pango formatting logic is not sufficient to handle the CJK preferences as said above, I think to refine the logic to take it into consideration is better than stick with a fixed but incomplete logic.
I consider patches improving Pango's font selection algorithm, but none that I've seen so far had been an improvement (from my point of view). If it has words like CJK or "special case", I'm most probably not interested. Of the bugs you listed, only the one I opened myself is valid IMO. The rest is just left open because no matter how many times I close them, they will be reopened... Oh well.
please let me know your thoughts and reasoning on whether this is feasible or not, if yes, where to get start.
Does the above make sense? I understand that it's easier to apply a two line patch to Pango instead of doing what of the things I listed above, but that just doesn't fit in the design, and it introduces other problems you don't see right now.
thank you for paying attention to this issue.
Qianqian
Regards,
behdad
=============================================================== Bug 321113 - Wrong glyph subsituation algorithm for digital characters and punctuations http://bugzilla.gnome.org/show_bug.cgi?id=321113
Bug 345072 - changes font when typing different scripts on the same line http://bugzilla.gnome.org/show_bug.cgi?id=345072
Bug 345386 - Language and direction propagation in and between PangoLayouts http://bugzilla.gnome.org/show_bug.cgi?id=345386 (opened by yourself) https://bugzilla.redhat.com/bugzilla/attachment.cgi?id=103679
Bug 481210 - [All lang] [firefox] - Face of the number is changing when enter number + Char, in any Locale http://bugzilla.gnome.org/show_bug.cgi?id=481210
Bug 481188 - ascii text space too narrow for Chinese encodings http://bugzilla.gnome.org/show_bug.cgi?id=481188
Bugzilla Bug 129541: changes font when typing different scripts on the same line https://bugzilla.redhat.com/show_bug.cgi?id=129541
Bugzilla Bug 131218: [RHEL4] Characters get truncated in new pango https://bugzilla.redhat.com/show_bug.cgi?id=131218
Bugzilla Bug 149991: [CJK pango] digits and punctuation in textbox give bad eol rendering and cursor placement https://bugzilla.redhat.com/show_bug.cgi?id=149991 (filed by Jens Petersen)
https://bugzilla.redhat.com/show_bug.cgi?id=220885 (broken link)
Bugzilla Bug 228804: [All lang] [firefox] - Face of the number is changing when enter number + Char, in any Locale https://bugzilla.redhat.com/show_bug.cgi?id=228804
Bugzilla Bug 221361: [pango] ascii text space and punctuation is narrow for CJK https://bugzilla.redhat.com/show_bug.cgi?id=221361
Bug 379125 - chinese punctuations after english letters are wrongly displayed https://bugzilla.mozilla.org/show_bug.cgi?id=379125 https://bugzilla.mozilla.org/attachment.cgi?id=263185 ===============================================================
-- behdad http://behdad.org/
...very few phenomena can pull someone out of Deep Hack Mode, with two noted exceptions: being struck by lightning, or worse, your *computer* being struck by lightning. -- Matt Welsh
gtk-i18n-list mailing list gtk-i18n-list@gnome.org http://mail.gnome.org/mailman/listinfo/gtk-i18n-list
hi Badhdad
I don't think the tone in your reply going to be helpful in any aspect toward a solution of this problem.
I hope you understand that people raise these issues for the goods of pango. They want to make it more powerful and logical under all possible circumstances. Second, there must be a reason for this issue being raised again and again in the past many years. I think insufficient explanations and poor guidance for users toward a good solution play roles here (I am sorry to say that your "proof" in the last email still did not help because it was not what I was asking for).
As reading the replies in the past few days, I came to realize the key of problem is to set up a "correct" fall-back path of the untagged (or COMMON) text. Obviously, you are reluctant to explicitly tag them as LATIN in pango. You may be right if differentiating COMMON with LATIN is practically necessary (I mean "practically", not semantically as in Unicode standard). You have your rationales here.
Unfortunately, the current fall-back mechanism eventually assign the current locale info to these untagged text. And it turns out that for some users (if not all), particularly for CJK users (where the practical differences between Latin/Common are not significant), it created unpleasant formating results due to the mixing of fonts.
So, it seems obvious that additional info is needed to assist the fall-back of these untagged text to the preferred settings. This info can be introduced by the patched fontconfig, using block preference font list; or using the current keyboard layout as suggested by Sergey and Chris. Maybe a third way is to create a LC variable, say LC_COMMON, independent of LC_ALL/LANG, taking care of the untagged text formating. I actually felt that this is probably more suitable than the other two approaches. Because this is a locale-based preference, not font or keyboard preferences (here is just my first thought on this, I may be wrong).
In any case, I "think" I understand your argument, although there are still details needs to be verified. But I think it will be useful if we focus on clarifying a solution rather than arguing who is right and who is wrong.
Qianqian
Behdad Esfahbod wrote:
On Thu, 2007-12-20 at 04:22 +0800, Abel Cheung wrote:
Hi,
Hi,
My reply is followed below, inline...
So is mine.
On Dec 17, 2007 7:22 AM, Behdad Esfahbod behdad@behdad.org wrote: [..........tons of quasi-maths ...........]
Secondly, you said that "contextual font selection" is a "cool" feature, I am wondering what languages are beneficial from this feature? (I believe there are, but just want to know).
Pretty much every non-Latin script. In some situations even the Latin script.
Take the Unicode character U+002E FULL STOP, aka ASCII period. It is used in more than just Latin, in Arabic for example, in Hebrew, possibly in Indic and many other scripts. If it was not grouped with neighboring characters for font selection purposes all those people would have got their Arabic/Hebrew/... text assigned an Arabic/Hebrew/... font while the periods in at the end of sentences assigned a different (default Latin for example) font.
The same happens for Latin under a document tagged as non-Latin. It's not a luxury thing. It's just how things are supposed to work.
That means, font change depending on context is actually preferrred in some fonts or some langauges, is it? If that's true, then this would be a per-language preference, some want it, some don't.
So does pango support toggling this behavior yet? (I guess not?)
What do you exactly mean by "this behavior"? Which behavior? Show me the source code line. I'm getting tired of all the hand waving.
The main font issue though, is that Chinese (Simplified, Traditional), Korean, and Japanese share some Unicode code points, but they require slightly different renderings. Now if you don't tell Pango which version is preferred, how can it know which font to choose? It explicitly doesn't prefer any one over the others to avoid cultural problems.
The symptoms of this problem are "multiple fonts used in the same line". Solution is: Either run under a CJK locale, or give hints to Pango about your preferred CJK locale using the env var PANGO_LANGUAGE.
Note that theoretically Pango can do text analysis to come up with a best guess, but doing that would then introduce another bug with symptoms "changes font when typing a few characters on the same line".
Let me set the record straight here. Most people seeing this problem is not exactly complaining about the font changing, but about the font changing TO SOME BAD LATIN GLYPH THEY DON'T LIKE. It is understood that font changing is almost not avoidable, since typing just a few characters may not provide enough information on what kind of font should be picked, and typing more gives more info. So far it is determined per sentence, or per what?
Believe me, I know that. And I understand it if you don't WRITE IN CAPS too. Does it help if I say THEN GO REMOVE THE CRAPPY FONT?
[...]
Sadly this way absolutely won't satisfy everybody -- one party only. And in particular, the font picked is determined per glyph, causing a sentence to be intermixed by multiple CJK fonts as described.
This is totally wrong. Pango first tags each piece of text with a language, then asks fontconfig to sort fonts for that language, then uses the sorted list to assign font to each character. That is, if you mark your text zh_CN (by either running under that locale, or setting PANGO_LANGUAGE to that, or otherwise marking it), and have a suitable font for that language and if you have crappy fonts for it, have fontconfig configured to prefer the good one, then Pango chooses the right font. Now all the "bugs" you show me are in all the steps mentioned except for what Pango is doing.
What if the font determination is not chopped glyph by glyph, but also determined heuristically with context?
Pango already does that. That's exactly what you call "contextual" something above and condemn.
If my guess is correct this would work most of the cases, even among language variants (think zh_CN and zh_TW).
No. You need to go back and read and understand my "tons of quasi-maths".
Another symptom, "digits change font after typing character" is in fact a very cool Pango feature, just badmouthed by the above problem. Fix the problem.
When a solution is not universal enough to be accepted by everybody, and caused more trouble then its worth for specific people, it would be badmouthed no matter what. Or not? I don't know the rule here.
You officially don't know what you are talking about.
behdad
Abel
As you see from the bug lists, this problem has existed for many years, and I am pretty sure that it will come back again and again, as long as the expected rendering is not achieved. If the current pango formatting logic is not sufficient to handle the CJK preferences as said above, I think to refine the logic to take it into consideration is better than stick with a fixed but incomplete logic.
I consider patches improving Pango's font selection algorithm, but none that I've seen so far had been an improvement (from my point of view). If it has words like CJK or "special case", I'm most probably not interested. Of the bugs you listed, only the one I opened myself is valid IMO. The rest is just left open because no matter how many times I close them, they will be reopened... Oh well.
please let me know your thoughts and reasoning on whether this is feasible or not, if yes, where to get start.
Does the above make sense? I understand that it's easier to apply a two line patch to Pango instead of doing what of the things I listed above, but that just doesn't fit in the design, and it introduces other problems you don't see right now.
thank you for paying attention to this issue.
Qianqian
Regards,
behdad
=============================================================== Bug 321113 - Wrong glyph subsituation algorithm for digital characters and punctuations http://bugzilla.gnome.org/show_bug.cgi?id=321113
Bug 345072 - changes font when typing different scripts on the same line http://bugzilla.gnome.org/show_bug.cgi?id=345072
Bug 345386 - Language and direction propagation in and between PangoLayouts http://bugzilla.gnome.org/show_bug.cgi?id=345386 (opened by yourself) https://bugzilla.redhat.com/bugzilla/attachment.cgi?id=103679
Bug 481210 - [All lang] [firefox] - Face of the number is changing when enter number + Char, in any Locale http://bugzilla.gnome.org/show_bug.cgi?id=481210
Bug 481188 - ascii text space too narrow for Chinese encodings http://bugzilla.gnome.org/show_bug.cgi?id=481188
Bugzilla Bug 129541: changes font when typing different scripts on the same line https://bugzilla.redhat.com/show_bug.cgi?id=129541
Bugzilla Bug 131218: [RHEL4] Characters get truncated in new pango https://bugzilla.redhat.com/show_bug.cgi?id=131218
Bugzilla Bug 149991: [CJK pango] digits and punctuation in textbox give bad eol rendering and cursor placement https://bugzilla.redhat.com/show_bug.cgi?id=149991 (filed by Jens Petersen)
https://bugzilla.redhat.com/show_bug.cgi?id=220885 (broken link)
Bugzilla Bug 228804: [All lang] [firefox] - Face of the number is changing when enter number + Char, in any Locale https://bugzilla.redhat.com/show_bug.cgi?id=228804
Bugzilla Bug 221361: [pango] ascii text space and punctuation is narrow for CJK https://bugzilla.redhat.com/show_bug.cgi?id=221361
Bug 379125 - chinese punctuations after english letters are wrongly displayed https://bugzilla.mozilla.org/show_bug.cgi?id=379125 https://bugzilla.mozilla.org/attachment.cgi?id=263185 ===============================================================
-- behdad http://behdad.org/
...very few phenomena can pull someone out of Deep Hack Mode, with two noted exceptions: being struck by lightning, or worse, your *computer* being struck by lightning. -- Matt Welsh
gtk-i18n-list mailing list gtk-i18n-list@gnome.org http://mail.gnome.org/mailman/listinfo/gtk-i18n-list
On Thu, 2007-12-20 at 15:30 -0500, Qianqian Fang wrote:
hi Badhdad
I don't think the tone in your reply going to be helpful in any aspect toward a solution of this problem.
I try to respond decently. However, can't help when someone does not spend the same effort that I put in my replies, has not read my previous mails in the thread, and uses caps...
Writing these replies takes time. No abundance of it here. And I get frustrated from saying the same thing again and again.
I hope you understand that people raise these issues for the goods of pango.
I don't agree. I'm very clearly saying: Pango doesn't want to fix this issue. Fix it somewhere else if you want the issue fixed.
They want to make it more powerful and logical under all possible circumstances.
I want to keep Pango as clean in design as it is. That means, no "if (CJK)". Pango is a true international text layout system. It's quite different from a MS Windows "Chinese Edition" or Adobe Photoshop "Asian Edition", etc. It is supposed to be able to render all languages and scripts, in the same process, in the same document.
Second, there must be a reason for this issue being raised again and again in the past many years.
Compared to other scripts:
- No one has "fixed" it properly so far, so it keeps coming up.
- Chinese people are a great majority. The only comparable majorities are: Latin/Cyrillic script users, Arabic script users, and Indic script users. Latin and Arabic pretty much work. Indic has lots of issues, and it comes up again and again and more than CJK, believe me.
There used to be a time that Arabic was a disaster too. And people complained about it, a lot. But it's fixed now. Because there were people that fixed it all. Not by attacking the maintainer BTW. Not by taking it personally.
I think insufficient explanations and poor guidance for users toward a good solution play roles here (I am sorry to say that your "proof" in the last email still did not help because it was not what I was asking for).
I knew it doesn't help. Because it was an obvious fact for everyone thinking about it without prejudice. You asked that I say exactly why it's impossible, and I did. Now either read and try understanding that, or take my word when I say it's impossible.
As reading the replies in the past few days, I came to realize the key of problem is to set up a "correct" fall-back path of the untagged (or COMMON) text. Obviously, you are reluctant to explicitly tag them as LATIN in pango. You may be right if differentiating COMMON with LATIN is practically necessary (I mean "practically", not semantically as in Unicode standard). You have your rationales here.
If I hardcode them to LATIN, I'm sure *you* come back and complain about it too. When you see in a monospace piece of text that you've got bitmap crisp glyphs for Chinese glyphs, but a wider, fuzzy glyph for your '['.
Unfortunately, the current fall-back mechanism eventually assign the current locale info to these untagged text.
No. Not current locale. Adjacent scripts. If there's none, then current locale.
And it turns out that for some users (if not all), particularly for CJK users (where the practical differences between Latin/Common are not significant), it created unpleasant formating results due to the mixing of fonts.
Read above. It's going to create an unpleasant result in one case or the other. There's no magic bullet here.
So, it seems obvious that additional info is needed to assist the fall-back of these untagged text to the preferred settings. This info can be introduced by the patched fontconfig, using block preference font list; or using the current keyboard layout as suggested by Sergey and Chris. Maybe a third way is to create a LC variable, say LC_COMMON, independent of LC_ALL/LANG, taking care of the untagged text formating. I actually felt that this is probably more suitable than the other two approaches. Because this is a locale-based preference, not font or keyboard preferences (here is just my first thought on this, I may be wrong).
I don't agree completely, but do note that none of the above involves Pango (at least initially).
In any case, I "think" I understand your argument, although there are still details needs to be verified. But I think it will be useful if we focus on clarifying a solution rather than arguing who is right and who is wrong.
Except that I'm not interested in fixing it if it doesn't involve Pango.
Regards,