On Fri, 2007-12-21 at 01:24 +0900, mpsuzuki@hiroshima-u.ac.jp wrote:
Sorry, I slipped to attach the picture, here it is.
On Fri, 21 Dec 2007 01:17:08 +0900 mpsuzuki@hiroshima-u.ac.jp wrote:
On Thu, 20 Dec 2007 10:08:05 -0500 Behdad Esfahbod behdad@behdad.org wrote:
On Thu, 2007-12-20 at 23:04 +0900, mpsuzuki@hiroshima-u.ac.jp wrote:
On Thu, 20 Dec 2007 07:48:50 -0500 Behdad Esfahbod behdad@behdad.org wrote:
Setting locale is actually enough. If that's not desired, $PANGO_LANGUAGE can be set as a fallback. So far seems like most of the issues happen because either the users are not setting locale correctly or are using crappy fonts. How do I don't care enough about those cases I'm not surprised.
Excuse me, PANGO_LANGUAGE is the solution to modify the Pango's behaviour that Qianqian & Abel ask for fix?
It's a way to tell Pango which of the CJK languages to prefer. It's main use is when running under non-CJK locale (en_US for example) and the text doesn't have language tags. It solves most of the "multiple fonts used in the same line" issues with CJK characters.
Excuse me again, please let me know more detail. I attached a picture to describe the behaviour I want to fix.
Thanks for raising a concrete issue.
The picture (1), (2), (3) are screenshots under English.
If I execute gedit as $ env LANG=C PANGO_LANGUAGE=en gedit font is not changed during I type "[" then "a".
The picture (1'), (2'), (3'), (4') are screenshots under Japanese.
If I execute gedit as $ env LANG=ja_JP.euc-jp PANGO_LANGUAGE=ja gedit
As long as your LANG and PANGO_LANGUAGE are the same, you don't need both. PANGO_LANGUAGE is mostly useful when you set LANG to en. That's not relevant to your issue here though.
and I type "[" then "a" then "あ". The font to display "[" is dynamically changed as (2'), (3'), (4') during typing keys. The dynamically font switching shifts the baseline up and down, it looks as strange zig-zag behaviour. I could not stop this switching by setting PANGO_LANGUAGE=en nor PANGO_LANGUAGE=ja. How can I stop this switching?
I tell you what's happening, you tell me what Pango is doing wrong and how you think it can be fixed:
- In image 2', you are running under Japanese locale, you type a COMMON character ('[') only, Pango assumes you are going to type Japanese text, your preferred Japanese font has a glyph for '[', so Pango uses it, hoping that it will use the same font when you enter Japanese text.
- In image 3', you entered a Latin letter, not Japanese (an unexpected event given that you run under Japanese locale), so Pango now associates the bracket to the Latin text, because, well, that's the only non-COMMON script there. You sure have a bracket and Latin text in it. So it renders the bracket using the same font that it uses for the Latin text.
- In image 4', you add a Japanese character. No surprises here: you have two fonts, the line takes the height of the taller font. So the Latin text is shifted down a bit.
So, the issue comes down to the fact that:
- It's unexpected to enter Latin under Japanese locale.
- You have a COMMON character at the beginning of the line.
- Your Japanese and Latin fonts have different heights.
And this case is rare enough that I normally don't consider it an issue at all. But apparently multiplying that by 1 billion makes it quite visible!
One way one may suggest is that Pango should reserve a minimum line height that is enough to fit the default Japanese font, because it's running under Japanese locale after all. That would fix the jump from 3' to 4', but makes English-only paragraphs look very ugly and badly spaced vertically, so that's not an option either.
The jump from 2' to 3' can't be fixed. I already proved that. If one fixes it, it would introduce the bug that '[' followed by a Japanese character will choose a separate fonts for those chars, OR, that font used for '[' will change when you type a Japanese char. It's as simple as this: Pango can't know what you are going to type next. It can just guess, and it's guessing pretty good. It's just not reading your mind yet :).
I have two suggestions for what you can do that may achieve better results for you.
- Run under LC_LANG=en_US LC_MESSAGES=ja_JA
- Choose a non-generic font family in gedit. That is, something other than Sans, Sans-serif, and Monospace.
Regards, mpsuzuki
Regards,
Le jeudi 20 décembre 2007 à 17:25 -0500, Behdad Esfahbod a écrit :
I have two suggestions for what you can do that may achieve better results for you.
Run under LC_LANG=en_US LC_MESSAGES=ja_JA
Choose a non-generic font family in gedit. That is, something other
than Sans, Sans-serif, and Monospace.
3. Have an IM/layout switcher that explicitely declares to apps and pango the language which is going to be typed.
On Thu, 2007-12-20 at 23:37 +0100, Nicolas Mailhot wrote:
Le jeudi 20 décembre 2007 à 17:25 -0500, Behdad Esfahbod a écrit :
I have two suggestions for what you can do that may achieve better results for you.
Run under LC_LANG=en_US LC_MESSAGES=ja_JA
Choose a non-generic font family in gedit. That is, something other
than Sans, Sans-serif, and Monospace.
- Have an IM/layout switcher that explicitely declares to apps and
pango the language which is going to be typed.
That may help when typing, but has the following problems:
- Fonts change when you switch language.
- To make it meaningful, your editor should store the language at the time of typing as a tag. Or it will lose it and void the advantage.
- Doesn't help when copy/pasting or opening a document.
What will be helpful is, if pango could query your session and see that you have American English and Chinese Chinese IM/layouts set, so automatically set PANGO_LANGUAGE to en_US:zh_CN. That is, respect your set languages, but not necessarily follow the currently-selected one.
Le jeudi 20 décembre 2007 à 17:54 -0500, Behdad Esfahbod a écrit :
That may help when typing, but has the following problems:
- Fonts change when you switch language.
Fonts will change anyway (indeed the alpha and omega of current complains is they change but people disagree with the heuristics), and it's better to let users in control when we can not guess properly in a large number of cases
- To make it meaningful, your editor should store the language at the
time of typing as a tag. Or it will lose it and void the advantage.
Understood. I won't happen overnight. Nevertheless if can still happen faster than finding the perfect crystal ball.
- Doesn't help when copy/pasting or opening a document.
Cut & paste can probably be solved with a "tagged text" media type. Opening a document will never work for document types that do not store language info. But if the problem can be reduced to this perimeter we'll have made a huge leap forward.
What will be helpful is, if pango could query your session and see that you have American English and Chinese Chinese IM/layouts set, so automatically set PANGO_LANGUAGE to en_US:zh_CN. That is, respect your set languages, but not necessarily follow the currently-selected one.
To me that is an if(CJK) solution. That is to say it sort of solves the problem of one group of users without being generalisable to other groups of users. It assumes you can deduce language from configured IMs, when those can overlap, when many languages can and are commonly typed through IMs primarily designed for another language, etc
The breakage when the wrong language is detected is far more widespread than just chinese, even if the effects are often more subtle. You need good language detection to autoselect the right spellchecker, to tell office suite what language should tag a run of text, to select the right locl font alternative, etc including when users type several languages sharing the same unicode blocks. You'll never autodetect those through locales, IMs, or codepoints used. German people write English. Balkan people write Russian. They still use their primary IM for this since it gives them access to the codepoints needed without needing to learn another layout.
On Fri, 2007-12-21 at 09:50 +0100, Nicolas Mailhot wrote:
Le jeudi 20 décembre 2007 à 17:54 -0500, Behdad Esfahbod a écrit :
That may help when typing, but has the following problems:
- Fonts change when you switch language.
Fonts will change anyway (indeed the alpha and omega of current complains is they change but people disagree with the heuristics), and it's better to let users in control when we can not guess properly in a large number of cases
Guess I wasn't clear. Fonts change on the current line as you switch keyboard/IM. So, you rotate through Chinese and English locales and fonts keep changing with your change. Unless you add markup to keep state...
- To make it meaningful, your editor should store the language at the
time of typing as a tag. Or it will lose it and void the advantage.
Understood. I won't happen overnight. Nevertheless if can still happen faster than finding the perfect crystal ball.
- Doesn't help when copy/pasting or opening a document.
Cut & paste can probably be solved with a "tagged text" media type. Opening a document will never work for document types that do not store language info. But if the problem can be reduced to this perimeter we'll have made a huge leap forward.
What will be helpful is, if pango could query your session and see that you have American English and Chinese Chinese IM/layouts set, so automatically set PANGO_LANGUAGE to en_US:zh_CN. That is, respect your set languages, but not necessarily follow the currently-selected one.
To me that is an if(CJK) solution. That is to say it sort of solves the problem of one group of users without being generalisable to other groups of users. It assumes you can deduce language from configured IMs, when those can overlap, when many languages can and are commonly typed through IMs primarily designed for another language, etc
Again, guess I wasn't clear. I'm saying that if and when the desktop has the language information (as opposed to just keyboard layout information), Pango should use the list of all languages in that way. This helps for example preferring Persian fonts over Arabic fonts.
The breakage when the wrong language is detected is far more widespread than just chinese, even if the effects are often more subtle. You need good language detection to autoselect the right spellchecker, to tell office suite what language should tag a run of text, to select the right locl font alternative, etc including when users type several languages sharing the same unicode blocks. You'll never autodetect those through locales, IMs, or codepoints used. German people write English. Balkan people write Russian. They still use their primary IM for this since it gives them access to the codepoints needed without needing to learn another layout.