Le Mar 4 décembre 2007 08:35, Behdad Esfahbod a écrit :
On Mon, 2007-12-03 at 21:58 -0500, Qianqian Fang wrote:
Hi,
I've let Behdad answer so far because he's the most qualified on the pango front, but I've wanted to reafirm some points for a few days, so I'll do it now:
Your core problem as I wrote in one of my first mails is your font is providing bad glyphs for unicode blocks you don't really want to touch, and you're changing locales you shouldn't change so the easier and fastest solution for you has always beent to
- Remove Latin and ASCII digits from your font. Why is it there if
it's not desired?
You have the chance to package a free/open-libre font, this is something that couldn't be done for most fonts but you can do it so don't hesitate to do it.
Nicolas suggested that fontconfig adds support for conditional blacklisting of individual blocks/glyphs in a font. That would help too, but it's not in fontconfig yet.
Unfortunately many fonts are not so open and users still depend on them. So some sort of fontconfig blacklisting support is needed to support those fonts and users. From these exchanges, it seems chinese users are most affected by this problem.
Since you have contacts in the chinese fonts community do consider reviving the patches posted on the fontconfig list in the past or writing others. Have chinese users indicate on the fontconfig list their support for them. It's not a short-term fix, but it's the right long-term fix, and if you don't push it this year you'll hit the same problem again and again till someone does this work.
Last time the problem was discussed on fontconfig lists almost no one stepped in to write he needed this change. So fontconfig developpers decided it was a lot of work with no real need, and passed.
The moral of this story is: your problems won't be fixed if you only focus on workarounds (as you're doing now) and let others with no core interest at stake drive changes. I know that culturally chinese people tend to avoid open disagreement, but if you need fontconfig to change silently hoping for fontconfig maintainers to realise this won't work.
Similarly, if you need good Chinese rendering in non-chinese locales, chinifying en_US is not the solution. We've not heard from Japanese users yet but I'm sure they would strongly object to chinese-oriented defaults. That means you need to push for apps that do not do it yet to pass language info for properly tagged text to pango (like firefox does) and push for some sort of input language notification system.
You can of course pass and hope others will do it but in the meantime you'll have to accept any workaround that affects users in other locales won't be accepted in the distro. And since getting proper localised input working is the only way to get your stuff working without side-effects for those other users, that means chinese users won't have optimal defaults in the meantime.
Back to the original topic of this thread, how do you think the fontconfig file in my last email?
The version posted on http://www.redhat.com/archives/fedora-fonts-list/2007-November/msg00088.html
looks mostly fine, except I'm not sure the DejaVu LGC Sans Mono in monospace is needed and you rely on a high priority (61) to stomp on other CJK fonts (and probably others). IMHO this needs to be approved by Jens and the language teams affected.
For the version on http://www.redhat.com/archives/fedora-fonts-list/2007-December/msg00002.html
I'm not sure what the selectfont is there for. And likewise you have all sorts of stuff in monospace that assumes specific latin defaults out of your control. Will probably work most of the time, but removing the latin glyphs in your fonts would solve this in a more robust way.
Regards,
the selectfont block originated from Debian-based distributions, where by default the bitmap fonts are disabled. This block only enables this font without turning on the global switch for all bitmap fonts. However, in Fedora, bitmap fonts are allowed by default, so, this block can be safely removed.
Removing the Latin part might be a solution, but it works by sacrificing the integrity of the font and accommodating the insufficiency of fontconfig (I've never seen any Chinese font without Latin glyphs). In the long run, I don't think this will help either. To my understanding, the purpose of fontconfig is to provide the mechanism for font selection in non-invasively to the font pool, therefore, substitution and combining fonts based on the preferences of particular language SHOULD and COULD be done at this level.
Another reason is that there is ~1/4 of the people who likes to use the bitmap Latin in wqy-bitmap-fonts as their default desktop, I can show you dozens of links to prove this if you can find someone who can read Chinese.
I've tested the later file (http://www.redhat.com/archives/fedora-fonts-list/2007-December/msg00002.html ) under various locales, it works almost perfectly and I did not see the side effect of it. I am wondering if Jens would like to test it and let me know how you think about this file?
Qianqian
Nicolas Mailhot wrote:
The version posted on http://www.redhat.com/archives/fedora-fonts-list/2007-November/msg00088.html
looks mostly fine, except I'm not sure the DejaVu LGC Sans Mono in monospace is needed and you rely on a high priority (61) to stomp on other CJK fonts (and probably others). IMHO this needs to be approved by Jens and the language teams affected.
For the version on http://www.redhat.com/archives/fedora-fonts-list/2007-December/msg00002.html
I'm not sure what the selectfont is there for. And likewise you have all sorts of stuff in monospace that assumes specific latin defaults out of your control. Will probably work most of the time, but removing the latin glyphs in your fonts would solve this in a more robust way.
Regards,
Nicolas Mailhot wrote:
Unfortunately many fonts are not so open and users still depend on them. So some sort of fontconfig blacklisting support is needed to support those fonts and users. From these exchanges, it seems chinese users are most affected by this problem.
Since you have contacts in the chinese fonts community do consider reviving the patches posted on the fontconfig list in the past or writing others. Have chinese users indicate on the fontconfig list their support for them. It's not a short-term fix, but it's the right long-term fix, and if you don't push it this year you'll hit the same problem again and again till someone does this work.
Last time the problem was discussed on fontconfig lists almost no one stepped in to write he needed this change. So fontconfig developpers decided it was a lot of work with no real need, and passed.
hi Nicolas
I agree with you for the long-term solution of the problem. Here I just want to describe my observation to the Chinese users and my opinions on work-around.
Unfortunately the Chinese user community is quite weak in communicating with the upstreams, majorly due to language reasons. More than half of the users do not like to use English to discuss their problems, the vast majority of the feedback and problem-solving were done at various Chinese Linux forums, BBS (bulletin board system) and instant messaging. Even those who are able to describe their problem clearly in English, only a small fraction went through all the culture training and practicing and become a contributor.
The bad lucks of propagating patches back to upstream is also another reason that discourages Chinese to get involved. Chinese is one of the most complicated scripts and is always challenging to get what people expect without altering the default Latin handling, therefore, the upstream developers are very cautious about any change related to Chinese (or CJK). This also negatively impacts the situation.
As a result, Chinese users HAD to find out work-arounds to meet their day-to-day needs. You may be supprised that almost all Chinese linux forums have a board called "Font Beautification", it sounds ridiculous but this is true. People used to spend days or weeks trying to fix their Chinese font settings for all applications. That is also my motivation to create the Wen Quan Yi project, just trying to save people's time and make Linux easier to use by Chinese.
I can do my best to help pushing the fontconfig scheme that you mentioned, but I am not supprized if that still not implemented after years. But there are immediate needs to use Linux in a Chinese-friendly way and a good work-around can really build up the expanding user community and likely developer group, and that could make the life easier in the future. That's my rationale to push a reasonable fontconfig file for my font.
Qianqian
Le mardi 04 décembre 2007 à 11:40 -0500, Qianqian Fang a écrit :
Unfortunately the Chinese user community is quite weak in communicating with the upstreams, majorly due to language reasons. More than half of the users do not like to use English to discuss their problems, the vast majority of the feedback and problem-solving were done at various Chinese Linux forums, BBS (bulletin board system) and instant messaging. Even those who are able to describe their problem clearly in English, only a small fraction went through all the culture training and practicing and become a contributor.
This small fraction needs to organise itself, identify problems in Chinese font/text support, possible fixes, and relay them to upstream projects. For example every year freedekstop.org organises a text summit where pretty much every project that counts in FLOSS text rendering is represented:
http://unifont.org/TextLayout2007/
It would be extremely helpful if the Chinese FLOSS user community sent someone to next year's summit to list the main problems affecting Chinese users, and what the Chinese community feels needs to be done in projects like fontconfig or pango to fix them.
The rest of the year having clearly identified Chinese relays people can ask questions to (like "if I do this will it break Chinese apps") may probably help too.
The bad lucks of propagating patches back to upstream is also another reason that discourages Chinese to get involved. Chinese is one of the most complicated scripts and is always challenging to get what people expect without altering the default Latin handling, therefore, the upstream developers are very cautious about any change related to Chinese (or CJK).
CJK is difficult sure but do not underestimate the part lack of communication plays. Any maintainer will be ultra-cautious about making CJK changes when he knows that if he makes a mistake users are likely not to report back but suffer silently for years while cursing his name.
This also negatively impacts the situation.
As a counter example you may have noticed there's been a lot of Greek-related activity on the list lately. It's not because Greek is easier or more interesting that other languages, but because the Greek community managed to organise itself. As a result they're getting good support from every distribution, Fedora included.
As a result, Chinese users HAD to find out work-arounds to meet their day-to-day needs. You may be supprised that almost all Chinese linux forums have a board called "Font Beautification", it sounds ridiculous but this is true. People used to spend days or weeks trying to fix their Chinese font settings for all applications.
I'm not surprised at all this is typical workaround culture.
That is also my motivation to create the Wen Quan Yi project, just trying to save people's time and make Linux easier to use by Chinese.
I can do my best to help pushing the fontconfig scheme that you mentioned, but I am not supprized if that still not implemented after years.
I'll be honest even if someone actively pushes fontconfig changes it may take a year for them to be integrated and yet more time for the changes to percolate in distribution. Getting fontconfig to change is not easy. However if you don't try I'm pretty sure nothing will have changed in 5 years.
And you probably need fixes at other levels too. Just like the Wen Quan Yi project is part of the solution, but not the whole solution, fixing fontconfig will probably not be sufficient. For example even if fontconfig selects the perfect Chinese font when told to render Chinese, apps still need to detect they are rendering Chinese, which is not possible basing yourself only on unicode points, or the session locale (though for this particular problem you are better of than Latin languages since you only share codepoints with Japanese)
But there are immediate needs to use Linux in a Chinese-friendly way and a good work-around can really build up the expanding user community and likely developer group, and that could make the life easier in the future. That's my rationale to push a reasonable fontconfig file for my font.
The work-around limits as you've found out is they get removed as soon as someone else complains of them. No one really wants to choose between Latin, Chinese or Japanese users at Fedora, so if two user communities conflict the one stepping on the other loses.
By selecting a fontconfig priority of 61 you pretty much removed everyone relying on fonts with a less than 61 priority from the picture. That leaves people relying on fonts with a more-than-61 priority to complain. I suspect some of them, most likely Japanese users, can still be negatively affected by your changes but I'm no Japanese speaker so that's up to Jens to confirm (or infirm). And even if Jens greenlights there is still the possibility of later complains causing to remove your changes.
Lastly for Fedora ≥ 9 we'll probably use DejaVu full not DejaVu LGC as default, so you may want to adapt your fontconfig file accordingly in Fedora-devel (you'll note DejaVu full as default got blocked for several releases due to the same kinds of conflicts you're encountering, and is only pushed now we're more confident in its non LGC parts. We try to be fair to everyone)
Regards,
Nicolas Mailhot wrote:
This small fraction needs to organise itself, identify problems in Chinese font/text support, possible fixes, and relay them to upstream projects. For example every year freedekstop.org organises a text summit where pretty much every project that counts in FLOSS text rendering is represented:
http://unifont.org/TextLayout2007/
It would be extremely helpful if the Chinese FLOSS user community sent someone to next year's summit to list the main problems affecting Chinese users, and what the Chinese community feels needs to be done in projects like fontconfig or pango to fix them.
The rest of the year having clearly identified Chinese relays people can ask questions to (like "if I do this will it break Chinese apps") may probably help too.
I fully agree with you that a representing group is needed to facilitate the communications and speak for Chinese users on all text layout issues. I will make contact with the related people that I know, including Arne Gojge, the maintainer of the Uni-fonts project, some Redhat developers in Beijing and the Debian/Ubuntu Chinese group. I am not sure if my energy allows me to change anything other than taking care of my project, but I will make sure that your suggestions are passed around among those who are interested.
By selecting a fontconfig priority of 61 you pretty much removed everyone relying on fonts with a less than 61 priority from the picture. That leaves people relying on fonts with a more-than-61 priority to complain. I suspect some of them, most likely Japanese users, can still be negatively affected by your changes but I'm no Japanese speaker so that's up to Jens to confirm (or infirm). And even if Jens greenlights there is still the possibility of later complains causing to remove your changes.
I am not sure if you noticed it or not, it has a language matching block before strong binding to DejaVu Mono:
<test compare="contains" name="lang"> <string>zh</string> </test>
it also matches WenQuanYi Bitmap Song in family, IMO, this strong binding will only happen when user is under zh locales and has WQY font installed. So, I do not think it will mess up Japanese fonts as it will not match the lang tag. I tested it under ja locale, and everything seems to be normal (the mono font was not influenced), the screenshot is attached.
hi Jens, I am not sure if my previous reply CCed you or not, but I want to know your opinion and test results on this new font config file (attached). thank you!
Lastly for Fedora ≥ 9 we'll probably use DejaVu full not DejaVu LGC as default, so you may want to adapt your fontconfig file accordingly in Fedora-devel (you'll note DejaVu full as default got blocked for several releases due to the same kinds of conflicts you're encountering, and is only pushed now we're more confident in its non LGC parts. We try to be fair to everyone)
thank you for reminding me this, I will do the according adjustment if the config file get approved.
Again, I really appreciate your careful thoughts on these issues. All the requests are quite reasonable. I will tailor my config file as best as I can to avoid future complications.
Regards,
On Tue, 2007-12-04 at 11:40 -0500, Qianqian Fang wrote:
hi Nicolas
I agree with you for the long-term solution of the problem. Here I just want to describe my observation to the Chinese users and my opinions on work-around.
Unfortunately the Chinese user community is quite weak in communicating with the upstreams, majorly due to language reasons. More than half of the users do not like to use English to discuss their problems, the vast majority of the feedback and problem-solving were done at various Chinese Linux forums, BBS (bulletin board system) and instant messaging. Even those who are able to describe their problem clearly in English, only a small fraction went through all the culture training and practicing and become a contributor.
The bad lucks of propagating patches back to upstream is also another reason that discourages Chinese to get involved. Chinese is one of the most complicated scripts and is always challenging to get what people expect without altering the default Latin handling, therefore, the upstream developers are very cautious about any change related to Chinese (or CJK). This also negatively impacts the situation.
Hi Qianqian,
[/me tries to write a motivational mail]
It's easy to assume that one's problems are harder than others'. In this case, Chinese for example is a far easier script to support than Middle-Eastern scripts and definitely far easier than Indic scripts. Or in Iran, my native country, less than half of Iranians know enough English to be able to communicate at all, let alone preferring it...
When I started working on Persian support in software back in 1999, it was a disaster. IE5 had just came out and had support for Unicode, but had a serious bug with the letter Persian Yeh that made it almost unusable for Persian. The community started using Arabic Yeh instead, and many individuals and companies produced fonts that had the shape of Persian Yeh in their Arabic Yeh glyph position. That's not the only problem that needed to be worked around.
In the mean time, some of us started the FarsiWeb Project to systematically work on properly fixing Persian support in software. We soon got attracted to Free Software as there was not much we could do about proprietary ones other than reporting the bug (that particular IE bug took more than 4 years to fix...). Persian support in Free Software was even worse. Both KDE and GNOME had just added support for Arabic, but no Persian-specific feature was working. And there were no suitable fonts. No keyboard layout either. No translations whatsoever. Lots and lots of bugs in right-to-left UIs. The list goes on and on...
While trying to learn the culture of upstream in FarsiWeb, we learned about similar projects in other countries that shared a bunch of those problems with us, namely, Arabeyes from all over the Arab countries and Ivrix from Israel. We worked on a lot of projects and patches together, with the main goal of *fixing upstream*. To make this mail short, fast forward a few years later and I now maintain Pango and HarfBuzz, comaintain cairo, hack on Gtk+, Fontconfig, and Mozilla/Firefox, and the Linux desktop has the best Persian support among all modern operating systems. We've come a long way, and there's still a lot left to go...
Sorry if it was too personal and history, thought that may resonance with your feelings.
Regards,
hi Behdad
thank you sharing your experience and path for bettering local language support on FLOSS OSs. I do appreciate the diversities and complexities for text layout problems for mostly existing languages. And also, nothing would have happened if there was no one who endured the pains and devoted his efforts for improving the situation.
As I said in the first email, I am new to package maintaining and communication to upstream developers. I might have underestimated the problem and fired at the wrong directions. If that is the case, then please forgive me.
Go back to the digit font change issue as we discussed earlier, I spent some time in the past few days, trying to get myself a more clear picture on this. I dug out some bug reports from various bugzillas (Mozilla, Redbat, Gnome) and gathered a list of similar reports (see the bottom of the email). These reports were filed from simplified and traditional Chinese users and Japanese users (I believed Korean experienced the same problem). So, one thing that can be said from this list is that the "contextual font selection" does seem to be bothering CJK users in text formatting.
I understand that "contextual shaping" is one of the techniques for rendering complex scripts. I am not sure how tight is the connection between "contextual shaping" and the "contextual format propagation", but one thing that I think may put some light to the complains of the CJK users is that Chinese (maybe Japanese as well) scripts are not contextual sensitive. Chinese characters are relatively independent and self-consistent in shapes (while, this statement is not true for Chinese calligraphy, where strokes may connect between characters depending on layout direction, but the current OSs and font technologies are not ready to handle this IMO). The only complexities may come from the fact that Hanzi for printing are mostly equal-width, and the punctuations among the Hanzi are expected to match the width of the surrounding Hanzi. As the full-width punctuations being encoded separately by Unicode, together with the contextual punctuation support of the input-methods, this seems to be handled very well. So, in short, for Chinese text layout, users are generally not expected to see contextual-based changes, either encoding/glyph or font faces (this may not include some extreme cases).
Now go back to pango, from what I read from the bug reports, pango uses PANGO_SCRIPT_COMMON to represent language-independent symbols. I have no complain about that. It is a good classification based on the semantics of the symbols. What I, and most CJK users, are not satisfied with is the contextual-sensitivity of those common scripts when for mating text under cjk locales. I know that you have advocated to stick with the "face" meaning of SCRIPT_COMMON, which is supposedly to be rendered by local languages. But IMO, the face meaning is misleading here. From a Chinese user perspective, the difference between the SCRIPT_COMMON to Latin is negligible, compared with its difference to CJK characters. Therefore, using CJK fonts to render SCRIPT_COMMON is quite odd. Using Latin fonts for COMMON is most preferred; even specifying no face (i.e. using system fall-back) is better than assigning Chinese fonts for these scripts for that most Chinese fonts have low-quality Latin/common glyphs, even the commercial ones.
As you see from the bug lists, this problem has existed for many years, and I am pretty sure that it will come back again and again, as long as the expected rendering is not achieved. If the current pango formatting logic is not sufficient to handle the CJK preferences as said above, I think to refine the logic to take it into consideration is better than stick with a fixed but incomplete logic.
please let me know your thoughts and reasoning on whether this is feasible or not, if yes, where to get start.
thank you for paying attention to this issue.
Qianqian
=============================================================== Bug 321113 - Wrong glyph subsituation algorithm for digital characters and punctuations http://bugzilla.gnome.org/show_bug.cgi?id=321113
Bug 345072 - changes font when typing different scripts on the same line http://bugzilla.gnome.org/show_bug.cgi?id=345072
Bug 345386 - Language and direction propagation in and between PangoLayouts http://bugzilla.gnome.org/show_bug.cgi?id=345386 (opened by yourself) https://bugzilla.redhat.com/bugzilla/attachment.cgi?id=103679
Bug 481210 - [All lang] [firefox] - Face of the number is changing when enter number + Char, in any Locale http://bugzilla.gnome.org/show_bug.cgi?id=481210
Bug 481188 - ascii text space too narrow for Chinese encodings http://bugzilla.gnome.org/show_bug.cgi?id=481188
Bugzilla Bug 129541: changes font when typing different scripts on the same line https://bugzilla.redhat.com/show_bug.cgi?id=129541
Bugzilla Bug 131218: [RHEL4] Characters get truncated in new pango https://bugzilla.redhat.com/show_bug.cgi?id=131218
Bugzilla Bug 149991: [CJK pango] digits and punctuation in textbox give bad eol rendering and cursor placement https://bugzilla.redhat.com/show_bug.cgi?id=149991 (filed by Jens Petersen)
https://bugzilla.redhat.com/show_bug.cgi?id=220885 (broken link)
Bugzilla Bug 228804: [All lang] [firefox] - Face of the number is changing when enter number + Char, in any Locale https://bugzilla.redhat.com/show_bug.cgi?id=228804
Bugzilla Bug 221361: [pango] ascii text space and punctuation is narrow for CJK https://bugzilla.redhat.com/show_bug.cgi?id=221361
Bug 379125 - chinese punctuations after english letters are wrongly displayed https://bugzilla.mozilla.org/show_bug.cgi?id=379125 https://bugzilla.mozilla.org/attachment.cgi?id=263185 ===============================================================
On Dec 7, 2007 2:41 AM, Behdad Esfahbod behdad@behdad.org wrote:
Hi Qianqian,
[/me tries to write a motivational mail]
It's easy to assume that one's problems are harder than others'. In this case, Chinese for example is a far easier script to support than Middle-Eastern scripts and definitely far easier than Indic scripts. Or in Iran, my native country, less than half of Iranians know enough English to be able to communicate at all, let alone preferring it...
When I started working on Persian support in software back in 1999, it was a disaster. IE5 had just came out and had support for Unicode, but had a serious bug with the letter Persian Yeh that made it almost unusable for Persian. The community started using Arabic Yeh instead, and many individuals and companies produced fonts that had the shape of Persian Yeh in their Arabic Yeh glyph position. That's not the only problem that needed to be worked around.
In the mean time, some of us started the FarsiWeb Project to systematically work on properly fixing Persian support in software. We soon got attracted to Free Software as there was not much we could do about proprietary ones other than reporting the bug (that particular IE bug took more than 4 years to fix...). Persian support in Free Software was even worse. Both KDE and GNOME had just added support for Arabic, but no Persian-specific feature was working. And there were no suitable fonts. No keyboard layout either. No translations whatsoever. Lots and lots of bugs in right-to-left UIs. The list goes on and on...
While trying to learn the culture of upstream in FarsiWeb, we learned about similar projects in other countries that shared a bunch of those problems with us, namely, Arabeyes from all over the Arab countries and Ivrix from Israel. We worked on a lot of projects and patches together, with the main goal of *fixing upstream*. To make this mail short, fast forward a few years later and I now maintain Pango and HarfBuzz, comaintain cairo, hack on Gtk+, Fontconfig, and Mozilla/Firefox, and the Linux desktop has the best Persian support among all modern operating systems. We've come a long way, and there's still a lot left to go...
Sorry if it was too personal and history, thought that may resonance with your feelings.
Regards,
-- behdad http://behdad.org/
...very few phenomena can pull someone out of Deep Hack Mode, with two noted exceptions: being struck by lightning, or worse, your *computer* being struck by lightning. -- Matt Welsh
Hi Qianqian,
[CC'ing to gtk-i18n-list, so hopefully this is the last time I have to repeat this.]
On Mon, 2007-12-10 at 18:01 -0500, Qianqian Fang wrote:
Go back to the digit font change issue as we discussed earlier, I spent some time in the past few days, trying to get myself a more clear picture on this. I dug out some bug reports from various bugzillas (Mozilla, Redbat, Gnome) and gathered a list of similar reports (see the bottom of the email). These reports were filed from simplified and traditional Chinese users and Japanese users (I believed Korean experienced the same problem). So, one thing that can be said from this list is that the "contextual font selection" does seem to be bothering CJK users in text formatting.
Yes, you have identified the problem very accurately.
I understand that "contextual shaping" is one of the techniques for rendering complex scripts. I am not sure how tight is the connection between "contextual shaping" and the "contextual format propagation", but one thing that I think may put some light to the complains of the CJK users is that Chinese (maybe Japanese as well) scripts are not contextual sensitive. Chinese characters are relatively independent and self-consistent in shapes (while, this statement is not true for Chinese calligraphy, where strokes may connect between characters depending on layout direction, but the current OSs and font technologies are not ready to handle this IMO). The only complexities may come from the fact that Hanzi for printing are mostly equal-width, and the punctuations among the Hanzi are expected to match the width of the surrounding Hanzi. As the full-width punctuations being encoded separately by Unicode, together with the contextual punctuation support of the input-methods, this seems to be handled very well. So, in short, for Chinese text layout, users are generally not expected to see contextual-based changes, either encoding/glyph or font faces (this may not include some extreme cases).
And Pango supports those all perfectly fine. Even vertical writing using the correct substituted punctuation glyphs. See:
http://www.pango.org/ScriptGallery
The main font issue though, is that Chinese (Simplified, Traditional), Korean, and Japanese share some Unicode code points, but they require slightly different renderings. Now if you don't tell Pango which version is preferred, how can it know which font to choose? It explicitly doesn't prefer any one over the others to avoid cultural problems.
The symptoms of this problem are "multiple fonts used in the same line". Solution is: Either run under a CJK locale, or give hints to Pango about your preferred CJK locale using the env var PANGO_LANGUAGE.
Note that theoretically Pango can do text analysis to come up with a best guess, but doing that would then introduce another bug with symptoms "changes font when typing a few characters on the same line".
Now go back to pango, from what I read from the bug reports, pango uses PANGO_SCRIPT_COMMON to represent language-independent symbols. I have no complain about that. It is a good classification based on the semantics of the symbols.
Good. Let me also note that there's no way to change that. It's hardcoded in the Unicode standard.
What I, and most CJK users, are not satisfied with is the contextual-sensitivity of those common scripts when for mating text under cjk locales. I know that you have advocated to stick with the "face" meaning of SCRIPT_COMMON, which is supposedly to be rendered by local languages. But IMO, the face meaning is misleading here. From a Chinese user perspective, the difference between the SCRIPT_COMMON to Latin is negligible,
Lemme correct you here, "From a Chinese user perspective, the ASCII digits are considered Latin". There's sure a lot more than ASCII digits to SCRIPT_COMMON. Helps to be precise.
compared with its difference to CJK characters. Therefore, using CJK fonts to render SCRIPT_COMMON is quite odd. Using Latin fonts for COMMON is most preferred; even specifying no face ( i.e. using system fall-back) is better than assigning Chinese fonts for these scripts for that most Chinese fonts have low-quality Latin/common glyphs, even the commercial ones.
And this problem has a name: "crappy glyphs and multiple scripts in a font". Tell me about it...
I already pointed out a few solutions to it previously:
- Rip the crap out and everyone will feel better.
- Use TrueType containers (even for bitmap-only fonts) and put each script's glyphs into its own face, with all faces having the same name and put into the same TrueType Collection file.
- Finish patch for fontconfig to allow configuration to disable certain Unicode codepoints per font. The write such configuration for the crappy glyphs.
Pick whichever you prefer and just do it.
Another symptom, "digits change font after typing character" is in fact a very cool Pango feature, just badmouthed by the above problem. Fix the problem.
As you see from the bug lists, this problem has existed for many years, and I am pretty sure that it will come back again and again, as long as the expected rendering is not achieved. If the current pango formatting logic is not sufficient to handle the CJK preferences as said above, I think to refine the logic to take it into consideration is better than stick with a fixed but incomplete logic.
I consider patches improving Pango's font selection algorithm, but none that I've seen so far had been an improvement (from my point of view). If it has words like CJK or "special case", I'm most probably not interested. Of the bugs you listed, only the one I opened myself is valid IMO. The rest is just left open because no matter how many times I close them, they will be reopened... Oh well.
please let me know your thoughts and reasoning on whether this is feasible or not, if yes, where to get start.
Does the above make sense? I understand that it's easier to apply a two line patch to Pango instead of doing what of the things I listed above, but that just doesn't fit in the design, and it introduces other problems you don't see right now.
thank you for paying attention to this issue.
Qianqian
Regards,
behdad
=============================================================== Bug 321113 - Wrong glyph subsituation algorithm for digital characters and punctuations http://bugzilla.gnome.org/show_bug.cgi?id=321113
Bug 345072 - changes font when typing different scripts on the same line http://bugzilla.gnome.org/show_bug.cgi?id=345072
Bug 345386 - Language and direction propagation in and between PangoLayouts http://bugzilla.gnome.org/show_bug.cgi?id=345386 (opened by yourself) https://bugzilla.redhat.com/bugzilla/attachment.cgi?id=103679
Bug 481210 - [All lang] [firefox] - Face of the number is changing when enter number + Char, in any Locale http://bugzilla.gnome.org/show_bug.cgi?id=481210
Bug 481188 - ascii text space too narrow for Chinese encodings http://bugzilla.gnome.org/show_bug.cgi?id=481188
Bugzilla Bug 129541: changes font when typing different scripts on the same line https://bugzilla.redhat.com/show_bug.cgi?id=129541
Bugzilla Bug 131218: [RHEL4] Characters get truncated in new pango https://bugzilla.redhat.com/show_bug.cgi?id=131218
Bugzilla Bug 149991: [CJK pango] digits and punctuation in textbox give bad eol rendering and cursor placement https://bugzilla.redhat.com/show_bug.cgi?id=149991 (filed by Jens Petersen)
https://bugzilla.redhat.com/show_bug.cgi?id=220885 (broken link)
Bugzilla Bug 228804: [All lang] [firefox] - Face of the number is changing when enter number + Char, in any Locale https://bugzilla.redhat.com/show_bug.cgi?id=228804
Bugzilla Bug 221361: [pango] ascii text space and punctuation is narrow for CJK https://bugzilla.redhat.com/show_bug.cgi?id=221361
Bug 379125 - chinese punctuations after english letters are wrongly displayed https://bugzilla.mozilla.org/show_bug.cgi?id=379125 https://bugzilla.mozilla.org/attachment.cgi?id=263185 ===============================================================
hi Behdad
I would have agreed with you if you clearly tell me why this change SHOULD be done in the fonts, or in the font selection, not in the layout engine. Your previous replies, either to the bug reports or to my email, simply refused to make this change by saying this is "technically impossible", but you do not tell me based on what model that you made the statement. If you can give me a diagram or document to illustrate that this is not the business of layout engine, I would not insist to continue this discussion.
Secondly, you said that "contextual font selection" is a "cool" feature, I am wondering what languages are beneficial from this feature? (I believe there are, but just want to know). As I said in the previous email, this creates more troubles for CJK languages than benefits.Particularly this ruins the text alignment in monospace environment (see attachment). I doubt anyone see it would say "cool", rather, they would feel annoyed.
In addition, you seem to underestimate the difficulties of ripping out part of a CJK font. This is not possible for commercial fonts. Even it is doable for open fonts (very few choices though), the incompatibility of the resulting fonts will make it totally unusable on most platforms.
I want to add that on Windows, CJK users had never had such a problem, all known CJKfonts have their Latin glyphs (some are crappy), but the text rendering are "normal" (nothing like in the attachment). How window structures the style propagation for COMMON characters?
Qianqian
Behdad Esfahbod wrote:
Hi Qianqian,
[CC'ing to gtk-i18n-list, so hopefully this is the last time I have to repeat this.]
On Mon, 2007-12-10 at 18:01 -0500, Qianqian Fang wrote:
Go back to the digit font change issue as we discussed earlier, I spent some time in the past few days, trying to get myself a more clear picture on this. I dug out some bug reports from various bugzillas (Mozilla, Redbat, Gnome) and gathered a list of similar reports (see the bottom of the email). These reports were filed from simplified and traditional Chinese users and Japanese users (I believed Korean experienced the same problem). So, one thing that can be said from this list is that the "contextual font selection" does seem to be bothering CJK users in text formatting.
Yes, you have identified the problem very accurately.
I understand that "contextual shaping" is one of the techniques for rendering complex scripts. I am not sure how tight is the connection between "contextual shaping" and the "contextual format propagation", but one thing that I think may put some light to the complains of the CJK users is that Chinese (maybe Japanese as well) scripts are not contextual sensitive. Chinese characters are relatively independent and self-consistent in shapes (while, this statement is not true for Chinese calligraphy, where strokes may connect between characters depending on layout direction, but the current OSs and font technologies are not ready to handle this IMO). The only complexities may come from the fact that Hanzi for printing are mostly equal-width, and the punctuations among the Hanzi are expected to match the width of the surrounding Hanzi. As the full-width punctuations being encoded separately by Unicode, together with the contextual punctuation support of the input-methods, this seems to be handled very well. So, in short, for Chinese text layout, users are generally not expected to see contextual-based changes, either encoding/glyph or font faces (this may not include some extreme cases).
And Pango supports those all perfectly fine. Even vertical writing using the correct substituted punctuation glyphs. See:
http://www.pango.org/ScriptGallery
The main font issue though, is that Chinese (Simplified, Traditional), Korean, and Japanese share some Unicode code points, but they require slightly different renderings. Now if you don't tell Pango which version is preferred, how can it know which font to choose? It explicitly doesn't prefer any one over the others to avoid cultural problems.
The symptoms of this problem are "multiple fonts used in the same line". Solution is: Either run under a CJK locale, or give hints to Pango about your preferred CJK locale using the env var PANGO_LANGUAGE.
Note that theoretically Pango can do text analysis to come up with a best guess, but doing that would then introduce another bug with symptoms "changes font when typing a few characters on the same line".
Now go back to pango, from what I read from the bug reports, pango uses PANGO_SCRIPT_COMMON to represent language-independent symbols. I have no complain about that. It is a good classification based on the semantics of the symbols.
Good. Let me also note that there's no way to change that. It's hardcoded in the Unicode standard.
What I, and most CJK users, are not satisfied with is the contextual-sensitivity of those common scripts when for mating text under cjk locales. I know that you have advocated to stick with the "face" meaning of SCRIPT_COMMON, which is supposedly to be rendered by local languages. But IMO, the face meaning is misleading here. From a Chinese user perspective, the difference between the SCRIPT_COMMON to Latin is negligible,
Lemme correct you here, "From a Chinese user perspective, the ASCII digits are considered Latin". There's sure a lot more than ASCII digits to SCRIPT_COMMON. Helps to be precise.
compared with its difference to CJK characters. Therefore, using CJK fonts to render SCRIPT_COMMON is quite odd. Using Latin fonts for COMMON is most preferred; even specifying no face ( i.e. using system fall-back) is better than assigning Chinese fonts for these scripts for that most Chinese fonts have low-quality Latin/common glyphs, even the commercial ones.
And this problem has a name: "crappy glyphs and multiple scripts in a font". Tell me about it...
I already pointed out a few solutions to it previously:
Rip the crap out and everyone will feel better.
Use TrueType containers (even for bitmap-only fonts) and put each
script's glyphs into its own face, with all faces having the same name and put into the same TrueType Collection file.
- Finish patch for fontconfig to allow configuration to disable
certain Unicode codepoints per font. The write such configuration for the crappy glyphs.
Pick whichever you prefer and just do it.
Another symptom, "digits change font after typing character" is in fact a very cool Pango feature, just badmouthed by the above problem. Fix the problem.
As you see from the bug lists, this problem has existed for many years, and I am pretty sure that it will come back again and again, as long as the expected rendering is not achieved. If the current pango formatting logic is not sufficient to handle the CJK preferences as said above, I think to refine the logic to take it into consideration is better than stick with a fixed but incomplete logic.
I consider patches improving Pango's font selection algorithm, but none that I've seen so far had been an improvement (from my point of view). If it has words like CJK or "special case", I'm most probably not interested. Of the bugs you listed, only the one I opened myself is valid IMO. The rest is just left open because no matter how many times I close them, they will be reopened... Oh well.
please let me know your thoughts and reasoning on whether this is feasible or not, if yes, where to get start.
Does the above make sense? I understand that it's easier to apply a two line patch to Pango instead of doing what of the things I listed above, but that just doesn't fit in the design, and it introduces other problems you don't see right now.
thank you for paying attention to this issue.
Qianqian
Regards,
behdad
=============================================================== Bug 321113 - Wrong glyph subsituation algorithm for digital characters and punctuations http://bugzilla.gnome.org/show_bug.cgi?id=321113
Bug 345072 - changes font when typing different scripts on the same line http://bugzilla.gnome.org/show_bug.cgi?id=345072
Bug 345386 - Language and direction propagation in and between PangoLayouts http://bugzilla.gnome.org/show_bug.cgi?id=345386 (opened by yourself) https://bugzilla.redhat.com/bugzilla/attachment.cgi?id=103679
Bug 481210 - [All lang] [firefox] - Face of the number is changing when enter number + Char, in any Locale http://bugzilla.gnome.org/show_bug.cgi?id=481210
Bug 481188 - ascii text space too narrow for Chinese encodings http://bugzilla.gnome.org/show_bug.cgi?id=481188
Bugzilla Bug 129541: changes font when typing different scripts on the same line https://bugzilla.redhat.com/show_bug.cgi?id=129541
Bugzilla Bug 131218: [RHEL4] Characters get truncated in new pango https://bugzilla.redhat.com/show_bug.cgi?id=131218
Bugzilla Bug 149991: [CJK pango] digits and punctuation in textbox give bad eol rendering and cursor placement https://bugzilla.redhat.com/show_bug.cgi?id=149991 (filed by Jens Petersen)
https://bugzilla.redhat.com/show_bug.cgi?id=220885 (broken link)
Bugzilla Bug 228804: [All lang] [firefox] - Face of the number is changing when enter number + Char, in any Locale https://bugzilla.redhat.com/show_bug.cgi?id=228804
Bugzilla Bug 221361: [pango] ascii text space and punctuation is narrow for CJK https://bugzilla.redhat.com/show_bug.cgi?id=221361
Bug 379125 - chinese punctuations after english letters are wrongly displayed https://bugzilla.mozilla.org/show_bug.cgi?id=379125 https://bugzilla.mozilla.org/attachment.cgi?id=263185 ===============================================================
On Thu, 2007-12-13 at 12:13 -0500, Qianqian Fang wrote:
hi Behdad
Hi,
I would have agreed with you if you clearly tell me why this change SHOULD be done in the fonts, or in the font selection, not in the layout engine. Your previous replies, either to the bug reports or to my email, simply refused to make this change by saying this is "technically impossible", but you do not tell me based on what model that you made the statement. If you can give me a diagram or document to illustrate that this is not the business of layout engine, I would not insist to continue this discussion.
You've kept saying it should be different for CJK and I've always asked you to describe how exactly it should behave to no avail.
Here is the set of assumptions that best describes the problem:
A1. The layout engine is not provided any hints whatsoever on which of the CJK languages to prefer.
A2. Any font available on the system is suitable for (aka "supports") at most one CJK language, not more.
A3. For every CJK language, there exists a positive number of characters solely used in that CJK language and not any other one.
A4. There exists a positive number of Unicode characters that are used in more than one CJK language.
That's enough to prove that you can't fix both of these bugs at the same time:
B1. "multiple CJK fonts on the same line"
B2. "font face changes when more text is typed"
This is what we will prove: "for any layout engine with font fallback support [1], there exists some CJK text that when typed on a line by the user, either results in more than one CJK font being used, or a font change for the already typed text happens", where font fallback support means that a character is assigned a font that is known to *support* that character, if any such font is available on the system. We prove by constructing such a piece of text. Here's a sketch:
- Pick a Unicode character that is used in more than one CJK language. This is possible because of A4. Call it c[0].
- Let the layout engine choose a font to render this character. Let f[0] be the font used to render it.
- Find the CJK language l[0] that font f[0] supports. By A2 we know that there can't be more than one such language. If no such language exists, the layout engine suffers from the bug "no CJK font is chosen". Abort.
- Let l[1] be any CJK language other than l[0].
- Choose c[1] to be any CJK character used in language l[1] and l[1] only. That's possible because of A3.
- Pass text c[0]c[1] to the layout engine, let f'[0]f[1] be the two fonts chosen to render characters c[0] and c[1] respectively.
- Observe that:
* if f'[0] == f[0]: We know f[0] supports l[0], and that l[0] != l[1]. By A2, it follows that f[0] does not support l[1], so f[0] cannot be chosen for c[1] and as a result, f'[0] != f[1], that is, multiple fonts are chosen to render the text.
* if f'[0] != f[0]: Typing character c[1] on the line containing text c[0] caused the chosen font for c[0] to change.
End of proof ∎
[1] I'm tempted to say deterministic Turing machine here, but I pass :)
Similar proofs can be constructed for other CJK "bugs" (those involving Latin text, ASCII digits, etc), but I've already exceeded my time limit for this message.
Secondly, you said that "contextual font selection" is a "cool" feature, I am wondering what languages are beneficial from this feature? (I believe there are, but just want to know).
Pretty much every non-Latin script. In some situations even the Latin script.
Take the Unicode character U+002E FULL STOP, aka ASCII period. It is used in more than just Latin, in Arabic for example, in Hebrew, possibly in Indic and many other scripts. If it was not grouped with neighboring characters for font selection purposes all those people would have got their Arabic/Hebrew/... text assigned an Arabic/Hebrew/... font while the periods in at the end of sentences assigned a different (default Latin for example) font.
The same happens for Latin under a document tagged as non-Latin. It's not a luxury thing. It's just how things are supposed to work.
As I said in the previous email, this creates more troubles for CJK languages than benefits.Particularly this ruins the text alignment in monospace environment (see attachment). I doubt anyone see it would say "cool", rather, they would feel annoyed.
That's not true. If you have Chinese text and Latin text in the same line, and your Latin and Chinese monospace fonts have different widths, you are screwed no matter what.
There are situations that that particular bug you are referencing here can be improved, and that's why I filed bug 345386, but you already knew that.
In addition, you seem to underestimate the difficulties of ripping out part of a CJK font. This is not possible for commercial fonts. Even it is doable for open fonts (very few choices though), the incompatibility of the resulting fonts will make it totally unusable on most platforms.
I've put three different ways in front of you. The fontconfig one is not hard at all for anyone willing to put their fingers where their mouth is. You on the other hand, seem to ignore the impossibility (not difficulty) of what you are asking for.
I want to add that on Windows, CJK users had never had such a problem, all known CJKfonts have their Latin glyphs (some are crappy), but the text rendering are "normal" (nothing like in the attachment). How window structures the style propagation for COMMON characters?
Windows does no font fallback. You choose which font to use. But you want your Latin characters in a different font than your Chinese characters AND you want to keep the crappy glyphs. They don't mix.
Qianqian
behdad
Behdad Esfahbod wrote:
Hi Qianqian,
[CC'ing to gtk-i18n-list, so hopefully this is the last time I have to repeat this.]
On Mon, 2007-12-10 at 18:01 -0500, Qianqian Fang wrote:
Go back to the digit font change issue as we discussed earlier, I spent some time in the past few days, trying to get myself a more clear picture on this. I dug out some bug reports from various bugzillas (Mozilla, Redbat, Gnome) and gathered a list of similar reports (see the bottom of the email). These reports were filed from simplified and traditional Chinese users and Japanese users (I believed Korean experienced the same problem). So, one thing that can be said from this list is that the "contextual font selection" does seem to be bothering CJK users in text formatting.
Yes, you have identified the problem very accurately.
I understand that "contextual shaping" is one of the techniques for rendering complex scripts. I am not sure how tight is the connection between "contextual shaping" and the "contextual format propagation", but one thing that I think may put some light to the complains of the CJK users is that Chinese (maybe Japanese as well) scripts are not contextual sensitive. Chinese characters are relatively independent and self-consistent in shapes (while, this statement is not true for Chinese calligraphy, where strokes may connect between characters depending on layout direction, but the current OSs and font technologies are not ready to handle this IMO). The only complexities may come from the fact that Hanzi for printing are mostly equal-width, and the punctuations among the Hanzi are expected to match the width of the surrounding Hanzi. As the full-width punctuations being encoded separately by Unicode, together with the contextual punctuation support of the input-methods, this seems to be handled very well. So, in short, for Chinese text layout, users are generally not expected to see contextual-based changes, either encoding/glyph or font faces (this may not include some extreme cases).
And Pango supports those all perfectly fine. Even vertical writing using the correct substituted punctuation glyphs. See:
http://www.pango.org/ScriptGallery
The main font issue though, is that Chinese (Simplified, Traditional), Korean, and Japanese share some Unicode code points, but they require slightly different renderings. Now if you don't tell Pango which version is preferred, how can it know which font to choose? It explicitly doesn't prefer any one over the others to avoid cultural problems.
The symptoms of this problem are "multiple fonts used in the same line". Solution is: Either run under a CJK locale, or give hints to Pango about your preferred CJK locale using the env var PANGO_LANGUAGE.
Note that theoretically Pango can do text analysis to come up with a best guess, but doing that would then introduce another bug with symptoms "changes font when typing a few characters on the same line".
Now go back to pango, from what I read from the bug reports, pango uses PANGO_SCRIPT_COMMON to represent language-independent symbols. I have no complain about that. It is a good classification based on the semantics of the symbols.
Good. Let me also note that there's no way to change that. It's hardcoded in the Unicode standard.
What I, and most CJK users, are not satisfied with is the contextual-sensitivity of those common scripts when for mating text under cjk locales. I know that you have advocated to stick with the "face" meaning of SCRIPT_COMMON, which is supposedly to be rendered by local languages. But IMO, the face meaning is misleading here. From a Chinese user perspective, the difference between the SCRIPT_COMMON to Latin is negligible,
Lemme correct you here, "From a Chinese user perspective, the ASCII digits are considered Latin". There's sure a lot more than ASCII digits to SCRIPT_COMMON. Helps to be precise.
compared with its difference to CJK characters. Therefore, using CJK fonts to render SCRIPT_COMMON is quite odd. Using Latin fonts for COMMON is most preferred; even specifying no face ( i.e. using system fall-back) is better than assigning Chinese fonts for these scripts for that most Chinese fonts have low-quality Latin/common glyphs, even the commercial ones.
And this problem has a name: "crappy glyphs and multiple scripts in a font". Tell me about it...
I already pointed out a few solutions to it previously:
Rip the crap out and everyone will feel better.
Use TrueType containers (even for bitmap-only fonts) and put each
script's glyphs into its own face, with all faces having the same name and put into the same TrueType Collection file.
- Finish patch for fontconfig to allow configuration to disable
certain Unicode codepoints per font. The write such configuration for the crappy glyphs.
Pick whichever you prefer and just do it.
Another symptom, "digits change font after typing character" is in fact a very cool Pango feature, just badmouthed by the above problem. Fix the problem.
As you see from the bug lists, this problem has existed for many years, and I am pretty sure that it will come back again and again, as long as the expected rendering is not achieved. If the current pango formatting logic is not sufficient to handle the CJK preferences as said above, I think to refine the logic to take it into consideration is better than stick with a fixed but incomplete logic.
I consider patches improving Pango's font selection algorithm, but none that I've seen so far had been an improvement (from my point of view). If it has words like CJK or "special case", I'm most probably not interested. Of the bugs you listed, only the one I opened myself is valid IMO. The rest is just left open because no matter how many times I close them, they will be reopened... Oh well.
please let me know your thoughts and reasoning on whether this is feasible or not, if yes, where to get start.
Does the above make sense? I understand that it's easier to apply a two line patch to Pango instead of doing what of the things I listed above, but that just doesn't fit in the design, and it introduces other problems you don't see right now.
thank you for paying attention to this issue.
Qianqian
Regards,
behdad
=============================================================== Bug 321113 - Wrong glyph subsituation algorithm for digital characters and punctuations http://bugzilla.gnome.org/show_bug.cgi?id=321113
Bug 345072 - changes font when typing different scripts on the same line http://bugzilla.gnome.org/show_bug.cgi?id=345072
Bug 345386 - Language and direction propagation in and between PangoLayouts http://bugzilla.gnome.org/show_bug.cgi?id=345386 (opened by yourself) https://bugzilla.redhat.com/bugzilla/attachment.cgi?id=103679
Bug 481210 - [All lang] [firefox] - Face of the number is changing when enter number + Char, in any Locale http://bugzilla.gnome.org/show_bug.cgi?id=481210
Bug 481188 - ascii text space too narrow for Chinese encodings http://bugzilla.gnome.org/show_bug.cgi?id=481188
Bugzilla Bug 129541: changes font when typing different scripts on the same line https://bugzilla.redhat.com/show_bug.cgi?id=129541
Bugzilla Bug 131218: [RHEL4] Characters get truncated in new pango https://bugzilla.redhat.com/show_bug.cgi?id=131218
Bugzilla Bug 149991: [CJK pango] digits and punctuation in textbox give bad eol rendering and cursor placement https://bugzilla.redhat.com/show_bug.cgi?id=149991 (filed by Jens Petersen)
https://bugzilla.redhat.com/show_bug.cgi?id=220885 (broken link)
Bugzilla Bug 228804: [All lang] [firefox] - Face of the number is changing when enter number + Char, in any Locale https://bugzilla.redhat.com/show_bug.cgi?id=228804
Bugzilla Bug 221361: [pango] ascii text space and punctuation is narrow for CJK https://bugzilla.redhat.com/show_bug.cgi?id=221361
Bug 379125 - chinese punctuations after english letters are wrongly displayed https://bugzilla.mozilla.org/show_bug.cgi?id=379125 https://bugzilla.mozilla.org/attachment.cgi?id=263185 ===============================================================
Le dimanche 16 décembre 2007 à 18:22 -0500, Behdad Esfahbod a écrit :
On Thu, 2007-12-13 at 12:13 -0500, Qianqian Fang wrote:
Secondly, you said that "contextual font selection" is a "cool" feature, I am wondering what languages are beneficial from this feature? (I believe there are, but just want to know).
Pretty much every non-Latin script. In some situations even the Latin script.
Take the Unicode character U+002E FULL STOP, aka ASCII period. It is used in more than just Latin, in Arabic for example, in Hebrew, possibly in Indic and many other scripts. If it was not grouped with neighboring characters for font selection purposes all those people would have got their Arabic/Hebrew/... text assigned an Arabic/Hebrew/... font while the periods in at the end of sentences assigned a different (default Latin for example) font.
The same happens for Latin under a document tagged as non-Latin. It's not a luxury thing. It's just how things are supposed to work.
To be honest this was mostly solved latin-size by creating pan-european+ LGC fonts to completely avoid triggering substitutions.
Creating coherent pan-unicode fonts would solve it for other locales but that's a huge piece of work and some bits like opentype base are not there yet on the FLOSS side.
As I said in the previous email, this creates more troubles for CJK languages than benefits.Particularly this ruins the text alignment in monospace environment (see attachment). I doubt anyone see it would say "cool", rather, they would feel annoyed.
That's not true. If you have Chinese text and Latin text in the same line, and your Latin and Chinese monospace fonts have different widths, you are screwed no matter what.
That's means that for monospace separate fonts with different metrics are a dead-end, right? :p
I wonder if something semi-monospaced like using twice the base size for complex scripts would be worth it or would just break horribly apps.
In addition, you seem to underestimate the difficulties of ripping out part of a CJK font. This is not possible for commercial fonts. Even it is doable for open fonts (very few choices though), the incompatibility of the resulting fonts will make it totally unusable on most platforms.
I've put three different ways in front of you.
Easy one: removing latin from the FLOSS font. But wouldn't solve proprietary fonts people use in the wild.
Complete one: enhancing fontconfig to blacklist parts of fonts.
I don't see much the point of the TTC solution, except as a workaround to lack of opentype BASE support.
The fontconfig one is not hard at all for anyone willing to put their fingers where their mouth is. You on the other hand, seem to ignore the impossibility (not difficulty) of what you are asking for.
I want to add that on Windows, CJK users had never had such a problem, all known CJKfonts have their Latin glyphs (some are crappy), but the text rendering are "normal" (nothing like in the attachment). How window structures the style propagation for COMMON characters?
Windows does no font fallback.
But windows, however, has an input chooser that explicitely specifies the language in use instead of just a keyboard layout switcher, and I suspect some windows apps do use it to select the right font
Unfortunately it seems Sergey Udaltsov was discouraged by lack of positive feedback and stopped pushing something like http://fedoraproject.org/wiki/SIGs/Fonts/Dev/LanguageAwarenessProblem
Qianqian: you need to realise the low hanging fruits have been harvested long ago. There are no easy solution left that was not rejected for one reason or another. That's why you're hitting a wall (and exasperating Behdad). The bits needed to support well CJK and complex scripts are well-known, but they're non-trivial so they do need some concerted effort by the affected communities.
Regards,
On Mon, 2007-12-17 at 11:46 +0100, Nicolas Mailhot wrote:
I don't see much the point of the TTC solution, except as a workaround to lack of opentype BASE support.
You are prolly right. Families with the same name have almost all the problems of a pan-Unicode font when installed.