Please do not reply directly to this email. All additional comments should be made in the comments box of this bug.
Summary: [ta_IN] Tamil collation rules are not working in other locales
https://bugzilla.redhat.com/show_bug.cgi?id=514110
Summary: [ta_IN] Tamil collation rules are not working in other locales Product: Fedora Version: rawhide Platform: All OS/Version: Linux Status: NEW Severity: medium Priority: low Component: glibc AssignedTo: schwab@redhat.com ReportedBy: psatpute@redhat.com QAContact: extras-qa@fedoraproject.org CC: jakub@redhat.com, santhosh.thottingal@gmail.com, fedora-i18n-bugs@redhat.com, schwab@redhat.com Estimated Hours: 0.0 Classification: Fedora Target Release: ---
Description of problem: ta_IN collation rules are not working, when we select other locale say en_US.UTF-8
Version-Release number of selected component (if applicable): glibc-common-2.10.90-7.1
How reproducible: every time
Steps to Reproduce: 1. select en_US locale 2. try to sort Tamil characters 3.
Actual results: sorting is not working
Expected results: It should be sorted as per collation rule
Additional info: This is happening due to ta_IN collation rules written in ta_IN locale file, and these rules are not available to outside locale.
We should move these collation table to iso14651_t1_common(common to most of the locale), so it will be available to other locale as well and tamil sorting will work while selecting any locale
Please do not reply directly to this email. All additional comments should be made in the comments box of this bug.
https://bugzilla.redhat.com/show_bug.cgi?id=514110
--- Comment #1 from Pravin Satpute psatpute@redhat.com 2009-08-21 06:26:29 EDT --- some discussion happened on mail already, thanks to santhosh for preparing patch to move ta_IN collation rules to iso14651_t1_common file
/me corrected syntax errors as well tested with some test data
please test it will some more tamil data so we can make some correction before moving to glibc
steps for testing 1) download Tamil_Collation folder from http://pravins.fedorapeople.org/Tamil_Collation
2) cd Tamil_Collation
3) append some more rule in test file
4) LOCPATH=$PWD LC_ALL=ta_TT sort ../test > test_result
5) gedit test_result
please add test file as well result in bugzilla
if its correct i will proceed with patch
Please do not reply directly to this email. All additional comments should be made in the comments box of this bug.
https://bugzilla.redhat.com/show_bug.cgi?id=514110
Pravin Satpute psatpute@redhat.com changed:
What |Removed |Added ---------------------------------------------------------------------------- AssignedTo|schwab@redhat.com |psatpute@redhat.com
Please do not reply directly to this email. All additional comments should be made in the comments box of this bug.
https://bugzilla.redhat.com/show_bug.cgi?id=514110
--- Comment #2 from Pravin Satpute psatpute@redhat.com 2009-10-24 02:10:27 EDT --- it will be nice if some tamil person can test this patch once, so we can proceed.
its two month now its in bugzilla
Please do not reply directly to this email. All additional comments should be made in the comments box of this bug.
https://bugzilla.redhat.com/show_bug.cgi?id=514110
Pravin Satpute psatpute@redhat.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Flag| |needinfo?(ifelix@redhat.com | |)
Please do not reply directly to this email. All additional comments should be made in the comments box of this bug.
https://bugzilla.redhat.com/show_bug.cgi?id=514110
Pravin Satpute psatpute@redhat.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Version|12 |rawhide
Please do not reply directly to this email. All additional comments should be made in the comments box of this bug.
https://bugzilla.redhat.com/show_bug.cgi?id=514110
Pravin Satpute psatpute@redhat.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Flag|needinfo?(ifelix@redhat.com | |) |
--- Comment #4 from Pravin Satpute psatpute@redhat.com 2009-12-28 06:11:23 EDT --- Created an attachment (id=380616) --> (https://bugzilla.redhat.com/attachment.cgi?id=380616) sorting i am getting with my present patch
had a discussion with Felix, we need order like க் க கா கி
working on same
Please do not reply directly to this email. All additional comments should be made in the comments box of this bug.
https://bugzilla.redhat.com/show_bug.cgi?id=514110
Pravin Satpute psatpute@redhat.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Attachment #380616|0 |1 is obsolete| |
--- Comment #5 from Pravin Satpute psatpute@redhat.com 2010-01-20 06:16:48 EST --- Created an attachment (id=385655) --> (https://bugzilla.redhat.com/attachment.cgi?id=385655) This is present sorting order i am getting with my patch
please check whether this is proper order, if yes i will prepare patch for glibc, else we can correct same
Please do not reply directly to this email. All additional comments should be made in the comments box of this bug.
https://bugzilla.redhat.com/show_bug.cgi?id=514110
--- Comment #6 from Felix ifelix@redhat.com 2010-01-20 06:50:06 EST --- This is the correct order. You can go ahead.
Please do not reply directly to this email. All additional comments should be made in the comments box of this bug.
https://bugzilla.redhat.com/show_bug.cgi?id=514110
Pravin Satpute psatpute@redhat.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED
Please do not reply directly to this email. All additional comments should be made in the comments box of this bug.
https://bugzilla.redhat.com/show_bug.cgi?id=514110
--- Comment #7 from Pravin Satpute psatpute@redhat.com 2010-01-20 07:21:32 EST --- Created an attachment (id=385662) --> (https://bugzilla.redhat.com/attachment.cgi?id=385662) patch for enhancing tamil collation
Thanks felix for confirmation above patch will move tamil collation rules from ta_IN to iso_common file
it has some enhancement as well as compared to previous collation rules * handled pure consonant properly * now collation will be in iso_common file, so one can see sorted tamil characters. menus in any locale
Please do not reply directly to this email. All additional comments should be made in the comments box of this bug.
https://bugzilla.redhat.com/show_bug.cgi?id=514110
--- Comment #8 from Ulrich Drepper drepper@redhat.com 2010-02-03 06:32:30 EST --- Applied upstream.
Please do not reply directly to this email. All additional comments should be made in the comments box of this bug.
https://bugzilla.redhat.com/show_bug.cgi?id=514110
--- Comment #9 from K. Sethu skhome@gmail.com 2010-02-04 13:07:34 EST --- Pravin
Excuse me for participating rather late. My apologies.
In the attachment to your comment above in # Comment 5, I see the following
[..] ஃ க் க கடந்த கா காலையில் கி கீ கு கூ கெ கே கேஎஸ் கேஎஸ்ரவிக்குமார் கை கொ கோ கௌ க்க க்கௗ க்ங ங் ங [..]
To be correct க்க, க்கள, க்ங in the above should be in between க் and க and not as indicated above in which they are shown to be between கௌ and ங்
For example if we sort { அ, அக், அகம் அக்கம் அக்கு, அகால } to be in ascending order, then the correct result should be {அ, அக், அக்கம், அக்கு, அகம், அகால }
(அக் in above is of course a meaningless word - but for making a sorting order that has no bearing)
The above sorting order from your attachment suggests it will sort to {அ,அக்,அகம், அகால, அக்கம், அக்கு } which is wrong.
Compare with English alphabets in ascending order:
a,b,c,d,e,f,.....x,y,z
Now ask where should be aa, ab, ac, ad, ....,az be ? Obviously they are between a and b - like a,aa,ab,ac,ad,...,ax,ay,az,b,c,d,e... Right?
Well similarly க் followed by any letter should be somewhere in between க் and க. i.e., if க் is followed by a character, sort order is as follows:
க் < க்அ < க்ஆ < க்இ < க்ஈ < ..........க்ஒ < க்ஓ < க்ஔ < க்ஃ < க்க் < க்க < க்கா < க்கி ............< க்கொ < க்கோ < க்கௌ < க்ங் < க்ங < க்ஙா ........
(Wherever I indicate ....... they are to mean "and so on")
In English sorting of words, if two words have same first letter then they are next to each other and their relative placement is determined by second letter in each - if they are also same then by the 3rd letter, and ad-nauseum .
Essentially first sort on first letter, then wherever necessary on second letter and so on but for each of those sorting the order of sorting used is same. So the increasing order (or weight-ages) of characters is like a one dimensional vector set - the repeated application of which wherever needed in a collating of words.
The principle is same for Tamil too but there are some differences. In case of English in Unicode each character is represented by a single code point.
But in case of Tamil, I presume, the following differences could make collation algorithm conceptualisation to some degree more complex:
i) Canonical Decompositions - in each case of ஔ (0B94), ொ (0BCA), ோ (0BCB), ௌ (0BCC), the NFC and the NFD forms should have equality.
ii) Grantha ligatures SHRII , KSHA as well as KSHA+Virama - each of these of course are not represented by single code point but of 3 or 4 code point sequence.
iii) While the 11 vowel modifiers from ா (0BBE) to ௌ (0BCC) can be placed as singletons in ascending order (with provision for the canonical decompositions of the last 3 - 0BCA, 0BCB and 0BCC being made equal to each of their composed form) in case of Pulli (i.e., Virama) - 0BCD) it has to be placed along with each base consonant immediately preceding corresponding based consonant - i.e., க் (0B95 0BCD) preceding க (0B95) and similarly for each base consonant.
Hope I my observations so far are clear and correct ; if not I would like to know where I am wrong. I will write more after seeing responses
K. Sethu (11:30 PM , Colombo-Sri Lanka)
Please do not reply directly to this email. All additional comments should be made in the comments box of this bug.
https://bugzilla.redhat.com/show_bug.cgi?id=514110
--- Comment #10 from Pravin Satpute psatpute@redhat.com 2010-02-05 04:09:05 EST --- no props things happens, it will be nice if you can attach standard sorting list as per exception and some reference for that as well
may be in next cycle we can fix that
Please do not reply directly to this email. All additional comments should be made in the comments box of this bug.
https://bugzilla.redhat.com/show_bug.cgi?id=514110
Pravin Satpute psatpute@redhat.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Flag| |needinfo?(skhome@gmail.com)
--- Comment #12 from Pravin Satpute psatpute@redhat.com 2010-07-22 06:57:17 EDT --- can you attach sort order you are expecting? i am clear that you want same sort order, we are following presently for Malayalam
i.e cons + virama, should get sorted before cons
cons+virama cons
please check present malayalam sorting as well
Please do not reply directly to this email. All additional comments should be made in the comments box of this bug.
https://bugzilla.redhat.com/show_bug.cgi?id=514110
Parag pnemade@redhat.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |pnemade@redhat.com Version|13 |rawhide
Please do not reply directly to this email. All additional comments should be made in the comments box of this bug.
https://bugzilla.redhat.com/show_bug.cgi?id=514110
Pravin Satpute psatpute@redhat.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Flag| |needinfo?(santhosh.thotting | |al@gmail.com)
--- Comment #13 from Pravin Satpute psatpute@redhat.com 2011-02-24 05:31:22 EST --- santhosh,
can you provide patch for this? it looks same like you did for malayalam let me know
Please do not reply directly to this email. All additional comments should be made in the comments box of this bug.
https://bugzilla.redhat.com/show_bug.cgi?id=514110
Santhosh Thottingal santhosh.thottingal@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Flag|needinfo?(santhosh.thotting | |al@gmail.com) |
--- Comment #14 from Santhosh Thottingal santhosh.thottingal@gmail.com 2011-02-26 07:46:49 EST --- Pravin, I think we need a consensus from the Tamil speakers here. I have explained my patch here: http://thottingal.in/blog/2011/02/26/tamil-collation-in-glibc/ If we get a clear definition, I would be happy to prepare the patch.
Please do not reply directly to this email. All additional comments should be made in the comments box of this bug.
https://bugzilla.redhat.com/show_bug.cgi?id=514110
--- Comment #15 from Pravin Satpute psatpute@redhat.com 2011-02-28 01:59:21 EST --- Thanks Santhosh!! Yeah, taking some more comment from community is better before preparing patch
K. Sethu, you there?
Please do not reply directly to this email. All additional comments should be made in the comments box of this bug.
https://bugzilla.redhat.com/show_bug.cgi?id=514110
K. Sethu skhome@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Flag|needinfo?(skhome@gmail.com) |
--- Comment #16 from K. Sethu skhome@gmail.com 2011-02-28 04:26:32 EST --- Pravin
Yes I am alive ;>)
Just posted posted some comments on standards to Santosh's blog (gone to his moderation), but will write soon my opinions on the issues due from the rules designed to align with phonetic (or is it phonemic) decomposition of consonant+vowel syllables.
Till then Regards
K. Sethu
Please do not reply directly to this email. All additional comments should be made in the comments box of this bug.
https://bugzilla.redhat.com/show_bug.cgi?id=514110
Fedora Admin XMLRPC Client fedora-admin-xmlrpc@redhat.com changed:
What |Removed |Added ---------------------------------------------------------------------------- AssignedTo|psatpute@redhat.com |law@redhat.com
--- Comment #17 from Fedora Admin XMLRPC Client fedora-admin-xmlrpc@redhat.com 2011-11-14 14:14:45 EST --- This package has changed ownership in the Fedora Package Database. Reassigning to the new owner of this component.
Please do not reply directly to this email. All additional comments should be made in the comments box of this bug.
https://bugzilla.redhat.com/show_bug.cgi?id=514110
--- Comment #18 from Fedora Admin XMLRPC Client fedora-admin-xmlrpc@redhat.com 2011-11-14 23:46:52 EST --- This package has changed ownership in the Fedora Package Database. Reassigning to the new owner of this component.
Please do not reply directly to this email. All additional comments should be made in the comments box of this bug.
https://bugzilla.redhat.com/show_bug.cgi?id=514110
Andreas Schwab schwab@redhat.com changed:
What |Removed |Added ---------------------------------------------------------------------------- AssignedTo|law@redhat.com |schwab@redhat.com
Please do not reply directly to this email. All additional comments should be made in the comments box of this bug.
https://bugzilla.redhat.com/show_bug.cgi?id=514110
Matt Newsome mnewsome@redhat.com changed:
What |Removed |Added ---------------------------------------------------------------------------- AssignedTo|schwab@redhat.com |law@redhat.com
Please do not reply directly to this email. All additional comments should be made in the comments box of this bug.
https://bugzilla.redhat.com/show_bug.cgi?id=514110
Jeff Law law@redhat.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |NEW
Please do not reply directly to this email. All additional comments should be made in the comments box of this bug.
https://bugzilla.redhat.com/show_bug.cgi?id=514110
--- Comment #19 from Pravin Satpute psatpute@redhat.com 2012-01-31 03:34:20 EST --- Hi K. Sethu,
any update on this? sorting is bit tricky thing, so getting consensus from all might be difficult. At least we should stick to one standard.
Please do not reply directly to this email. All additional comments should be made in the comments box of this bug.
https://bugzilla.redhat.com/show_bug.cgi?id=514110
Jeff Law law@redhat.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Flag| |needinfo?
--- Comment #20 from Jeff Law law@redhat.com 2012-02-10 01:11:41 EST --- Without some kind of agreement on the sorting rules we can't move this issue forward. I'm marking this as NEEDINFO until we have that information.
Please do not reply directly to this email. All additional comments should be made in the comments box of this bug.
https://bugzilla.redhat.com/show_bug.cgi?id=514110
Jeff Law law@redhat.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Summary|[ta_IN] Tamil collation |[RFE] [ta_IN] Tamil |rules are not working in |collation rules are not |other locales |working in other locales Flag|needinfo? |
https://bugzilla.redhat.com/show_bug.cgi?id=514110
Jeff Law law@redhat.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |CLOSED Resolution|--- |INSUFFICIENT_DATA Last Closed| |2012-10-17 13:17:26
--- Comment #21 from Jeff Law law@redhat.com --- No response from K. Sethu in over a year. Closing with insufficient data. This can be reopened and addressed in the upstream sources if K. Sethu or someone else can provide us with the right rules for collation in this locale.
i18n-bugs@lists.fedoraproject.org