[ibus-typing-booster] miketmp: Use normalization form NFD internally for Korean as well (f07da6a)
by mfabian@fedoraproject.org
Repository : http://git.fedorahosted.org/git/?p=ibus-typing-booster.git
On branch : miketmp
>---------------------------------------------------------------
commit f07da6a1fc34f4fe2af67d49ee3e7d3032eb7d5e
Author: Mike FABIAN <mfabian(a)redhat.com>
Date: Sun Sep 1 17:54:42 2013 +0200
Use normalization form NFD internally for Korean as well
Use the normalization forms as follows:
NFD: for internal matching in the sql database and grepping
in the hunspell dictionary for all languages except
Korean
NFC: for display in the preëdit and the lookup table, for calculating
the caret position in the preëdit, for commiting.
Also for input into pyhunspell.
I used NFKD for Korean internally for the database. But that makes it
difficult to use the same database for all languages.
And the reason that I used NFKD for Korean is only that the 2
Korean input methods we have in m17n produce Jamo from
the Hangul compatibility block and not from the Hangul Jamo area.
For example transliterationg 'h' using /usr/share/m17n/ko-romaja.mim
gives
U+314E ㅎ
not
U+1112 ᄒ
https://en.wikipedia.org/wiki/List_of_hangul_jamo
To match correctly in the hunspell dictionary
/usr/share/myspell/ko_KR.dic we need the latter (U1112 from the Hangul
Jamo area).
But instead of converting to NFKD when writing to our database and
when matching in the hunspell dictionary, we can just as well convert
the transliteration output immediately to NFKD. Converting the
transliteration output of /usr/share/m17n/ko-romaja.mim and
/usr/share/m17n/ko-han2.mim to NFKD converts U+314E ㅎ to U+1112 ᄒ
etc.
Further converting these Jamo from the Hangul Jamo area (like U+1112 ᄒ)
to any of the normalization forms NFC, NFKC, NFD, and NFKD does not
change them anymore. I.e. when the conversion to NFKD is done right
after the transliteration when using one of these 2 Korean input methods,
the matching works as well and we do not have to use a different
normalization form for Korean in our database.
>---------------------------------------------------------------
ibus-typing-booster/engine/hunspell_suggest.py | 2 --
ibus-typing-booster/engine/hunspell_table.py | 6 ++++++
ibus-typing-booster/engine/tabsqlitedb.py | 2 --
3 files changed, 6 insertions(+), 4 deletions(-)
diff --git a/ibus-typing-booster/engine/hunspell_suggest.py b/ibus-typing-booster/engine/hunspell_suggest.py
index 9e9e1ed..c1556fa 100755
--- a/ibus-typing-booster/engine/hunspell_suggest.py
+++ b/ibus-typing-booster/engine/hunspell_suggest.py
@@ -41,8 +41,6 @@ class Hunspell:
def __init__(self,lang='en',loc='/usr/share/myspell/',dict_name='en_US.dic',aff_name='en_US.aff'):
self.language=lang
self.normalization_form_internal = 'NFD'
- if self.language.startswith('ko'):
- self.normalization_form_internal = 'NFKD'
self.loc = loc
self.dict_name = dict_name
self.aff_name = aff_name
diff --git a/ibus-typing-booster/engine/hunspell_table.py b/ibus-typing-booster/engine/hunspell_table.py
index 43e71f9..9e49350 100644
--- a/ibus-typing-booster/engine/hunspell_table.py
+++ b/ibus-typing-booster/engine/hunspell_table.py
@@ -173,6 +173,9 @@ class editor(object):
if self.trans_m17n_mode:
self._transliterated_string = self.trans.transliterate(
self._typed_string)[0].decode('UTF-8')
+ if self._current_ime in ['ko-romaja', 'ko-han2']:
+ self._transliterated_string = unicodedata.normalize(
+ 'NFKD', self._transliterated_string)
else:
self._transliterated_string = self._typed_string
@@ -266,6 +269,9 @@ class editor(object):
self._typed_string[:self._typed_string_cursor])[0].decode('UTF-8')
else:
transliterated_string_up_to_cursor = self._typed_string[:self._typed_string_cursor]
+ if self._current_ime in ['ko-romaja', 'ko-han2']:
+ transliterated_string_up_to_cursor = unicodedata.normalize(
+ 'NFKD', transliterated_string_up_to_cursor)
transliterated_string_up_to_cursor = unicodedata.normalize(
'NFC', transliterated_string_up_to_cursor)
return len(transliterated_string_up_to_cursor)
diff --git a/ibus-typing-booster/engine/tabsqlitedb.py b/ibus-typing-booster/engine/tabsqlitedb.py
index 39d2acd..7c42ad8 100755
--- a/ibus-typing-booster/engine/tabsqlitedb.py
+++ b/ibus-typing-booster/engine/tabsqlitedb.py
@@ -90,8 +90,6 @@ class tabsqlitedb:
self.ime_properties = ImeProperties(self._conf_file_path+filename)
self._language = self.ime_properties.get('language')
self._normalization_form_internal = 'NFD'
- if self._language.startswith('ko'):
- self._normalization_form_internal = 'NFKD'
self._m17ndb = 'm17n'
self._m17n_mim_name = ""