Martin Langhoff wrote:
2008/10/2 Sean Flanigan sflaniga@redhat.com:
I have a simple Ant task which can generate pseudo-translations like the one above from a gettext POT files,
I am after a few sets of "latin-lookalike" character tables I can use. Have you (or anyone) got pointers to good tables?
Well, I've made up a couple of simple ones (also attached as UTF-8): ASCII: "abcdefghijklmnopqrstuvwxyz" BMP only: "åЬçđéϝցⱨîﺩⱪŀოňøÞᕴяšŧմⱱשẋŷż" BMP+SMP: "åЬçđ𝖾ϝցⱨî𝚓ⱪŀოňøÞᕴяšŧմⱱשẋŷż"
You could also try googling for "LATIN SMALL LETTER {A,B,C,...} WITH", which should turn up all sorts of modified latin characters, such as LATIN SMALL LETTER V WITH RIGHT HOOK.
Another option is the Wikipedia Unicode pages http://en.wikipedia.org/wiki/List_of_Unicode_characters has several sections for extended latin scripts, and the Unicode mapping tables down the bottom are handy if you want to go directly to a certain Unicode range (eg to get away from the BMP).
The simple example phrase you provided hit a bug in moodle (php webapp) straight away - I think a few webapps have trouble with that funny 'e' (U+1D5BE). Interestingly, it's also present in Jira (Java-based webapp). Might be an iconv issue.
I chose that 'e' specifically because it wasn't part of the BMP, but apparently the mathematical alphanumeric symbols are a bit of a special case - I'm not sure if systems are expected to provide font substitution for them.
Zimbra (written in Java) had trouble with the 'e' too - it just removed it entirely. I think a lot of programs have trouble with characters that don't fit into 16-bit Unicode. My text editors and Thunderbird can show the 'e' character, but the cursor handling is all wrong on those lines.