On Tue, Feb 26, 2008 at 08:05:01PM -0800, Peter Gordon wrote:
In some languages, though, the diacritic differentiates the character
from the "plain" form. For example, a Spanish package name for a similar
studying software could be "¡Estudiará!" (third-person indicative future
tense; literally, "You will study!"). However, we would need to be
careful here because, without that accent, this changes the conjugation
to "estudiara," which is the first- and third-person imperfect
subjunctive (which really makes no sense on its own, since the
subjunctive tense is meant to be used in a subjective or predictive
clause of a sentence, such as referring to one's wants and desires for
the future).
If wikipedia is right in
http://en.wikipedia.org/wiki/Transliteration
what you are trying to do is not transliteration (in a narrow sense),
but transcription. Transliterated word are not necessarily pronounced
the same. There is an automatic mapping between characters only,
irrespective of the correctness of the result in the original language
(the character mapping is in general based on characters similitude or
sounds in english when it comes to transliteration in ASCII 7 bit).
I think that we should not permit non ASCII 7 bit letters in package
names, but the transliteration or transcription scheme used should be
left to the packager.
We can have aids, still, for packagers who don't have an idea on how to
transliterate an upstream name. For example in texi2html/texinfo we
transliterate non ascii characters in file names. There are tables in
texinfo for some characters, and I use Text::Unidecode to complete in
texi2html. That way the file names are ascii 7 bit and are unlikely to
be problematic in any platform (but the portability issue is more severe
than in fedora, since we want these file names to be usable everywhere).
This scheme seems to work for a lot of characters -- though maybe better
schemes could be devised, if needed. But I don't think this would be
needed, instead left to the packager.
--
Pat