Legacy document translations
by Shaun McCance
Hi all,
I've been working on getting translations from Zanata and merging them
into DocBook. There are two big issues, and I'd like to propose an
alternative for legacy documents. Here are the issues:
* Pulling from Zanata is slow. It's basically just a bunch of HTTP
calls, at least one per language per XML file. I don't see any way to
use etags or similar to avoid redownloading the same content. I had a
brief chat with bex on IRC about possibly having a git mirror of the PO
files. That would be faster.
* Merging requires Publican, because the merge code lives there. There
are two possible ways around this:
1) We pull Publican's Translate.pm into a standalone module and have a
tool ("publican-po"?) that just does PO extraction and merging exactly
the way Publican does. We'd have to maintain this, but it would be a
lower maintenance burden than all of Publican.
2) We merge with itstool instead. itstool's PO files don't exactly
match Publican's. So a 100% translated document might drop to 90% or
so. I could probably write custom ITS rules that would make it match
better. I don't know if I could get it to match 100%.
So, an alternative: For any documents that are no longer edited in any
way, we could do a one time merge of all translations and just put it
in git on that branch. That way there's no downloading (aside from the
git clone we do anyway), no merging, and no maintaining a legacy merge
tool going forward.
The downside is that we'd be putting a lot more content in git, which
could slow down git clones. Alternatively, we could put them all in a
separate repo. For example, all release-notes translations could go
into a new repo called release-notes-translations.
Thoughts?
--
Shaun
7 years, 5 months
Re: Legacy document translations
by Pete Travis
On Oct 14, 2016 10:00, "Petr Bokoc" <pbokoc(a)redhat.com> wrote:
>
> On 10/13/2016 11:00 PM, Pete Travis wrote:
>>
>> On Oct 13, 2016 07:27, "Brian Exelbierd" <bex(a)pobox.com> wrote:
>> >
>> > On Wed, Oct 12, 2016, at 08:09 AM, Jean-Baptiste Holcroft wrote:
>> > > if helping :
>> > > * Fedora Websites is not saving translation in git repository
>> > > Robyduck wrote explanation here :
>> > > https://pagure.io/fedora-websites/pull-request/36
>> >
>> > As Robyduck and shaunm have pointed out, pulling from Zanata is slow.
>> > We really need to address this with the Zanata team. Either they
>> > realize it is a problem and are trying to fix it or they have
envisioned
>> > a different workflow and we should consider that.
>> >
>> > This also goes back to the issue of trying to figure out when we should
>> > publish a translation. Do we publish partial translations? In other
>> > words, everytime English is updated, we publish updated, now by
>> > definition incomplete, translations as well. Or, do we only publish
>> > versions which are 100% translated and approved after being signaled by
>> > the translation team?
>> >
>> > The answer to the above questions will drive whether we really
want/need
>> > to cache PO files or not.
>> >
>> > regards,
>> >
>> > bex
>>
>> IMO it would be impractical to require 100% translation, or 100% review.
>>
>> Many documents seem to peak at 60 or 70% translation. For a book
project that probably means ie ~100% for release N-2 with no retranslation
on content updates. For an article collection, it would mean no translated
articles until all articles in a repo are translated - hopefully, that's an
unattainable moving target.
>>
>> I believe most teams did not opt into review, and suspect we'll need
to investigate whether we can effectively choose only reviewed POs for some
languages.
>>
>> Given these concerns it might be best to publish whatever translations
are available when we update the source content, plus periodic rebuilds
that only serve to update translated content.
>>
>> -- Pete
>>
>>
>>
>> _______________________________________________ docs mailing list --
docs(a)lists.fedoraproject.org To unsubscribe send an email to
docs-leave(a)lists.fedoraproject.org
>
>
> Partial translation should stop being an issue when (if) we switch from
large books into series of short, mostly self-contained articles. Before
that happens, though, +1 from me for publishing partially translated
content. Maybe with some kind of threshold, something like "only publish
books with more than 50% strings translated" or something similar.
>
Well, it becomes a different question at that point. With books we want to
publish translations of a project when N% of resources in a project have
been translated. With articles we want to publish each resource with over
N% translated strings. I need to review the API functionality again... but
at this stage getting the POTs generated and uploaded should be dealt with
first.
-- Pete
7 years, 5 months
Re: Legacy document translations
by Pete Travis
On Oct 11, 2016 10:23, "Shaun McCance" <shaunm(a)gnome.org> wrote:
>
> Hi all,
>
> I've been working on getting translations from Zanata and merging them
> into DocBook. There are two big issues, and I'd like to propose an
> alternative for legacy documents. Here are the issues:
>
> * Pulling from Zanata is slow. It's basically just a bunch of HTTP
> calls, at least one per language per XML file. I don't see any way to
> use etags or similar to avoid redownloading the same content. I had a
> brief chat with bex on IRC about possibly having a git mirror of the PO
> files. That would be faster.
>
Could we solve this by periodically pulling POs into the release branches
of our docs? I'm picturing a nightly script that checks out ie the f25
branch, pulls a language, does some tests, commits if the tests pass, and
moves to the next language. I read the suggestion as using a separate git
repo, which seems unnecessarily complex.
> * Merging requires Publican, because the merge code lives there. There
> are two possible ways around this:
>
> 1) We pull Publican's Translate.pm into a standalone module and have a
> tool ("publican-po"?) that just does PO extraction and merging exactly
> the way Publican does. We'd have to maintain this, but it would be a
> lower maintenance burden than all of Publican.
>
> 2) We merge with itstool instead. itstool's PO files don't exactly
> match Publican's. So a 100% translated document might drop to 90% or
> so. I could probably write custom ITS rules that would make it match
> better. I don't know if I could get it to match 100%.
>
Can you elaborate on what itstool does not do? Entities? I like the idea
of using an established tool vs partial fork, perhaps a little additional
processing will get us there.
> So, an alternative: For any documents that are no longer edited in any
> way, we could do a one time merge of all translations and just put it
> in git on that branch. That way there's no downloading (aside from the
> git clone we do anyway), no merging, and no maintaining a legacy merge
> tool going forward.
>
> The downside is that we'd be putting a lot more content in git, which
> could slow down git clones. Alternatively, we could put them all in a
> separate repo. For example, all release-notes translations could go
> into a new repo called release-notes-translations.
>
> Thoughts?
>
> --
> Shaun
>
OK, you did get to the separate repo question. Time spent fetching remote
refs seems to be the only downside to continuing our POs-in-release-branch
SOP. I don't see enough need for speed in the process to warrant the
increased procedural and architectural complexity. IMO publishing the
source lang and translated langs asynchronously would be fine.
That said, I have not personally done a multi-language build with pintail,
there may well be something I'm missing.
-- Pete
7 years, 5 months
Hello docs group member
by Luc Vu Nang
Hi team,
I have just found this group and hope I can contribute something. It's
great to know group members, let me be in group for sometime so I will know
how to contribute to the project.
Thanks & regards,
Vu Nang Luc
7 years, 5 months
Fedora 25 Beta Release Readiness Meeting, Thursday, October 6th @
19:00 UTC
by Jan Kurik
Join us on irc.freenode.net in #fedora-meeting-2 for the Fedora 25
Beta Release Readiness Meeting meeting.
The meeting is going to be held on Thursday, October 6th, 2016 at
19:00 UTC. Please check the [FedoCal] link for your time zone.
We will meet to make sure we are coordinated and ready for the Beta
release of Fedora 25. Please note that this meeting is going to be
held even if the release is delayed at the Go/No-Go meeting on the
same day two hours earlier.
You may received this message several times, but it is by purpose to
open this meeting to the teams and to raise awareness, so hopefully
more team representatives will come to this meeting. This meeting
works best when we have representatives from all of the teams.
[FedoCal] https://apps.fedoraproject.org/calendar/meeting/4782/
More information available at:
https://fedoraproject.org/wiki/Release_Readiness_Meetings
Thank you for your support and Regards, Jan
--
Jan Kuřík
Platform & Fedora Program Manager
Red Hat Czech s.r.o., Purkynova 99/71, 612 45 Brno, Czech Republic
7 years, 5 months