Friday, December 07, 2007

Translating Wikipedia articles (2)

Like I already said yesterday, I would come back to this argument today.

Apertium is already used in some projects, one of which is the Occitan Wikipedia. For those who are not familiar with Wikis: there you have the possibility to compare the not proofread version with the proofread version and that is something you will see by clicking here.

What you see on the left hand side is the text as it was after the machine translation and on the right hand side the proofread version of the text. The changes are highlighted in green on the left and in blue on the right hand side. There are even some parts of the text that were not changed at all.

The work on the glossary and the grammar rules (well I am not using the specific terminology here to make things understandable for all) has been going on for approximately one year now.

At a certain stage the problems arise from vocabulary that is missing and not so much from the rules. Of course these translations will probably never be a 100% perfect, but the quality depends very much on us and our adding terminology and classifying it.

Comparing the above result to what you would see for Spanish-Catalan, well the last one having been under development for years is much better.

You can find further reading about co-operation between Wikipedia and Apertium on the Apertium Wiki.

Language pairs that are right now available are:

  • Spanish←→Catalan
  • Spanish←→Galician
  • Spanish←→Portuguese (pt and pt_BR)
  • Catalan←→English
  • Catalan←→French
  • Catalan←→Occitan (oc and oc@aran)
  • Romanian→Spanish

Many other language pairs are under development. Of course: you may start on any language combination that is comfortable for you. Please keep in mind: the more similar two languages are the easier it is to program the rules, the faster the translation engine will produce good translations.

If you want to start to work on wordlists, please write me at: s.cretella (at) voxhumanitatis.org and tell me which language pair you are interested in. You can also reach me by skype at: sabinecretella

I will upload a wordlist to google docs and give you access. Please let me know if you have difficulties to work online (that is if you work with a dial-in connection).

The Apertium Chat is on Freenode.

One more thing I just received criticism since machine translation would flatten the language: well any translated text, in particular when it comes to literature translations, is post edited by a second person. The translation is never published directly since during translation - and you can be the best translator of the world - there are always some bits and pieces that sound a little strange or that do not really transport the scene into the other culture. And please allow me to introduce the concept of cultural localization here that will be explained in one of the future posts here and that was coined by Dr. Martin Benjamin who is part of the advisory board of Vox Humanitatis. The concept of cultural localization became then immediately part of the scope of the association.

And since I am adding notes here: please remember that the Fundraiser of the Wikimedia Foundation is still running and that you can help by donating and telling others that the fundraiser is on. For more information and to donate please click here.

4 comments:

Andri said...

that's a pretty interesting subject, I guess it could be really useful for minority languages like mine, Friulian. The only problem is that this seems quite a specialist project. I mean, I had a look at Apertium wiki, they explain pretty well what you should do to start, but it seems to me that you still need a technical background in linguistics, or at least you should be part of a team where somebody has it. It would be easier for people like me to have just an editor where you translate a text to your language and it does automatically the job and creates the Apertium files, learning from your choices and so on. I know that that it is a big task, I don't even know if it is feasible, but it would for sure help a lot. I hope we can have such a software in the future, of course open source so everybody can help
bye

SabineWanner said...

Aindrias, it is not as difficult as it seems I just started to create a first wordlist for Neapolitan and Bèrto is starting to go a similar way for Piedmontese. Considering that the Friulian language is one of the first ones to be also taught at school again, I am sure that in this way you can help your language a lot. If you are able to edit a spreadsheet like the one I have here for Neapolitan than you can do.

Besides that we would be really interested to get the right connections to people interested in improving the situation of the Friulian language for the future Chapter for the Friulian language on Vox Humanitatis. It would be great if you could contacted me by e-mail at: s.cretella (at) voxhumanitatis.org. I would then explain things in a more detailed way.

Each single step into the right direction, even the smallest one helps.

Thank you!

Andri said...

well, if it's as easy as editing a spreadsheet like that I guess I can do it, did you start from an example or something?
for vox humanitatis, if I understand correctly it's a frame for supporting languages development. I'm interested in it, even if I think you'll better create a community around your language and then you can produce something useful. By the way, we have already a Firefox localization. I'm quite busy for the next month with studies, but I'll contact you via mail to know more
For teaching in schools, it's just a beginning, and a low profile one. But I guess you can't have everything in a moment, especially in Italy, and if you have to start from a base

SabineWanner said...

Well, yes, you can and it will help to get more contents more easily. Please note that the Apertium people will help you from the technical side. I will get you in touch with them when you write me. As for Vox Humanitatis there will be one chapter for each language and if there is already an organisation (not for profit) we would in any case prefer to co-operate instead of creating the same kind of organisation again. So I believe we will have some really nice information exchange. I will not be online too much during the next two weeks, but e-mail will hopefully be checked on a daily basis.

Khalil Gibran über die Musik

Die Musik wirkt wie die Sonne, die alle Blumen des Feldes mit ihrem Strahlen zum Leben erweckt. ( Khalil Gibran ) Image by Pete Linforth fr...