Saturday, August 12, 2006

OmegaT, WiktionaryZ, Betawiki ... some questions that need an answer ...

In the Wiktionary IRC the following questions were made by Connel: "... considers omegat.org. Is the intent for it to just auto-upload stuff to WZ? to/from ZW? Or betawiki, or both betawiki and WZ? Or is betawiki just for WikiMedia total localization?"

That is a lot ... so let me go step by step.

The intent of OmegaT is not to auto-upload stuff to WiktionaryZ or download it from there. Nor is it only there for Betawiki and WiktionaryZ, even if it will probably be used for both sooner or later. OmegaT is a CAT-Tool that helps translators to do their work.

What does this mean: imagine you use for all of your translations a tool that creates a Translation Memory, a file containing the translations you did segmented into sentences, combining source and target sentence. Then you do further translations and let the CAT-Tool access these already translated files. Now if your translation is of a subject you already translated chances are high that most terminology needed is already in there and you can even see in which context it was used. So with OmegaT you do a search on your project and the available translation memories to see if and how a term was already translated. This can help a lot.

Now consider a manual - of a machine, a computer, whatever. These manuals need updates once a new version of that machine or computer is produced. Normally companies than also just update the description and parts of it remain the same as before (simply because the functionality of these parts is still the same). When you then translate you will find these parts that are unchanged in your translation memory and depending on how you set your options OmegaT proposes the 100% match or overwrites the translation part of your project with the already existing translations. In this way you can save loads of time.

Having the right parser also the MediaWiki UI could be translated in such a way. Now we always will have people that translate things manually online and who will not use a CAT. This means that OmegaT should be able to access the single pages containing the messages on Betawiki, you translate them on your computer and store them to the page in the correct language version. This is feasible.

Another use will be: creation of contents for small wikipedias. Once we get our wiki read/wiki write option within OmegaT it is possible to start a translation of an article, let's say from the English wikipedia, and translate it to any language, let's say the Neapolitan wikipedia. This means you tell OmegaT which page to get on en.wikipedia and which page to write on nap.wikipedia. The same is valid for any African language. The advantage of this is: if there is no online-connection people can work offline on translations.

The translation memories out of these translations should be stored (WiktionaryZ is already enabled to upload translation memories) somewhere in order to allow others to access and use them to be faster and of higher quality during their own translations. Another aspect of doing things this way is: the proof reading of a translation is easier since you see the source text above the translation for each sentence. This eases the job a lot and the quality of the translated article raises.

Now to WiktionaryZ and OmegaT: OmegaT for now has quite a simple glossary function - you create a tab separated text file and put it into your glossary directory. While you translate OmegaT shows you the translation proposals for the words that are present in that sentence and in the glossary. Now imagine what that means if you connect the glossary function to WiktionaryZ: the whole repository of data at your fingertips - of course: considering the mass of data that is online in WiktionaryZ it becomes very important to attribute domains to terminology. Often a word can be translated in 20 ways or even more into another language ... well, it does not make sense if you are doing a translation about medical equipment that you get proposals from another domain, let's say machinery - the possibilities from other domains should only be proposed (showing that other domain) when there is no entry for medical equipment.

At this stage we don't have this domain structure for terminology on WiktionaryZ and therefore the data, once we have loads of it online, cannot be used - it would just create a huge mess and would be very time consuming. So one of the things we really nees asap is a domain structure where we can connect the single terms to - the sooner we have it the better .... otherwise we will have loads of double and triple work or WiktionaryZ could become completely useless for the use within OmegaT and as such it would not be of any advantage for translators. Not even for scientist really ... imagine a biologist search for terminology and get whatever result ... also those of machinery or whatever other domain.

Back to the use within OmegaT:

The next step is then: what if the searched term is not in WiktionaryZ ... I already noted that during my last translation - for now it is too time consuming to add terms to WiktionaryZ and also Wiktionary when you wish to do that while you are translating - but: it would make so much sense. So what is planned in the reference implementation for a translation glossary is that when working with OmegaT you get the possibility to add such a term directly from there. You simply tell OmegaT to add it to WiktionaryZ with your user ID and you can attribute all the necessary domains etc. without problems as well as tag the term as "definition needs to be added". What happens in that way is that WiktionaryZ will get quite a bunch of very specific terminology over time.

Another use is OmegaT for language lessons - Connel, from en.wiktionary thought about it and he is right: OmegaT could be used for language learning as well ... what if we have a huge sentence repository and people start to translate texts to study that language - they do not need a paper dictionary - OmegaT would help them to see the use of a word in various sentences and they would get the terminology proposals like the translators. When being back at school or university (or maybe also online with a language teacher) they can understand their errors, update WiktionaryZ and the online sentence repository.

For exams teachers would have a mass of proposals and they could determine which glossary group shall be included in the exams ... that is to be thought about ... it was not considered up to now even if there are already thoughts on how to use WiktionaryZ for language learning.

Did I miss something? Hmmm ... not sure. Well if you have questions: just ask :-)

No comments:

Khalil Gibran über die Musik

Die Musik wirkt wie die Sonne, die alle Blumen des Feldes mit ihrem Strahlen zum Leben erweckt. ( Khalil Gibran ) Image by Pete Linforth fr...