Bina's blog: August 2006

Thursday, August 31, 2006

Creating contents for many Wikipedias

The basis to this is a project about mass contents creation on meta and Wikidata. Mass contents creation is an idea of user Millosh and yes, he is right about that - it is how I already did certain stuff for the Neapolitan wikipedia.

There is so much easy to create contents out there that Wikipedias could share easily and even if we will not have Wikidata implemented into wikipedias we can use the data in databases to create stubs by using Mailmerge (in OpenOffice.org or Word) and upload them with the bot. (see my other post of today).

This means: if we now start to add all names of:
continents
countries
cities
rivers
mountains
monuments
places
yes, even streets, because there are some who have translations
lakes
seas
animals
plants
names of people (also these are translated)
etc.

And then we start to translate them. At the same time people care to add statistical data to a table that is exactly about this (if we cannot do this with a separate wikidata installation ... anyway we do not need relational information for now ... just information).

How many articles (stubs) can be created in this way and how many people can work on it?

We also should not forget about film and book titles, the Greek and Roman gods (I suppose other parts of the world will have other material on such tings).

It is really a huge project, but it is feasable ... there are many of us who have similar goals.

Where to start: well we need the infoboxes translated into as many languages as possible - and we then need the place names etc. translated. This must be combined with a datasheet.

Example:
Castua: http://it.wikipedia.org/wiki/Castua
We have the box on the right side with all the statistical/basic information - all that can be translated into many languages. The first sentence in the stub will simply be the definition of WiktionaryZ.
So most of it like stato (state), regione (region) etc. can be translated within WZ and - in that way we can populate the templates used to all wikis. As for the not "not seen part" of the template I would use or a lingua franca (English) or simply the same names that are visible.

There's not much about it - we need the lists to start off with. If we use the pagefromfile.py to upload the ready pages existing ones with the same name will be skipped and written in a logfile - these are then the only ones someone has to look after manually.

If sooner or later we get a pure wikidata application that takes the translations from WZ and combines them with the rest of the data: that would be great ... since that would avoid that we need to correct the entries when there are corrections.

Using the Geoboxes we already have a good way to compare lists ... but does it make sense to do it that way right now? Or does it make sense to prepare now all possible translations to be ready once we can have wikidata for geographical entries?

Hmmm ... I was interrupted quite often while writing this blog ... and I don't have the time to re-read now. So sorry if things seem to be a bit mixed up.

Tuesday, August 29, 2006

Adding contents to wikipedia using a bot

Well, this question comes up over and over again and I would like to describe here how to do this - and this is valid for Wikipedia and Wiktionary.

Now I did this quite often on the Italian wiktionary and on the Neapolitan wikipedia (and some other projects).

For the upload I use the pywikipediabot - and in particular pagefromfile.py. This bot was mainly created to upload pages to Wiktionary, but then it turned out to be a great tool for wikipedia as well.

You need a .txt file saved in utf-8 code. The bot understands the first word on the page between '''and''' pagename and will of course create that page. If the page already exists it will be skipped.

Now the question I got is how a typical entry would look like. Here is an example:

{{-start-}}
'''Rome''' is the capital of Italy.
{{-stop-}}

This means the bot would create the page Rome and add the contents "Rome is the capital of Italy." to the page.

If the first word between '''and''' is not the page name you can use a workaround using a comment:

{{-start-}}

'''Statistical data''' about Rome: ....
{{-stop-}}

In this case the template Rome is being created that contains statistical data.

Just add everything you want to see on the wikipage you want to create between start and stop.

Now one thing you are probably wondering about is how to do this for a huge number of cities or other data. Well: use mailmerge in OpenOffice.org Writer or Microsoft Office and create the layout for a typical template page, then enter the fields of the database you have and simply have it merge. Copy and paste the whole contents of the resulting file into a .txt file (Editor) and save it with utf-8 coding. You can try to do this with Word and OpenOffice.org as well (I mean create the utf-8 coded text file), but we noted that on some systems this creates problems. So just try it out.

Then copy the file in your pywikipediabot folder and call the file.

To have the bot run I use the following comand for the file nap.txt:
pagefromfile.py -start:{{-start-}} -end:{{-stop-}} -file:nap.txt -utf

Of course first you must login using login.py.

I hope this helps those who want to know how to do things. If you have further questions: well, just ask :-) I'll answer asap.

Tuesday, August 22, 2006

Lost in translation ????? (Episode 1)

This is called lost in translation, because sometimes we have to translate really funny (???) stuff and I would in some way like to collect these examples of source texts:

When MMS cannot be sent out due to setting of GPRS not complete, system will pop up a window to notify User that GPRS are not complete.
The E-mail may contain virus or other elements that will be harmful to your PC or cellular phone, if you do not certain the sender’s identity, please do not open the accessories.
If have input the Name, press OK key.
It is mainly used to provide digital mobile phone and other wireless terminal devices with wireless communication and information service for.

Saturday, August 19, 2006

What is a TMX file?

This is a question I received quite frequently during the last days and therefore I believe it makes sense to write a blog about it.

TMX stands for Translation Memory eXchange. It is a standard format used by many CAT-Tools (CAT = Computer Assisted Translation). CAT-Tools are mainly used by translators, but lately, talking with Connel and other membes in the Wiktionary chat he suggested them for language study - and yes, it makes sense to use them also there. TMX-files would then be even more relevant. Students translate texts of different levels and aften having the translations corrected by the teacher or professor they exchange them with others. When searching for a word in a specific sentence they can do a concordance search in the Translation Memory and so they will see how that specific word was used in other sentences.

As for translators Translation Memories are helpful in two ways: one for concordance search and two for repetitive texts and updates of manuals they already translated before. Imagine you translate a manual of a TV-set then, one year after a new model of that TV-set is produced and you get the follow-up translation. By using your translation memory of the year before you will find many sentences that are already there - maybe they need to be adapted a bit to make reading more fluent, sometimes you do not need to do even that (well, you have to check, of course) . This helps to assure quality.

Tags: TMX, CAT, Computer Assisted Translation, translation, translator, Translation Memory, Translation Memory eXchange

Thursday, August 17, 2006

Articles ...

Quite a bunch of news this time copied from words & more - there you can also access to the links to read the complete articles.

Los médicos de este centro hospitalario pueden comunicarse, gracias a este sistema, en tiempo real con pacientes que hablen inglés, árabe, chelja, alemán y francés. - El servicio de traducción telefónica simultánea, puesto en marcha por el complejo hospitalario Carlos Haya de Málaga a finales de 2004, ha atendido ya 585 traducciones en inglés, árabe, chelja (idioma hablado fundamentalmente en el norte de África), alemán y francés.
La Policía Municipal de Madrid 'habla' idiomas - Los agentes locales cuentan desde este verano con un sistema pionero de traducción para atender a los turistas extranjeros en su propia lengua.
Chiesa, quando Matteo Ricci tradusse Confucio in latino - La storia del gesuita che voleva globalizzare le culture: nel 1594 portò a termine quella che per l'epoca fu una vera e propria impresa culturale
AAA Translation Selects Shafer Communications as its Public Relations Agency of Record - As AAA Translation expands to meet growing needs of the global marketplace, Shafer Communications will create comprehensive public relations program to support client's growth.
Road sign leaves Welsh-speakers bewildered - Welsh-speaking cyclists have been left baffled - and possibly concerned for their health - after a bizarre translation mix-up.
Dollar Renta Car Launches License Translation Service in Japan - Dollar Rent A Car, a subsidiary of Dollar Thrifty Automotive Group, Inc. (NYSE: DTG) today announced a new driver's license translation service for its Japanese customers traveling to the United States.
Language Weaver to Demonstrate Integration of Automated Translation into Homeland Security Support Applications at Intelink Conference - Language Weaver, a leading software company developing enterprise software for the automated translation of human languages, today announced it will demonstrate multiple applications where automated language translation has been integrated with communications programs that help the homeland security and U.S. defense efforts.
Ukrainian PM confirms his stance on Russian language issue - Ukraine's PM Viktor Yanukovich told journalists in Sochi today, that the Russian language would be granted status of the second national language in Ukraine as soon as the coalition secures a majority in the Supreme Rada.
LanguageScape.com Helps Companies and Individuals Bridge Language Barriers and Expand Their Global Reach - BOSTON, Aug. 16 /PRNewswire/ -- EditAvenue Incorporated today announced the launch of http://www.LanguageScape.com, an online marketplace for translation services, to help both companies and individuals translate documents into any language.
Verbalplanet.com Launches the World’s First Global Online Language Tuition Marketplace - United Kingdom (PRWEB) August 16, 2006 -- Verbalplanet.com is a global marketplace for online language tuition services, enabling language tutors to sell their services online and interact with language learners across the globe.
Sorenson opens new sign-language interpreting centers across U.S. - Sorenson Communications has opened 17 new video relay service interpreting centers for deaf and hard-of-hearing individuals throughout the United States, the company said Tuesday.
Monterey's Language Line shrinks Q2 loss to $2.9M - Language Line Holdings Inc. on Monday reported a second quarter loss of $2.9 million, about 12 percent lower than its loss of $3.3 million in the year-ago period.
Language no issue in Chinese venues - Much has changed in China since President Richard M. Nixon's historic visit in 1972. As you walk down a street and see McDonald's, KFC, Starbucks and Victoria's Secret, you may think you are in New York, Los Angeles or Chicago rather than Beijing, Shanghai or Guangzhou.
Doctors Look To VoIP To Bridge Language Barriers - A creative use of voice and video over IP is helping three California hospitals overcome increasingly common language barriers between doctors and patients.
Views sought on boosting Gaelic - Scotland's first ever National Plan for Gaelic has gone out for public consultation.
Wiradjuri Language resource launch - Parkes Shire library has acquired a number of books and CDs which form a Wiradjuri Language resource. The collection consists of a Wiradjuri Dictionary and kits on learning Wiradjuri and Wiradjuri language songs for children of all ages.
Mehr Effizienz durch klare und freundliche Sprache - Das "Handbuch Bürgerkommunikation" verdankt seine Entstehung dem Projekt "Verständliche Verwaltung", das von der Stadtverwaltung Arnsberg in Angriff genommen wurde.
Duden-Redaktion gibt Google nach - Sprache contra Markenschutz: Die Redaktion des Duden hat die Definition des Verbs "googeln" geändert.
Spracherkennung: Immer präziser, immer effizienter - (pd) Computerprogramme, die Sprache in Text umwandeln, werden immer besser. Vor allem Krankenhäuser, Ärzte und Juristen nutzen Softwarelösungen zur digitalen Spracherkennung.
Fremdsprachenkenntnisse erweitern den Freundeskreis - Fremdsprachen zu beherrschen ist hilfreich - Mischlingshund Alex erklärt warum...
Eine Studie zeigt die sprachlichen Trends für das kommende Jahr - Die Slogans der deutschen Werbung werden kürzer, einfacher, deutschsprachiger und auffordernder. Medienbeobachter von Slogans.de und Trendbüro Hamburg vergleichen die Merkmale Wortwahl, Wortart, Wortanzahl, Worthäufigkeit, Wortverwendung, Satzbau, Satzart, Satzeichen und verwendete Sprache.
Sechsfachsuche für Firefox - Der "Feuerfuchs" gilt zunehmend als Kult-Browser. Die Popularität des Gratis-Programms zeigt sich auch an der hohen Zahl der Plugins, die inzwischen im Web zu haben sind. Die Definero-Toolbar spendiert gleich sechs nützliche Suchroutinen.
Quand la traduction fait passer les vessies pour des cyclistes - Par la faute d'une erreur de traduction, les cyclistes abordant un rond-point très fréquenté au Pays de Galles sont avertis par un panneau d'une «irritation de la vessie» en lieu et place d'un avis leur conseillant de descendre de vélo.
Translators Selling on eBay - Lately I’ve seen a number of translators selling their services on eBay (Germany). Personally I think that eBay is not the best place to sell ...

Tuesday, August 15, 2006

What to do with all those links ....?

That was the question when I received a link yesterday ... now I get frequently interesting links and would like to put them somewhere - these are not always news - it can be a funny website, an e-book, simply an interesting website, a word on WiktionaryZ, an article on Wikipedia - there are so many possibilities ... you might have the same problem ... well: for that scope I opened the section Linksoup on words & more. There I simply paste the links with two or three words that should give an idea about the link. You can do the same btw - but please log in - in that way it is easier to understand who does what and if an IP is a spam IP or not. In a second stage, when the software on words & more is upgraded I will install semantic Mediawiki and tag them. From that moment on the "soup" will be easier to search. At a certain stage, when links become too much, a different scheme will be needed ... but for now ... I feel it is a good solution to make sure links do not get lost.

Monday, August 14, 2006

Articles ...

Links to articles added on wordsandmore.org

Alles in Butter oder kommt doch das blaue Wunder? - „Nein, da bist du aber auf einem Holzweg, Lisa!“ ... Den Holzweg, aus dem die Redewendung resultiert, findet man im Mittelalter.
Was war. Was wird. - Das Schöne am Journalismus ist, dass es immer etwas Neues zu lernen gibt, immer neue Worte buchstabiert werden müssen. Nehmen wir nur die "Degetoisierung" des deutschen Fernsehvolkes, die der Musikantenstadlerisierung auf dem Fuß folgt.
Welcome to linguafranca.com - According to Kaled Fattal: “People say the Net works, but it only works for those communities whose native language is Latin-based.

MediaWiki software and "small" wikiprojects

This is a post I am sending right now to the wikitech-l ... I am posting it here as well, because many people will not read it there - only a few are subscribed. There is a discussion about wysiwyg and wiki software - now the discussion went into the direction of specific needs for specific languages and/or keyboards. Since I myself face some of these issues daily it only makes sense to talk about it.

Well: let's make some practical point: people on the nap.wikipedia are driven away because they have to use workarounds for '' - that is ' in whatever combination - we now uwse '' for '' to have a unique way to identify words and word combinations if some day we should need to use replace.py. Many now create non-standard artilces using the accent of à or á that it ` or ´ to create articles ... inserting spaces where they are not needed etc. (all sorts of strange solutions to avoid to see all in italics afterwards) before the &# thingie we needed to use to get things right. It is an annoyance to have continuously use workarounds - now I am quite "wikiphile" I'd say, being able to install my own wikis + extensions and create sometimes quite complicated templates ... imagine how someone feels who has no clue at all about wikis - someone who would like to start a first article and then, clicking on save gets weird stuff and therefore does not come back to edit again, because editing is "too difficult". I have also plenty of colleagues who maybe would like to help, but simply don't want to loose the time to learn wiki, because donating a translation is already a lot considering how much they would earn if they did translation jobs instead.

Next thingie - the | sign - you don't have it on the German keyboard - and it gets even worse if you have a laptop keyboard like I have - there is no way to reproduce it easily with alt+ like I could do it on a normal keyboard.

How I work on the nap.wikipedia? Well there are two things to it: I first write in OOo or Word, then I subsitute all '' with the numeric code with search and replace and I avoid to create wiki-links, because I am simply very annoyed (better I remain with nice words...) to copy and paste it here and there. Or I write the article on the wiki and then substitute the parts needed - both ways require loads of time more.

At least the {{ [[ are on the keyboard - so normal wiki-links are not much of an issue - for Wiktionary that works fine - but not for wikipedia where you have declensed forms of a word.

I find wikis great - but they are not suitable for the biggest part of potential contributors - maybe to 5% of them. See: I already said this last time I wrote about these issues here ... if Mediawiki was not developed by English sepaking people, but maybe by people speaking some kind of "strange" language - if it was not developed for the English speaking market only we would have more contributors in the regional editions it would be different - more conscious about such issues - no, please don't object - if people do things most of them only think about the English wikipedia - the other projects might exist, but are considered to be some kind of fun-stuff ... instead many of the small wikipedias are very serious projects - much more serious than anyone of you can imagine, they face many more problems you could ever imagine - a good software approach would help many of us to grow a better community and to be able to create more articles instead of having to re-read and adapt every article to our standards (' is some kind of standard for nap now) and people, who already have difficulties in writing local languages, yes we have an alphabetization rate of approx 2 to 4%, would not have to concentrate on multiple issues at a time, they could concentrate on the text they are writing ... that would be great ... that would be really a step ahead ... that would mean "think about the users" not "for the majority it's fine ... so who cares ..."

I am sorry that I have to write this ... it should not be necessary.

I'll post this mail also on my blog in order to have it accessible to more people.

Thank you for taking the time to read ... well I hope one day I will be able to say "thank you for caring about the small communities".

Best, Sabine

Saturday, August 12, 2006

OmegaT, WiktionaryZ, Betawiki ... some questions that need an answer ...

In the Wiktionary IRC the following questions were made by Connel: "... considers omegat.org. Is the intent for it to just auto-upload stuff to WZ? to/from ZW? Or betawiki, or both betawiki and WZ? Or is betawiki just for WikiMedia total localization?"

That is a lot ... so let me go step by step.

The intent of OmegaT is not to auto-upload stuff to WiktionaryZ or download it from there. Nor is it only there for Betawiki and WiktionaryZ, even if it will probably be used for both sooner or later. OmegaT is a CAT-Tool that helps translators to do their work.

What does this mean: imagine you use for all of your translations a tool that creates a Translation Memory, a file containing the translations you did segmented into sentences, combining source and target sentence. Then you do further translations and let the CAT-Tool access these already translated files. Now if your translation is of a subject you already translated chances are high that most terminology needed is already in there and you can even see in which context it was used. So with OmegaT you do a search on your project and the available translation memories to see if and how a term was already translated. This can help a lot.

Now consider a manual - of a machine, a computer, whatever. These manuals need updates once a new version of that machine or computer is produced. Normally companies than also just update the description and parts of it remain the same as before (simply because the functionality of these parts is still the same). When you then translate you will find these parts that are unchanged in your translation memory and depending on how you set your options OmegaT proposes the 100% match or overwrites the translation part of your project with the already existing translations. In this way you can save loads of time.

Having the right parser also the MediaWiki UI could be translated in such a way. Now we always will have people that translate things manually online and who will not use a CAT. This means that OmegaT should be able to access the single pages containing the messages on Betawiki, you translate them on your computer and store them to the page in the correct language version. This is feasible.

Another use will be: creation of contents for small wikipedias. Once we get our wiki read/wiki write option within OmegaT it is possible to start a translation of an article, let's say from the English wikipedia, and translate it to any language, let's say the Neapolitan wikipedia. This means you tell OmegaT which page to get on en.wikipedia and which page to write on nap.wikipedia. The same is valid for any African language. The advantage of this is: if there is no online-connection people can work offline on translations.

The translation memories out of these translations should be stored (WiktionaryZ is already enabled to upload translation memories) somewhere in order to allow others to access and use them to be faster and of higher quality during their own translations. Another aspect of doing things this way is: the proof reading of a translation is easier since you see the source text above the translation for each sentence. This eases the job a lot and the quality of the translated article raises.

Now to WiktionaryZ and OmegaT: OmegaT for now has quite a simple glossary function - you create a tab separated text file and put it into your glossary directory. While you translate OmegaT shows you the translation proposals for the words that are present in that sentence and in the glossary. Now imagine what that means if you connect the glossary function to WiktionaryZ: the whole repository of data at your fingertips - of course: considering the mass of data that is online in WiktionaryZ it becomes very important to attribute domains to terminology. Often a word can be translated in 20 ways or even more into another language ... well, it does not make sense if you are doing a translation about medical equipment that you get proposals from another domain, let's say machinery - the possibilities from other domains should only be proposed (showing that other domain) when there is no entry for medical equipment.

At this stage we don't have this domain structure for terminology on WiktionaryZ and therefore the data, once we have loads of it online, cannot be used - it would just create a huge mess and would be very time consuming. So one of the things we really nees asap is a domain structure where we can connect the single terms to - the sooner we have it the better .... otherwise we will have loads of double and triple work or WiktionaryZ could become completely useless for the use within OmegaT and as such it would not be of any advantage for translators. Not even for scientist really ... imagine a biologist search for terminology and get whatever result ... also those of machinery or whatever other domain.

Back to the use within OmegaT:

The next step is then: what if the searched term is not in WiktionaryZ ... I already noted that during my last translation - for now it is too time consuming to add terms to WiktionaryZ and also Wiktionary when you wish to do that while you are translating - but: it would make so much sense. So what is planned in the reference implementation for a translation glossary is that when working with OmegaT you get the possibility to add such a term directly from there. You simply tell OmegaT to add it to WiktionaryZ with your user ID and you can attribute all the necessary domains etc. without problems as well as tag the term as "definition needs to be added". What happens in that way is that WiktionaryZ will get quite a bunch of very specific terminology over time.

Another use is OmegaT for language lessons - Connel, from en.wiktionary thought about it and he is right: OmegaT could be used for language learning as well ... what if we have a huge sentence repository and people start to translate texts to study that language - they do not need a paper dictionary - OmegaT would help them to see the use of a word in various sentences and they would get the terminology proposals like the translators. When being back at school or university (or maybe also online with a language teacher) they can understand their errors, update WiktionaryZ and the online sentence repository.

For exams teachers would have a mass of proposals and they could determine which glossary group shall be included in the exams ... that is to be thought about ... it was not considered up to now even if there are already thoughts on how to use WiktionaryZ for language learning.

Did I miss something? Hmmm ... not sure. Well if you have questions: just ask :-)

Friday, August 11, 2006

Piedmontese - Venetian - Ukrainian in WiktionaryZ now

Yesterday evening three more languages were added to WiktionaryZ. Now it is also possible to add terminology in Piedmontese, Venetian and Ukrainian.

I hope people who read this will pass on the message, also to the relevant beer parlours.

Have fun! :-)

Thursday, August 10, 2006

Creating Open Contents against payment

It is approximately a year ago when there was the first translated article on Wikipedia that was paid for. The idea then was to create a translation service that works on that basis - I also put up a basic website that was never finished (because time was not due). Then there already were voices against such a way of earning money, but since it was only an article about a city things calmed down.

Some days ago there was a report about mywikibiz.com around and people reacted quite irritated ... a person creating articles for companies on Wikipedia being even paid for it? Many would say: that is impossible ... where is NPOV going ... well: this user already added some articles and they were not deleted, because they were OK. Now knowing he does it against payment, does that make the article any worse or better? No, the contents remains the same. The difference is: this person made a work out of his hobby and it seems as if he is good.

So where's the real problem? That he earns money because he needs to live from something? Well ... anyone of us does that ... in many different ways ...

The real problem is that people are not used yet to this thought ... someone earning money by working on Open Content ... but don't we have software developers that are paid for developing Open Source software and we happily use it because id does not cost anything? Well do we really expect that people maybe work the whole day on free projects and live from nothing? Or should everyone of us really use only the free time to do this?

The thing is: nobody will ever be able to stop such an initiative. If you really want to get your article there, you get it there. Isn't it better to know who it wrote and why it was written? Isn't it better to co-operate and make sure things go the right way? I would not wonder if there are already many people being paid to add certain kinds of articles and we just don't know about it. Now that Greg's work is publicly known (no, I don't know him - I only wrote him an e-mail telling him about wikitranslations) people react scandalised ... they refuse to understand: creating Open Content agains payment will be a job of the future ...

Now some of you are worrying about NPOV on Wikipedia - why? There are all the other editors that will, like always, chek the article, edit it if necessary. Once an article is published under GFDL on Wikipedia it can be edited and changed.

Do also consider one thing: it is not said that it is positive to have an article on Wikipedia about a person or a business. All facts that are known can be added. It could well be that companies will then read about the problems they eventually had some years ago and that by now everybody forgot - by having an encyclopaedic NPOV article all notes on history, if positive or negative, can and will be mentioned.

Consider also that not all companies can be included in Wikipedia. There are guidlines to follow. A small company next door is normally not to be inserted into Wikipedia. Most companies do not correspond to the Wikipedia guidelines for the insertion of companies. Therefore, some time ago, yellowikis was created. There you will have space for any kind of business to be inserted - it is a GFDL directory that anyone can edit. And it is getting more and more known. Once it is on a good level being present on Yellowikis, which is known as a business directory anybody can edit, means just as much as being present on Wikipedia - the only difference will be: companies that wrote history due to their inventions or due to their international high level presence like Siemens, Ferrari, Nokia just to name some, will have entries in Wikipedia and Yellowikis.

This blog was written from scratch with various interruptions - it may well be that I am going to add or change some parts.

African languages - how are they connected to what I do?

Well I was just writing my first message to the AphrophoneWikis discussion group - and of course people in that group will wonder why I joined it ... well: I will tell you and them on my blog, because maybe you have or know people who have similar goals - and if so: please contact us.

Well as you can see from the various blogs I am involved in languages ... WiktionaryZ ... Wikipedia ... and other projects. I very much care about regional languages and how to make their life easier, make them known, connect people etc.

In WiktionaryZ we will have many languages where actually there is no Wikipedia and for many it will be the first repository on the Internet. In Africa there are many languages that need attention, otherwise these languages together with the culture of the poeple who speaks them would die. According to UNESCO each week one language dies.

Another thing is: our small Wikipedias - may it be Venetian, Piemontese, Sicilian, Lombard, Neapolitan, Akan, Ripuarian, Asturian, Maltese, Samogitian etc. (just to name some without giving any preference) - all face very similar problems. There are not as many speakers available as for English, German, French, Italian and the other big ones - so often only a handful of people work on them. There are ways to co-operate and make contents available for all of us. This is why I am in the African Wiki group - this is why I want to communicate with people: to give more value to our all contributions - to create projects that help all wikipedias, also the big ones, to have better data available and reach higher quality, to do certain tasks only once having them available for all.

If we start to talk and get such things on the way then one of our goals is partly reached ... why partly? Well: there is loads of contents to be added to these projects and that will take time.

On which kind of data could wikipedias co-operate:

Geographical data
Basic data of species
Baisc data of people
Basic data of events
... and much, much more ...

Finding an extraordinary blog ...

Today Martin Benjamin sent a mail to the newly founded group for Wikipedias in African languages. Well, yes, Ethan Zuckerman mainly is about African languages, but the points he makes are valid for all small Wikipedias around. I would very much like to see a co-operation start. We can get things on the way ... many small drops of water form an Ocean ... let the small Wikipedias become our ocean. Well read his blog about Your language or mine and you will understand.

OmegaT 1.6 RC 10 comes complete with Java (testo anche in italiano)

The other evening Henry Pijffers created the packages for Windows and Linux that can be used "out of the box" without the need to install Java.

Just download the Windows or Linux bundles by clicking on the link.

It is a huge step forwards for all these users that don't like to care about having Java installed or that have problems to check which Java version they have and eventually update it.

If you have questions or need help, please contact me through my talk page, write to the OmegaT user group or just come into the OmegaT IRC-Channel.

And now: have fun with OmegaT :-)

*****

L'altra sera Henry Pijffers ha creato i pacchetti per Windows e Linux che possono essere utilizzati "out of the box" senza dover installare Java.

Puoi scaricare il pacchetto Windows o Linux cliccando sul link.

È un grande passo avanti per tutti quelli che non hanno voglia di occuparsi dell'installazione o dell'attualizzazione di Java o che hanno problemi di farlo.

Se avete domande, per piacere contattatemi tramite la mia me pagina di discussione, scrivete al gruppo di utenti OmegaT (anche in italiano) o venite nello chat di OmegaT.

E ora: buon divertimento con OmegaT :-)

Monday, August 07, 2006

Unbuntu ... yes it works :-)

Well, yesterday was another Ubuntu day ... consider that I did not know how to install software on Linux - well I have some knowledge of DOS, but that is different even if you can imagine where to look and what more or less needs to be done.

Well: my problem was and still is my router - it will be substituted asap. As for the rest things work smoothly - anyway, before changing to Ubuntu you should try out the live CD - if that one works you can expect also the rest to work.

Ubuntu has a great Italian community - you can find them on IRC, in their discussion list and in their forum. They really helped me a lot and I would say particular thanks to MartinderKiller (no, he's not German, he's Italian) and Jacopo who paitently took me through the hurdles.

Well and then: special thanks to Celestianpower (see the link to his blog on the right side) - talking with him he gave me a link on how to easily install Skype ... well it was not that easy for me (due to the router problems), but since he gave me the link I decided to go ahead yesterday - and like always: when there is a problem I normally cannot stop until it is solved.

So if you ask me if it makes sense to pass over: yes, in particular because you can work contemporarily with Linux and Windows on your computer and so you have plenty of time to learn all you need - there's no reason to worry since at the beginning you will still mainly use windows until you are accustomed to Ubuntu (which in the end is very similar to Win - you only have more "direct communication" with your computer).

Friday, August 04, 2006

Collected articles in several languages (DE, EN, FR, IT) on words & more

On words & more I collect all sorts of langauge and translation related articles. Today I added particularly many articles and that is why I am copying that part down here. Please go through http://wordsandmore.org to have the functioning links to read the complete articles that interest you. It would simply take too much time to create them also in the blog.

There you also find the link to the archive. I hope you enjoy :-)

Pekín busca acabar con el 'Chinglish' para las Olimpiadas - PEKÍN (Reuters) - Las autoridades de Pekín esperan erradicar para los Juegos Olímpicos de 2008 el "Chinglish" de los rótulos bilingües de la capital china, según informaron el viernes los medios estatales.
Livedictionary traduce "in diretta" pagine Web - Eloquents presenta una nuova versione di Livedictionary, un dizionario e vocabolario per Safari che traduce e spiega in diretta ogni termine presente in una pagina web.
Lionbridge profit soars on Bowne acquisition - Lionbridge Technologies Inc., which provides translation services for companies selling software and other products overseas, said net income for the second quarter jumped year over year from $1.38 million to $3 million, helped largely by its acquisition of Bowne Global Solutions.
Association for Machine Translation in the Americas Opens Its Conference Doors to Public to Showcase the Wonders of Automated Translation - STROUDSBURG, Pa.--(BUSINESS WIRE)--Aug. 1, 2006--At its seventh biennial conference, AMTA 2006 to be held at the Marriott in Cambridge, Massachusetts, The Association for Machine Translation in the Americas will open its doors to the public for a free showcase of applications on Thursday, Aug. 10 from noon to 4:00 pm.
Koreans go to other Asian countries for language training - JUST when the Philippines has been recognized as one of the leading English training centers in Asia, Koreans nowadays are eyeing other Asian countries to train them on other international languages beside English.
Nigeria: Don Decries Apathy to Local Language Studies - Prof of Yoruba Language, Olanrewaju Folorunso, has expressed concern over the apathy of students to the study of the country's indigenous languages.
Language Weaver Expands Its Reach in Educational Market with Sale of Automated Translation Software to Educational Testing Service - Non-profit educational advancement company uses translation to simplify instruction for English language learners.
Three nations to promote Malay language - Jakarta: Malaysia, Indonesia and Brunei will intensify collaboration in promoting the Malay language internationally.
IM language is not spoiling English, say Canadian researchers - TORONTO: Are you one of those conformists who believe the IM culture is spoiling Queen's English?
Watch your language - TODAY is the start of National Language Month, whose theme is "national languages." As everybody knows, there are eight other Philippine languages (they used to be called "dialects") and our Constitution, original language English, mandates two official languages, English and Filipino (which used to be called Tagalog). Is it any wonder that we do not seem to understand one another?
Instant Messaging verdirbt die Sprache nicht - Die Abkürzungen, die Jugendliche für Nachrichten über Handy oder PC verwenden, haben wenig Einfluss auf ihre Sprache.
Von seriös bis locker: der Schreibstil in E-Mails - Auch in E-Mails sollte je nach Adressat ein angemessener Schreibstil gewahrt werden. „E-Mails sind immer noch eine schriftliche Form der Kommunikation und keine Gesprochene wie zum Beispiel Chat“, sagt die Sprachwissenschaftlerin Annette Trabold vom Institut für Deutsche Sprache in Mannheim.
In welcher Sprache? - In einigen Zürcher Kindergärten wird Hochdeutsch zur Standardsprache. Das gilt sogar für die Pause.
Man spricht kein Deutsch in Brüssel - Die deutsche Sprache hat in der Hauptstadt Europas eine schwache Stellung – auch bei der Schweizer EU-Mission. Unter deutscher EU-Präsi-dentschaft im ersten Halbjahr 2007 soll sich dies ändern.
Wikimania 2006 hits Cambridge - In just five years the humble wiki, a Web page that can be added to, excised from, and otherwise edited by pretty much anyone with an Internet connection, has fundamentally changed the way humans learn and communicate.
Zwiebelfisch: Als ich noch der Klasse Sprecher war - Wieso wird der Stich einer Biene nicht Bienestich genannt? Die deutsche Sprache hält immer ein paar Buchstaben parat, um Fugen zwischen Wörtern zu füllen. Einige verzichten jedoch auf Fugenzeichen und verwenden lieber Fuge-Zeichen.
Sieg des Deppenapostrophs - Früher war alles irgendwie besser: Viele schrieben "Ulli's Imbiss" - und einige wussten, dass es eigentlich "Ullis Imbiss" heißen muss. Doch zum Ärger der Sprachpfleger erlaubt der neue Duden beide Formen.
Sprachen lernen soll Spaß machen - Neue Software auf dem Markt mit neuen Lernmethoden - Am Beispiel des Tschechischen - Das gute alte Vokabelheft und das stupide Pauken der fremden Grammatik sind Vergangenheit.
TU Chemnitz verbessert sprechendes Online-Wörterbuch - Chemnitz (dpa) - Die Technische Universität Chemnitz hat ihr kostenloses Online-Wörterbuch in Deutsch und Englisch aufgerüstet.
woerterbuch.info mit 950.000 Übersetzungen und Synonymen - Hamburg (pts/01.08.2006/10:00) - Das kostenlose Online-Wörterbuch http://www.woerterbuch.info hat die Marke von 950.000 Deutsch-Englisch Übersetzungen und Synonymen überschritten.
Wie viel Englisch verträgt eine Pressemitteilung? - Das wollten wir von deutschen Journalisten wissen, die sich mit IT- und Technikthemen beschäftigen.
Ein Schatzhaus aus Wörtern - Philologen arbeiten seit 112 Jahren an einem lateinischen Lexikon - Auch Wörter haben ein Leben. Und ihr Biograf heißt Hugo Beikircher. Auf seinem Schreibtisch in der Residenz stehen Pappkästen, schlicht und grau.
Tragbares Sprachgenie - Mit einer Grundfläche von rund 15 x 8 Zentimetern und 312 Gramm Gewicht passt der Sprachcomputer "Partner EGm800" von Ectaco in jede Reisetasche.
Systran: en hausse malgré la chute des bénéfices. - (Cercle Finance) - Systran, groupe spécialisé dans l'édition de logiciels de traduction automatique, a vu sa rentabilité se dégrader fortement au premier semestre 2006 sous l'effet des lourds investissements effectués en prévision de la sortie de la prochaine version 6 de son logiciel.
Opposition à la traduction "au plus bas prix" - L'application rigide de la politique d'approvisionnement au plus bas prix contribuera à "marchandiser" la traduction au sein du gouvernement fédéral. C'est du moins ce que redoute le Conseil des traducteurs, terminologues et interprètes du Canada (CTTIC), qui craint pour l'avenir des pigistes et des petites maisons de traduction.

Bina's blog

Thursday, August 31, 2006

Creating contents for many Wikipedias

Tuesday, August 29, 2006

Adding contents to wikipedia using a bot

Tuesday, August 22, 2006

Lost in translation ????? (Episode 1)

Saturday, August 19, 2006

What is a TMX file?

Thursday, August 17, 2006

Articles ...

Tuesday, August 15, 2006

What to do with all those links ....?

Monday, August 14, 2006

Articles ...

MediaWiki software and "small" wikiprojects

Saturday, August 12, 2006

OmegaT, WiktionaryZ, Betawiki ... some questions that need an answer ...

Friday, August 11, 2006

Piedmontese - Venetian - Ukrainian in WiktionaryZ now

Thursday, August 10, 2006

Creating Open Contents against payment

African languages - how are they connected to what I do?

Finding an extraordinary blog ...

OmegaT 1.6 RC 10 comes complete with Java (testo anche in italiano)

Monday, August 07, 2006

Unbuntu ... yes it works :-)

Friday, August 04, 2006

Collected articles in several languages (DE, EN, FR, IT) on words & more

Khalil Gibran über die Musik

Weiteres