Posts

Showing posts from August, 2006

Creating contents for many Wikipedias

The basis to this is a project about mass contents creation on meta and Wikidata. Mass contents creation is an idea of user Millosh and yes, he is right about that - it is how I already did certain stuff for the Neapolitan wikipedia.

There is so much easy to create contents out there that Wikipedias could share easily and even if we will not have Wikidata implemented into wikipedias we can use the data in databases to create stubs by using Mailmerge (in OpenOffice.org or Word) and upload them with the bot. (see my other post of today).

This means: if we now start to add all names of:
continents
countries
cities
rivers
mountains
monuments
places
yes, even streets, because there are some who have translations
lakes
seas
animals
plants
names of people (also these are translated)
etc.

And then we start to translate them. At the same time people care to add statistical data to a table that is exactly about this (if we cannot do this with a separate wikidata installation ... anyway we do not need relational…

Adding contents to wikipedia using a bot

Well, this question comes up over and over again and I would like to describe here how to do this - and this is valid for Wikipedia and Wiktionary.

Now I did this quite often on the Italian wiktionary and on the Neapolitan wikipedia (and some other projects).

For the upload I use the pywikipediabot - and in particular pagefromfile.py. This bot was mainly created to upload pages to Wiktionary, but then it turned out to be a great tool for wikipedia as well.

You need a .txt file saved in utf-8 code. The bot understands the first word on the page between '''and''' pagename and will of course create that page. If the page already exists it will be skipped.

Now the question I got is how a typical entry would look like. Here is an example:

{{-start-}}
'''Rome''' is the capital of Italy.
{{-stop-}}

This means the bot would create the page Rome and add the contents "Rome is the capital of Italy." to the page.

If the first word between ''…

Lost in translation ????? (Episode 1)

This is called lost in translation, because sometimes we have to translate really funny (???) stuff and I would in some way like to collect these examples of source texts:
When MMS cannot be sent out due to setting of GPRS not complete, system will pop up a window to notify User that GPRS are not complete.The E-mail may contain virus or other elements that will be harmful to your PC or cellular phone, if you do not certain the sender’s identity, please do not open the accessories.If have input the Name, press OK key.It is mainly used to provide digital mobile phone and other wireless terminal devices with wireless communication and information service for.

What is a TMX file?

This is a question I received quite frequently during the last days and therefore I believe it makes sense to write a blog about it.

TMX stands for Translation Memory eXchange. It is a standard format used by many CAT-Tools (CAT = Computer Assisted Translation). CAT-Tools are mainly used by translators, but lately, talking with Connel and other membes in the Wiktionary chat he suggested them for language study - and yes, it makes sense to use them also there. TMX-files would then be even more relevant. Students translate texts of different levels and aften having the translations corrected by the teacher or professor they exchange them with others. When searching for a word in a specific sentence they can do a concordance search in the Translation Memory and so they will see how that specific word was used in other sentences.

As for translators Translation Memories are helpful in two ways: one for concordance search and two for repetitive texts and updates of manuals they already trans…

Articles ...

Image
Quite a bunch of news this time copied from words & more - there you can also access to the links to read the complete articles.
Los médicos de este centro hospitalario pueden comunicarse, gracias a este sistema, en tiempo real con pacientes que hablen inglés, árabe, chelja, alemán y francés. - El servicio de traducción telefónica simultánea, puesto en marcha por el complejo hospitalario Carlos Haya de Málaga a finales de 2004, ha atendido ya 585 traducciones en inglés, árabe, chelja (idioma hablado fundamentalmente en el norte de África), alemán y francés. La Policía Municipal de Madrid 'habla' idiomas - Los agentes locales cuentan desde este verano con un sistema pionero de traducción para atender a los turistas extranjeros en su propia lengua. Chiesa, quando Matteo Ricci tradusse Confucio in latino - La storia del gesuita che voleva globalizzare le culture: nel 1594 portò a termine quella che per l'epoca fu una vera e propria impresa culturale AAA Translation Sele…

What to do with all those links ....?

That was the question when I received a link yesterday ... now I get frequently interesting links and would like to put them somewhere - these are not always news - it can be a funny website, an e-book, simply an interesting website, a word on WiktionaryZ, an article on Wikipedia - there are so many possibilities ... you might have the same problem ... well: for that scope I opened the section Linksoup on words & more. There I simply paste the links with two or three words that should give an idea about the link. You can do the same btw - but please log in - in that way it is easier to understand who does what and if an IP is a spam IP or not. In a second stage, when the software on words & more is upgraded I will install semantic Mediawiki and tag them. From that moment on the "soup" will be easier to search. At a certain stage, when links become too much, a different scheme will be needed ... but for now ... I feel it is a good solution to make sure links do not ge…

Articles ...

Image
Links to articles added on wordsandmore.org

Alles in Butter oder kommt doch das blaue Wunder? - „Nein, da bist du aber auf einem Holzweg, Lisa!“ ... Den Holzweg, aus dem die Redewendung resultiert, findet man im Mittelalter. Was war. Was wird. - Das Schöne am Journalismus ist, dass es immer etwas Neues zu lernen gibt, immer neue Worte buchstabiert werden müssen. Nehmen wir nur die "Degetoisierung" des deutschen Fernsehvolkes, die der Musikantenstadlerisierung auf dem Fuß folgt. Welcome to linguafranca.com - According to Kaled Fattal: “People say the Net works, but it only works for those communities whose native language is Latin-based.

MediaWiki software and "small" wikiprojects

This is a post I am sending right now to the wikitech-l ... I am posting it here as well, because many people will not read it there - only a few are subscribed. There is a discussion about wysiwyg and wiki software - now the discussion went into the direction of specific needs for specific languages and/or keyboards. Since I myself face some of these issues daily it only makes sense to talk about it.

Well: let's make some practical point: people on the nap.wikipedia are driven away because they have to use workarounds for '' - that is ' in whatever combination - we now uwse '' for '' to have a unique way to identify words and word combinations if some day we should need to use replace.py. Many now create non-standard artilces using the accent of à or á that it ` or ´ to create articles ... inserting spaces where they are not needed etc. (all sorts of strange solutions to avoid to see all in italics afterwards) before the &# thingie we needed to use …

OmegaT, WiktionaryZ, Betawiki ... some questions that need an answer ...

In the Wiktionary IRC the following questions were made by Connel: "... considers omegat.org. Is the intent for it to just auto-upload stuff to WZ? to/from ZW? Or betawiki, or both betawiki and WZ? Or is betawiki just for WikiMedia total localization?"

That is a lot ... so let me go step by step.

The intent of OmegaT is not to auto-upload stuff to WiktionaryZ or download it from there. Nor is it only there for Betawiki and WiktionaryZ, even if it will probably be used for both sooner or later. OmegaT is a CAT-Tool that helps translators to do their work.

What does this mean: imagine you use for all of your translations a tool that creates a Translation Memory, a file containing the translations you did segmented into sentences, combining source and target sentence. Then you do further translations and let the CAT-Tool access these already translated files. Now if your translation is of a subject you already translated chances are high that most terminology needed is already in …

Piedmontese - Venetian - Ukrainian in WiktionaryZ now

Yesterday evening three more languages were added to WiktionaryZ. Now it is also possible to add terminology in Piedmontese, Venetian and Ukrainian.

I hope people who read this will pass on the message, also to the relevant beer parlours.

Have fun! :-)

Creating Open Contents against payment

It is approximately a year ago when there was the first translated article on Wikipedia that was paid for. The idea then was to create a translation service that works on that basis - I also put up a basic website that was never finished (because time was not due). Then there already were voices against such a way of earning money, but since it was only an article about a city things calmed down.

Some days ago there was a report about mywikibiz.com around and people reacted quite irritated ... a person creating articles for companies on Wikipedia being even paid for it? Many would say: that is impossible ... where is NPOV going ... well: this user already added some articles and they were not deleted, because they were OK. Now knowing he does it against payment, does that make the article any worse or better? No, the contents remains the same. The difference is: this person made a work out of his hobby and it seems as if he is good.

So where's the real problem? That he earns money …

African languages - how are they connected to what I do?

Well I was just writing my first message to the AphrophoneWikis discussion group - and of course people in that group will wonder why I joined it ... well: I will tell you and them on my blog, because maybe you have or know people who have similar goals - and if so: please contact us.

Well as you can see from the various blogs I am involved in languages ... WiktionaryZ ... Wikipedia ... and other projects. I very much care about regional languages and how to make their life easier, make them known, connect people etc.

In WiktionaryZ we will have many languages where actually there is no Wikipedia and for many it will be the first repository on the Internet. In Africa there are many languages that need attention, otherwise these languages together with the culture of the poeple who speaks them would die. According to UNESCO each week one language dies.

Another thing is: our small Wikipedias - may it be Venetian, Piemontese, Sicilian, Lombard, Neapolitan, Akan, Ripuarian, Asturian, Maltes…

Finding an extraordinary blog ...

Today Martin Benjamin sent a mail to the newly founded group for Wikipedias in African languages. Well, yes, Ethan Zuckerman mainly is about African languages, but the points he makes are valid for all small Wikipedias around. I would very much like to see a co-operation start. We can get things on the way ... many small drops of water form an Ocean ... let the small Wikipedias become our ocean. Well read his blog about Your language or mine and you will understand.

OmegaT 1.6 RC 10 comes complete with Java (testo anche in italiano)

The other evening Henry Pijffers created the packages for Windows and Linux that can be used "out of the box" without the need to install Java.

Just download the Windows or Linux bundles by clicking on the link.

It is a huge step forwards for all these users that don't like to care about having Java installed or that have problems to check which Java version they have and eventually update it.

If you have questions or need help, please contact me through my talk page, write to the OmegaT user group or just come into the OmegaT IRC-Channel.

And now: have fun with OmegaT :-)

*****

L'altra sera Henry Pijffers ha creato i pacchetti per Windows e Linux che possono essere utilizzati "out of the box" senza dover installare Java.

Puoi scaricare il pacchetto Windows o Linux cliccando sul link.

È un grande passo avanti per tutti quelli che non hanno voglia di occuparsi dell'installazione o dell'attualizzazione di Java o che hanno problemi di farlo.

Se avete domande,…

Unbuntu ... yes it works :-)

Well, yesterday was another Ubuntu day ... consider that I did not know how to install software on Linux - well I have some knowledge of DOS, but that is different even if you can imagine where to look and what more or less needs to be done.

Well: my problem was and still is my router - it will be substituted asap. As for the rest things work smoothly - anyway, before changing to Ubuntu you should try out the live CD - if that one works you can expect also the rest to work.

Ubuntu has a great Italian community - you can find them on IRC, in their discussion list and in their forum. They really helped me a lot and I would say particular thanks to MartinderKiller (no, he's not German, he's Italian) and Jacopo who paitently took me through the hurdles.

Well and then: special thanks to Celestianpower (see the link to his blog on the right side) - talking with him he gave me a link on how to easily install Skype ... well it was not that easy for me (due to the router problems), but s…

Collected articles in several languages (DE, EN, FR, IT) on words & more

Image
On words & more I collect all sorts of langauge and translation related articles. Today I added particularly many articles and that is why I am copying that part down here. Please go through http://wordsandmore.org to have the functioning links to read the complete articles that interest you. It would simply take too much time to create them also in the blog.

There you also find the link to the archive. I hope you enjoy :-)

Pekín busca acabar con el 'Chinglish' para las Olimpiadas - PEKÍN (Reuters) - Las autoridades de Pekín esperan erradicar para los Juegos Olímpicos de 2008 el "Chinglish" de los rótulos bilingües de la capital china, según informaron el viernes los medios estatales. Livedictionary traduce "in diretta" pagine Web - Eloquents presenta una nuova versione di Livedictionary, un dizionario e vocabolario per Safari che traduce e spiega in diretta ogni termine presente in una pagina web. Lionbridge profit soars on Bowne acquisition - Lionbridg…