Showing posts with label Wikipedia. Show all posts
Showing posts with label Wikipedia. Show all posts

Thursday, May 04, 2017

Wikis of the Wikimedia foundation - Wikipedia and Commons in particular

Well, by chance I was on one of the many Wikipedias today where I did not edit anymore for years. Last year I got a message claiming copyright violation. The funny part is that in 2006 I wrote the original text, which I also took to Wikipedia, to donate my work. Afterwards my text was taken up by different other places and really I did not care if they mentioned me as a source or not, because it was fairly short stuff and information for people. Now I get a copyright infringement claim and should put the situation right. The same happened for quite a bunch of my pictures on Commons which, at the time I uploaded them, were well accepted. Now I should edit all of them (quite a number) in order to not have them deleted. By all means: delete them, I am not going to take the time to give additional proof. But this makes one thing very clear: time invested in open projects might not be time invested well ... Actually right now I would not even have time. If my stuff was good at that time, it should be good even now.

Sunday, March 16, 2008

Ladin Wikipedia on the incubator

A week ago I received the following post (in German):

Hallo Sabine,

da Du Dich ja mit "less resourced languages" befasst, wollte ich fragen,
ob Du vielleicht ein paar Leute aus Südtirol, dem Trentino, der Provinz
Belluno oder sonstwoher kennst, die gerne Artikel auf ladinisch
Schreiben würden. Die Wikipedia liegt schon seit einiger Zeit im
Inkubator, aber keiner will dieses Ei ausbrüten.

Da die Ladiner in vielen Vereinen organisiert sind, habe ich mir
überlegt, in nächster Zeit eine davon Anzuschreiben, so z. B. die
ladinischen Feuerwehren und die Grupa per la defendura di uciei
(Vogelschutzbund). Falls Du auch ein bisschen Werbung machen kannst,
währe ich Dir dankbar :)

Liebe Grüße
Andi

Actually Andi asks me if I can help with recruiting people who help with the Ladin Wikipedia which is still on the incubator. Well, I am on one hand part of the LangCom and I am not sure if I should do this ... but on the other hand I am the CCO of Vox Humanitatis which actively promotes less resourced languages and the related projects. Well, Wikipedias in these languages are projects to be promoted and helped. Therefore I would like to ask anyone who is reading this to help to connect me/us to people speaking/writing Ladin so that I can connect them with the right people from my side. I am also going to write a number of associations that maybe can help. (E-Mail: s.cretella [at] voxhumanitatis.org).

Please note that we will actively support your help requests, but please consider that we only have a certain amount of time, so whatever needs to be done is going to be done even often it will take some days until we actually can move. Our to-do list is quite long :-)

Thanks for helping to spread the word!

Technorati tags: , ,

Friday, December 07, 2007

Translating Wikipedia articles (2)

Like I already said yesterday, I would come back to this argument today.

Apertium is already used in some projects, one of which is the Occitan Wikipedia. For those who are not familiar with Wikis: there you have the possibility to compare the not proofread version with the proofread version and that is something you will see by clicking here.

What you see on the left hand side is the text as it was after the machine translation and on the right hand side the proofread version of the text. The changes are highlighted in green on the left and in blue on the right hand side. There are even some parts of the text that were not changed at all.

The work on the glossary and the grammar rules (well I am not using the specific terminology here to make things understandable for all) has been going on for approximately one year now.

At a certain stage the problems arise from vocabulary that is missing and not so much from the rules. Of course these translations will probably never be a 100% perfect, but the quality depends very much on us and our adding terminology and classifying it.

Comparing the above result to what you would see for Spanish-Catalan, well the last one having been under development for years is much better.

You can find further reading about co-operation between Wikipedia and Apertium on the Apertium Wiki.

Language pairs that are right now available are:

  • Spanish←→Catalan
  • Spanish←→Galician
  • Spanish←→Portuguese (pt and pt_BR)
  • Catalan←→English
  • Catalan←→French
  • Catalan←→Occitan (oc and oc@aran)
  • Romanian→Spanish

Many other language pairs are under development. Of course: you may start on any language combination that is comfortable for you. Please keep in mind: the more similar two languages are the easier it is to program the rules, the faster the translation engine will produce good translations.

If you want to start to work on wordlists, please write me at: s.cretella (at) voxhumanitatis.org and tell me which language pair you are interested in. You can also reach me by skype at: sabinecretella

I will upload a wordlist to google docs and give you access. Please let me know if you have difficulties to work online (that is if you work with a dial-in connection).

The Apertium Chat is on Freenode.

One more thing I just received criticism since machine translation would flatten the language: well any translated text, in particular when it comes to literature translations, is post edited by a second person. The translation is never published directly since during translation - and you can be the best translator of the world - there are always some bits and pieces that sound a little strange or that do not really transport the scene into the other culture. And please allow me to introduce the concept of cultural localization here that will be explained in one of the future posts here and that was coined by Dr. Martin Benjamin who is part of the advisory board of Vox Humanitatis. The concept of cultural localization became then immediately part of the scope of the association.

And since I am adding notes here: please remember that the Fundraiser of the Wikimedia Foundation is still running and that you can help by donating and telling others that the fundraiser is on. For more information and to donate please click here.

Thursday, December 06, 2007

Translating Wikipedia articles ...

... into less resourced languages. Well, time has come that we can start to think about how to go about a faster creation of contents for the many small Wikipedias. As you all know, often we have just a handful of people creating and translating and then adapting articles. Well ... combining various Open Source and Open Content projects we can now go a further step into the direction of fast contents creation, but that does not mean: stub upload. This is a completely different way of doing things.

Apertium is a machine translation tool that works really great with similar languages. Approx. a year ago I had a translation from Spanish to Catalan done by Apertium through the online interface (http://xixona.dlsi.ua.es/apertium/) and asked some people of the Catalan Wikipedia to have a look at it. They told me that of course it was not perfect, but that it would be easy to proofread it and much faster than actually translating it. In March I made a similar test during a masters for translation studies in Pisa. I asked one of the students who was bilingual Spanish and Catalan to have a look at the outcome of the machine translation of a general text. The grammar was almost perfect and and also the terminology. There were just 5 corrections in a bit more than half a page (A4).

Now what does this mean to us: if we have a bilingual wordlist for two similar languages under a free license, we can pass it on to the Apertium people. From there we are a step closer of getting machine translation for that specific language combinations on their way.

One note inbetween for the Apertium people who might read this: please don't mind me not using specific terminology to describe what needs to be done. It could become to techy.

So the next step is to identify what a term is and how it needs to be handled. That is for example a verb needs to be declared as such, then one needs to give it a tag that indicates which conjugation scheme needs to be applied. This needs doing for all word types, that is verbs, nouns, adjectives etc. After that grammar rules need to be considered. Step by step the correctness level will be improved and the time invested to complete wordlists which will be available as google doc spreadsheet and to add all the additional information will help to save a lot of time. That is: now it will take longer, once the engine "learnt" how to deal with the terminology and grammar for that specific language combination creating contents will become much faster. This will help the small projects in such a way that the few editors can concentrate on proof reading and adapting and will result in a faster contents growth that has quite high quality.

This project that is going to care about less resourced languages will be one of the first lead through Vox Humanitatis. Should you be interested in helping with the wordlists, please let us know which language combination you would like to work on (that is starting from English right now and step by step from others since most of the Terminology is there in English). We will get you the access to the online document. If you need to work offline, please let us know. You can contact me by e-mail: s.cretella (at) voxhumanitatis.org

I just received a list of the supported language combinations as well as an example for Catalan-Occitan and some notes on evaluation of machine translation co-operating with a Wikipedia community. This means I have quite some further stuff to tell you. I'll post that info tomorrow, otherwise this blog would become too long.

Please also note that the documents will be released under CC-BY license and therefore they can be integrated into any wiktionary.

Friday, August 24, 2007

Less resourced languages meet ... getting some projects on the way

Berto from the Piedmontese Wikipedia, who also has i-iter.org that deals with less resourced languages stayed here in Maiori for some days and so we had plenty of time to talk and consider many strategies on how to protect less resourced languages and the very specific culture of the various regions in the world. Well there is still much to be worked out, but one thing became clear: we are going to work much closer together than before and we will find a structure on how to make the most out of the efforts of so many people who care about the same goals.

So yes, the first Piedmontese-Neapolitan meet-up made some first results.

This is just a note to let you know: something is going on ... so stay tuned for more news :-)

Saturday, August 18, 2007

A game in Piemontese with an article in Neapolitan

Approximately a month ago I wrote an article about the game Berto localized into Piemontese, Freecol and something incredible is happening ... I mean why should people who speak and read Neapolitan find an article about a game in Piemontese sooooo interesting? Well it is .... the article up to date has over 1550 reads (you can see that below the article) and gets further approx. 50 reads a day.

We don't have a clue on how often our small Wikipedias are read, but considering these figures: there are many readers for our languages, even if for now not writers.

What it also tells me: indeed time is due to go over to software localization for our languages ... in particular games, browsers, stuff you use often ... I am wondering how a handful of Wikipedia editors that we are will be able to deal with that ... we desperately need more people writing ... it doesn't too much matter that everything is written correctly, it is just relevant that people start.

Berto: tanks for that huge amount of work you did. It enters an incredible market nièche that is unfortunately always underestimated.

People find their languages fun - and they do want to read about such stuff - and thanks to a game two cultures meet ... exchange ... will co-operate ... that is something that I find incredibly exciting.

Friday, July 20, 2007

Articles with stable version for nap.wikipedia

Well, again this theme is on my mind ... stable versions ... on the Neapolitan wikipedia we have a very particular situation: only very few people are really able to write well and the others write "as they speak" often adopting the spelling of the Italian writing rules to Neapolitan or if they are Neapolitans who grew up in the states and are eventually of second or third generation you could even find some very particular words coined in the States (and well yes, that is still Neapolitan, just a different dialect of it), which of course does not work well. Even being many of them native speakers we have problems when it comes to written versions. Yes, there are rules, but Neapolitan is not taught at school nor you can easily find courses around where you can learn the language. Also when I write I always have to ask for proof reading by Carmine, well for me it is not even my mothertongue ... some people are frightened to write since they know they are not able to write correctly, but eventually they would write and start to contribute if they knew that the issue of having spelling errors in the end is not soooo big - this is something that can be sorted out.

What would be really helpful is a "stable version" function so that people can find proofread examples where they can rely on. This will help them to contribute on nap.wikipedia. For us it is not so much about having long pages with loads of contents - for us it is still on "getting people to write" in their native language.

Templates, the problem with the double quote and all that is too wiki specific is still too much for many ... I am also considering in not adding templates for cities etc. for the moment (for new pages) - it is confusing to people. I note that more and more when I talk with people who could become valid authors on nap.wikipedia. Most of those who write Neapolitan properly are of my generation or older and there are only few of them who really can dig into wiki syntax etc.

This means: we have to give them a really low entry level ... and we have to assure people who edit: there is a stable version where you cannot do any harm to, you can write, even if not 100% correct ... and you can then learn from the stable versions and the corrections made in your writings.


It is indeed a long process to get through all these problems ... and for now I don't see an end to it.

Monday, March 26, 2007

Pisa ... again, but different this time


Being on the way back home on the motorway I decided to get finally used to my PDA and start to write my blog. Up to date I always was in Pisa for conferences or meetings, but not this time. Yesterday meant "back into the class room" and teach to me, at the SSML. It was a strange feeling having the register with the names of the students in hand and having to check who was present and who not. It was the first time after 12 years and I noted that I missed that for quite some time. Showing students new things, explaining etc., yes it is definitely fun. Of course it was quite different to years ago, I did not give German language lessons, but explained how to use OmegaT, how to use Wikipedia and its sister projects to search for terminology. We looked at OmegaWiki and how to add a definition or translation. We also took the time to talk about things like networking on portals like LinkedIn. Last but not least we also had a look at IRC as a communication means sice in future chat will be more and more used by translators. In the end I asked, like I always do, what I could do better. It seems all were happy with what they learnt new. Resuming I would say : it was a great experience for all of us.

Khalil Gibran über die Musik

Die Musik wirkt wie die Sonne, die alle Blumen des Feldes mit ihrem Strahlen zum Leben erweckt. ( Khalil Gibran ) Image by Pete Linforth fr...