Thursday, August 31, 2006

Creating contents for many Wikipedias

The basis to this is a project about mass contents creation on meta and Wikidata. Mass contents creation is an idea of user Millosh and yes, he is right about that - it is how I already did certain stuff for the Neapolitan wikipedia.

There is so much easy to create contents out there that Wikipedias could share easily and even if we will not have Wikidata implemented into wikipedias we can use the data in databases to create stubs by using Mailmerge (in OpenOffice.org or Word) and upload them with the bot. (see my other post of today).

This means: if we now start to add all names of:
continents
countries
cities
rivers
mountains
monuments
places
yes, even streets, because there are some who have translations
lakes
seas
animals
plants
names of people (also these are translated)
etc.

And then we start to translate them. At the same time people care to add statistical data to a table that is exactly about this (if we cannot do this with a separate wikidata installation ... anyway we do not need relational information for now ... just information).

How many articles (stubs) can be created in this way and how many people can work on it?

We also should not forget about film and book titles, the Greek and Roman gods (I suppose other parts of the world will have other material on such tings).

It is really a huge project, but it is feasable ... there are many of us who have similar goals.

Where to start: well we need the infoboxes translated into as many languages as possible - and we then need the place names etc. translated. This must be combined with a datasheet.

Example:
Castua: http://it.wikipedia.org/wiki/Castua
We have the box on the right side with all the statistical/basic information - all that can be translated into many languages. The first sentence in the stub will simply be the definition of WiktionaryZ.
So most of it like stato (state), regione (region) etc. can be translated within WZ and - in that way we can populate the templates used to all wikis. As for the not "not seen part" of the template I would use or a lingua franca (English) or simply the same names that are visible.

There's not much about it - we need the lists to start off with. If we use the pagefromfile.py to upload the ready pages existing ones with the same name will be skipped and written in a logfile - these are then the only ones someone has to look after manually.

If sooner or later we get a pure wikidata application that takes the translations from WZ and combines them with the rest of the data: that would be great ... since that would avoid that we need to correct the entries when there are corrections.

Using the Geoboxes we already have a good way to compare lists ... but does it make sense to do it that way right now? Or does it make sense to prepare now all possible translations to be ready once we can have wikidata for geographical entries?

Hmmm ... I was interrupted quite often while writing this blog ... and I don't have the time to re-read now. So sorry if things seem to be a bit mixed up.

No comments:

Khalil Gibran über die Musik

Die Musik wirkt wie die Sonne, die alle Blumen des Feldes mit ihrem Strahlen zum Leben erweckt. ( Khalil Gibran ) Image by Pete Linforth fr...