Google Translate Blog: Translating Wikipedia

Translating Wikipedia

Wednesday, July 14, 2010

WikipediaHindi WikipediaGoogle TrendsTranslator Toolkitstubs

Number of non-stub Wikipedia articles by Internet users, normalized (English = 1)

We’ve also found that there are many Internet users who have used our tools to translate more than 100 million words of Wikipedia content into various languages worldwide. If you do speak another language we hope you’ll join us in bringing Wikipedia content to other languages and cultures with Translator Toolkit.

We presented these results last Saturday, July 10, at Wikimania 2010 in Gdańsk, Poland. We look forward to continuing to support the creation of the world’s largest encyclopedia and we can’t wait to work with Wikipedians and volunteers to create more content worldwide.

Posted by Michael Galvez, Product Manager

28 comments :

Ragib HasanJuly 14, 2010 at 4:05 PM
This comment has been removed by the author.
ReplyDelete
Replies
AnonymousJuly 14, 2010 at 9:57 PM
I am not qualified enough to talk like Ragib Hasan(Wiki Administrator in Bangla), who is a proud son of our Nation. But at least I can echo his opinion. Natural language processing so far has not really excelled to be called natural yet. At least from general sense we can conclude.

But of course, we should appreciate the research being done by Google. With the mistakes we will be able to see the prospect of how to overcome them.

I have a suggestion about localizing the Google's services. So far they are being translated at the background with Google's employees I guess. But at Facebook we can see a community assisted approach in translating/localizing the strings in real time. Could you use the same idea Facebook did? Or even excel beyond that and involve us to review your translations?

As a Computer Engineering graduate I also think that participation of the community in reviewing the quality of localization and also the Outputs generated by GTT can be improved. As the natural language processing learns from training more and more.

Bangla being the language of about 300million people all over the world is probably 5th in the world demography. And its an indic language too. We found Google's Bangla localization for Blogger, Gmail and many other really really lacks the quality to be used as mainstream. If we could be involved in reviewing the processing. Or as appreciation we could contribute to improve the quality of it.

I have personally participated in the Facebook localization in Bangla and currently hold the top translator in bn_IN locale.

I would request Google to involve more direct participation and reviewing from the people who will use the services. I know Google can do this just like Facebook did.

Thanks a lot.
ReplyDelete
Replies
अनुनाद सिंहJuly 14, 2010 at 11:26 PM
I am a regular contributor of Hindi Wikipedia. It is well known that the machine translation has not yet reached to a level where is resembles a natural one. But everybody will agree that it has a great potential.

I have studied google translated Hindi wiki articles. They can be said to be more than 50% 'natural' to comprehand.

I have also been regularily using tranlations of technical articles from German, Polish, french and Italian wikies. I must say that I almost always understand the information from the translations.

I thank Google for their efforts in this direction.
ReplyDelete
Replies
ThalydJuly 15, 2010 at 3:41 AM
What about spanish? i found 0 mentions to spanish in this article... Looks like Google doesnt like spain at all :(

Here. a sad spanish fan of google.
ReplyDelete
Replies
அ. இரவிசங்கர் | A. RavishankarJuly 15, 2010 at 3:50 AM
A Review on Google Translation project in Tamil Wikipedia
ReplyDelete
Replies
ManishJuly 15, 2010 at 5:43 AM
It's funny that Google translation of Anunad Singh's profile would render his name as Resonanace Singh, while leaving other names on the page unmolested. Certainly lacking parts of rudimentary semantics.
ReplyDelete
Replies
DrewJuly 15, 2010 at 7:58 AM
This graph is either very misleading or just wrong... It seems to imply that German has 2.8 times as many articles as English on Wikipedia, and that Japanese Russian and French all have more than English. The Wikipedia homepage shows that this is untrue... Am I missing something here? What does this graph mean?
ReplyDelete
Replies
Dan BurtonJuly 15, 2010 at 8:13 AM
The graph says "number of *non-stub* Wikipedia articles *by internet users*". Not quite sure what the last bit means.
ReplyDelete
Replies
ManiJuly 15, 2010 at 11:23 PM
This initiative by Google has immense potential, given the importance of native language among various communities across the world.
Ofcourse there are issues related to quality that need to be taken care and the solution is active participation by those who care about the language. It is not surprising that there are so many skeptics, given the fact that it is a Google initiative. If Governments and student communities can partner, the reach could be much higher. After all Wiki or Google are not the sole owners of cyberspace.
ReplyDelete
Replies
AnonymousJuly 16, 2010 at 12:56 PM
The number of non-stub wikipedia articles is divided by the number of internet speakers of the given language.

This means that there are more non-stub articles in German per German internet user than there are English non-stub articles per English speaking user. Since there are so many more English speakers than German speakers, the measurement is 2.8.

The measure is a good way to compare the amount of possible participation of the speakers of a given language.
ReplyDelete
Replies
அ. இரவிசங்கர் | A. RavishankarJuly 16, 2010 at 1:31 PM
Buzina,

Thanks for the explanation. If they had used the word "per" instead of "by" it would have been clear.

Mani,

There is a reason why Wikipedia has a minimum quality at least: the community, prinicples, systems and procedures followed with the right spirit. No combination of corporations, governments and others can achieve this without adhering to the same spirit.

Please also see

What happened on the Google Challenge @ the Swahili Wikipedia
ReplyDelete
Replies
வன்பாக்கம் விஜயராகவன்July 26, 2010 at 6:51 AM
What A.Ravishankar sya i.e. "There is a reason why Wikipedia has a minimum quality at least: the community, prinicples, systems and procedures followed with the right spirit." is a big joke.

Tamil Wikipedia is under the control of a cabal, of whom A.Ravishankar is one, which imposes a linguistic ideology of "Pure Tamil". In pursuit of this ideology, this cabal has systmatically abused every princple of Wikipedia. This is a serious matter. For example, Tamil wikipdia cabal has even abused the ordinarily accepted Standard Tamil keyboard, saying some letetrs are not to be preferred.

Be that as may, the ire of A.Ravishankar and other Tamil Wiki cabal against the Tamil Google translators is that they don't adhere to the "Pure Tamil" ideology. Hence their diatribe against Google translations. Tamil Google tranlators are doing an excellent job. Just becuase they don't adhere to the prejudices of this cabal, A.Ravishankar and Co are blaming them.

"the community, prinicples, systems and procedures followed with the right spirit" which A.Ravishankar talks about is this Pre Tamil ideology coomunity and principles. I hope Google Tranlsators reject the views of A.Ravishankar with contempt it deserves.The translators are doing an excellent work.

Vijayaraghavan
ReplyDelete
Replies
UnknownJuly 27, 2010 at 2:10 PM
Hi, I noticed translations from english to hebrew, got much better lately.
I would like to request a feature:
to add glossary terms from within the working window, and have the translation updated with the new information. this should be multiple word phrases as well as single words.
multiple words should take precedent, and should be replaced according to the longest phrase.
(matching a puzzle)
ReplyDelete
Replies
SundarJuly 28, 2010 at 2:56 AM
Vijayaraghavan,

Your empty assertions notwithstanding, we've been very transparent and have the full support for all decisions taken regard to the Google translation project. Besides, we have been extremely patient and have been collaborating with their team to try to make it a win-win for both sides. Without an inkling of the numerous conversations that we have had with the Google team in total good faith, don't try to settle your own personal scores here.

- Sundar
ReplyDelete
Replies
BalaAugust 5, 2010 at 12:48 PM
I am new to Tamil Wikipedia but an experienced editor in English Wikipedia. I took time to review a Tamil wiki article translated using the toolkit. The translation quality is so bad that in many places the meaning conveyed is exact opposite of the English article.

Take for example http://en.wikipedia.org/wiki/Tinto_Brass

Its translation is at

http://ta.wikipedia.org/wiki/%E0%AE%9F%E0%AE%BF%E0%AE%A9%E0%AF%8D%E0%AE%9F%E0%AF%8B_%E0%AE%AA%E0%AE%BF%E0%AE%B0%E0%AE%BE%E0%AE%B8%E0%AF%8D

There is a blooper in the first line of the translation itself

Giovanni Brass... better known as Tinto Brass, has been translated into tamil as Tinto Brass... better known as Giovanni Brass,

Further into the article "post-production" has been translated into "pre-production"

These are only two of the obvious errors. I have listed others in the Tamil wiki article's discussion page. And this is one the articles that were supposedly "corrected" by Google's hired translators.

This is not an one off example. Most of the Google translated articles have similar problems. Reading them makes the following obvious

1) Google translation toolkit for Tamil is an inferior product and is in no-shape for even beta release

2)Google's quality control is poor (or non-existent). It outsources the translation work to companies like Desi-Crew and does not care what the translators actually produce.

As i go through ta.wiki's archives, it is apparent that all these problems have been pointed out to google repeatedly. And the promises to "correct mistakes" still produce crappy output like the Tinto Brass article. I cannot help admire the patience of Tamil wikipedians who work with Google. If Google (or someone else) tried this in En.Wiki, they would have been templated and banned and the articles stubbed without hesitation.
ReplyDelete
Replies
AnonymousAugust 5, 2010 at 11:48 PM
Buzina,

Thanks for the explanation. If they had used the word "per" instead of "by" it would have been clear.
Hi, I noticed translations from english to hebrew, got much better lately.
I would like to request a feature:
to add glossary terms from within the working window, and have the translation updated with the new information. this should be multiple word phrases as well as single words.Dong Feng 21D
multiple words should take precedent, and should be replaced according to the longest phrase.
ReplyDelete
Replies
AnonymousAugust 5, 2010 at 11:49 PM
This is not an one off example. Most of the Google translated articles have similar problems. Reading them makes the following obvious

1) Google translation toolkit for Tamil is an inferior product and is in no-shape for even beta release
Wikileaks Video
2)Google's quality control is poor (or non-existent). It outsources the translation work to companies like Desi-Crew and does not care what the translators actually produce.
ReplyDelete
Replies
UnknownSeptember 9, 2010 at 6:39 AM
Is there a way how to show multiple possible translations? See this page www.slovnik.cz for each word about 5-10 translations is offered. This is probably the only missing feature in google translate.
Thanks :)
ReplyDelete
Replies
AnonymousSeptember 15, 2010 at 10:21 AM
Why you don't mention spanish as one of the hardcore languages ? I think that our community have thousands of people.
ReplyDelete
Replies
IlyasHGOctober 3, 2010 at 4:24 AM
I like it)))) Google RuLeZ!!!
ReplyDelete
Replies
AnonymousOctober 20, 2010 at 9:40 AM
that`s really very nice post. great topic.

Justin Bieber Pittsburgh - Justin Bieber Boston
ReplyDelete
Replies
LizardoOctober 21, 2010 at 9:08 AM
Good idea, I'd been thinking about that Google translation and Wikki should be important to each other.

This should evolve to an automated translation of all Wikki entries so that each entry is substantively similar in all languages.

Presently the English version of an article will be quite different from the Chinese or Spanish, or may be absent entirely. It isn't reasonable to depend upon volunteers to make the effort to duplicate the work in other languages, this needs be done automatically.

The best evolution is that the original material be kept in a universal key that can be read and written in any language preferred by the viewer.
ReplyDelete
Replies
WinnieMay 10, 2019 at 4:13 AM
Really great post It was so lovely to meet you, can't wait to catch up again for your blog.
German Translation Services

ReplyDelete
Replies
14126013548940391149October 5, 2019 at 8:40 AM
A rapper who describes rainlox himself as pewdiepie ’s best friend has pulled his egoist pati latest pro-Kremlin music hazreti yasuo video from YouTube after it set muhammet yt new record for online unpopularity.
The track, game bedel entitled Moscow, was released by Timati on techno patates the eve very good sites thank you tugay gök in the capital.
ReplyDelete
Replies
HeyronSonsuzOctober 26, 2019 at 12:02 AM
This comment has been removed by the author.
ReplyDelete
Replies
anotniaDecember 2, 2019 at 10:51 PM
Really great post for authentically information's! I would like to share you post with my others friends!
Chinese Translation Services
ReplyDelete
Replies
UnknownMay 7, 2020 at 7:03 PM
gabile gabile
gabile sohbet gabile sohbet
gabile chat gabile chat
gabile mobil gabile mobil
gay sohbet gay sohbet
gay chat gay chat
gaysohbet gaysohbet
gaychat gaychat
gay sohbet kanalları gay sohbet kanalları
gay sohbet odaları gay sohbet odaları

Gabile ve Gay Sohbet ve Chat etmenin gerçek adresinde sizlerde hemen dilediğiniz odalar üzerinden sohbet etmeye başlayın

cinsel chat cinsel chat
sex sohbet sex sohbet
cinsel sohbet cinsel sohbet
cinsel sohbet odaları cinsel sohbet odaları
sohbet sohbet
chat chat
sohbet odaları sohbet odaları
sohbet siteleri sohbet siteleri
tam sohbet tam sohbet
sohbet tam sohbet tam

Sohbet ve Cinsel sohbet odaları üzerinde sizlerde ücretsiz ve bedava mobil üzerinden chat ve muhabbet odalarına giriş yapa bilirsiniz.
ReplyDelete
Replies

Add comment

Note: Only a member of this blog may post a comment.

Google Translate Blog

Translating Wikipedia

28 comments :

Labels

Archive

Feed

Useful Links

Company-wide

Products

Around the world