Google Translate Blog
The official source for news on Google's translation technologies
Google Translate welcomes you to the Indic web
Tuesday, June 21, 2011
(Cross-posted from the
Official Google Blog
and on the
Research Blog
)
Beginning today, you can explore the linguistic diversity of the Indian sub-continent with
Google Translate
, which now supports five new experimental alpha languages: Bengali, Gujarati, Kannada, Tamil and Telugu. In India and Bangladesh alone, more than 500 million people speak these five languages. Since 2009, we’ve launched a total of 11 alpha languages, bringing the current number of languages supported by Google Translate to 63.
Indic languages
differ from English in many ways, presenting several exciting challenges when developing their respective translation systems. Indian languages often use the
Subject Object Verb (SOV) ordering
to form sentences, unlike English, which uses
Subject Verb Object (SVO) ordering
. This difference in sentence structure makes it harder to produce fluent translations; the more words that need to be reordered, the more chance there is to make mistakes when moving them. Tamil, Telugu and Kannada are also highly
agglutinative
, meaning a single word often includes affixes that represent additional meaning, like tense or number. Fortunately, our research to improve Japanese (an SOV language) translation helped us with the word order challenge, while our work translating languages like German, Turkish and Russian provided insight into the agglutination problem.
You can expect translations for these new alpha languages to be less fluent and include many more untranslated words than some of our more mature languages—like Spanish or Chinese—which have much more of the web content that powers our
statistical machine translation approach
. Despite these challenges, we release alpha languages when we believe that they help people better access the multilingual web. If you notice incorrect or missing translations for any of our languages, please
correct us
; we enjoy learning from our mistakes and your feedback helps us graduate new languages from alpha status. If you’re a translator, you’ll also be able to take advantage of our machine translated output when using the
Google Translator Toolkit
.
Since these languages each have their own unique scripts, we’ve enabled a transliterated input method for those of you without Indian language keyboards. For example, if you type in the word “nandri,” it will generate the Tamil word நன்றி (
see what it means
). To see all these beautiful scripts in action, you’ll need to install fonts* for each language.
We hope that the launch of these new alpha languages will help you better understand the Indic web and encourage the publication of new content in Indic languages, taking us five alpha steps closer to a web without language barriers.
*Download the fonts for each language:
Tamil
,
Telugu
,
Bengali
,
Gujarati
and
Kannada
.
Posted by Ashish Venugopal, Research Scientist
Labels
alpha languages
Android
api
BBC
Challenge
chrome
football
Google Goggles
Google Translate
Google Translate for Animals
I/O
integrations
Mobile
new languages
partnerships
Research
search
Search Stories
text-to-speech
toolbar
Translate Blog
Translate Community
translation quality
Translator Toolkit
transliteration
Wear
website translation element
Wikipedia
Youtube
Archive
2016
May
Apr
Feb
2015
Dec
Oct
Aug
Jul
Jun
May
Apr
Feb
Jan
2014
Dec
Oct
Jul
2013
Dec
Nov
Sep
Aug
Jul
May
Apr
Mar
Feb
2012
Dec
Oct
Sep
Aug
Jul
May
Apr
Mar
Feb
Jan
2011
Nov
Oct
Aug
Jun
May
Apr
Feb
Jan
2010
Dec
Nov
Oct
Aug
Jul
Jun
May
Apr
Mar
Feb
2009
Dec
Nov
Feed
Follow @google
Follow
Useful Links
About Translate
Translate Community
Translate for Android
Translate for iOS
Give us feedback in our
Product Forums
.