Google Translate Blog
The official source for news on Google's translation technologies
How Google Translate squeezes deep learning onto a phone
Wednesday, July 29, 2015
Today we
announced
that the
Google Translate app
now does real-time visual translation of 20 more languages. So the next time you’re in Prague and can’t read a menu, we’ve got your back. But how are we able to recognize these new languages?
In short: deep neural nets. When the Word Lens team joined Google, we were excited for the opportunity to work with some of the leading researchers in deep learning. Neural nets have gotten a lot of attention in the last few years because they’ve set all kinds of records in
image recognition
. Five years ago, if you gave a computer an image of a cat or a dog, it had trouble telling which was which. Thanks to convolutional neural networks, not only can computers tell the difference between cats and dogs, they can even recognize different breeds of dogs. Yes, they’re good for more than just
trippy art
—if you're translating a foreign menu or sign with the latest version of Google's Translate app, you're now using a deep neural net. And the amazing part is it can all work on your phone, without an Internet connection. Here’s how.
Step by step
First, when a camera image comes in, the Google Translate app has to find the letters in the picture. It needs to weed out background objects like trees or cars, and pick up on the words we want translated. It looks at blobs of pixels that have similar color to each other that are also near other similar blobs of pixels. Those are possibly letters, and if they’re near each other, that makes a continuous line we should read.
Second, Translate has to recognize what each letter actually is. This is where deep learning comes in. We use a convolutional neural network, training it on letters and non-letters so it can learn what different letters look like.
But interestingly, if we train just on very “clean”-looking letters, we risk not understanding what real-life letters look like. Letters out in the real world are marred by reflections, dirt, smudges, and all kinds of weirdness. So we built our letter generator to create all kinds of fake “dirt” to convincingly mimic the noisiness of the real world—fake reflections, fake smudges, fake weirdness all around.
Why not just train on real-life photos of letters? Well, it’s tough to find enough examples in all the languages we need, and it’s harder to maintain the fine control over what examples we use when we’re aiming to train a really efficient, compact neural network. So it’s more effective to simulate the dirt.
Some of the “dirty” letters we use for training. Dirt, highlights, and rotation, but not too much because we don’t want to confuse our neural net.
The third step is to take those recognized letters, and look them up in a dictionary to get translations. Since every previous step could have failed in some way, the dictionary lookup needs to be approximate. That way, if we read an ‘S’ as a ‘5’, we’ll still be able to find the word ‘5uper’.
Finally, we render the translation on top of the original words in the same style as the original. We can do this because we’ve already found and read the letters in the image, so we know exactly where they are. We can look at the colors surrounding the letters and use that to erase the original letters. And then we can draw the translation on top using the original foreground color.
Crunching it down for mobile
Now, if we could do this visual translation in
our data centers
, it wouldn’t be too hard. But a lot of our users, especially those getting online for the very first time, have slow or intermittent network connections and smartphones starved for computing power. These low-end phones can be about 50 times slower than a good laptop—and a good laptop is already much slower than the data centers that typically run our image recognition systems. So how do we get visual translation on these phones, with no connection to the cloud, translating in real-time as the camera moves around?
We needed to develop a very small neural net, and put severe limits on how much we tried to teach it—in essence, put an upper bound on the density of information it handles. The challenge here was in creating the most effective training data. Since we’re generating our own training data, we put a lot of effort into including just the right data and nothing more. For instance, we want to be able to recognize a letter with a small amount of rotation, but not too much. If we overdo the rotation, the neural network will use too much of its information density on unimportant things. So we put effort into making tools that would give us a fast iteration time and good visualizations. Inside of a few minutes, we can change the algorithms for generating training data, generate it, retrain, and visualize. From there we can look at what kind of letters are failing and why. At one point, we were warping our training data too much, and ‘$’ started to be recognized as ‘S’. We were able to quickly identify that and adjust the warping parameters to fix the problem. It was like trying to paint a picture of letters that you’d see in real life with all their imperfections painted just perfectly.
To achieve real-time, we also heavily optimized and hand-tuned the math operations. That meant using the mobile processor’s
SIMD
instructions and tuning things like matrix multiplies to fit processing into all levels of cache memory.
In the end, we were able to get our networks to give us significantly better results while running about as fast as our old system—great for translating what you see around you on the fly. Sometimes new technology can seem very abstract, and it's not always obvious what the applications for things like convolutional neural nets could be. We think breaking down language barriers is one great use.
Posted by Otavio Good, Software Engineer, Google Translate
(Cross-posted on the
Google Research Blog
)
See the world in your language with Google Translate
Wednesday, July 29, 2015
The
Google Translate app
already lets you instantly visually translate printed text in seven languages. Just open the app, click on the camera, and point it at the text you need to translate—a street sign, ingredient list, instruction manual, dials on a washing machine. You'll see the text transform live on your screen into the other language. No Internet connection or cell phone data needed.
Today, we’re updating the Google Translate app again—expanding instant visual translation to 20 more languages (for a total of 27!), and making real-time voice translations a lot faster and smoother—so even more people can experience the world in their language.
Instantly translate printed text in 27 languages
We
started out
with seven languages—English, French, German, Italian, Portuguese, Russian and Spanish—and today we're adding 20 more. You can now translate to and from English and Bulgarian, Catalan, Croatian, Czech, Danish, Dutch, Filipino, Finnish, Hungarian, Indonesian, Lithuanian, Norwegian, Polish, Romanian, Slovak, Swedish, Turkish and Ukrainian. You can also do one-way translations from English to Hindi and Thai. (Or, try snapping a pic of the text you’d like translated—we have a total of 37 languages in camera mode.)
To try out the new languages, go to the Google Translate app, set “English” along with the language you’d like to translate, and click the camera button; you'll be prompted to download a small (~2 MB) language pack for each.
Ready to see all of these languages in action?
And how exactly did we get so many new languages running on a device with no data connection? It’s all about convolutional neural networks (whew)—geek out on that over on our
Research blog
.
Have a natural, smoother conversation—even with a slower mobile network
In many emerging markets, slow mobile networks can make it challenging to access many online tools - so if you live in an area with unreliable mobile networks, our other update today is for you. In addition to instant visual translation, we’ve also improved our voice conversation mode (enabling real-time translation of conversations across 32 languages), so it’s even faster and more natural on slow networks.
These updates are coming to both Android and iOS, rolling out over the next few days.
Translate Community helps us get better every day
On top of today’s updates, we’re also continuously working to improve the quality of the translations themselves and to add new languages. A year ago this week, we launched
Translate Community
, a place for multilingual people from anywhere in the world to provide and correct translations. Thanks to the millions of language lovers who have already pitched in—more than 100 million words so far!—
we've been updating our translations
for over 90 language pairs, and plan to update many more as our community grows.
We’ve still got lots of work to do: more than half of the content on the Internet is in English, but only around 20% of the world’s population speaks English. Today’s updates knock down a few more language barriers, helping you communicate better and get the information you need.
Posted by Barak Turovsky, Product Lead, Google Translate
(Cross-posted on the
Official Google Blog
)
Labels
alpha languages
Android
api
BBC
Challenge
chrome
football
Google Goggles
Google Translate
Google Translate for Animals
I/O
integrations
Mobile
new languages
partnerships
Research
search
Search Stories
text-to-speech
toolbar
Translate Blog
Translate Community
translation quality
Translator Toolkit
transliteration
Wear
website translation element
Wikipedia
Youtube
Archive
2016
May
Apr
Feb
2015
Dec
Oct
Aug
Jul
Jun
May
Apr
Feb
Jan
2014
Dec
Oct
Jul
2013
Dec
Nov
Sep
Aug
Jul
May
Apr
Mar
Feb
2012
Dec
Oct
Sep
Aug
Jul
May
Apr
Mar
Feb
Jan
2011
Nov
Oct
Aug
Jun
May
Apr
Feb
Jan
2010
Dec
Nov
Oct
Aug
Jul
Jun
May
Apr
Mar
Feb
2009
Dec
Nov
Feed
Follow @google
Follow
Useful Links
About Translate
Translate Community
Translate for Android
Translate for iOS
Give us feedback in our
Product Forums
.