Contact   •   Products   •   Search

Rick Strahl's Web Log

Wind, waves, code and everything in between...
ASP.NET • C# • HTML5 • JavaScript • AngularJs

Translating with Google Translate without API and C# Code


Some time back I created a data base driven ASP.NET Resource Provider along with some tools that make it easy to edit ASP.NET resources interactively in a Web application. One of the small helper features of the interactive resource admin tool is the ability to do simple translations using both Google Translate and Babelfish.

Here's what this looks like in the resource administration form:

LocalizationAdmin

When a resource is displayed, the user can click a Translate button and it will show the current resource text and then lets you set the source and target languages to translate. The Go button fires the translation for both Google and Babelfish and displays them - pressing use then changes the language of the resource to the target language and sets the resource value to the newly translated value. It's a nice and quick way to get a quick translation going.

Ch… Ch… Changes

Originally, both implementations basically did some screen scraping of the interactive Web sites and retrieved translated text out of result HTML. Screen scraping is always kind of an iffy proposition as content can be changed easily, but surprisingly that code worked for many years without fail. Recently however, Google at least changed their input pages to use AJAX callbacks and the page updates no longer worked the same way. End result: The Google translate code was broken.

Now, Google does have an official API that you can access, but the API is being deprecated and you actually need to have an API key. Since I have public samples that people can download the API key is an issue if I want people to have the samples work out of the box - the only way I could even do this is by sharing my API key (not allowed).  

However, after a bit of spelunking and playing around with the public site however I found that Google's interactive translate page actually makes callbacks using plain public access without an API key. By intercepting some of those AJAX calls and calling them directly from code I was able to get translation back up and working with minimal fuss, by parsing out the JSON these AJAX calls return. I don't think this particular

Warning: This is hacky code, but after a fair bit of testing I found this to work very well with all sorts of languages and accented and escaped text etc. as long as you stick to small blocks of translated text. I thought I'd share it in case anybody else had been relying on a screen scraping mechanism like I did and needed a non-API based replacement.

Here's the code:

/// <summary>
/// Translates a string into another language using Google's translate API JSON calls.
/// <seealso>Class TranslationServices</seealso>
/// </summary>
/// <param name="Text">Text to translate. Should be a single word or sentence.</param>
/// <param name="FromCulture">
/// Two letter culture (en of en-us, fr of fr-ca, de of de-ch)
/// </param>
/// <param name="ToCulture">
/// Two letter culture (as for FromCulture)
/// </param>
public string TranslateGoogle(string text, string fromCulture, string toCulture)
{
    fromCulture = fromCulture.ToLower();
    toCulture = toCulture.ToLower();

    // normalize the culture in case something like en-us was passed 
    // retrieve only en since Google doesn't support sub-locales
    string[] tokens = fromCulture.Split('-');
    if (tokens.Length > 1)
        fromCulture = tokens[0];
    
    // normalize ToCulture
    tokens = toCulture.Split('-');
    if (tokens.Length > 1)
        toCulture = tokens[0];
    
    string url = string.Format(@"http://translate.google.com/translate_a/t?client=j&text={0}&hl=en&sl={1}&tl={2}",                                     
                               HttpUtility.UrlEncode(text),fromCulture,toCulture);

    // Retrieve Translation with HTTP GET call
    string html = null;
    try
    {
        WebClient web = new WebClient();

        // MUST add a known browser user agent or else response encoding doen't return UTF-8 (WTF Google?)
        web.Headers.Add(HttpRequestHeader.UserAgent, "Mozilla/5.0");
        web.Headers.Add(HttpRequestHeader.AcceptCharset, "UTF-8");

        // Make sure we have response encoding to UTF-8
        web.Encoding = Encoding.UTF8;
        html = web.DownloadString(url);
    }
    catch (Exception ex)
    {
        this.ErrorMessage = Westwind.Globalization.Resources.Resources.ConnectionFailed + ": " +
                            ex.GetBaseException().Message;
        return null;
    }

    // Extract out trans":"...[Extracted]...","from the JSON string
    string result = Regex.Match(html, "trans\":(\".*?\"),\"", RegexOptions.IgnoreCase).Groups[1].Value;            

    if (string.IsNullOrEmpty(result))
    {
        this.ErrorMessage = Westwind.Globalization.Resources.Resources.InvalidSearchResult;
        return null;
    }

    //return WebUtils.DecodeJsString(result);

    // Result is a JavaScript string so we need to deserialize it properly
    JavaScriptSerializer ser = new JavaScriptSerializer();
    return ser.Deserialize(result, typeof(string)) as string;            
}

To use the code is straightforward enough - simply provide a string to translate and a pair of two letter source and target languages:

string result = service.TranslateGoogle("Life is great and one is spoiled when it goes on and on and on", "en", "de");
TestContext.WriteLine(result);

How it works

The code to translate is fairly straightforward. It basically uses the URL I snagged from the Google Translate Web Page slightly changed to return a JSON result (&client=j) instead of the funky nested PHP style JSON array that the default returns.

The JSON result returned looks like this:

{"sentences":[{"trans":"Das Leben ist großartig und man wird verwöhnt, wenn es weiter und weiter und weiter geht","orig":"Life is great and one is spoiled when it goes on and on and on","translit":"","src_translit":""}],"src":"en","server_time":24}

I use WebClient to make an HTTP GET call to retrieve the JSON data and strip out part of the full JSON response that contains the actual translated text. Since this is a JSON response I need to deserialize the JSON string in case it's encoded (for upper/lower ASCII chars or quotes etc.).

Couple of odd things to note in this code:

First note that a valid user agent string must be passed (or at least one starting with a common browser identification - I use Mozilla/5.0). Without this Google doesn't encode the result with UTF-8, but instead uses a ISO encoding that .NET can't easily decode. Google seems to ignore the character set header and use the user agent instead which is - odd to say the least.

The other is that the code returns a full JSON response. Rather than use the full response and decode it into a custom type that matches Google's result object, I just strip out the translated text. Yeah I know that's hacky but avoids an extra type and firing up the JavaScript deserializer. My internal version uses a small DecodeJsString() method to decode Javascript without the overhead of a full JSON parser.

It's obviously not rocket science but as mentioned above what's nice about it is that it works without an Google API key. I can't vouch on how many translates you can do before there are cut offs but in my limited testing running a few stress tests on a Web server under load I didn't run into any problems.

Limitations

There are some restrictions with this: It only works on single words or single sentences - multiple sentences (delimited by .) are cut off at the
".". There is also a length limitation which appears to happen at around 220 characters or so. While that may not sound  like much for typical word or phrase translations this this is plenty of length.

Use with a grain of salt - Google seems to be trying to limit their exposure to usage of the Translate APIs so this code might break in the future, but for now at least it works.

FWIW, I also found that Google's translation is not as good as Babelfish, especially for contextual content like sentences. Google is faster, but Babelfish tends to give better translations. This is why in my translation tool I show both Google and Babelfish values retrieved. You can check out the code for this in the West Wind West Wind Web Toolkit's TranslationService.cs file which contains both the Google and Babelfish translation code pieces. Ironically the Babelfish code has been working forever using screen scraping and continues to work just fine today. I think it's a good idea to have multiple translation providers in case one is down or changes its format, hence the dual display in my translation form above.

I hope this has been helpful to some of you - I've actually had many small uses for this code in a number of applications and it's sweet to have a simple routine that performs these operations for me easily.

Resources

Make Donation
Posted in CSharp  HTTP  


Feedback for this Post

 
# re: Translating with Google Translate without API and C# Code
by Knagis August 07, 2011 @ 12:33am
And then there is Bing Translator for which you can use the web service API and that translates up to ~10kb in one chunk.
# re: Translating with Google Translate without API and C# Code
by Rick Strahl August 07, 2011 @ 4:48am
@Knaggis, yup - but you do need an API key for that which was one of the reasons I use the two above. Might be able to rig something up with BING translate as well though.
# re: Translating with Google Translate without API and C# Code
by Bruno Alexandre August 08, 2011 @ 6:43am
Keep in mind that there was an imense fuzz about Google shutting down languages API, I have no idea how's that at this point as I wasn't following up, but just as a warning.

more reading:

http://blog.gts-translation.com/2011/05/30/why-larry-page-killed-google-translate-api-and-other-assorted-thoughts/
# re: Translating with Google Translate without API and C# Code
by Rick Strahl August 08, 2011 @ 4:46pm
@Bruno - yes, but this call uses the same code Google uses on their public Web site, so hopefully this won't be affected, since obviously they use this themselves. It doesn't appear they are rubber stamping those requests any special way, though they might shut this down in the future. For now it works and without an API key.
# re: Translating with Google Translate without API and C# Code
by Robert McKee March 30, 2012 @ 1:30pm
This would be a better way of getting the Neutral Culture than taking everything before the first dash:
            // normalize the culture in case something like en-us was passed 
            // retrieve only en since Google doesn't support sub-locales
            fromCulture = GetNeutralCulture(fromCulture).TwoLetterISOLanguageName;
            toCulture = GetNeutralCulture(toCulture).TwoLetterISOLanguageName;
 
            // Override since google doesn't understand zh-Hans/zh-Hant
            if (fromCulture == "zh")
            {
                fromCulture = GetNeutralCulture(fromCulture).ThreeLetterISOLanguageName == "CHT" ? "zh-TW" : "zh-CN";
            }
 
            if (toCulture == "zh")
            {
                toCulture = GetNeutralCulture(toCulture).ThreeLetterISOLanguageName == "CHT" ? "zh-TW" : "zh-CN";
            }
 
        public System.Globalization.CultureInfo GetNeutralCulture(string culture)
        {
            return GetNeutralCulture(System.Globalization.CultureInfo.CreateSpecificCulture(culture));
        }
 
        public System.Globalization.CultureInfo GetNeutralCulture(System.Globalization.CultureInfo ci)
        {
            System.Globalization.CultureInfo ci2 = ci;
            while (!ci2.IsNeutralCulture && ci2.Parent.Name != "")
                ci2 = ci2.Parent;
            return ci2;
        }
# re: Translating with Google Translate without API and C# Code
by Robert McKee April 02, 2012 @ 10:21am
Here is a better babelfish translator, that works with Chinese (zh):

        public string TranslateBabelFish(string Text, string FromCulture, string ToCulture)
        {
            FromCulture = GetNeutralCulture(FromCulture).TwoLetterISOLanguageName;
            ToCulture = GetNeutralCulture(ToCulture).TwoLetterISOLanguageName;
 
            // Override since yahoo doesn't understand zh-Hans/zh-Hant
            if (FromCulture == "zh")
            {
                if (GetNeutralCulture(FromCulture).ThreeLetterISOLanguageName == "CHT")
                {
                    FromCulture = "zt";
                }
            }
 
            if (ToCulture == "zh")
            {
                if (GetNeutralCulture(ToCulture).ThreeLetterISOLanguageName == "CHT")
                {
                    ToCulture = "zt";
                }
            }
            string LangPair = FromCulture + "_" + ToCulture;
 
            string url = string.Format(@"http://babelfish.yahoo.com/translate_txt?ei=UTF-8&doit=done&fr=bf-home&intl=1&tt=urltext&trtext={0}&lp={1}&btnTrTxt=Translate",
                                       HttpUtility.UrlEncode(Text), LangPair);
 
            // Retrieve Translation with HTTP GET call
            string Html = null;
            try
            {
                WebClient web = new WebClient();
 
                // MUST add the following browser user agent or else yahoo doesn't respond correctly (WTF Yahoo?)
                web.Headers.Add(HttpRequestHeader.UserAgent, "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)");
 
                // Make sure we have response encoding to UTF-8
                web.Encoding = Encoding.UTF8;
                Html = web.DownloadString(url);
            }
            catch (Exception ex)
            {
                ErrorMessage = Resources.Resources.ConnectionFailed + ": " +
                                    ex.GetBaseException().Message;
                return null;
            }
 
            // <div id="result"><div style="padding:0.6em;">Hallo</div></div>
            string Result = StringUtils.ExtractString(Html, "<div id=\"result\">", "</div>");
            if (Result == "")
            {
                ErrorMessage = "Invalid search result. Couldn't find marker.";
                return null;
            }
            Result = Result.Substring(Result.LastIndexOf(">") + 1);
 
            return HttpUtility.HtmlDecode(Result);
        }
 


West Wind  © Rick Strahl, West Wind Technologies, 2005 - 2014