Friday, October 16, 2009

Delphi: Using Google for online translating

Google provides its Translation API that looks very promising and useful. So I was very enthusiastic when we decided to use it in our Localizer project.
Unfortunately I can’t say that API is well documented. And of course there is no tutorial that describes how to use it in Delphi 2009 as we need to do.
I spent some time searching the web and found a few articles concerning this task but not a single of them give me a solution.
All found articles suggest to use the http://google.com/translate URL to access the translation service. It is not quite correct. First, this URL is used for end-user requests via browser. Its parameters are not documented and can be changed any time. Second, the response is a regular web page with a lot of unnecessary tags, text etc. It is a kind of headache to extract the result from it. And, as well as the URL, the response layout can be (I’d even say it definitely will be) changed in the future.

Google describes the Translation API and gives another way to solve the task. The correct URL is the following: http://ajax.googleapis.com/ajax/services/language/translate. In this case the response format is a JSON encoded result with embedded status codes.
All we need to do is to construct a properly constructed URL with all necessary CGI arguments, send an http referer header that accurately identifies our application (Google terms of use requirement), and be able to process the JSON encoded response.
So far so good. Let’s try to write the Delphi function that translate some input string. We will use the Indy TidHttp component to send the http requests.
As I defined after investigating the argument part of the constructed URL should be converted to UTF8 and then encoded. As Google says “the value of a CGI argument must be properly escaped (e.g., via the functional equivalent of Javascript's encodeURIComponent() method)”. I tried to use some standard or third-party URL-encoding functions but not a single of them do it correctly in terms of Google expectations. The main problem is that all available functions encode the source string char-by-char when Google expects the string encoded byte-by-byte. So I had to do it myself.
function URLEncode(const S: RawByteString): RawByteString;
  const
    NoConversion = ['A'..'Z', 'a'..'z', '*', '@', '.', '_', '-', '/', ':', '=', '?'];
  var
    i, idx, len: Integer;

  function DigitToHex(Digit: Integer): AnsiChar;
  begin
    case Digit of
      0..9: Result := AnsiChar(Chr(Digit + Ord('0')));
      10..15: Result := AnsiChar(Chr(Digit - 10 + Ord('A')));
    else
      Result := '0';
    end;
  end; // DigitToHex

begin
  len := 0;
  for i := 1 to Length(S) do
    if S[i] in NoConversion then
      len := len + 1
    else
      len := len + 3;
  SetLength(Result, len);
  idx := 1;
  for i := 1 to Length(S) do
    if S[i] in NoConversion then
    begin
      Result[idx] := S[i];
      idx := idx + 1;
    end
    else
    begin
      Result[idx] := '%';
      Result[idx + 1] := DigitToHex(Ord(S[i]) div 16);
      Result[idx + 2] := DigitToHex(Ord(S[i]) mod 16);
      idx := idx + 3;
    end;
end; // URLEncode

The next question is how to extract the translation from the response we get. In our case the response format is a simple JSON object similar to the snippet shown below:
{
  "responseData" : {
    "translatedText" : the-translated-text,
  },
  "responseDetails" : null | string-on-error,
  "responseStatus" : 200 | error-code
}

The best way is to use some library that works with JSON structures. For example, you may download and use the uJson unit.
For demonstration purpose it will be enough to process the response as regular string. We need to extract the status of the response (200 = OK), the translation text and the error string if status != 200.
// source - the string to be translated
// langpair - the string that defines the source and target language in special format, 
//     i.e. “en|ru”. The list of available languages and their abbreviations 
//     you may find in Translation API description
// resultString - the translation
// result - the error message if any. Empty result means that 
//     the function has been executed successfully
function googleTranslate(source : string; langpair : string; var resultString : string) : string;
var
  url, s, status : String;
  utfs : UTF8String;
  http : TidHttp;

begin
  result := '';

  http := TidHttp.Create;

  try
    utfs := UTF8String(source);
    utfs := URLEncode(utfs);
    url := 'http://ajax.googleapis.com/ajax/services/language/translate?v=1.0&q=' + String(utfs) + '&langpair=' + langpair;

    http.Request.Referer := 'http://oursite.com';
    http.Request.UserAgent := 'Our Application';
    s := http.Get(url);

    status := Copy(s, pos('"responseStatus":', s)+18, length(s));
    status := Copy(status, 0, pos('}', status)-1);

    if (status = '200') then begin //status is OK
      s := Copy(s, pos('"translatedText":', s)+18, length(s));
      resultString := Copy(s, 0, pos('"}, "responseDetails"', s)-1);
    end
    else begin //an error occured
      s := Copy(s, pos('"responseDetails":', s)+20, length(s));
      resultString := '';
      result := Copy(s, 0, pos('", "responseStatus"', s)-1);
    end;

  finally
    http.Free;
  end;
end;

At last we can try to translate something. Say, we are to translate “Hello world!” from English to Ukrainian.
var
  res, strValue : string;
…
res := googleTranslate('Hello world!', 'en|uk', strValue);
if (res = '') then //translation is OK
  ShowMessage('Translation: ' + strValue)
else //error
  ShowMessage('Error: ' + res);