Friday 5 March 2010

Translation Tools

A main ingredient for cross language information retrieval (CLIR) applications are translations. As I am currently dealing with a multi-language setup, I needed to look at possibilities to translate text documents.

The translation service of Google seems to do a good service in research projects. At least it has been mentioned repeatedly during the CLEF 2009 wokshop in Corfu. An alternative is Microsoft's translate web-service. Both can be accessed via a REST-ful web-service API.

With a few lines of code I managed to include both interfaces into a Java project. I started a batch translation of documents and summarize a few observations about the two:

  • Speed: Google seems to respond slightly quicker, while Mircosoft's translation takes longer. I did not track the response times precisely, but there is definitly a difference.
  • Text length: Google translates text up to a length of 5000 characters. Everything beyond that results in an error messages. Also Microsoft puts some restriction on the length of the text. I looked briefly at the API reference, but did not find a description of where the limit is. However, when trying to translate longer texts I got error messages.
  • Reponse format: Google delivers the translations in the JSON format. Microsoft gives a plan text format. For both I am not entirely sure, whether there are options for configuring the results.
  • Registration: Microsoft requires an application ID with each request. The ID can be generated online. Google encourages the use of an application key, but does not require it.

Resumee: Well, so far I did not compare the quality of the translations, neither did I look in all the possibilities of the APIs. However, Googles faster response and more clearly expressed limitations for text length seem to favour this solution.

Links:

Google Language API
Microsoft Translate API

No comments:

Post a Comment