

Similarly, Slator’s 2022 SaaS Localization Report found that UI-related content where “context is critical” is usually done by professionals. In a 2022 study reviewers preferred human translations over MT translations 85% of the time. In 20, two cases manifested why Google Translate should not be employed by police one judge described it as having “an alarming capacity for miscommunication and error.” A 2019 analysis of translations from English to seven target languages found accuracy ranged from 94% for Spanish to 55% for Armenian. A study on using Google Translate in the ER corroborates studies showing Google Translate’s output for medical communications is not perfect with significant variation in accuracy levels between languages.Ī 2018 study established 92% accuracy for English to Spanish and 81% for English to Chinese. The US Department of Health and Human Services has proposed a new rule outlining when/how MT may be used in healthcare situations. The current consensus is not to use Google Translate in medical, police, or creative translation settings. Questions have been raised surrounding the use of Google Translate in specific settings. While remaining wary of bias, DeepL’s evaluation generally matches other independent analyses. It should be noted that DeepL outranking Google Translate is generally specific to European languages and for natural-sounding accuracy since DeepL is typically better at preserving context and handling colloquialisms and slang. In DeepL’s evaluation (2020), its own MT tool consistently took first place for all language pairs, with Google Translate in second. Intento ( 2022) ranked Google Translate first above 18 other engines for almost all language pairs. Google Translate is now considered one of the best in terms of reliability and accuracy, particularly for high-resource languages. A reevaluation in 2019 using the same text and metric showed a 34% improvement. Google Translate on Top?Ī 2011 accuracy study on 51 languages found Google Translate performed strongly for European languages, but less so for Asian languages.

Facebook patented its own alternative in 2019 and Meta proposed XSTS (a cross-lingual variant of Semantic Textual Similarity) in 2022. Others include NIST, Word Error Rate, and METEOR. While not perfect, the most commonly used is Bilingual Evaluation Understudy ( BLEU). Many automated evaluation metrics have been used for measuring MT output quality. Some researchers deem a text accurate if over 50% of the meanings are precise others are stricter and texts are ‘failed’ if any ambiguity is present.
