When mobile apps need to ship in multiple languages, the developer often hires a contractor or external service to translate all the strings. The people doing these translations are usually unfamiliar with the technical details of localization, which makes it easy for them to introduce bugs when a string contains a variable. Even if the app only ships in one language, it’s still possible to write bugs by making subtle mistakes. In order to ship a bug-free app, the developer needs some way of ensuring the format strings in every translation are correct even if they don’t speak the language.
At Asana, where we ship in 13 languages, we developed Locheck to automatically verify that every string in our .strings, .stringsdict, and strings.xml files use consistent arguments and types, and report errors to our CI pipelines. In this post, I’ll cover some challenges with localization and show how Locheck makes sure we don’t ship with bugs.
Locheck compares the language you develop into the languages you translate to, and makes sure all their types match. It can catch things like when:
A string appears in one localization but not another
An argument is used in a localization but does not appear in the base localization
An argument has different types in different localizations or different plural variants
The translation has misspelled a named variable
For .strings and .strings.xml files, this is relatively simple given a fancy enough regular expression and knowledge of the syntax. Locheck parses a string like "added %d tasks to %3$s" into a list of Swift structs:
(We are very fortunate that iOS and Android use a close enough format string syntax.)
Locheck then generates a list for each string, and then compares the same string keys across translations, logging a warning or error if they differ. Some issues might cause crashes, for example if your German translation uses %s instead of %d.
.stringsdict files are much more complicated. Here’s a shorthand version of the plural rule I showed earlier:
The %#@tasks@ substring means “recurse into tasks." You can even nest these rules:
(There is a simpler way to define this rule, but sometimes nesting is really necessary.)
These rules form a grammar, defining a set of possible strings. The rules are traversed before the format string is applied. That means in order to really be sure the arguments are correct, every permutation needs to be checked. Here are all the permutations of the .stringsdict entry example above:
Given the permutations above, look at how the arguments differ in each permutation. Without explicit positions, the second permutation might mistakenly use the value for tasks in front of milestones, and try to use a number for the string argument at the end. If we add explicit positions, these problems disappear:
Locheck knows how to expand these rules and can log intelligent errors to help you find problems.
Imagine we’re making a task list app with an activity feed.
On Android, there is built-in support for a plurals element in strings.xml for this:
On iOS, we would add an entry to our Localizable.stringsdict file:
Then in our code, we’d access the string:
And we’d get back whichever variant matched the value of numTasks we put in.
Or would we? No, we would not!
If we pass a value of 1 for numTasks, the app will actually crash, because after the system substitutes our string value, we’re really doing this:
This kind of mistake is extremely easy to make if you’re not used to thinking about these details, for example if your job is to translate text between different languages rather than write code all day, or if you’re translating to a language like Japanese where the order is often different. Even if you have a developer review every string, it can be very tricky to spot these issues. And as code and teams scale together, tricky-to-spot bugs become guaranteed-to-ship-to-production bugs.
The right thing to do is to add explicit positions to non-consecutive arguments. Instead of writing %s for our third argument, we should write %3$s, which makes it always use the third argument.
Best practice would be to use explicit positions 100% of the time, but it can be prohibitively time-consuming to retroactively add explicit positions if your source of truth is an online service like Transifex, which is true for us at Asana. And you might still get errors if the people doing the translations aren’t perfect at understanding format strings.
Locheck will catch this type of problem automatically, so it’s safe to use implicit positions. There might still be translation errors where two strings are incorrectly swapped and their format specifiers still match, but at least the app won’t crash.
You can install Locheck using Mint or Make:
Locheck emits Xcode-style errors to stderr, as well as a human-readable summary to stdout after all files are examined. It works well as an Xcode Run Script build phase, continuous integration step, or precommit script. Here’s some example output from our demo files:
While we’ve run Locheck on our own code and a few open source apps, it’s still early. If you do decide to try it out, please leave feedback as a GitHub issue. Enjoy your new localization-bug-free life!