Recently, I had to build a Java application that reads a series of data and puts “human readable” labels on it. Some of the texts the application has to generate use ordinals like “This is the _first _report…” and they have to be displayed in three different languages, so instead of using a non-scalable solution (if/else, switch), I decided to generate the messages from the numeric data using something more flexible. A web search turned up ICU4J(from the project International Components for Unicode) as a common solution, so I decided to give it a go.
The technical documentation is quite complete, but I found the samples and usability of the library a bit lacking, especially in the localization aspect, so I decided to show my results and contribute a bit back.
If you want to convert numbers into their text ordinals counterparts in Java, the first step would be to include the ICU4J library in your project, for example using.
Once we have done that, there are several ways you can display a message with ICU4J, but when you are dealing with ordinals, you have to become familiar with the RuleBasedNumberFormat concept. A RuleBasedNumberFormat uses a given locale and a format specification (duration, numbering, ordinal or spellout) to format numbers using a specified rule. For example, 1 in English as a spelled out ordinal is “first”, as a spelled out cardinal is “one”, as duration “with words” is “1 second”. In our case, we want the spelled out ordinal (“_%spellout-ordinal_” rule), so we can test it like this:
So far, so good, but how about doing the same with other languages, like Spanish? Well, if we try we’ll get a runtime error because with the Spanish locale we don’t get the same rules as with the English one.
So, how do you find out what rules are in there for a given locale? Well, you can open the jar files and browse some binary files, as I did first :O, or you can use the API and find it the easy way, with this code.
After executing this code, we can see the problem: Spanish has genders so we cannot use the same rule for masculine and feminine nouns. On top of that, the masculine form is different when you use it as a noun or an adjective. In our sample sentence, “This is the _first _report…”, the noun, report, in Spanish is masculine and the ordinal is used as an adjective, so we’ll need to use the rule “%spellout-ordinal-masculine-adjective”. Let’s try again:
Great, that works. But the code is kind of clumsy and we would have to change the rule depending on the context… there must be an easy way, right? Well, I was about to try to extend the MessageFormat class to handle this use case more easily when I realized IC4J already does that, and their class is named… MessageFormat, oh yeah.
In this case, when we create the MessageFormat instance, we have to choose which locale will be used along with it, and then we can provide the rule as part of the patterns for the parameters. For example, we can create two different MessageFormat instances with the messages in English and Spanish, and tell them, in the patterns, to use the appropriate rule in each case. After that, a call to MessageFormat.format, passing the number as a parameter, will return the correct text.
Let’s test if we can adapt the messages easily. In some contexts, we could translate the English word “report” for the Spanish word “memoria” instead of “informe”. “Memoria” is feminine, so we just have to change the rule specified in the pattern and…
With that logic in place, it is fairly easy to create files that contain the MessageFormat text for each locale and from that, render the messages correctly.
There are plenty of other things you can do with the library, like creating your own definitions if your locale is not supported by default etc., so keep playing with the library if you are interested.
PS: Photo by Vladislav Klapin on Unsplash