Regarding International Text

Today I’d like to talk about handling Japanese text. Not about localization, but about the Strings themselves. I know a lot of indies care about their localizations and it shows. But there are some things regarding Japanese that are often lacking a bit of polish, though not through the fault of the developer. I certainly don’t expect someone to have knowledge of every language.

The main place where there is a noticeable lack of polish is searching. In Japanese, there are traditionally 3 "alphabets": hiragana, katakana, and kanji. And with the prevalence of English and other western languages, we can to that the Latin alphabet. Unfortunately, all these overlap, meaning that I can generally write the same thing in each alphabet (though honestly it’s normal for one to be more prevalent). You might be thinking that this is not really a big deal, but when searching it really gets in the way.

To search for the text "Hello World" an English user would probably follow one of follow two paths:

H -> e -> l -> l -> o -> …
h -> e -> l -> l -> o -> …

I think many people can see one problem with the text matching right away: uppercase and lowercase. Because of how Unicode works, we cannot get a perfect match without normalizing those cases. Another thing we might expect is that the search would update as the user types, narrowing down results. Now, while Japanese text has the same two issues, it is much more difficult to handle:

To search for the Japanese text "分かります" (meaning "I understand") a Japanese user could follow a large number of paths. The most likely is something like:

わ -> か -> り -> ま -> す -> <Convert Key to 分かります>

In this case, the user has entered the string in hiragana, the default output method for Japanese keyboards. It is not until they finish the enitre string and choose to convert it until the kanji can be inserted (this works the same for hiragana->katakana). Now imagine the user’s search experience. Compared to the English user, where the search results are updating with the text to narrow down results, the Japanese results won’t even contain the desired result until the user has completed typing!

Unfortunately, there is no default String.toHiragana like there is for uppercased in the Standard library. But hope is not lost; CFStringTransform(_:_:_:_:) gets us there using the kCFStringTransformHiraganaKatakana option. It’s certainly not Swifty, but it gets the job done. Now we can normalize our Japanese search data (to a point)!

(Un?)Fortunately I ran into this issue again recently, so I decided to make some Swifty extensions and bundled them into a SPM repo! Full disclosure, I based a lot of the work off a prior ObjC project that has saved me in the past.

I don’t expect everyone to immediately fix this issue, after all Japanese is just one of many languages. But next time you work on your localizations remember that alphabets are not as easy as ABC (literally).


*If you have any questions, comments, or corrections let me know on Twitter