<<

i18n

By Paul Hastings (Bio)
gist

What is Globalization?

The process of making an application ready for global usage is globalization, or G11N (for the 11 letters between the "g" and the "n" in globalization). Globalization consists of two steps: internationalization, or I18N (for the 18 letters between the "i" and "n" in internationalization), and localization or L10N (for the 10 letters between "l" and "n" in localization—if you're sensing a pattern here, yes there is, people working in this field are particularly fond of numeronyms). The atomic units for globalization are locales. Locales are the most important piece of G11N.

Locales

Locales are languages and calendars; date, number, and currency formatting; spelling; writing system direction; etc., that are specific to a geographic region. For instance, the English (color) and date formats (month/day/year) used in Brooklyn are not exactly the same as the English (colour) and date formats (day/month/year) used in Perth.

Since version 7, ColdFusion locales are based on core Java locales, which means the locale information concerning formatting, calendars, etc. are an integral part of the underlying Java platform and vetted by the Unicode Consortium—you do not have to research these yourself. Locale resources are accessed via locale identifiers in the form of language_Country, such as en_US for the English language used in the United States, or fr_CA for the French language as used in Canada. Take note of the case, as in all things related to core Java, it matters.

Internationalization (I18N)

The process of I18N is the first step in making your application globalized. It consists of stripping away any hard-coded text and date, time and number formatting that tie it to a specific locale and replacing these with variables or functions that can dynamically return locale appropriate content or formatting. This allows for the application to be adapted to various locales without major engineering changes. Considering the amount of work this can involve, it is perhaps best done from the outset rather after the fact (though its quite common for applications to be undergo G11N well after development has completed).

Localization (L10N)

Once an application has undergone I18N, the next step is making it ready for use in a given locale. This is done by translating all of the application's text into the various languages that your application will support as well as using appropriate locale-based functions to format dates, numbers, currency, etc. These function are usually prefixed by "LS" such as LSdatetimeFormat. You can find a reference for all ColdFusion functions and tags related to globalization here: http://adobe.ly/UrOesD

Note that during the translation process, locales need to be considered as well. It helps fine tune the content to a specific region.

Simple I18N Example

Lets say you have an original web application dealing with appointments and had something like the following block of code:

You have the following appointments for #dateFormat(now())#:
<ul>
	<cfoutput query="todaysAppointments">
		<li>#appointmentWith#: #timeFormat(appointment)#</li>
	</cfoutput>
</ul>

After I18N that would look like the following:

<cfprocessingdirective pageencoding="UTF-8">
<cfset setLocale( session.locale) />
#appointmentsRB[session.locale].appointmentsText #LsdateFormat(now(),"FULL")#:
<ul>
	<cfoutput query="todaysAppointments">
		<li>#appointmentWith#: #LstimeFormat(appointment,"MEDIUM")#</li>
	</cfoutput>
</ul>

where

  • cfprocessingdirective pageencoding="UTF-8" identifies the character encoding used on this page, UTF-8.
  • session.locale is a session variable holding the user's preferred locale, usually set at login or application initialization.
  • setLocale(session.locale) tells ColdFusion what locale this page will be using.
  • appointmentsRB is a resource bundle (see below) as a ColdFusion structure holding the application's translated text.
  • appointmentsText is the structure value holding the translated text for "You have the following appointments for".

Some Things to Consider when Building a Globalized Application

Locales

Since your entire globalized application flows from a user's locale it is critical that this be captured and stored for use throughout the application. The easiest way of capturing a user's locale is simply to ask for it. For example, if your application requires registration, ask for it then. As a tip, it's usually a good idea to display locale choices in that language: French in French, Thai in Thai, etc.

If asking isn't possible, a stealthier approach is to examine the user's browser-provided data such as http_accept_language which is accessible via the CGI scope. You can also supplement this by examining the user's IP address and looking up their country from one of the many online geolocation services. Note that it's a good practice, however, to also provide a way for a user to manually change their locale.

It's important that the user's locale be persisted and used throughout the application. A session-scoped variable is often the best choice for this. The relevant ColdFusion function for handling locales are:

  • setLocale: Tells ColdFusion to use the supplied locale identifier for the current page.
  • getLocale: Returns the locale ColdFusion is currently using.

Character Encoding/Writing Systems

A character encoding is a map for each character in a language to a numeric code that can be represented in a computer. For a variety of reasons, there are often more than one encoding for a language. Japanese, for example, is represented by Shift-JIS, EUC-JP, and ISO-2022-JP encodings. Since each encoding is basically the same set of numbers, data and application content couldn't be dynamic. For instance, in a web application, each web page had to be encoded in that language's code page--developers therefore had to develop and maintain one page per encoding the application needed to support. Data for each encoded web page had to be stored in its own column in a table or its own table in a database. Prior to the creation of Unicode in 1987, developers always had to climb this encoding Tower of Babel, making globalized applications extremely costly and very limited in scope. Unicode allows for a single, standard encoding scheme for much of the world's languages. Since all modern databases now support it, it's the logical choice for character encoding for globalized applications. In short, just use Unicode.

A language's writing system dictates the way a language is written. There are basically three flavors:

  • Left-to-right (LTR): used in Western languages such as English, French, German, etc.
  • Right-to-left (RTL): used in Middle Eastern languages such as Hebrew and Arabic.
  • Vertical: used in some Asian languages (traditional Japanese uses tategaki , top-to-bottom, right to left format as does traditional Chinese from which it was derived—modern forms of Japanese and Chinese follow a LTR format, some print formats still use a vertical layout).

Writing systems have an impact on an application's layout and display. It's especially important for RTL languages. Not only is the text RTL, but the whole application layout needs to be RTL—visual attention is no longer in the upper-left corner but instead in the upper-right.

Resource Bundles

As mentioned above, one common way to handle localized text at run time is via resource bundles. All static text in the application is replaced by variables that hold that specific text translated into the different languages that need to be supported. A resource bundle can be a simple ColdFusion structure:

<cfscript>
  	appointments={};
    	appointments.en_US.greeting="Hello"; (American English)
    	appointments.fr_FR.greeting="Bonjour"; (French)
    	appointments.de_DE.greeting="Hallo"; (German)
    	appointments.sv_SE.greeting="Hallå" (Swedish)
    	appointments.th_TH.greeting="สวัสดี"; (Thai)
    	appointments.ja_JP.greeting="もしもし"; (Japanese)
</cfscript>

where the various locales in this structure could be accessed at run time by supplying a locale key (en_US, fr_FR, etc.). This approach, however, is very cumbersome to maintain, especially as the application grows in complexity or the number of locales it supports. Another, perhaps better approach, is to use files or a database to hold the translated text, but this requires advanced methodologies beyond the scope of this course and will be addressed in week 2.

Dates, Times, Calendars and Time Zones

Date formats are particularly vexing to many developers. For instance, a date string of 1/2/2012 in en_US locale is January 2, 2012, while in en_AU it's 1 February 2012; that’s quite a difference if you're making an appointment. To ensure that date and time formats are correct for a user's locale, use the following functions:

  • LStimeFormat
  • LSdateFormat

to format dates and times.

Note that it's usually a good idea to format dates and times using one of the regular masks, FULL, LONG, MEDIUM, or SHORT to ensure your application isn't breaking some cultural rule as well as to make sure the dates and times the application displays to the user can also be parsed back to a ColdFusion date-time object via LSParsedatetime function.

ColdFusion dates are based on the Gregorian calendar which will suffice for many locales. While this is an advanced topic, it's important to note that there are several other calendars in common use globally:

  • Islamic calendar
  • Buddhist calendar
  • Chinese calendar
  • Indian calendar
  • Persian calendar

Handling non-Gregorian calendars will be covered in week 2 of this course.

Time zones are another issue to many developers, especially if users are in one time zone while the ColdFusion server is in another. Again, this is an advanced topic but developers should take note of the following points:

  • ColdFusion considers all date-times to be in the server's time zone; ColdFusion doesn't care about your intentions, just the server's time zone.
  • If ColdFusion handles any date-times, these will be converted to the server's time zone.
  • The simplest option for handling time zone issues is not to store date-time objects but instead store epoch offsets such as Java's (milliseconds since 1-jan-1970) .
  • Basic timezone handling is done via the getTimeZoneInfo, which returns a structure holding the server's time zone information.

Numbers/Currencies

Similar to date and time formats, it's important to display numbers and currencies in the locale’s correct format. For example, 123456789.10 in en_US locale would be understandably displayed as 123,456,789 to Americans. The same number would be displayed as 123 456 789 in fr_FR locale and unfamiliar to most Americans. The relevant ColdFusion functions for formatting numbers and currencies are:

  • LSnumberFormat
  • LScurrencyFormat

Note that the masks used in these function will map the dollar sign ($), decimal (.) and comma (,) to the appropriate locale symbols. Also note that the LScurrencyFormat function will not convert between currencies; it's not a function for handling exchange rates.

Databases

As mentioned above, modern databases are all Unicode-capable and shouldn't be an issue when considering which one to deploy. It's only important that any database-specific setup, data types, collations, etc. be followed to ensure your application's database is able to fully use Unicode and perform sorting in a locale specific way.