Setting Locale

In the end the used character encoding doesn’t matter much, as long as it’s a Unicode encoding, i.e. one which can be used to encode all Unicode characters.

http://perlgeek.de/en/article/set-up-a-clean-utf8-environment

Choosing an encoding

In the end the used character encoding doesn’t matter much, as long as it’s a Unicode encoding, i.e. one which can be used to encode all Unicode characters.

UTF-8 is usually a good choice because it efficiently encodes ASCII data too, and the character data I typically deal with still has a high percentage of ASCII chars. It is also used in many places, and thus one can often avoid conversions.

Whatever you do, chose one encoding and stick to it, for your whole system. On Linux that means text files, file names, locales and all text based applications (mutt, slrn, vim, irssi, …).

For the rest of this article I assume UTF-8, but it should work very similarly for other character encodings.

Locales: installing

Check that you have the locales package installed. On Debian you can do that with.

$ dpkg -l locales
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Cfg-files/Unpacked/Failed-cfg/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Hold/Reinst-required/X=both-problems (Status,Err: uppercase=bad)
||/ Name           Version        Description
+++-==============-==============-============================================
ii  locales        2.7-18         GNU C Library: National Language (locale) da

The last line is the important one: if it starts with ii, the package is installed, and everything is fine. If not, install it. As root, type

$ aptitude install locales

If you get a dialog asking for details, read on to the next section.

Locales: generation

make sure that on your system an UTF-8 locale is generated. As root, type

$ dpkg-reconfigure locales

You’ll see a long list of locales, and you can navigate that list with the up/down arrow keys. Pressing the space bar toggles the locale under the cursor. Make sure to select at least one UTF-8 locale, for example en_US-UTF-8 is usually supported very well. (The first part of the locale name stands for the language, the second for the country or dialect, and the third for the character encoding).

In the next step you have the option to make one of the previously selected locales the default. Picking a default UTF-8 locale as default is usually a good idea, though it might change how some programs work, and thus shouldn’t be done servers hosting sensitive applications.

Locales: configuration

If you chose a default locale in the previous step, log out completely and then log in again. In any case you can configure your per-user environment with environment variables.

The following variables can effect programs: LANG, LANGUAGE, LC_CTYPE, LC_NUMERIC, LC_TIME, LC_COLLATE, LC_MONETARY, LC_MESSAGES, LC_PAPER, LC_NAME, LC_ADDRESS, LC_TELEPHONE, LC_MEASUREMENT, LC_IDENTIFICATION.

Most of the time it works to set all of these to the same value. Instead of setting all LC_ variables separately, you can set the LC_ALL. If you use bash as your shell, you can put these lines in your~/.bashrc and ~/.profile files:

export LC_ALL=en_US.UTF-8
export LANG=en_US.UTF-8
export LANGUAGE=en_US.UTF-8

To make these changes active in the current shell, source the .bashrc:

$ source ~/.bashrc

All newly started interactive bash processes will respect these settings.

A Warning about Non-Interactive Processes

There are certain processes that don’t get those environment variables, typically because they are started by some sort of daemon in the background.

Those include processes started from cronat, init scripts, or indirectly spawned from init scripts, like through a web server.

You might need to take additional steps to ensure that those programs get the proper environment variables.

Locales: check

Run the locale program. The output should be similar to this:

LANG=en_US.UTF-8
LANGUAGE=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=en_US.UTF-8

If not you’ve made a mistake in one of the previous steps, and need to recheck what you did.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>