Sean Crist's Homepage Home > Professional > Pages > Language/Region Codes Contact

Language/Region codes: a list of likely combinations



Short summary

Suppose you have a field containing language-region codes, such as en-us or ja-jp, and you want to validate this field. You can do the following:

While there are ISO standards for language codes and for region codes, there are no defined standards for language-region combinations. Any combination of language + region is legal. mn-ie (Mongolian as spoken in Ireland) or cs-jm (Czech as spoken in Jamaica) are unlikely, but just as legal as en-us or ja-jp.

However, you might want to check your field against a reasonable list of likely combinations. This page provides a free list of this type, carefully compiled by a professional linguist who works in natural language processing. In an industry context, this list will probably suffice for most practical validation purposes.



Details

This project grew out of a work project where I had to draw together linguistic data from many different sources. I observed that language/region codes are not always correct. I needed an automated means to detect likely errors and flag them.

For example, one source of data included the code ar-ar. The language code ar means Arabic, but the region code ar means Argentina. I am not aware of any significant community of Arabic speakers in Argentina, and feel safe in considering this to be an error. Whoever created the data may have meant "Arabic as spoken in (Saudi) Arabia", or perhaps followed the model of codes such as de-de or fr-fr where the language and region codes happen to be the same.

I looked around for a suitable list to use for validation purposes, but was somewhat surprised that I could not find one. I could see from questions on sites such as Stack Overflow that others had need for such a resource. So, I decided to put together such a list myself—not just for my own immediate purposes, but rather to solve the problem generally. My aim is to include most combinations which have some likelihood of appearing in a major international commercial software product.

There can obviously be no definitive list of "likely" combinations. I think that the list below is a well-informed one, but it necessarily involved some individual judgment.


How the list was prepared

Here is how I created the list:

There are some ISO 639-1 codes which don't appear in the list below. For example, ISO 639-1 has codes for Latin, Esperanto, Manx, Fulah, and Nauru. I didn't include these languages in the list, for the simple reason that they didn't appear in any of the vendor sources that I consulted listing languages supported in actual software products. The point is not to cover every language, but rather to cover languages likely to be encountered in the context of commercial software.

The ISO codes are up-to-date as of May 2018. Be aware that the ISO does occasionally make revisions to the country codes, reflecting changing political situations.

To the best of my knowledge, the information in the list is accurate as of May 2018. However, I make no guarantees regarding the list. I also don’t guarantee that I will keep the list continuously updated to reflect changes in the ISO standards. Reports of mistakes are of course welcome.


Intellectual property considerations

Under U.S. copyright law, facts cannot be copyrighted. A collection of facts can sometimes be copyrighted if the author exercised substantial creativity in the selection or arrangement of facts. In the unlikely event that I created any copyrightable interest in the list, I hereby assign the list to the public domain.

Also, I confirmed that my employer is making no intellectual property claim regarding the list under the terms of my employment contract. So, you can use the list as you please.




The list

Click here for the same list as a tab-separated text file.

Code Language Region
af-za Afrikaans South Africa
am-et Amharic Ethiopia
ar-ae Arabic United Arab Emirates
ar-bh Arabic Bahrain
ar-dz Arabic Algeria
ar-eg Arabic Egypt
ar-iq Arabic Iraq
ar-jo Arabic Jordan
ar-kw Arabic Kuwait
ar-lb Arabic Lebanon
ar-ly Arabic Libya
ar-ma Arabic Morocco
ar-om Arabic Oman
ar-qa Arabic Qatar
ar-sa Arabic Saudi Arabia
ar-sy Arabic Syria
ar-tn Arabic Tunisia
ar-ye Arabic Yemen
as-in Assamese India
az-az Azerbaijani Azerbaijan
ba-ru Bashkir Russia
be-by Belarusian Belarus
bg-bg Bulgarian Bulgaria
bn-bd Bengali Bangladesh
bn-in Bengali India
bo-cn Tibetan China
br-fr Breton France
bs-ba Bosnian Bosnia and Herzegovina
ca-es Catalan Spain
co-fr Corsican France
cs-cz Czech Czechia
cy-gb Welsh United Kingdom
da-dk Danish Denmark
de-at German Austria
de-ch German Switzerland
de-de German Germany
de-li German Liechtenstein
de-lu German Luxembourg
dv-mv Divehi Maldives
el-gr Greek, Modern Greece
en-au English Australia
en-bz English Belize
en-ca English Canada
en-gb English United Kingdom
en-ie English Republic of Ireland
en-in English India
en-jm English Jamaica
en-my English Malaysia
en-nz English New Zealand
en-ph English Philippines
en-sg English Singapore
en-tt English Trinidad and Tobago
en-us English United States of America
en-za English South Africa
en-zw English Zimbabwe
es-ar Spanish Argentina
es-bo Spanish Bolivia
es-cl Spanish Chile
es-co Spanish Colombia
es-cr Spanish Costa Rica
es-do Spanish Dominican Republic
es-ec Spanish Ecuador
es-es Spanish Spain
es-gt Spanish Guatemala
es-hn Spanish Honduras
es-mx Spanish Mexico
es-ni Spanish Nicaragua
es-pa Spanish Panama
es-pe Spanish Peru
es-pr Spanish Puerto Rico
es-py Spanish Paraguay
es-sv Spanish El Salvador
es-us Spanish United States of America
es-uy Spanish Uruguay
es-ve Spanish Venezuela
et-ee Estonian Estonia
eu-es Basque Spain
fa-ir Persian Iran
fi-fi Finnish Finland
fo-fo Faroese Faroe Islands
fr-be French Belgium
fr-ca French Canada
fr-ch French Switzerland
fr-fr French France
fr-lu French Luxembourg
fr-mc French Monaco
fy-nl Western Frisian Netherlands
ga-ie Irish Republic of Ireland
gd-gb Gaelic United Kingdom
gl-es Galician Spain
gu-in Gujarati India
ha-ng Hausa Nigeria
he-il Hebrew Israel
hi-in Hindi India
hr-ba Croatian Bosnia and Herzegovina
hr-hr Croatian Croatia
ht-ht Haitian Haiti
hu-hu Hungarian Hungary
hy-am Armenian Armenia
id-id Indonesian Indonesia
ig-ng Igbo Nigeria
ii-cn Sichuan Yi China
is-is Icelandic Iceland
it-ch Italian Switzerland
it-it Italian Italy
iu-ca Inuktitut Canada
ja-jp Japanese Japan
jv-id Javanese Indonesia
ka-ge Georgian Georgia
kk-kz Kazakh Kazakhstan
kl-gl Kalaallisut Greenland
km-kh Central Khmer Cambodia
kn-in Kannada India
ko-kr Korean South Korea
ky-kg Kirghiz Kyrgyzstan
ky-kz Kirghiz Kazakhstan
lb-lu Luxembourgish Luxembourg
lo-la Lao Laos
lt-lt Lithuanian Lithuania
lv-lv Latvian Latvia
mg-mg Malagasy Madagascar
mi-nz Maori New Zealand
mk-mk Macedonian Republic of Macedonia
ml-in Malayalam India
mn-cn Mongolian China
mn-mn Mongolian Mongolia
mr-in Marathi India
ms-bn Malay Brunei
ms-my Malay Malaysia
mt-mt Maltese Malta
my-mm Burmese Myanmar
nb-no Bokmål, Norwegian Norway
ne-np Nepali Nepal
nl-be Dutch Belgium
nl-nl Dutch Netherlands
nn-no Norwegian Nynorsk Norway
no-no Norwegian Norway
ny-mw Chichewa Malawi
oc-fr Occitan France
or-in Oriya India
pa-in Panjabi India
pl-pl Polish Poland
ps-af Pushto Afghanistan
pt-br Portuguese Brazil
pt-pt Portuguese Portugal
qu-bo Quechua Bolivia
qu-ec Quechua Ecuador
qu-pe Quechua Peru
rm-ch Romansh Switzerland
ro-ro Romanian Romania
ru-ru Russian Russia
rw-rw Kinyarwanda Rwanda
sa-in Sanskrit India
sd-in Sindhi India
sd-pk Sindhi Pakistan
se-fi Northern Sami Finland
se-no Northern Sami Norway
se-se Northern Sami Sweden
si-lk Sinhala Sri Lanka
sk-sk Slovak Slovakia
sl-si Slovenian Slovenia
sm-as Samoan American Samoa
sm-ws Samoan Samoa
sn-zw Shona Zimbabwe
so-so Somali Somalia
sq-al Albanian Albania
sr-ba Serbian Bosnia and Herzegovina
sr-me Serbian Montenegro
sr-rs Serbian Serbia
su-id Sundanese Indonesia
sv-fi Swedish Finland
sv-se Swedish Sweden
sw-ke Swahili Kenya
ta-in Tamil India
te-in Telugu India
tg-tj Tajik Tajikistan
th-th Thai Thailand
ti-er Tigrinya Eritrea
ti-et Tigrinya Ethiopia
tk-tm Turkmen Turkmenistan
tl-ph Tagalog Philippines
tn-za Tswana South Africa
tr-tr Turkish Turkey
ts-za Tsonga South Africa
tt-ru Tatar Russia
ug-cn Uighur China
uk-ua Ukrainian Ukraine
ur-pk Urdu Pakistan
uz-uz Uzbek Uzbekistan
vi-vn Vietnamese Viet Nam
wo-sn Wolof Senegal
xh-za Xhosa South Africa
yo-ng Yoruba Nigeria
zh-cn Chinese China
zh-hk Chinese Hong Kong
zh-mo Chinese Macau
zh-sg Chinese Singapore
zh-tw Chinese Taiwan
zu-za Zulu South Africa
Last updated 13 August 2019