Sean Crist's Homepage | Home > Professional > Pages > Language/Region Codes | Contact |
Suppose you have a field containing language-region codes, such as en-us or ja-jp, and you want to validate this field. You can do the following:
While there are ISO standards for language codes and for region codes, there are no defined standards for language-region combinations. Any combination of language + region is legal. mn-ie (Mongolian as spoken in Ireland) or cs-jm (Czech as spoken in Jamaica) are unlikely, but just as legal as en-us or ja-jp.
However, you might want to check your field against a reasonable list of likely combinations. This page provides a free list of this type, carefully compiled by a professional linguist who works in natural language processing. In an industry context, this list will probably suffice for most practical validation purposes.
This project grew out of a work project where I had to draw together linguistic data from many different sources. I observed that language/region codes are not always correct. I needed an automated means to detect likely errors and flag them.
For example, one source of data included the code ar-ar. The language code ar means Arabic, but the region code ar means Argentina. I am not aware of any significant community of Arabic speakers in Argentina, and feel safe in considering this to be an error. Whoever created the data may have meant "Arabic as spoken in (Saudi) Arabia", or perhaps followed the model of codes such as de-de or fr-fr where the language and region codes happen to be the same.
I looked around for a suitable list to use for validation purposes, but was somewhat surprised that I could not find one. I could see from questions on sites such as Stack Overflow that others had need for such a resource. So, I decided to put together such a list myself—not just for my own immediate purposes, but rather to solve the problem generally. My aim is to include most combinations which have some likelihood of appearing in a major international commercial software product.
There can obviously be no definitive list of "likely" combinations. I think that the list below is a well-informed one, but it necessarily involved some individual judgment.
Here is how I created the list:
There are some ISO 639-1 codes which don't appear in the list below. For example, ISO 639-1 has codes for Latin, Esperanto, Manx, Fulah, and Nauru. I didn't include these languages in the list, for the simple reason that they didn't appear in any of the vendor sources that I consulted listing languages supported in actual software products. The point is not to cover every language, but rather to cover languages likely to be encountered in the context of commercial software.
The ISO codes are up-to-date as of May 2018. Be aware that the ISO does occasionally make revisions to the country codes, reflecting changing political situations.
To the best of my knowledge, the information in the list is accurate as of May 2018. However, I make no guarantees regarding the list. I also don’t guarantee that I will keep the list continuously updated to reflect changes in the ISO standards. Reports of mistakes are of course welcome.
Under U.S. copyright law, facts cannot be copyrighted. A collection of facts can sometimes be copyrighted if the author exercised substantial creativity in the selection or arrangement of facts. In the unlikely event that I created any copyrightable interest in the list, I hereby assign the list to the public domain.
Also, I confirmed that my employer is making no intellectual property claim regarding the list under the terms of my employment contract. So, you can use the list as you please.
Click here for the same list as a tab-separated text file.
Code Language Region af-za Afrikaans South Africa am-et Amharic Ethiopia ar-ae Arabic United Arab Emirates ar-bh Arabic Bahrain ar-dz Arabic Algeria ar-eg Arabic Egypt ar-iq Arabic Iraq ar-jo Arabic Jordan ar-kw Arabic Kuwait ar-lb Arabic Lebanon ar-ly Arabic Libya ar-ma Arabic Morocco ar-om Arabic Oman ar-qa Arabic Qatar ar-sa Arabic Saudi Arabia ar-sy Arabic Syria ar-tn Arabic Tunisia ar-ye Arabic Yemen as-in Assamese India az-az Azerbaijani Azerbaijan ba-ru Bashkir Russia be-by Belarusian Belarus bg-bg Bulgarian Bulgaria bn-bd Bengali Bangladesh bn-in Bengali India bo-cn Tibetan China br-fr Breton France bs-ba Bosnian Bosnia and Herzegovina ca-es Catalan Spain co-fr Corsican France cs-cz Czech Czechia cy-gb Welsh United Kingdom da-dk Danish Denmark de-at German Austria de-ch German Switzerland de-de German Germany de-li German Liechtenstein de-lu German Luxembourg dv-mv Divehi Maldives el-gr Greek, Modern Greece en-au English Australia en-bz English Belize en-ca English Canada en-gb English United Kingdom en-ie English Republic of Ireland en-in English India en-jm English Jamaica en-my English Malaysia en-nz English New Zealand en-ph English Philippines en-sg English Singapore en-tt English Trinidad and Tobago en-us English United States of America en-za English South Africa en-zw English Zimbabwe es-ar Spanish Argentina es-bo Spanish Bolivia es-cl Spanish Chile es-co Spanish Colombia es-cr Spanish Costa Rica es-do Spanish Dominican Republic es-ec Spanish Ecuador es-es Spanish Spain es-gt Spanish Guatemala es-hn Spanish Honduras es-mx Spanish Mexico es-ni Spanish Nicaragua es-pa Spanish Panama es-pe Spanish Peru es-pr Spanish Puerto Rico es-py Spanish Paraguay es-sv Spanish El Salvador es-us Spanish United States of America es-uy Spanish Uruguay es-ve Spanish Venezuela et-ee Estonian Estonia eu-es Basque Spain fa-ir Persian Iran fi-fi Finnish Finland fo-fo Faroese Faroe Islands fr-be French Belgium fr-ca French Canada fr-ch French Switzerland fr-fr French France fr-lu French Luxembourg fr-mc French Monaco fy-nl Western Frisian Netherlands ga-ie Irish Republic of Ireland gd-gb Gaelic United Kingdom gl-es Galician Spain gu-in Gujarati India ha-ng Hausa Nigeria he-il Hebrew Israel hi-in Hindi India hr-ba Croatian Bosnia and Herzegovina hr-hr Croatian Croatia ht-ht Haitian Haiti hu-hu Hungarian Hungary hy-am Armenian Armenia id-id Indonesian Indonesia ig-ng Igbo Nigeria ii-cn Sichuan Yi China is-is Icelandic Iceland it-ch Italian Switzerland it-it Italian Italy iu-ca Inuktitut Canada ja-jp Japanese Japan jv-id Javanese Indonesia ka-ge Georgian Georgia kk-kz Kazakh Kazakhstan kl-gl Kalaallisut Greenland km-kh Central Khmer Cambodia kn-in Kannada India ko-kr Korean South Korea ky-kg Kirghiz Kyrgyzstan ky-kz Kirghiz Kazakhstan lb-lu Luxembourgish Luxembourg lo-la Lao Laos lt-lt Lithuanian Lithuania lv-lv Latvian Latvia mg-mg Malagasy Madagascar mi-nz Maori New Zealand mk-mk Macedonian Republic of Macedonia ml-in Malayalam India mn-cn Mongolian China mn-mn Mongolian Mongolia mr-in Marathi India ms-bn Malay Brunei ms-my Malay Malaysia mt-mt Maltese Malta my-mm Burmese Myanmar nb-no Bokmål, Norwegian Norway ne-np Nepali Nepal nl-be Dutch Belgium nl-nl Dutch Netherlands nn-no Norwegian Nynorsk Norway no-no Norwegian Norway ny-mw Chichewa Malawi oc-fr Occitan France or-in Oriya India pa-in Panjabi India pl-pl Polish Poland ps-af Pushto Afghanistan pt-br Portuguese Brazil pt-pt Portuguese Portugal qu-bo Quechua Bolivia qu-ec Quechua Ecuador qu-pe Quechua Peru rm-ch Romansh Switzerland ro-ro Romanian Romania ru-ru Russian Russia rw-rw Kinyarwanda Rwanda sa-in Sanskrit India sd-in Sindhi India sd-pk Sindhi Pakistan se-fi Northern Sami Finland se-no Northern Sami Norway se-se Northern Sami Sweden si-lk Sinhala Sri Lanka sk-sk Slovak Slovakia sl-si Slovenian Slovenia sm-as Samoan American Samoa sm-ws Samoan Samoa sn-zw Shona Zimbabwe so-so Somali Somalia sq-al Albanian Albania sr-ba Serbian Bosnia and Herzegovina sr-me Serbian Montenegro sr-rs Serbian Serbia su-id Sundanese Indonesia sv-fi Swedish Finland sv-se Swedish Sweden sw-ke Swahili Kenya ta-in Tamil India te-in Telugu India tg-tj Tajik Tajikistan th-th Thai Thailand ti-er Tigrinya Eritrea ti-et Tigrinya Ethiopia tk-tm Turkmen Turkmenistan tl-ph Tagalog Philippines tn-za Tswana South Africa tr-tr Turkish Turkey ts-za Tsonga South Africa tt-ru Tatar Russia ug-cn Uighur China uk-ua Ukrainian Ukraine ur-pk Urdu Pakistan uz-uz Uzbek Uzbekistan vi-vn Vietnamese Viet Nam wo-sn Wolof Senegal xh-za Xhosa South Africa yo-ng Yoruba Nigeria zh-cn Chinese China zh-hk Chinese Hong Kong zh-mo Chinese Macau zh-sg Chinese Singapore zh-tw Chinese Taiwan zu-za Zulu South Africa
Last updated 13 August 2019 |