Wikipedia:Language recognition chart

This article describes a variety of simple clues one can use to determine what language a document is written in with high accuracy.

Characters

You can recognize text in a foreign language by looking up characters specific to that language. For some reason this is often more accurate than language recognition software, which pays little attention to the characters.

ABCDEFGHIJKLMNOPQRSTUVWXYZ (Latin alphabet)
- and no other - English language, Zulu language, Japanese language in Romaji (see below), Indonesian language, Hawaiian language, Swahili language, Afrikaans language
- ÆØÅæøå - Danish language, Norwegian language
- ÅÄÖåäö - Swedish language
- ÁÐÉÍÓÚÝÞÆÖáðéíóúýþæö - Icelandic language
- ÄÖäö - Finnish language
- ÄÖÕŠŽäöõšž - Estonian language
- àéëï - Dutch language
- ĉĈĝĜĥĤĵĴŝŜŭŬ - Esperanto
- àâçéèêîïôœùû - French language
- ÄÖÜäöüß - German language
- àèìòù - Italian language
- ÁÉÍÓÚÂÊÔÀãõçáéíóúâêôà (ü Brazilian and k, w and y not in native words) - Portuguese language
- áéíñÑóúü ¡¿ - Spanish language
- ÀÇÉÈÍÓÒÚàçéèíóòú· - Catalan language
- ÁÉÍÓÖŐÚÜŰáéíóöőúüű - Hungarian language
- ĂÎÂŞŢăîâşţ - Romanian language
- çÇğıİöÖşŞüÜ - Turkish language
- ą, ć, ę, ł, ń, ó, ś, ź, ż Polish language
- ČŠŽ
  - and no other - Slovenian language
  - ĆĐ - Bosnian language, Croatian language
  - ÁĎÉĚŇÓŘŤÚŮÝáďéěňóřťúůý - Czech language
  - ÁÄĎÉÍĽĹŇÓÔŔŤÚÝáäďéíľĺňóôŕťúý - Slovak language
  - ĀĒĢĪĶĻŅŌŖŪāēģīķļņōŗū - Latvian language
  - ĄĘĖĮŲŪąęėįųū - Lithuanian language
- ả ạ ấ ầ ẩ ẫ ậ ắ ằ ẳ ẵ ặ đ ₫ ẻ ẹ ế ề ể ễ ệ ỉ ĩ ị ỏ ọ ổ ỗ ộ ơ ớ ờ ở ỡ ợ ủ ụ ư ứ ừ ử ữ ự ỷ ỹ ỵ – can only be Vietnamese
БДЖИЛПУЦЧШ (Cyrillic alphabet)
- ЙЩЬЮЯ
  - ҐЄІЇ - Ukrainian language
  - Ъ - Bulgarian language
    - ЁЭЫ - Russian language
      - Ў, І instead of И - Belarusian language
- ЉЊЏ (Vuk Karadzic's reform)
  - ЋЂ - Serbian language
  - ЃЌЅ - Macedonian language
- ЅЋѸѲѠЩЪЬҌЮЯѦѪѮѰѴ - Old Church Slavonic
- In Transnistria, Romanian is written in Cyrillic characters
ΔΘΛΨΩαβγδεζηθικλμνξπρςστυφχψω (Greek Alphabet)
- Greek language
אבגדהוזחטיכלמנסעפצקרשת (Hebrew alphabet)
- and maybe some odd dots and lines - Hebrew language
- Yiddish
- Ladino
الصفحة الرئيسية - Arabic alphabet
- Arabic, Persian, Malay (Jawi), Kurdish, Panjabi, Pashto, Sindhi, Urdu, others
日本語勉強 - East Asian Languages
- and no other - Chinese language
- with あいうえお Hiragana and/or アイウエオ Katakana - Japanese language
- with characters like 위키백과에 - Korean language
- Vietnamese uses Latin alphabet – see above
ㄅㄆㄇㄈㄉㄊㄋㄌㄍㄎㄏ etc. -- ㄓㄨㄧㄋㄈㄨㄏㄠ (Zhuyin)
- ㄪㄫㄬ -- not Mandarin
Ա Բ Գ Դ Ե Զ Է Ը Թ Ժ Ի Լ Խ Ծ Կ Հ Ձ Ղ Ճ Մ Յ Ն Շ Ո Չ Պ Ջ Ռ Ս Վ Տ Ր Ց Ւ Փ Ք Օ Ֆ Armenian language
ა ბ გდ ევ ზ ჱ თ ი კ ლ მ ნ ჲ ო პ ჟ რ ს ტ ჳ უ ფ ქ ღ ყ შ ჩ ც ძ წ ჭ ხ ჴ ჯ ჰ ჵ ჶ ჷ ჸ Georgian alphabet

You can also recognise languages (especially those written in Latin text) by looking for common words / letter patterns.

e.g.

Latin alphabet (possibly extended)

Romance languages

Lots of Latin roots.

English

words: an, in, on, the, that

French

words: de, la, le, du, des, il, et;
words ending in -x, especially -aux or -eux;
many apostrophised contractions, i.e. words beginning with l' or d'
accented letters: à â ç è é ê î ô û, rarely ë ï, but never á í ì ó ò ú, and ù only in the word où

Spanish

characters: ¿ ¡ (inverted question and exclamation marks), ñ
word endings: -o, -a, -ción, -miento, -dad
angle quotation marks: « » (though "curly-Q" quotation marks are also used)

Italian

Almost every word ends in a vowel. Exceptions include non, il, per, con.
Grave accent (e.g., on à) almost always occurs in the last letter of words.

Catalan

character combination "l·l"
word endings: -o, -a, -es, ció, -tat

Romanian

characters: ă â î ş ţ
words: şi, de, la, a, ai, ale, alor, cu
word endings: -a, -ă, -u, -ul, -ţie (or -ţiune), -ment, -tate
Note that Romanian is sometimes written online with no diacritics, making it harder to identify

Portuguese

Common one-letter words: a, à, e, é, o
Common two-letter words: ao, as, às, da, de, do, os, um
Common three-letter words; aos, das, dos, ele, ela, não, por, que, uma, ums
Common endings: -ção, -ções, -dade
Most singular words end in vowels. Other singular words end in l, m, r, z
Plural words end in s

Walloon

Characters: å, é, è, ê, î, ô, û
Common digraphs and trigraphs: ai, ae, én, -jh-, tch, oe, -nn-, -nnm-, xh, ou
Common one-letter words: a, å, e, i, t', l', s', k'
Common two-letter words: al, ås, li, el, vs, ki, si, pô, pa, po, ni, èn, dj'
Common three-letter words: dji, nos, vos, les, ses, nén, rén, bén, pol, tel, mel
Common endings: -aedje, -mint, -xhmint, -ès, -ea, -ou, -owe, -yî, -åcion
apostrophes are followed by a space (preferably non breaking one), eg: l'_ome instead of l'ome.

Germanic languages

Dutch

letter sequences "ij", "aa";
words: het, op, een, voor (and compounds of voor).

German

umlauts (ä, ö, ü), eszet (ß)
common words: der, die, das, er, sie, es, ist, und, oder, aber
common endings: -en, -er, -ern, -st
long compound words
many capitalized words in the middles of sentences

Slavic languages

Polish

unusual consonant clusters "rz", "szcz", "prz", "trz";
words "i", "w";
word "się".

Czech

Visual abundance of letters "ž,š,ů,ě,ř";
words "je", "v".

Japanese in Romaji

words: "desu", "masu", "aru", "suru", esp. at end of sentences;
letters: nearly 50% vowels (a e i o u);
letters: no consonants, except "n" and "h", at end of words

Hungarian

letters Ő, Ű, ő and ű unique to Hungarian
words "a", "az", "ez", "egy", "és", "van"

Finnish

diacritics used: only ä and ö, but never õ
common words: sinä, on
common endings: -nen, -ka/-kä, -in
common letter combinations: yö, ei, äi
unusually high degree of letter duplication, both vowels and consonants

Estonian

similar to Finnish, except:
diacritics used: ä, ö, ü, õ, š, ž
words end in consonants more frequently than in Finnish

Latvian

uses diactrics: ā, ē, ī, ū č, š, ž ģ, ķ, ļ, ņ
does not have letters: q, w, y, x
very rare doubling of vowels
many words ending with the letter s
a period (.) after year numbers, for instance, 2004. gads
common words: "ir", "bija", "tika"

Vietnamese

Roman characters with many diacritical marks on vowels. See above.
Almost all written words are quite short (one syllable).
Words beginning with "ng"

VIQR

The following characters (often in combination) after vowels: ^ ( + ' ` ? ~ .
DD, Dd, or dd
The following character before punctuation: \

VNI

The digits 1-8 after vowels
The digit 9 after a D or d
The following character before numbers: \

Telex

The following characters after vowels: s f r x j
The following vowels, doubled up: a e o
The letter "w" after the following characters: a o u
DD, Dd, or dd

Minnan in Pe̍h-oē-jī

Many hyphenated words.
Roman characters with many diacritical marks on vowels. Unlike Vietnamese each character has at most one such mark.
Unusual combining characters, namely · (middle dot, always after "o") and | (vertical bar). - (macron) is also common.

Turkish

Turkish Alphabet

Lowercase: a b c ç d e f g ğ h ı i j k l m n o ö p r s ş t u ü v y z

Uppercase: A B C Ç D E F G Ğ H I İ J K L M N O Ö P R S Ş T U Ü V Y Z

Common Words

bu -- This

şu -- That

fakat -- But

Misc.

Look for word endings. Tense changes in Turkish verbs are created by adding suffixes to the end of the verb. Pluralizations occur by adding lar and ler.
- Common Tense Changes: mış muş sun
- Possessivity: im un ın in dur tır
- Example: Yapmıştır ([He] did it; "Yap" is the base verb meaning "to do", "mış" changes verb to past tense, "tır" adds possessivity, stating who did it.)
- Example: Oyunlar (Games; "Oyun" is a noun meaning "game", "lar" makes it plural.)
- Example: Meyveler (Fruits; "Meyve" is a noun meaning "fruit", "ler" makes it plural.)

Chinese Mandarin

Pinyin

See Pinyin;
You may notice numbers after words; they represent tones.

Greek

Modern Greek is written with Greek alphabet in monotonic, polytonic or atonic, either according to Demotic (Mr. Triantafilidis) grammar or Katharevousa grammar. Some people write in Greeklish (Greek with Latin script) which is either Visual-based, orthographic or phonetic or just messed-up (mixed). The only official forms of Greek language are the Monotonic and Polytonic.

Normal Modern Greek (Greek Monotonic)

words "και", "είναι";
Each multi-syllable word has one accent/tone mark (oxia): ά έ ή ί ό ύ ώ
The only other diacritic ever used is the trema: ϊ/ΐ, ϋ/ΰ, etc.

Ancient or pre-1980's Greek (Greek Polytonic)

This is Katharevousa or some mixed form of Demotiki (Triantafilidis' grammar) and Katharevousa;
You will notice several accents/tones. Examples: ~ ` and oxia (looks like 'ί);
You may also notice this: ΐ, ΰ. ϊ, ϋ etc.

Greek Atonic

Was common in some Greek media (television);
You will see Greek characters without accents/tones;
words: "και, ειναι, αυτο".

Greek in Greeklish

Automated conversion software for Greeklish->Greek conversion exists. If you notice a Greeklish text it may be useful for the Greek el.wikipedia (after conversion).
Keep in mind: in Greeklish more than one characters may be used for one letter. (example: th for theta).

Orthographic Greeklish

words "kai", "einai".

Phonetic Greeklish

words "ke", "ine";
omega appears as o;
ei, oi appear as i;
ai appears as e.

Visual-based Greeklish

omega (Ω or ω) may appear as W or w;
epsilon (E) may appear as "3";
alpha (A) may appear as "4";
theta (Θ) may appear as "8";
upsilon (Y) may appear as "\|/";
More than one characters may be used for one letter.

Messed-up (Mixed) Greeklish

words "kai", "eine";
combines principles of phonetic, visual-based and orthographic Greeklish according to writer's idiosyncracy;
The most commonly used form of Greeklish.

Armenian language

Armenian can be recognised by its unique 38-letter alphabet:

Ա Բ Գ Դ Ե Զ Է Ը Թ Ժ Ի Լ Խ Ծ Կ Հ Ձ Ղ Ճ Մ Յ Ն Շ Ո Չ Պ Ջ Ռ Ս Վ Տ Ր Ց Ւ Փ Ք Օ Ֆ

Georgian language

Georgian language can be recognised by its unique alphabet.

ა ბ გდ ევ ზ ჱ თ ი კ ლ მ ნ ჲ ო პ ჟ რ ს ტ ჳ უ ფ ქ ღ ყ შ ჩ ც ძ წ ჭ ხ ჴ ჯ ჰ ჵ ჶ ჷ ჸ

Add your language here

If your language has distinguishable properties and is not listed here, then you may include information about it here to help people recognise articles written in that language.

Artificial languages

Esperanto

words: de, la, al, kaj
six unique letters: ĉ Ĉ ĝ Ĝ ĥ Ĥ ĵ Ĵ ŝ Ŝ ŭ Ŭ
words ending in -o, -a, -oj, -aj, -as

Klingon

When written in the Latin alphabet Klingon has the unusual property of a distinction in case; "q" and "Q" are different letters. This causes a large number of words that look quite strange to people who aren't used to it, for example: "yIDoghQo'", "tlhIngan Hol".

Lojban

starts with "ni'o" or ".i" (or "i");
has many words like "ko'a" "pi'o" etc;
all lowercase;
usually no punctuation except for dots.