Classifier Labels

Every character recognition, as well as KADMOS, recognizes at the lowest level character shapes. So the character shape of q can be a digit 9 or a lowercase q, depending on its position in the text line. The character shape g can be a digit 9 or a lowercase g, also depending on its position in the text line. Further on, with q and g we have an example for two character shapes for a digit 9.

For every recognized character shape KADMOS returns two Unicode© signs, called basic label. Some character shapes can have several basic labels at the same time. Examples are the character shape q - it has the labels q1 and 91, and the character shape g which has the basic labels 9_ and g1 (All the basic labels that have an equal shape can be found in the Alc file under [equivalence] moma=...).

Normally applications are not interested in character shapes. Instead they are interested in the meaning of the characters. For this purpose in KADMOS the concept of group labels is introduced (All the group labels can be found in the Alc file under [equivalence] rename=... ).

For instance the basic labels 91 and 9_ make up the group label 9_. The KADMOS integrator has both possibilities - to work with basic labels or with group labels. The default setting is the work with group labels (see also OPTIONS_BASICLABELS ). 🗏 🗏

Labels

Recognition results are returned by KADMOS with two Unicode© signs. The first Unicode© character (first label) is usually the recognized character.
See code 🗏
The second Unicode© character (second label) gives a more detailed identification (standard is blank for typescript and _ for handwriting).
In addition, there are special cases characterized by a special second character, especially in basic labels.

Overview of the second labels (🖉 = handwriting, ⌨ = machine print):

🖉 = _ 1 3 5 7 9 C S U V W X Y Z ( ; ) * / = ? { } A
⌨ =   2 4 6 8   c s u v w x y z | : , . ' - ! [ ] ^
Ligatures 🖉 = L
Ligatures ⌨ = l
Ligatures Fraktur = p
Ligatures Sütterlin = P
Greek: 🖉 = G ⌨ = g
Cyrillic: 🖉 = K ⌨ = k
Hebrew: 🖉 = H ⌨ = h
Fraktur: ⌨ = f
Arabic / Persian / Farsi:🖉 = M I ⌨ = m i
Tamil:🖉 = Z ⌨ = z
Thai:🖉 = T ⌨ = t

Some second labels can be directly linked to a special look of a character or accented character.

🖉 (|: Character as a stroke, e.g. capital letter I, small letter L without serifs.
🖉 ;:: Letters with diaresis, e.g. ä ö Ü.
🖉 ),: Letters with grave, e.g. À Ò Ù.
🖉 *.: Letters with ring or dot above, e.g. Å Ċ; Ziffern hochgestellt z.B. ² ³ und Gradzeichen.
🖉 /': Letters with acute, e.g. Á É Ó.
🖉 =-: Letters with macron, e.g. Ā Ū; lateinischer Buchstabe ETH und Divisionszeichen.
🖉 ?!: Letters with double acute, e.g. Ő Ű.
🖉 {[: Letters with hook, e.g. Ơ Ư sowie Diakritische Zeichen, Ligaturen und Silben.
🖉 }]: Diacritical signs, ligatures and syllables.
🖉 A^: Letters with circumflex, e.g. Â Ĉ Ĝ.
🖉 Cc: Letters with ogonek or cedille, e.g. Ą Ç Ę.
🖉 Ss: Letters with tilde, e.g. Ñ Ũ.
🖉 Uu: Letters with breve, e.g. Ă Ğ.
🖉 Ww: Letters with a stroke, e.g. Đ đ Ƶ ƶ and Thai ligatures, e.g. ปั.
🖉 Yy: Letters like Ae Oe.

a: Latin / Norm OCR-A
b: Latin / Norm OCR-B
c: Latin / Norm CMC7
d: Latin / Norm Semi
e: Latin / Norm E13B
f: Latin / Norm F7B

Corners and boxes

+ Corner above
- Corner below
d Boxes

A list of the signs and their labels can be found on our website.