For every classifier file a related Alc file is contained in the KADMOS developer kit. This is a text file in which
all labels of the classifier can be found. The labels are divided into groups related to the ALC constants from kadmos.h
(ALC_LCALPHA
, ALC_UCALPHA
, ALC_NUMERIC
, ALC_SPECIAL
,
ALC_GESTURE
). So for example under [numeric]
in the file ttfus.alc the following lines are
to be found:
[numeric] font=0123456789 size=0:1 font+=0w121|32 02 size+=0:1 0.3:1
This means, the classifier contains under ALC_NUMERIC
the labels '0
', '1
',
'2
', ... '9
', '0W
', '12
', and '1|
', '32
',
and '02
'. All these characters are expected within their text line as an uppercase 'A
'
(size=0.0:1.0
), only the character 02 is smaller. It is the character o, often used as zero. If the second
byte normally is not a blank (' '), an additional line is filled in. So in the file hand.alc under [numeric]
an additional line ext=_
can be found. Below the lines font and size (resp. font+
and
size+
) additional lines xminmax
and yminmax
(resp. xminmax+
and
yminmax+
) sometimes can be found. There the minimum and maximum extensions of the characters in
x- and y-direction are determined, measured in pixels. Let us give an example:
[numeric] font=0123456789 size=0.0:1.0 xminmax=10:20 yminmax=15:30
The characters given under font have to be at least 10 pixels in width, but maximum 20 pixels (xminmax
).
The height of the characters is at least 15 pixels, but maximum 30 pixels (yminmax
). If one of the
conditions is not satisfied, the related character becomes a reject. From the Alc file it can be seen, which labels
are treated as characters of equal shape. These character classes can be found under [equivalence]
in the line moma=....
The classifier ttfus.rec has in its Alc file beside others the following moma equivalences:
, ' - _ . '2· * *2*4 / '|,|1|I|\ l|| 0 02O o ° 12l2
For moma equivalent character classes there is no exclusion of alternatives, if for a call to re?_do()
under option the parameter OPTIONS_EXCLUDE
is set. Otherwise only a random (and probably false) result
would be generated.
[comment] comment section of the Alc file ttfus.alc Generated by RecMaker from ttf.rec and ttfus.al0 Generation Time ##-Feb-200# 11:50 crc=0x7ab8 [general] representation=CODE_ISO_8859_1 [lcalpha] lower case section font=acemnorsuvwxz bdfhklt gpqy i j ss size=0.3:1 0:1 0.3:1.4 0.1:1 0.1:1.4 0:1.1 font+=a2 g2 i| j| l2l| size+=0.3:1 0.1:1 0.1:1 0.1:1,4 0:1 font= characters of the section with 1 byte-label. size= size (height) of the characters top:bottom, related to top and bottom of an uppercase A as 0:1. font+= characters of the section with 2 byte-label. size+= see size= [ucalpha] upper case section font=ABCDEFGHIJKLMNOPRSTUVWXYZ Q size=0:1 0:1.1 font+=I| size+=0:1 font= characters of the section with 1 byte label. size= size (height) of the characters top:bottom, related to top and bottom of an uppercase A as 0:1. font+= characters of the section with 2 byte label. size+= see size= [numeric] numbers section font=012345689 size=0:1 font+=0w121|32 02 size+=-0:1 0.3:1 font= characters of the section with 1 byte label. size= size (height) of the characters top:bottom, related to top and bottom of an uppercase A as 0:1. font+= characters of the section with 2 byte label. size+= see size= [special] special characters section font=!#%&()/?[\]{|}£¥ "' $§ *+ , -· . : ... size=0:1 0:0.2 -0.1:1.1 0.3:0.7 0.8:1.2 0.5:0.6 0.7:1 0.4:1 font+=*4e$ '2'| *2 ,| size+=0:1 0:0.2 0:0.4 0.8:1.2 font= characters of the section with 1 byte label. size= size (height) of the characters top:bottom, related to top and bottom of an uppercase A as 0:1. font+= characters of the section with 2 byte label. size+= see size= [reject] reject section labels=#X#x0<0[0]1<1[1]2<2[2]3<3[3]4<4[4]5<5[5]6<6[6]7<7[7]8<8[8]9<9[9] labels= character forms that shall be rejected [slant] section characters with a slant wit 2 byte label base=64 label=' , F J P d f j p r y / L Q b h k q \ slant=10 20 -10 -5 -20 korr='|,|/ 1|I|\ l|| base= supposed character height in pixel. label= characters of the section with 2-byte-label slant= slant of the characters listed under label. Negative numbers mean slant to the left, positive mean slant to the right. korr= Characters to use slant for discrimination [width] width description of the characters base=16 supposed character height 16 pixels blank=4 width of a standard blank prop=11 normal character width with proportional spacing prop3=! " ' '2'|, ,|1|. ; I|i|l|| . prop4=( ) - [ ] j|{ } prop5=* *2+ . = ° prop6=/ j l prop7=1 12< > \ _ i ¢ prop8=? @ I J f l2s ~ § prop9=# 020w3 5 7 a2c e o r t u z £ ä ö prop10=$ *40 2 324 6 8 9 E F L a b d h k n v x prop12=M U V X m y ¥ Ä Ö Ü prop13=w prop14=W equi=9 normal character width with equidistant spacing equi3=! " ' '2'|( ) , ,|- 1|: ; I|[ ] i|j|l|{ | } · equi6=* *2+ . / 1 12< = > \ _ i j l ¢ ° equi12=% & A B C D G H K M N O P Q R S T U V W X Y Z e$g g2m p q w y ¥ Ä Ö Ü ss width of the characters in pixels, related to the line height given in base in the case of proportional spacing equi?=width of the characters in pixels, related to the line height given in base in the case of equidistant spacing. [equivalence] section of equally shaped and equally named characters moma=, ' - _ . '2· * *2*4 / '|,|1|I|\ l|| 0 02O o ° 12l2 moma=C c i|j| Ö ö P p S s U u Ü ü V v W w X x Y y moma=Z z rename=* * *2*4 , , ,| 0 0 0w02 1 1 121| 3 3 32 I I I| a a a2 g g g2 rename=i i i| j j j| l l l2l| ' ' '2'| moma= Groups of characters with equal shape. Groups are separated by blanks, but mind that the labels itself can have a blank as second byte. rename= The first 2 byte label in every block is the label for the group of the rest in the block (basic labels). All other labels in the block are labels from characters. For example the label 9_ is the label from the group of handwritten niners. The basic label 9_ means a character with a bend the 91 means a character with a stroke. It can be worked with group labels or with basic labels. [words] Section of additional characters in words of numbers, lowercase alpha, or uppercase alpha. This is necessary for instance not to get a result of a OZ but a result of 02 in a string "02.04.99". lcalpha+=, . / : ; ' '|,|\ | ucalpha+=, . / : ; ' '|,|\ | numeric+=, . / : ; ' '|,|\ | [fontgroup] machine =[ 24aw|] latin =[ 24aw|] ocra =[a] other_norm=! " # $ % & ' '2'|( ) * *2*4+ , ,|- . .2/ 0 020w1 ... [fontgroup] Here information is provided about handprint, machine print, OCRA, or other and is used to enlarge the rec_value for handprint characters among machine print and similar cases. The blank between the square brackets means that all 2 byte labels with a blank in the second place also describe machine print characters. [segmentation] tryless="#%49HKLMNPTUVWYbdhkmnpquw„” trymore=!'()*,./1:;<>CIJV[\]cijlnrv{|} Here characters are listed which might be a good result of wrong segmentation. So a letter m might be recognized as well as the letters r and n and vice versa. That means that in these cases the engine always tries alternative segmentation.