New feature extraction with improved recognition accuracy. It combines the angle-intersect-analysis with the proven 'mesh'-method (former AEG engine). Despite the much higher computational burden it is only 10% to 30% slower than KADMOS 3.5.
Additional fast and small classifiers (Jumbos_s_*.rec
, Ttf_s_*.rec
, Hand_s_*.rec
, ...)
for time- or memory critical recognition tasks (pen computing). Speed and accuracy of these classifiers are comparable
to KADMOS 3.5. Additional acceleration and classifier reduction is under work.
Additional labels (Cyrillic, Arabic-Indian numbers). Additional special characters (^ ¡ ¢ ¥ § © ® ¿) for hand and machine print.
The hand print character E_
has been split into three form classes:
- E_
(angular shape)
- E1
(like C with a stroke)
- E7
(like 3 mirrored)
The hand print character b_
has been split into two form classes:
- b_
(straight, simple shape)
- b3
(looped form).
Additional reject labels for critical applications
(#-#<#=#>#?#S#[#\#]#y#~
).
Description of the related character shapes is provided in our website.
The method of KADMOS's heap allocation (parameter GENERAL_HEAP_ALLOC
) has proven superior.
This method is now hard implemented, the parameter GENERAL_HEAP_ALLOC
has been removed.
FunctionAddress()
for Java and C-Sharp (C #) programmers. Version 4.0e
GetPointer()
for Java and C-Sharp (C #) programmers. Version 4.0e
re_ArrayToString(), re_StringToArray()
converts data 🗏
re_DisplayErrorText_bas()
returns the error texts stored in re_ErrorText_bas
. 🗏
re_GetErrorText_bas()
fills the data structure of errtext
. 🗏
re_SetErrorText_bas()
writes back values. 🗏
rec_get_features()
provides access to internal KADMOS features. 🗏
array_type
ARRAY_TYPE_BYTE
oder ARRAY_TYPE_WCHAR
🗏
imgtype
Specification of image data given in a structure ReImage
under data
. 🗏
options OPTIONS_...
Working with basic and group labels 🗏 🗏
prep PREP_...
🗏
prep PREP_...
🗏 ab Version 4.0i
GetRelGraph()
reads data from one structure into another structure. 🗏
GetRelGrid()
reads data from one structure into another structure. 🗏
GetRelGridParm()
reads data from one structure into another structure. 🗏
GetRelResult(), GetRepResult()
reads data from one structure into another structure. 🗏
SetRelResult(),
copies a data structure into a field of the structure
SetRepResult()RelResult
or RepResult
. 🗏
re_cloneimage()
creates a second identical ReImage
-image 🗏
re_getpixel()
returns the color of a given pixel. 🗏
re_image2clipboard()
puts an image with the ReImage
format in the clipboard. 🗏
re_SetErrorText()
writes back values. 🗏
re_writeimage()
write a ReImage
image into a BITMAP file. 🗏
re_writeimagefile()
writes an image in the file given by a file_handle
. 🗏
rec_collect_kernel(), rel_collect_kernel()
writes the images submitted for recognition into the file. 🗏
rec_filetitle(), rel_filetitle(), rep_filetitle()
returns the name of the loaded classifier. 🗏
rec_init(), rel_init(), rep_init()
loads classifiers. 🗏
rel_recset()
Transfers individual characters from the data structure RecData
to the data structure RelData
. 🗏
rel_textline(), rep_textline(), repr_textline()
occupies a string with the recognition results of one line at a time 🗏
options OPTIONS_FASTCHECK
→ OPTIONS_FAST
🗏 🗏 ab Version 4.0g
All hand print classifiers have been improved by integration of new samples. The hand print number '4_
'
has got an additional basic class '41
'. This class contains those numbers 4
that are written
like a lightning. An analog extension has been made for the hand print characters 'S_
' and 's_
'
with the new basic classes 'S5
' and 's5
'.
The code pages in kadmos.h are enumerated differently. Three new code pages are added:
- CODE_ASCII, containing 7 bit coding only (thus all ersatz representations).
- CODE_PAGE_1255 and CODE_ISO_8859_8 in preparation of a Hebrew classifier.
- The special handling of a detected code page 1252 (redirection to ISO-8859-1) has been cancelled.
To prepare multi byte representations of characters in Unicode, the size of 'rec_char
' for result strings
has been enlarged from 4 byte to 8 bates (REC_CHAR_SIZE
). Accordingly, TEXT_FORMAT_KADMOS_2BYTE
has been renamed in TEXT_FORMAT_KADMOS_MULTIBYTE
.
reparm
is now defined as pointer and allocated accordingly by re?_init()
.
Its size is returned in the new item 'labels_size
'. 'labels
' is freed by re?_end()
.
REC_LSIZE
has been deleted. As parameters now can't be simply copied (overwriting of the pointer
'labels
'!), a related copy function has been provided:
KADMOS_ERROR KADMOS_API re_copyparm (ReParm *source, ReParm *destination);
An improved Java sample has been provided for the KADMOS DLL version.
For the specification of labels now always also the representation is usable. Since 4.1b.
Together with the program chopper.exe now additionally the programs Alcstrip.exe and Alcfill.exe are shipped. Since 4.1b
re_copyparm()
allows to copy parameters of the structure ReParm
into a target structure 🗏
rel_word_value()
supports the connection of dictionaries. 🗏
rel_lineshadow()
Information about the hand / machine classification of the given image. 🗏
TEXT_FORMAT_KADMOS_2BYTE → TEXT_FORMAT_KADMOS_MULTIBYTE
Output of multi-byte identifiers.🗏 ab 4.1a
New classifiers for Fraktur (Old German) are provided. Because ligatures are an important part of Fraktur, a
corresponding concept has been developed and implemented. For general machine print two ligatures have been added,
"fi
" and "fl
". The related labels are "<w
" (fi) and ">w
" (fl).
These labels are returned by rec_do()
.
To convert this into regular text, we have provided a new function code_expand_lig()
. The functions
re?_textline()
call code_expand_lig()
internally, in this case no explicit expansion
needs to be performed.
In addition, all machine- and hand-print classifiers (jumbo*.rec
, ttf*.rec
, hand*.rec
)
were extended with the new labels “ and " (0x84
and 0x94
in code page 1252).
In the classifier mark.rec
the label ".d
" (unmarked box with digit inside) has been renamed to
"@d
". The label ".d
" now denotes the dot in the font SEMI (classifiers norm*.rec
).
New strategies for classifier computations have been developed. All classifiers have now been provided with the corresponding algorithms.
Dictionaries are now connected. Therefore a new include-file respell.h has been provided, as well as the libraries
respell_*.lib
. With the KADMOS DLL version, the new dictionary functions are included in kadmos.dll.
Respell is based on the Open Source implementation Ispell. This way the full universe of ready made Ispell dictionaries
can directly be used with KADMOS. The same holds true for the Ispell tools to create customized dictionaries
(http://www.gnu.org/software/ispell/ispell.html).
As the use of dictionaries can be configured with parameters, extended functions for parameter settings,
writing and reading have been provided:
rel_config2()
, rep_config2()
, re_writeparm2()
and re_readparm2()
.
The old functions rel_config()
, rep_config()
, re_writeparm()
and re_readparm()
can be used as before.
With the universal dictionary support, the use of trigrams makes no sense any more and has completely been cancelled. All related items and parameters have been deleted.
The new dictionary functions enable a better selection of the alternative results. Therefore the maximum number of
segmentation alternatives can be enlarged, from SEG_alt=4
(old) to SEG_alt=8
(new).
The two new macros TYPO_4_SEGALTERNATIV
and TYPO_8_SEGALTERNATIV
ensure compatibility with
older KADMOS versions. By setting of TYPO_4_SEGALTERNATIV
only a maximum of 4 segmentation alternatives
is computed. The default value is TYPO_8_SEGALTERNATIV
with a maximum of 8 segmentation alternatives.
The data structure RelSpot
and all related items have been discontinued.
A new parameter deskew_min controls de-skewing
of (the) images into horizontal position before recognition.
The blank-detection with numbers and amounts has been improved.
ReSpellParm
🗏
RecData
🗏
RelResult
🗏
RelSpot
code_expand_lig()
provides a string for a ligature identifier 🗏
re_cloneimage()
creates a second identical ReImage
image. 🗏
re_imagehandle2image()
fills the data structure given under image
. 🗏
re_rotateimage()
generates a second rotated ReImage
image. 🗏
respell_do()
Verification or improvement of results by comparison with a dictionary. 🗏
respell_end()
deactivates the loaded dictionary. 🗏
respell_filetitle()
returns the filename of the loaded dictionary. 🗏
respell_init()
loads the dictionary specified under dictionary
. 🗏
respell_lookup()
Search for the given word in the activated dictionary. 🗏
re_readparm2()
reads from an INI file. 🗏
re_writeparm2()
write parameters in an INI file. 🗏
deskew_min
Threshold value from which an erection takes place 🗏
ispell_config
Here parameters for the configuration of Ispell can be transferred. 🗏
ispell_maxlen
Limitation of the length of the words to be edited by Ispell
. 🗏
PREP_GRAYTOBIN_UNIFORM
Thresholding is not done adaptively, but with a threshold for the whole picture. 🗏
reject_char
Character to be used in the case of a rejection. 🗏 🗏
reject_level
Confidence value up to which a result character is to be output. 🗏 🗏 🗏 🗏 🗏
reject_limit
determines the confidence value from which alternatives are returned internally. 🗏
rel_alternative_maximum
returns the maximum number of word alternatives. 🗏
rel_codepage
specifies the code page for the recognition results. 🗏
rel_graph_in
An input graph passed to respell_do()
. 🗏
rel_graph_in_len
Number of elements of rel_graph_in
. 🗏
rel_graph_out
The result graph is registered here. 🗏
rel_graph_out_len
Length (number of nodes) of the result graph returned by respell_do()
. 🗏
rel_graph_out_maxlen
Maximum length (number of nodes) of the result graph. 🗏
rel_result_in
to respell_do()
passed field rel_result
. 🗏
rel_result_in_len
Number of elements of rel_result_in
. 🗏
rel_result_out
Here the corresponding results RealResult
are entered. 🗏
rel_result_out_len
The number of entered field elements of the returned result. 🗏
rel_result_out_maxlen
Maximum length (number of results) of the result field rel_result_out
. 🗏
respell_config
Configuration of respell 🚧 🗏
result_text
returned result text. 🗏
result_text_len
Length (number of characters) of the result text result_text
delivered by respell_do()
. 🗏
result_text_maxlen
Maximum number of characters in result_text
. 🗏
text_format
Output format of the text line. 🗏
The Greek-, Cyrillic-, and Fraktur classifiers are extended by the Latin character set.
The classifier Numplus.rec is extended by the Arabic and Farsi numbers (formerly called Arabic Indian numbers).
To control recognition of such mixed cases (Latin+Arabic for instance), a new parameter font has been introduced.
The macros from ALC_HAND
to ALC_MACHINE
have been removed and replaced by macros
FONT_HAND
, FONT_MACHINE
, FONT_LATIN
to FONT_THAI
, and
FONT_OCRA
to FONT_BRAILLE
as values for the new parameter font. This means a splitting of the
old parameter alc in two parameters alc
and font
and the introduction of new fonts
FONT_LATIN
to FONT_THAI
.
The name ARABIC_INDIAN has been changed into the more usual name FARSI
The classifier numplusa.rec is renamed into numpluseu.rec and numplusa.rec is renamed into numplusus.rec.
For Unicode support to all KADMOS functions with file name parameters new functions with wide character
file name parameters are introduced. This new functions are rec_winit()
,
rel_winit()
, rep_winit()
, respell_winit()
, re_wreadparm2()
,
re_wwriteparm2()
, re_wopenimagefile()
, re_wreadimage()
,
re_wwriteimage()
and re_collect_winit()
.
The functions rec_filetitle()
, rel_filetitle()
, rep_filetitle()
and respell_filetitle()
return the file name as wide characters wchar_t
when the initialisation had been is made with one of the functions re*_winit()
.
To support programms under Unicode, the KADMOS error handling also has been extended to handle Unicode.
Beside the structure re_ErrorText
a new structure re_wErrorText
has been introduced,
which handles error texts as Unicode strings. The related new functions to handle this are re_wGetErrorText()
,
re_wSetErrorText()
, re_wDisplayErrorText()
, re_wSetErrorHandler()
and
re_wGetErrorHandler()
. Accordingly, there is an additional definition of re_wErrorHandler
.
Under Windows, with missing drive and directory specification, the functions re_?readparm?()
and
re_?writeparm?()
not longer read or write from or into the Windows directory, but instead they work
in the current working directory (as they already do under Linux). So, when functions like
GetPrivateProfileString()
or WritePrivateProfileString()
are used simultaneously,
the name of the parameter file has first to be completed by a call to _fullpath()
.
The data structure ReSpellData
has been extended by new elements rel_repeat
, rep_repeat
and repr_rect
. rel_repeat
an be set to the address of the initialized data structure
RelData
which had been used for recognition of the text line before the call to respell_do()
.
In this case for all words that were not found in the dictionary respell_do()
tries a new recognition with
slightly different parameters. The same holds for rep_repeat
, when rep_do()
had been called
before the calls to respell_do()
. In this case before every respell_do()
call additionally
repr
has to be set to the address of the RepResult
data structure which holds the recognition
result of the related text line.
Generally the performance of ReSpell
has been considerably improved. Beyond that, a new interface has
been provided to use private or OEM spellcheckers directly with KADMOS. Therfore the data structure
ReSpellData
has been extended by the new elements oem_spell_lookup
,
oem_codepage
, oem_wordchars
and oem_reject_char
.
With a new interface ReSpell
has the possibility to connect a private spellchecker to KADMOS.
For it in the structure ReSpellData
are the new elements oem_spell_lookup
,
oem_codepage
, oem_wordchars
and oem_reject_char
.
How to do this is demonstrated in the new sample program OEMSpellDemo
.
The new program HashMaker provides the possibility to extract word lists and affix files from iSpell dictionaries, to extend word lists, to combine word lists, and to integrate them into iSpell dictionaries. It is now quite easy to create iSpell dictionaries from private word lists, given as simple text files.
Since version 4.3d a source strxxx_s.c
for downward compatibility is provided, which emulates the used
new functions as well as ftol2()
.
Since version 4.3d the subdirectory dotnet is renamed in cs.
kadmosdotnet (.cpp
, .h
, .dll
, *helper.h
) are repleced with
kadmos_cs7.*
respectively kadmoshelper_cs7.h
.
Ab Version 4.3d werden kadmos.bas and kadmosdemo.frm
are repleced with *_vb7.*
.
For iSpell dictionaries the option ALLOW_COMPOUND_WORDS
is set as default. So the initialization parameter
ISPELL_ALLOW_COMPOUND_WORDS
could be deleted. Now with a new parameter
RESPELL_ALLOW_COMPOUND_WORDS
the setting can be changed in the data structure ReSpellParm
under
respell_config
(since version 4.3f).
re_collect_winit()
Initialization of a file for data collection. 🗏
re_wDisplayErrorText()
show the content of re_ErrorText
and re_wErrorText
at the screen. 🗏
re_wErrorText
error handling 🗏
re_wGetErrorHandler()
returns the address of a private error function. 🗏
re_wGetErrorText()
retrieve error messages. 🗏
re_wopenimagefile()
Open image file 🗏
re_wreadimagefile()
reads an image into a structure. 🗏
re_wreadparm2()
reads from an INI file. 🗏
re_wSetErrorHandler()
use the own error handling. 🗏
re_wSetErrorText()()
writes back values. 🗏
re_writeimagefile()
writes an image into the file given by a file_handle
. 🗏
re_wwriteparm2()
write parameters to an INI file. 🗏
rec_winit()
loads classifiers. 🗏
rel_winit()
loads classifiers. 🗏
rep_winit()
loads classifiers. 🗏
respell_codepage()
internal code page of the connected dictionary. 🗏
respell_freeimages()
Freeing memory of internally collected images. 🗏 ab 4.3b
respell_reject_char()
🗏
respell_winit()
loads the dictionary specified under dictionary. 🗏
respell_wordchars()
returns the character set of the dictionary. 🗏
RecData
🗏 Version 4.3d
ReInit
Parameter für die Initialisierung eines Klassifikators. 🗏 🗏 🗏 Version 4.3d
RelData
🗏 Version 4.3d
ReParm
🗏 Version 4.3d
RepData
🗏 Version 4.3d
ReSpellData
🗏
GENERAL_RESPELLCALL
Position query 🗏 Version 4.3b
GENERAL_RECALL
Mask for RECCALL
to GENERAL_RESPELLCALL
🗏 Version 4.3b
font
includes the given font. 🗏
font_HAND
, font_MACHINE
, font_HM
, font_LATIN
,
font_FRAKTUR
, font_GREEK
, font_CYRILLIC
, font_ARABIC
,
font_FARSI
, font_HEBREW
, font_THAI
, font_LANGUAGE
,
font_OCRA
, font_OCRB, font_F7B
, font_SEMI
, font_CMC7
,
font_E13B, font_LCD
, font_BRAILLE
, font_NORM
, font_LN
, font_ALL
oem_spell_lookup
Name of the own dictionary used. 🗏
oem_codepage
The code page used for the own dictionary. 🗏
oem_reject_char
The rejection character needed for the own spellchecker. 🗏
oem__wordchars
All characters of all words of the own dictionary used. 🗏
alc
is a parameter for character set selection. 🗏
FunctionAddress()
for Java and C-Sharp (C #) programmers. Version 4.3d
GetPointer()
for Java and C-Sharp (C #) programmers. Version 4.3d
hookparm
Connection of a separate function 🗏 Version 4.3d
hWND_line_message
when set, sends rep_do()
a message to the described window.🗏 Version 4.3d
hWND_rec_finished
when set, sends rec_do()
a message to the described window. 🗏 Version 4.3d
hWND_rel_finished
when set, sends rel_do()
a message to the described window. 🗏 Version 4.3d
hWND_rep_finished
when set, sends rep_do()
a message to the described window.🗏 Version 4.3d
interna
do not change 🗏 🗏 Version 4.3d
spell_maxlen
ist renamed in respell_maxlen
Limitation of the length of Ispell
words. 🗏
ALC_HAND
, ALC_OCRA
, ALC_MACHINE
, ALC_OCRB
,
ALC_E13B
, ALC_F7B
, ALC_CMC7
, ALC_LCD
RESPELL_FAST_LOOKUP
, RESPELL_NORMAL_LOOKUP
, RESPELL_EXTENSIVE_LOOKUP
ispell_minlen
ab Version 4.3d
A huge amount of new character samples, especially accented characters, were collected and used for a new computation of the classifiers Jumbo*.rec, Hand*.rec, Ttf*.rec and Fraktur*.rec.
The classifiers Ttf*.rec have new the characters « and » and the ligature ffl.
The classifiers Fraktur*.rec were extended by the ligatures sch and ffl. .
In preparation of a Thai classifier, the Thai digits were implemented for hand print and machine print.
The letters D and M were removed from the classifiers Numplus*.rec, in hand print and in machine print.
The speed of recognition got considerably accelerated with CPUs that support SSE3 or SSE4.1.
The other Kadmos interfaces have not been changed from version 4.3.
re_ClearThreadData()
Release thread information 🗏
re_readparm3(), re_wreadparm3()
🗏
re_writeparm3(), re_wwriteparm3()
🗏
rec_accent()
Return the elements of an accented character. 🗏
rel_config3()
allows the classifier configuration. 🗏
rep_config3()
allows the classifier configuration. 🗏
RecGraph
🗏 ab Version 4.4o
RecResult, RelResult
🗏 ab Version 4.4o
ReParm
🗏 ab Version 4.4p
RecResult, RelResult
🗏 ab Version 4.4p
ReSpellParm
🗏 ab Version 4.4q
POS_LINECHECK
ab Version 4.4p
ALC_ACCENT
accent designation 🗏
ALC_ALL
All characters 🗏
GENERAL_FEATURES_ONLY
A subsequent rep_do () call then terminates after the feature building. 🗏
RESULT_FLAG_ACCENT_TOP
Set when the accent is over the affected character. 🗏
RESULT_FLAG_ACCENT_BOTTOM
Set when the accent is under the affected character. 🗏
RESULT_FLAG_ACCENT RESULT_FLAG_ACCENT_TOP | RESULT_FLAG_ACCENT_BOTTOM
🗏
RESULT_FLAG_ACCENT_START
First element of an accented character. 🗏
RESULT_FLAG_ACCENT_MEMBER
Element of an accented character. 🗏
TEXT_FORMAT_RELRESULT_INDICES
Return indexes as a short field, ended with -1 🗏