Adding Ispell support to UdmSearch ================================== Since version 3.0 UdmSearch does not store ispell files in SQL database like in 2.x versions and loads files from the disc. When UdmSearch is used with ispell support all words are normalized by both indexer and search.cgi. It allows to find the same words with different endings. For example, if the words "testing" or "tests" are found in the document, the word "test" will be stored by indexer instead. search.cgi will also try to find the word "test" if "testing" or "tests" is given in search query. Note that this schema loose exact search possibility, but usually reduces the size of database and makes search faster. Only suffixes are supported by now. Prefixes are usually change the word meanings, for example if somebody search for the word "tested" he hardly wants "untested" to be found. To make UdmSearch support ispell you must specify Affix and Spell commands in both indexer.conf and search.htm files. Note that search time ispell support is not implemented in PHP and Perl frontend yet and works in search.cgi frontend only. Note that ispell commands MUST be given after LocalCharset definition in both search.htm and indexer.conf The format of commands: Affix Spell The first parameter of both commands is two letters language abbrevation. The second one is filename. File name are relative to UdmSearch /etc directory. Absolute paths can be also specified. Note that loading of several languages is supported at the same time. For example, Affix en en.aff Spell en en.dict Addix de de.aff Spell de de.dict will load ispell support for both English anf German languages. Ispell affixes file contains rules for words and has the following format: flag V: E > -E,IVE # As in create > creative [^E] > IVE # As in prevent > preventive flag *N: E > -E,ION # As in create > creation Y > -Y,ICATION # As in multiply > multiplication [^EY] > EN # As in fall > fallen Ispell dicitonary file contains words themselfs and has format like this: wop/S word/DGJMS wordage/S wordbook wordily wordless/P Note that if you add ispell support to already existing database, reindexing is required. In other case non-normalized words will not be found at all. Checking site against correct spelling ====================================== You may change the factors of word weight depending on whether word is found in Ispell dictionaries or not. There ars two indexer.conf commands are available (with default value 1): IspellCorrectFactor 1 IspellIncorrectFactor 1 Setting the "IspellCorrectFactor" to 0 will prevent indexer from storing words with correct spelling in database. The only incorrect words will be stored in database in this case. Then you may easily find incorrect words and correspondent URLs where those words are found. If no ispell files are used all word are considered as "incorrect". There is possible that several rare word will be found in your site which are not in ispell dictionaries. You may create the list of such words in plain text file of this format (on word per line): rare.dict: ---------- webmaster intranet ....... www http --------- You may also use ispell flags in this file if you know how to :-) This will allow not to write the same word with different endings to the rare words file, for example "webmaster" and "webmasters". You may choose the word which have the same changing rules from existing ispell dictionary and just to copy flags from it. For example, English dictionary has this line: postmaster/MS So, webmaster with MS flags will be probably OK: webmaster/MS Then copy this file to /etc directory of UdmSearch and add this file by Spell command, for example: Spell en rare.dict During next reindexing new words will be considered as words with correct spelling. The only really incorrect words will remain.