Huntoken 1.5.1,  2003-2004 (c) Németh László

Hungarian (and a little bit English) raw text tokenisation 

License: CC-LGPL

Compile
-------

make
make install

(Need
- Unix environment (shell, Unix tools),
- Flex lexical analyzer generator,
- M4 macro processor.)

Usage
-----

(Need
- Unix shell, or CYGWIN on Windows
- sed)

huntoken xml_output

Options
-------

-h, --help: help
-r: only sentence boundary detection
-x: processing without hun_abbrev filter
-b: break long sentences (need for tokenising long (>4000 characters) sentences!!!)
-n: output without XML header and footer
-e: tokenize English (set English abbrevations)
-v, --version: version


Filters
-------

See flex sources, and huntoken shell program.

László Németh
nemeth@gyorsposta.hu