The following language resources are published on the Hunglish CDROM.
szoszablya-web2-2.0-freq.bz2
This is a large frequency count of Hungarian (based on a gigaword web corpus),
see our LREC 2004 paper describing how it
was collected. The four columns correspond to different levels of corpus
cleaning from lowest (only duplicates eliminated) to highest (all pages with
less than 96% known words eliminated).
vonyo.stemmed
This is a stemmed version of the well-known machine-readable English-Hungarian
dictionary created by Attila Vonyó.
data
ispell-style aff and dic files for English and
Hungarian stemming. Our method of extending ispell to stemming and
morphological analysis tasks is described in a SALTMIL 2004 and an ACL 2005 Software Workshop paper.
magyarispell
Hungarian resources for the Hunmorph morphological analyzer.