The following language resources are published on the Hunglish CDROM.

szoszablya-web2-2.0-freq.bz2

This is a large frequency count of Hungarian (based on a gigaword web corpus), see our LREC 2004 paper describing how it was collected. The four columns correspond to different levels of corpus cleaning from lowest (only duplicates eliminated) to highest (all pages with less than 96% known words eliminated).

vonyo.stemmed

This is a stemmed version of the well-known machine-readable English-Hungarian dictionary created by Attila Vonyó.

data

ispell-style aff and dic files for English and Hungarian stemming. Our method of extending ispell to stemming and morphological analysis tasks is described in a SALTMIL 2004 and an ACL 2005 Software Workshop paper.

magyarispell

Hungarian resources for the Hunmorph morphological analyzer.