The following form allows queries to be made of the Sydney Morning Herald Word Database. The database includes word frequency and density figures for the 1994 editions of the Sydney Morning Herald. The corpus contains 23440636 words in 38526 articles. It has been filtered so that only items that occur in two separate articles remain. This removes many, but not all of the typographical errors. After filtering there are 97031 items.
The database can be queried either by supplying a specific word to look up or by providing bounds on the word frequency and density. In the second case, the fields may also be let blank indicating no constraint. The database is quite large and queries can take some time to complete, so be patient.
The lines returned will contain:
Word
Number of occurrences of word
Number of contexts/articles word occurred in
Last context word occurred in
Word Density (Number of Occurrences/ Number of Contexts)
Word Frequency (Probability of occurrence - multiply by 1000000 to get occurrences per million)
Simon Dennis
Last Edited 12 May 1995.