© 2010 Mike Scott
The point of it...
a tool to help find out which characters are most frequent in a text or a
set of texts. The purpose could be to check out which characters are most frequent (e.g. in normal
English text the letter E followed by T will be most frequent as shown below), or it could be to check
whether your text collection contains any oddities, such as accented characters or curly
apostrophes you weren't expecting.
The first 32 codes used in computer storage of text are "control characters" such as tabs, line-feeds
and carriage-returns. A plain .txt version of a text should only contain letters, numbers,
punctuation and tabs, line-feeds and carriage-returns -- if there are other symbols you do not
recognise you may have a .txt file which is really an old WordPerfect or Word .doc in disguise.
It would enable you to discover the most used characters across languages, as in this screenshot:
For further details see http://www.lexically.net/downloads/corpus_linguistics/1984_characters.xls
How to do it
Choose one or more texts or a folder. You can type in a complete filename (including drive and
folder), and can use wildcards such as *.txt, or you can browse to find your text or folder.
If you want to study one text only, just choose one text, but you may choose a whole folderful or
more by using the "sub-folders too" option.