The Adelaide Text Analysis Tool explained
Making a corpus
The best way to use the concordancing process is first to gather a collection of
articles which are relevant to the kind of writing you want to investigate. If you want to
examine research articles in a particular discipline, for example, a useful corpus would
consist of published articles from that discipline. This would allow you to search for
language features that are commonly used in this kind of writing.
The size of a corpus depends on the searches you intend doing. There are drawbacks
to having too little text in your corpus as you may not find enough examples of little-
used terms and expressions. Similarly, a corpus which is too large can result in too
many examples, especially of common words, to allow an easy evaluation of
In trials of concordancing software during the development of this package, it was
found that about 20 published journal articles, totalling around 100,000 words, made a
suitable corpus for examining the terms and language features used in writing in
particular disciplines of science.
The following steps will help you to contruct your own corpus quickly and easily:
Make sure the documents you want to use are written in current English, with
standard usage of prepositions, articles, verb tenses and other grammatical
features. This requirement can be covered by selecting articles for which at
least some of the authors are likely to be “native speakers” of English, and
ensuring that the articles are from a reputable source – check author and
publisher information, as well as the text itself, for guidance on this.
Obtain electronic copies of the articles and save only the text (sentences and
paragraphs, no page numbers, headers or footers, tables and figures), and
save them as text files (.txt format). Your sources may be web pages, PDF
documents or word processor files. See the following section, Preparing text
for concordancing, for more details on coverting text from these sources.
Save all the .txt files in a single folder on your computer.
Preparing text for concordancing
If you receive text in the form of a Microsoft
Word document, simply open the file and
save it as a ‘text only’ file. This will give it a ‘.txt’ extension. It should be saved in a
folder that you intend to use as your corpus.
If you are copying text from web pages, select all the text, copy and paste
immediately into a word processor document. If you are using a hard copy, you need
to scan the document and save it as a text only document.