© 2010 Mike Scott
Note: this procedure does not affect underscores within existing <> markup.
swap tag and word except for
... converts text like
It<PP> is<VBZ> easy<JJ>
<PP>It <VBZ>is <JJ>easy
or vice-versa. In other words swapping the order of tags and words. The procedure effects a swap at
each space in the non-tagged text sequence.
Fill in the box to the right with any tags which should not be included in the swap, using commas
to separate them, for example sentence and paragraph tags such as <s>,</s>,<p>,</p>
removing all tags
... would convert The<DT><the> TreeTagger<NP><TreeTagger> is<VBZ>... into The
Treetagger is. Can plough through a copy of the whole BNC, for example, and make it
readable. If you have specified a header string it will cut the header up to that point too. Uses the
selected span for looking for the next > when it finds a <.
lemmatised using ...
... converts each file using a lemma file
. Where your source text has "she was tired" and
your lemma file has BE -> AM, WAS, WERE, IS, ARE, then you will get "she be tired" in
your converted text file. Where your source text has "Was she tired?" you'll get "Be she
... replaces every end of line line-break with a space. Preserves any true paragraph breaks, which
you must ensure are defined (default = <Enter><Enter> -- in other words two line-breaks one
after the other with no words between them).
... allows you to encrypt your text files. You supply your own password. When WordSmith
processes your text files, e.g. when running a concordance it will restore the text as needed but
otherwise the text will be unintelligible. Encrypted files get the file extension .WSencrypted. For
example, if your original is wonderful.txt the copy will be wonderful.WSencrypted.
Requires the safer copy to button above to be selected.