free!) parser is James Clark’s nsgmls, and this produces a much simpler output
format, called ESIS, which can be parsed quite straightforwardly (one also has the
beneﬁt of an SGML parse against the DTD). Two good public domain packages
use this method:
• David Megginson’ssgmlspm , written in Perl 5.
Both of these allow the user to write ‘handlers’ for every SGML element, with
plentyof access to attributes, entities, and information about the context within the
If these packages don’t meet your needs for an average SGML typesetting job, you
need the big commercial stuff.
Since HTML is simply an example of SGML, we do not needa speciﬁc system for
HTML. However, Nathan Torkington developed html2latex from the HTML parser in
NCSA’s Xmosaic package. The program takes an HTML ﬁle and generates a LaTeX
ﬁle from it. The conversion code is subject to NCSA restrictions, but the whole source
is available on CTAN.
Michel Goossens and Janne Saarela published a very useful summary of SGML,
and of public domain tools for writing and manipulating it, in TUGboat 16(2).
101 Conversion from (La)TeX to HTML
TeX and LaTeX are well suited to producing electronically publishable documents.
However, it is important to realize the difference between page layout and functional
markup. TeX is capable of extremely detailed page layout; HTML is not, because
HTML is a functional markup language not a page layout language. HTML’s exact
rendering is not speciﬁed by the document that is published but is, to some degree, left
to the discretion of the browser. If you require your readers to see an exact replication
of what your document looks like to you, then you cannot use HTML and you must use
some other publishing format such as PDF. That is true for any HTML authoringtool.
TeX’s excellent mathematical capabilities remain a challenge in the business of
conversion to HTML. There are only two generally reliable techniques for generating
mathematics on the web: creating bitmaps of bits of typesetting that can’t be translated,
and using symbols and table constructs. Neither technique is entirely satisfactory.
Bitmaps lead to a profusion of tiny ﬁles, are slow to load, and are inaccessible to those
withvisual disabilities. The symbol fonts offer poor coverage of mathematics, and their
use requires conﬁguration of the browser. The future of mathematical browsingmay be
brighter — seefutureWebtechnologies.
For today, possible packages are:
matics (andother “difﬁcult”things) using bitmaps. Theoriginal version was written
by Nikos Drakos for Unix systems, but the package now sports an illustrious list of
co-authors and is also available for Windows systems. Michel Goossens and Janne
Saarela published a detailed discussion of LaTeX2HTML, and how to tailor it, in
Amailing list for users may be found via
TtH a compiledprogram thatsupportseitherLaTeX orPlain TeX,andusesthe
font/table technique for representing mathematics. It is written by Ian Hutchinson,
using ﬂex. The distributionconsists of a single C source (or a compiledexecutable),
which is easy to install and very fast-running.
DVI ﬁle;ituses bitmaps for mathematics, but can also use other technologies where
appropriate. Written by Eitan Gurari, it parses the DVI ﬁle generated whenyou run
(La)TeX over your ﬁle with tex4ht’s macros included. As a result, it’s pretty robust
against the macros you include in your document, and it’s also pretty fast.
access to a LaTeX document, as well as the ability to generate mulitple output
formats (e.g. HTML, DocBook, tBook, etc.).
;it uses bitmaps for equations.
for equations (indeed its entire approach is very similar to TtH). It is written in
Objective CAML by Luc Maranget. Hevea isn’t archived onCTAN;details (includ-
ing downloadpoints) are availablevia
An interesting set of samples, including conversion of the same text by the four free pro-
grams listed above, is available at
;a linked page gives lists of pros and cons, by wayof comparison.
The World Wide Web Consortium maintains a list of “ﬁlters” to HTML, with
sections on (La)TeX and BibTeX — see
102 Other conversions to and from (La)TeX
macros, plus most eqn and some tbl preprocessor
commands. Anything fancier needs tobe donebyhand. Twostyleﬁles are provided.
There is also a man page (which converts very well to LaTeX... ). Tr2latex is an
enhanced version of the earlier troff-to-latex (which is no longer available).
for Unix systems.
There is also a converter to LaTeX by Erwin Wechtl, called rtf2latex. The latest
converter, by Ujwal Sathyam and Scott Prahl, is rtf2latex2e which seems rather
good, though development of it seems to have stalled.
Translation to RTF may be done (for a somewhat constrained set of LaTeX doc-
uments) by TeX2RTF, which can produce ordinary RTF, Windows Help RTF (as
well as HTML,conversiontoHTML). TeX2RTF is supported on various Unix
platforms and under Windows 3.1
Microsoft Word d Arudimentary(free)programforconvertingMS-WordtoLaTeXis
wd2latex, which runs on MS-DOS; it probably processes the output of an archaic
version of MS-Word (the program itself was archived in 1991).
For conversion inthe other direction, the current preferred free-software method is
• Convert LaTeX to OpenOfﬁce format, using the tex4ht command oolatex;
• open the result in OpenOfﬁce and‘save as’ a MS-Word document.
(Note that OpenOfﬁce itself is not on CTAN; see
thoughmost linux systems offer it as a ready-to-install bundle.)
tex4ht can also generate OpenOfﬁce ODT format, which may be used as an inter-
mediate to producingWord format ﬁles.
Word2TeX and TeX2Word are shareware translators fromChikriiSoftlab; positive
users’ reports have been noted (but not recently).
If cost is a constraint, the best bet is probably to use an intermediate format such
as RTF orHTML. Word outputs and reads both, so in principle this route may be
You can also use PDF as an intermediate format: Acrobat Reader for Windows
(version5.0 and later) will output rather feeble RTF that Word can read.
environment; it comes
ﬁle which deﬁnes some Excel macros to produce output in a new format.
sources are distributed with a VAX executable.
and refer/tib formats. The collectionincludes a shell script converter from BibTeX
to refer format as well. The collection is not maintained.
document to a TeX-compatible disk ﬁle. It was writtenbyPeter Flynn at University
College, Cork, Republic of Ireland.
Wilfried Hennings’ FAQ,whichdealsspeciﬁcallywithconversionsbetweenTeX-
based formats and word processor formats, offers much detail as well as tables that
allow quick comparison of features.
Agroup at Ohio State University (USA) is working ona common document format
based on SGML, with the ambition that any format could be translated to or from
this one. FrameMaker provides “import ﬁlters” to aid translation from alien formats
(presumably including TeX) to FrameMaker’s own.
refer and tib tools
Word processor FAQ (source):
103 Using TeX to read SGML or XML directly
ConTeXt (mark IV)canprocesssome*ML,toproducetypesetoutputdirectly.Details
of what can (and can not) be done, are discussed inTheConTeXtWIKI. ConTeXt
is probably the system of choice for (La)TeX users who also need to work in XML
(and friends). (Note that ConTeXt mark IV requiresLuaTeX, and should therefore be
regarded as experimental, though many people do use it successfully).
Oldersystems also manage, using no more than (La)TeX macro programming, to
process XML and the like. David Carlisle’s xmltex is the prime example; it offers a
solution fortypesetting XML ﬁles, and is still in active (though not very widespread)
One use of a TeX that can typeset XML ﬁles is as a backend processor for XSL
formatting objects, serialized as XML. Sebastian Rahtz’s PassiveTeX uses xmltex to
achieve this end.
However, modern usage would proceed via XSL or XSLT2 toproduce a formattable
104 Retrieving (La)TeX from DVI, etc.
The job just can’t be done automatically: DVI, PostScript and PDF are “ﬁnal” formats,
supposedly not susceptible to further editing — information about where things came
from has been discarded. So if you’ve lostyour (La)TeX source (or never had the source
of a document you need to work on) you’ve a serious job on your hands. In many
circumstances, the best strategy is to retype the whole document, but this strategy is
to be tempered by consideration of the size of the document and the potential typists’
If automatic assistance is necessary, it’s unlikely that any more than text retrieval
is going to be possible; the (La)TeX markup that creates the typographic effects of the
document needs to be recreated by editing.
If the ﬁle you have is in DVI format, many of the techniques forconverting(La)TeX
are likely to be problems ﬁnding included material (such as included PostScript ﬁgures,
that don’t appear in the DVI ﬁle itself), andmathematics is unlikely to convert easily.
To retrieve text from PostScript ﬁles, the ps2ascii tool (part of theghostscript
distribution) is available. One could try applying this tool to PostScript derivedfrom an
PDF ﬁle using pdf2ps (also from theghostscriptdistribution), or Acrobat Reader itself;
an alternative is pdftotext, which is distributed with xpdf.
Another avenue available to those with a PDF ﬁle they want to process is offered
by Adobe Acrobat (version 5 or later): you can tag the PDF ﬁle into an estructured
document, output thence to well-formed XHTML, and import the results into Microsoft
Word (2000 or later). From there, one may convert to (La)TeX using one of the
techniques discussed in “convertingtoandfrom(La)TeX ”.
The result will typically (at best) be poorly marked-up. Problems may also arise
from the oddity of typical TeX font encodings (notably those of the maths fonts), which
Acrobat doesn’t know how to map to its standard Unicode representation.
105 Translating LaTeX to Plain TeX
Unfortunately, no “general”, simple, automatic process is likely to succeed at this task.
See “HowdoesLaTeXrelatetoPlainTeX” for further details.
Obviously, trivial documents will translate in a trivial way. Documents that use even
relatively simple things, such as labels and references, are likely to cause trouble (Plain
TeX doesn’t support labels). While graphics are in principle covered, by the Plain TeX
Translating a document designed to work with LaTeX into one that will work with
Plain TeX is likely to amount to carefully including (or otherwise re-implementing) all
those parts of LaTeX, beyond the provisions of Plain TeX, which the document uses.
Some of this work has (in a sense) been done, in the port of the LaTeX graphics
package toPlain TeX. However, while graphics is available, other complicated packages
(notablyhyperref) are not. The aspiring translator may ﬁnd theEplainsystem a useful
source of code. (In fact, a light-weight system such as Eplain might reasonably be
adopted as analternative targetof translation, though it undoubtedly gives the user more
than the “bare minimum” that Plain TeX is designed to offer.)
J Installing (La)TeX ﬁles
106 Installing things on a (La)TeX system
Installing (or replacing)things on your (La)TeX system has the potential to be rather
complicated; thefollowingquestions attempt to provide astep-by-stepapproach, starting
from the point where you’ve decided what it is that you want to install:
• You mustﬁndtheﬁleyouneed;
• It may be necessary togeneratesomedocumentationtoread;
• You need todecidewheretoinstalltheﬁles;
• You must nowinstalltheﬁles; and ﬁnally
• You may need totidyup after the installation.
107 Finding packages to install
How didyou learn about the package?
If the information came from these FAQs, you should already have a link to the
ﬁle (there are lists of packages at the end of each answer). Click on one of the links
associated with the package, and you can get the package (which may be one ﬁle or
If you heard about the ﬁle somewhere else, it’s possible that the source told you
where to look; if not, try the CTAN searching facilities, such as
.That (rather simple) search engine can return data from a search of the
CTAN catalogue (which covers most useful packages), or data from a search of the
names of ﬁles on the archive.
Packages come in a variety of different styles of distribution; the very simplest will
actually offer just
—in this case, just download the ﬁle andgetonwith
documented source ﬁle
;thus you should search just for foo —
be visible anywhere on the archive or in the catalogue.
Since most packages are distributed in this
way, they usually occupy
their own directory on the archive. Even if that directory contains other packages, you
should download everything in the directory: as often as not, packages grouped in this
way depend oneach other, so that you really need the other ones.
Having acquired the package distribution, “unpackingLaTeXpackages” outlines
your next step.
108 Unpacking LaTeX packages
As discussedelsewhere, the ‘ordinary’ way to distribute a LaTeX package is as a pair
.If you’ve acquired such a pair, you simply
with LaTeX, and the ﬁles will appear, readyfor installation.
Other sorts of provision should ordinarily be accompanied by a
you what to do; we list a few example conﬁgurations.
Sometimes, a directory comes with a bunch of
ﬁles, but fewer (often onlyone)
ﬁles (LaTeX itself comes looking like this). If there is more than one
and in the absence of any instruction in the
ﬁle, simply process the
If you’re missing the
altogether, you needto play around until some-
thing works. Some
ﬁles are “self-extracting” — theydo without an
once you’ve processed the
has automagically appeared.
Various other oddities may appear, but the archivists aim to have
ﬁle in every
package, which should document anything out of the ordinary with the distribution.
109 Generating package documentation
We are faced witha range of “normal”provision, as wellas several oddities. One should
note that documentation of many packages is available on CTAN, without the need of
any further effort by the user — suchdocumentation can usually be browsed in situ.
However, if you ﬁnd a package that does not offer documentationon the archive, or
if you need the documentation in some other format than the archive offers, you can
usually generate the documentation yourself from what you downloadfrom the archive.
The standard mechanism, for LaTeX packages, is simply to run LaTeX on the
ﬁle, as you would any ordinary LaTeX ﬁle (i.e., repeatedly until the
warnings go away).
Avariant is that the unpacking process provides a ﬁle
;if such a
thing appears, process it in preference to the
(it seems that when the
documented LaTeX source mechanism was ﬁrst discussed, the
suggested, but it’s not widely used nowadays).
Sometimes, the LaTeX run will complain that it can’t ﬁnd
line index) and/or
(the list of change records, notas you mightimagine, a
glossary). Both types of ﬁleare processedwithspecialmakeindex style ﬁles;appropriate
makeindex -s gind package
makeindex -s gglo -o package.gls package.glo
This author ﬁnds thatthe second (the change record) is generally of limited utilitywhen
reading package documentation; it is, however, valuable if you’re part of the package
development team. If you don’t feel you need it, just leave out that step
Another common (and reasonable) trick performed by package authors is to provide
;if the ﬁle
doesn’t help, simply look around for such alternatives. The ﬁles are treated in the same
way as any “ordinary” LaTeX ﬁle.
110 Installing ﬁles “where (La)TeXcan ﬁnd them”
In the past, package documentation used always to tell you to put your ﬁles “where
LaTeX can ﬁnd them”; this was always unhelpful — if you knew where that was, you
didn’t need telling, but if you didn’t know, you were completelystuck.
Itwas from this issue thatthewhole ideaof theTDS sprang;“where toput”questions
now come down to“where’s the TDS tree?”.
We therefore answer the question by considering:
• whattreetouse, and
Once we knowthe answer to both questions, and we’ve created any directories that
are needed, we simply copyﬁles to their rightful location.
This has done what the old requirement speciﬁed: LaTeX (or whatever) can (in
principle) ﬁnd the ﬁles. However, in order that the software will ﬁnd the ﬁles, we need
to update an index ﬁle.
On a MiKTeX system, open the window
, and click on
. The job may also be done
in a command window, using the command:
TheMiKTeXdocumentationgives further details about
On a TeX Live-based system (or its predecessor teTeX, use the command
(or if that’s not available,
—they ought to be different names for the same
Having done all this, the new package will be available for use.
111 Which tree to use
In almost all cases, new material that you install shouldgo into the “local” tree of your
(La)TeX installation. (A discussion of reasons not to use the local tree appears below.)
Ona Unix(-alike) system, using TeX Live or teTeX, the rootdirectorywill be named
ask such a system where it believes a local tree should be:
Documents you may be interested
Documents you may be interested