34
CHAPTER 3. XML REPRESENTATION OF REGULATIONS
79
9
the regulations. If regulations are not annotated with references in a standardized format
in advance, natural language reference tracking capabilities would be needed. As the
number and type of regulations increase, extracting the references can be even more
complicated since different regulations will have slightly different referencing styles.
Adding reference data at the time an XML regulation is created reduces the complexity
for the development of other processing systems which need to use the references.
The complexity of regulation references ranges from relatively straightforward to
complex. An example of a straightforward casual English reference is the text “as stated
in 40 CFR section 262.14(a)(2).” An example of a more complex reference is the text
“the requirements in subparts G through I of this part” (where the current part is part
265). This latter example can be converted manually into the following list of complete
references: 40.cfr.265.G, 40.cfr.265.H, and 40.cfr.265.I. However, given the large
volume of federal and state environmental regulations, such manual translation of
references is too time consuming to be practical for existing regulations. The same
problem of dealing with a huge number of natural language references has been faced by
at least one other researcher, Justin Needle, when he was working with JUSTIS, a legal
research data provider
34
. In an article on the automatic linking of legal citations, Justin
Needle writes [66]:
“The conventional method of creating hypertext links between documents
involves manually editing each document and inserting fixed links at the database
production stage. Unfortunately, there is a major problem. The JUSTIS databases
contain millions of citations which, in order to achieve the required functionality,
need to be converted into millions of corresponding hypertext links. The manual
creation of links on this scale is not really an option since link creation is a
laborious process, requiring the services of skilled, and expensive, editors. Even if
an editor is able to identify and process ten links per hour, which is optimistic,
then the human effort required will be approximately a hundred thousand hours
34
JUSTIS is available at the web address http://www.justis.com.