41
Information Will Be Free. On the other hand, a great deal of today's "popular" scientific
literature, promulgated by working scientists themselves, argues that electronic archiving
is very cheap indeed. Proponents of this optimistic line of argument reason that colleges,
universities, research laboratories, and the like already support the most costly piece of
the action: that electronic infrastructure comprises computers, internal networks, and fast
links to the external world, and institutions are obligated in any case aggressively to
maintain their investments and frequently to update them. That being the case, the
reasoning is that willing authors can put high quality material "out there," leaving it for
search engines and harvesters to find. In such arguments, the value-adding services
heretofore provided by editors, reviewers, publishers, and libraries are doomed to
obsolescence and are withering away even as this report is being written.
Our guess is that the "truth" will be found to lie in between those two polarities, but of
course that guess is a little glib and perhaps even more unfounded than the above
arguments.
Even though during the planning year we were unable to make economic issues a topic of
focused inquiry, we have begun to develop specific and detailed costs for building the
YEA for e-journals in preparation for the next granting phase, and those calculations are
starting to provide us with a sense of scale for such an operation. In addition, throughout
the year, team members articulated certain general views about the economics of e-
journal archives, which we share here below.
Five Cost Life-Cycle Stages of an e-Journal Archive
The task of archiving electronic journals may be divided into five parts: the difficult part
(infrastructure development and startup), the easier part (maintenance), the sometimes
tricky part (collaborations and standards), the messy part (comprehensiveness), and the
part where it becomes difficult again (new technologies, migration).
1. The difficult part (development and startup). Initial electronic archiving efforts
involve such activities as establishing the data architecture, verifying a prototype,
validating the assumptions, and testing the adequacy of the degree of detail of
realization. The magnitude and complexity of the issues and the detail involved in
e-journal archiving are considerable. That said, it does not lie beyond the scope of
human imagination, and the big lesson we have learned in this planning year is
that it is indeed possible to get one's arms around the problem, and that several
different projects have discovered more or less the same thing in the same time
period. In fact, Yale Library is already involved in other types of archiving
projects related to several other digital initiatives. The greatest difficulties do not
lie in having to invent a new technology, nor do they lie in coping with immense
magnitudes. Rather, they reside in resolving a large, but not unimaginably large,
set of problems in an adequate degree of detail to cope with a broad range of
possibilities.
2. The easier part (ongoing maintenance and problem resolution). Where we are
encouraged is in believing that once the first structure-building steps have been
How to C#: Basic SDK Concept of XDoc.PDF for .NET XDoc.PDF for .NET supports editing PDF document metadata, like Title, Subject, Author, Creator, Producer, Keywords, Created Date, and Last Modified Date.
pdf xmp metadata; remove metadata from pdf
42
taken, the active operationalization and maintenance of an e-journal archiving
project, in partnership with one or more well-resourced and cooperative
publishers, can become relatively straightforward, particularly as standards
develop to which all parties can adhere. There will be costs, but after start-up
many of these will be increasingly marginal costs to the act of publishing the
electronic journal in the first place. For new data being created going forward,
attaching appropriate metadata and conforming to agreed standards will require
up-front investment of time and attention, especially retrofitting the first years of
journals to standards newly enacted, but once that is done, the ongoing tasks will
become more transparent. In theory, the hosting of the archive could be part and
parcel of the operational side of the publishing, and the servers and staff involved
in that case would most likely be the same people involved in the actual
publication. Alternately, as we imagine it, the long-term archiving piece of
business will be taken aboard by existing centers distributed among hosting
universities with similar synergies of costs.
3. The tricky part (collaboration and standards). Because different people and
organizations in different settings have been working on electronic preservation
issues for the last few years, there may already be appreciable numbers of similar
but nonidentical sets of solutions coming to life. Working around the world to
build sufficient communities of interest and standards to allow genuinely
interoperable archives and real standards will take a great deal of "social work."
Every archive will continue to devote some percentage of its operation to external
collaborations driven by the desire to optimize functional interoperability.
4. The messy part (comprehensiveness). There will be a fair number of journals
that either choose not to cooperate or are financially or organizationally ill-
equipped to cooperate in a venture of the scope imagined. It will be in the interest
of the library and user communities generally to identify those under-resourced or
recalcitrant organizations and find the means — financial, organizational, political
— to bring as many of them aboard as possible. It may prove to be the case that
90 percent of formal publishers' journals can be brought aboard for a modest
price, and the other 10 percent may require as much money or more to come in
line with the broader community.
5. The part where it becomes difficult — and probably very expensive — again
(migration). The solutions we now envision will sustain themselves only as long
as the current technical framework holds. When the next technological or
conceptual revolution gives people powers of presentation they now lack and that
do not allow themselves to be represented by the technical solutions we now
envision, then we will require the next revolution in archiving. The good news at
that point is that some well-made and well-observed standards and practices today
should be able to be carried forward as a subset of whatever superset of practices
need to be devised in the future. Elsevier Science has a foretaste of this in its
current, very costly migration to XML.
VB.NET PDF: Basic SDK Concept of XDoc.PDF XDoc.PDF for .NET supports editing PDF document metadata, like Title, Subject, Author, Creator, Producer, Keywords, Created Date, and Last Modified Date.
remove pdf metadata; add metadata to pdf programmatically
42
Needless to say, the above overview is somewhat simplified. For example, in our
planning year, we were surprised to find just how few of the 1,100+ Elsevier e-journal
titles carried complex information objects, compared to what we expected to find.
Complex media, data sets, and other electronic-only features exist that have yet to find
their place as regular or dominant players in e-journals, and creating ways to deal with
these types of digital information — let alone standard ways — will be costly, as are all
initial structural activities (see #1 above).
Cost-Effective Collaboration and Organization for e-Archiving
That said, it appears that willing collaborators have yet a little time both to address and to
solve the hefty problems of presenting and archiving complex digital information objects.
To archive a single e-journal or small set of journals is to do relatively little. But to
develop standards that will serve e-preservation well — let alone to facilitate access to
the most simple of e-archives that begin to bloom like a hundred flowers — all the
players will need to work together. We imagine an aggregation of archiving efforts,
whether in physical co-location or at least virtual association and coordination.
But how might such archival universes be organized?
•
Archives could be subject-based, arranged by discipline and subdiscipline. Such
an arrangement would allow some specialization of features, easier cross-journal
searching, and creation of a community of stakeholders.
•
Archives could be format-based. This arrangement would probably overlap with
subject-based arrangement in many fields, would be easier to operate and manage,
but would sacrifice at least some functionality for users — an important
consideration, given that archival retrieval is likely to occur in ways that put at
least some demand on users to navigate unfamiliar interfaces.
•
Archives could be publisher-based. Such an arrangement would offer real
conveniences at the very outset, but would need close examination to assure that
standards and interoperability are maintained against the natural interest of a
given rights holder to cling to prerogatives and privileges.
•
Archives could be nationally-based. Australia, Japan, Canada, Sweden, and other
nations could reasonably feel that they have a mission to preserve their own
scientific and cultural products and not to depend on others.
•
Archives could be organized entrepreneurially by hosts. This is probably the
weakest model, inasmuch as it would create the least coherence for users and
searching.
Each of these alternate universes has its own gravitational force and all will probably
come into existence in one form or another. Such multiplicity creates potentially severe
problems of scalability and cost. One remedy could be for official archives to operate as
C# PDF - Read Barcode on PDF in C#.NET Bookmark: Edit Bookmark. Metadata: Edit, Delete Metadata. Watermark: Add Watermark to PDF. Form Process. Data: Read, Extract Field Data. Data: Auto Fill-in Field
extract pdf metadata; change pdf metadata
38
service providers feeding other archives. Hence, a publisher's agreed archive could feed
some of its journals to one subject-based archive and others to national archives.
One way to begin to anticipate and plan for this likely multiplicity would be to create a
consortium now of interested parties to address the difficult issues such as redundancy,
certification, economic models, collection of fees, standards, and so on. No one
organization can solve these problems alone, but coordination among problem-solvers
now and soon will be very cost-effective in the long run. In OCLC's proposal to create a
digital preservation cooperative,[7] and, on a larger scale in the Library of Congress's
recent National Digital Information Infrastructure Preservation Program,[8] we may be
seeing the emergence of such movements. It may be possible to turn the Mellon planning
projects into such an overarching group of groups.
Who Will Pay and How Will They Pay?
No preservation ambitions will be realized without a sustainable economic model. As we
have noted above, the costs of archiving are much in dispute and our study will examine
those costs in great detail in the next phase. For now, it would appear that the initial costs
are high, although manageable, and the ongoing costs, at least for standard publisher's
journals, could be relatively predictable and eventually stable over time.
If that is true, then various models for paying for the archiving process suggest
themselves. This is an area about which there has been much soft discourse but in which
there has been little experience, save perhaps for JSTOR whose staff have given the topic
a great deal of thought.
Up-front payment. The most dramatic and simple way to finance the e-journal archives
would be the "lifetime annuity model": that is, users (presumably institutional entities,
such as libraries, professional societies, governments, or cultural institutions, but some
speak of enhanced "page charges" from authors or other variants on current practices) pay
for a defined quantum of storage and with that one-time payment comes an eternity of
preservation. The up front payment would be invested partly in ongoing archival
development and partly in an "endowment" or rainy day fund. The risk in this case is that
inadequate funding may lead to future difficulties of operation.
Ongoing archival fees. An "insurance premium" on the other hand could give an
ongoing supply of money, adjustable as costs change, and modest at all stages. This
reduces the risk to the provider but increases the uncertainty for the beneficiary. The
ongoing fee could be a visible part of a subscription fee or a fee for services charged by
the archive.
The traditional library model. The library (or museum or archive) picks up the tab and
is funded by third-party sources.
Fee for services operation. The archive provides certain services (special metadata,
support for specialized archives) in return for payments.
39
Hybrid. If no single arrangement seems sufficient — as it likely will not — then a hybrid
system likely will emerge, perhaps with one set of stakeholders sharing the up-front costs
while another enters into agreement to provide ongoing funding for maintenance and
potential access.
Much more could be said on the topic of who pays but at the moment most of it would be
speculation. The choice of models will influence development of methods for paying fees
and the agents who will collect those fees. Before making specific recommendations it
will be important for our project to develop a much more specific sense of real costs of
the e-archive. We imagine that we might want to develop both cost and charging models
in conjunction with other libraries, i.e., prospective users of the archive. In Yale's case the
collaborative effort might happen with our local electronic resource licensing consortium
NERL.
Contract between the Publisher and the Archive
Publishers and librarians have reluctantly grown accustomed to having licenses that
articulate the terms and conditions under which digital publications may be used. These
licenses are necessary because in their absence the uses to which digital files could be put
would be limited by restrictions (and ambiguities) on reproduction and related uses that
are intrinsic within copyright law. Licenses clarify ambiguities and often remove, or at
least significantly reduce, limitations while also acknowledging certain restrictions on
unlimited access or use.
A licensing agreement between a digital information provider and an archival repository
presents several unique challenges not generally faced in the standard licensing
agreement context between an information provider and an end-user. Discussed below
are several of the issues that must be addressed in any final agreement:
Issues
1. Term and termination. The perpetual nature of the intended agreement, even if
"forever," is in fact, a relative rather than an absolute term. One has to think in
funereal terms of "perpetual care" and of the minimum length of time required to
make an archiving agreement reasonable as to expectations and investments.
Some issues that need to be addressed are appropriate length of any such
agreement, as well as provisions for termination of the agreement and/or "handing
off" the archive to a third party. Underlying concerns of term and termination is
the need to ensure that the parties' investments in the archive are sufficiently
protected as well as that the materials are sufficiently maintained and supported.
2. Sharing responsibility between the archive and the digital information provider.
There are elements of a service level agreement that must be incorporated into the
license because the rights and responsibilities are different in an archival
agreement than in a normal license. That is, an archive is not the same as a
traditional end-user; in many ways the archive is stepping into the shoes of the
41
digital information provider in order (eventually) to provide access to end-users.
The rights and responsibilities of the archive will no doubt vary depending on
when the material will become accessible and on whether there are any
differentiations between the level and timing of access by end-users. This issue
will have an impact on the level of technical and informational support each party
is required to provide to end-users and to each other, as well responsibility for
content — including the right to withdraw or change information in the archive —
and responsibilities concerning protecting against the unauthorized use of the
material.
3. Level and timing of access. While all licenses describe who are the authorized
users, the parties to an archival agreement must try to anticipate and articulate the
circumstances (i.e., "trigger events") under which the contents of the archive can
be made available to readers, possibly without restriction. When the information
will be transmitted to the archive and, more importantly, how that information is
made available to end-users are also critical questions. Several models have been
discussed and this may be an issue best addressed in detailed appendices
reflecting particular concerns related to individual publications.
4. Costs and fees. The financial terms of the agreement are much different from
those of a conventional publisher-user license. Though it is difficult to conceive
of one standard or agreed financial model, it is clear that an archival agreement
will have a different set of financial considerations from a "normal" license.
Arrangements must be made for the recovery of costs for services to end-users, as
well as any sharing of costs between the archive and the digital information
provider. These costs may include transmission costs, the development of archive
and end-user access software, and hardware and other costs involved in
preserving and maintaining the data.
5. Submission of the materials to the archive. The issues of format of the deposited
work ("submission") take on new considerations as there is a need for more
information than typically comes with an online or even locally-held database.
Describing the means for initial and subsequent transfers of digital information to
the archive requires a balance between providing sufficient detail to ensure all
technical requirements for receiving and understanding the material are met,
while at the same time providing sufficient flexibility for differing technologies
used in storing and accessing the materials throughout the life of the contract. One
means of dealing with the submission issues is to provide in the agreement
general language concerning the transmission of the materials, with reference to
appendices that can contain precise protocols for different materials in different
time periods. If detailed appendices are the preferred method for dealing with
submission matters, mechanisms must be developed for modifying the specifics
during the life of the agreement without triggering a formal renegotiation of the
entire contract.
40
6. Integrity of the archive. The integrity and comprehensiveness of the archive must
be considered. The contract must address the question: "If the publisher
'withdraws' a publication, is it also withdrawn from the archive?"
Progress Made
YEA and Elsevier Science have come to basic agreement on what they would be
comfortable with as a model license. In some areas alternatives are clearly available and
other archival agencies working with other publishers will choose different alternatives.
Reaching a general agreement was, however, surprisingly easy as the agreement flowed
naturally out of the year-long discussions on what we were trying to accomplish. The
current draft license is not supplied in this document because it has a number of
"unpolished" areas and some unresolved details, but it could be submitted and discussed
upon request.
The team made certain choices with regard to the contractual issues noted above:
1. Term. The team opted for an initial ten-year term with subsequent ten-year
renewals This provides the library with sufficient assurance that its investments
will be protected and assures the publisher that there is a long-term commitment.
The team also recognized that circumstances can change and has attempted to
provide for what we hope will be an orderly transfer to another archival
repository.
2. Rights and responsibilities. The agreement includes statements of rights and
responsibilities that are quite different from a traditional digital license. The
publisher agrees, among other things, to conform to submission standards. The
library agrees, among other things, to receive, maintain, and migrate the files over
time.
3. Trigger events. Discussions of "trigger events" provided some of the most
interesting, if also frustrating, aspects of the year. In the end, the only trigger
event that all completely agreed upon was that condition under which the digital
materials being archived were no longer commercially available either from the
original publisher or someone who had acquired them as assets for further
utilization. Given that it is quite hard to imagine a circumstance in which journal
files of this magnitude would be judged to have no commercial value and would
not be commercially offered, does it makes sense to maintain such an archive at
all? Will money be invested year after year as a precaution or protection against
an event that will never occur? Though the team agreed it is necessary to proceed
with long-term electronic archival agreements, clearly serious issues are at stake.
The team also identified a second side to the trigger question: if the archive were
not going to be exposed to wide use by readers, how could the archival agent
"exercise" it in order to assure its technical viability? This topic is discussed more
fully in the "Trigger Events" section of the report. Briefly here, the team was
concerned that a totally dark archive might become technically unusable over
41
time and wanted to provide agreed upon applications that would make the archive
at least "dim" and subject to some level of use, e.g., available to local authorized
users. The second, perhaps more important, notion was that there would be
archival uses that could be distinguishable from normal journal use. The team
tried to identify such uses but so far have not received the feedback from the
history of science community (for example) that we would have wished.
Therefore, "archival uses" remain more theory than reality, but at the same time
they represent a topic we are committed to exploring in the next phase of work.
An alternative would be to have the archive serve as a service provider to former
subscribers, but this changes the nature of the archive to being a "normal" host
which could be a questionable consideration. These issues are not currently
reflected in the draft license.
4. Financial terms were viewed as neutral at this time, i.e., no money would change
hands. In our current thinking, the publisher provides the files without charge and
the archival agency accepts the perpetual archiving responsibility without
financing from the publisher. Obviously, one could argue that the publisher
should be financing some part of this activity. However, in the longer term it is
probably more realistic to develop alternative financing arrangements that are
independent of the publisher.
5. Technical provisions. Early on, the team agreed on the OAIS model for
submission and subsequent activities. The license reflects this in terms of the need
to define metadata provided by the publisher. The specific metadata elements
have not yet been finalized, however. This is also relevant in defining what use
can be made by the archive of the metadata. Publishers such as Elsevier that have
secondary publishing businesses want to be sure that those businesses are not
compromised by an archive distributing abstracts for free, for example. The
model license does not yet reflect this point but it is recognized as an issue.
6. Withdrawal of content. The current draft license provides for appropriate notices
when an item is withdrawn by the publisher. The team has discussed and will
likely incorporate into the license the notion that the archive will "sequester"
rather than remove a withdrawn item.
The model license is still evolving and not yet ready for signature. However, there are no
identified points of contention — only points for further reflection and agreement on
wording. All the participants were very much pleased with the team's ability to come to
early understandings of licensing issues and to resolve some of these at the planning
stage. This success arises out of close working relationships and communications over
about a year-and-a-half of cooperative effort.
Archival Uses of Electronic Scientific Journals
As part of its work, the Yale-Elsevier team began to investigate whether and how the uses
of an archive of electronic journals would differ significantly from those of the active
product distributed by the publisher. This investigation was launched to help determine
39
what needed to be preserved and maintained in the archive; to inform the design of a
discovery, navigation, and presentation mechanism for materials in the archive; and to
determine the circumstances under which materials in the archive could be made
available for research use without compromising the publisher's commercial interests.
The group reviewed traditional archival theory and practice and began preliminary
consultations with historians of science and scholarly communication to understand past
and contemporary uses of scientific journal literature. A number of issues became
particularly significant in the group's discussions: the selection of documentation of long-
term significance, the importance of topological and structural relationships within the
content, and the importance of the archive as a guarantor of authenticity.
Selection and Appraisal
The first area in which there might be useful approaches is that of archival appraisal, i.e.,
the selection of those materials worth the resources needed for their long-term
preservation and ongoing access. Archival appraisal considers the continuing need of the
creating entity for documentation in order to carry out its mission and functions and to
maintain its legal and administrative accountability, as well as other potential uses for the
materials. These other uses generally fall into the category of support for historical
research, although there may be others such as establishing and proving the existence of
personal rights which may also be secondary to the original purpose of the documentation
in question.
Archivists also consider the context of the documentation as well as its content in
determining long-term significance. In some cases, the significance of the documentation
lies in the particular content that is recorded; the presentation of that content is not critical
to its usefulness or interpretation. The content of the documentation can be extracted, put
into other applications, and made to serve useful purposes even as it is divorced from its
original recording technology and form. In other cases, however, the role of
documentation as evidence requires that the original form of the document and
information about the circumstances under which it was created and used also be
preserved in order to establish and maintain its authenticity and usefulness.
With these selection approaches in mind, a number of issues arose in the e-journal
archiving context and in the work of the team. The first question was whether it was
sufficient for the archive to preserve and provide access to "just" the content of the
published material — primarily text and figures — in a standard format, which might or
might not be the format in which the publisher distributed the content. Preserving only
the content, insofar as that is possible, foregoes the preservation of any functionality that
controlled and facilitated the discovery, navigation, and presentation of the materials on
the assumption that functionality was of little or no long-term research interest. The
decision to preserve content only would eliminate the need to deal with changing display
formats, search mechanisms and indices, and linking capabilities.
Documents you may be interested
Documents you may be interested