Long-term archiving. It is the responsibility of libraries, archives and
museums to provide stewardship that endures over hardware and software
upgrades, organizational changes, and generations.
WHY IS ONLINE SCIENTIFIC DISCOURSE AT RISK?
Scholarly discourse and interaction among scientists and the public is rapidly
changing. The ephemeral nature of this online discussion leaves it at substantial risk
of being lost. Science blogging has become a major mode of scientific discourse.
The last ten years have seen significant growth in large science-focused blogging
communities and platforms. In this space, sites like ScienceBlogs, PLOSBlogs, and
Scientific American’s Blog Network are playing an important role in science
communication and may be prime targets for partnerships with digital preservation
organizations and other stakeholders. At the same time, many scientists are running
their own individual blogs, either through generic blogging platforms like
Wordpress.com and Google’s Blogger service, or through their own content
management systems. These individual blogs present more complicated issues for
selection and preservation.
A range of other novel online modes of publication have emerged, and are
continuing to emerge, which require attention. Various projects for sharing pre-
prints of articles, like SSRN, RePEc, and ArXive.org, are already developing new
However, new models of publications, like the video
Journal of Visualized Experiments (JoVE), and science podcasts present non-textual
information. These digital objects present particular risks for loss because they are
not published through traditional library acquisition channels.
Citizen Science initiatives are engaging members of the public to participate in
data collection and interpretation. Much of the work of citizen science is evident in
the collected data and reported in scholarly literature. However, a considerable
amount of important work occurs in online forums and discussion spaces. That
information will likely be an important set of source material for understanding the
role that these systems have played in the history of science. For example, much of
the work involved in the discovery of a new kind of galaxy in the Galaxy Zoo
project resulted from discussions in the project’s web forums
Much of the content that participants in the preserving online science summit thought
most valuable are also most at risk of loss because they do not clearly fall into the
existing collecting practices of libraries, museums and archives. Discussion forums
and a range of rather ephemeral websites offer considerable value as historical
records. As noted in Bora Zivkovic’s essay on science blogging, an outage on a
popular science blogging network last year underscores just how easy it would be
for a single point of failure to result in the loss of content documenting changes in
For example, see ArXive.org’s digital preservation plans with Cornel University’s Library
Cardamone, C., Schawinski, K., Sarzi, M., Bamford, S. P., Bennert, N., Urry, C. M., Lintott, C., et al. (2009).
Galaxy Zoo Green Peas: discovery of a class of compact extremely star‐forming galaxies. Monthly
Notices of the Royal Astronomical Society, 39 9(3), 1191-1205. doi:10.1111/j.1365-2966.2009.15383.x
science communication, and a diverse collection of responses and reactions to
WHY IS ONLINE SCIENTIFIC DISCOURSE VALUABLE?
Below are three kinds of value the participants identified in this content. These are
not meant to be exhaustive, but instead as a starting point for explaining why this
web content is important.
The Record of Scientific Knowledge, Discovery, and Innovation:
Much of the history of science, technology, medicine, and mathematics is built from
primary records of scientific publication and unpublished materials of scientists.
Traditionally, material has been preserved through a combination of collecting the
personal papers of scientists and their published work in books and journal articles.
With the emergence of practices like open notebook science, science blogging, and
science discussion forums a considerable amount of this content is being produced
and presented on the web. If we do not act to collect this contemporary material,
we may end up with more complete records of scientists’ unpublished notes and
personal communication from previous eras than we do from our own.
Related, the emergence of citizen science projects has resulted in some discoveries
and advances in science happening on the open web. For example, the discovery
of the green pea galaxies occurred entirely on the discussion forums that
accompany the Galaxy Zoo website. The forums, where these kinds of discussions
occur, document the process and contributions of individuals in scientific discoveries.
Changes in Scientific and Scholarly Communication:
Aside from documenting the record of science and discovery, the new media of
blogs, websites, and forums are themselves documentation of significant changes
occurring in scholarly communication. Much as work on the history of the book
documents an array of changes in culture, the history of online communication media
are themselves of considerable value in understanding science and scholarship in
contemporary society. In this respect, these sites are going to be of interest as
valuable primary sources in the history of technology, communications, and media.
Public Understanding and Perception of Science and Science Policy:
Conversations and reactions to science from members of the general public
represent one of the most exciting prospects for historians of the future to
understand science in our times. In particular, various controversies around topics
like evolution, vaccines, and climate change have stirred up an enormous amount of
online discussion. Records of these discussions will be invaluable for historians and
policy analysts for understanding and exploring public reactions and perspectives
on science. Furthermore, various pop-cultural developments that touch on science
topics (for example, videogames like Spore) are similarly likely to generate
substantive online discussion and offer potentially unique perspectives on science in
USE AND REUSABILITY IN COLLECTION DEVELOPMENT
Because the purpose of preservation is reuse, participants urged that data be as
well documented and standardized as possible. What those terms mean depends
very much upon the data and potential uses. Raw data, for example, should be in
standard formats to ease processing for pattern recognition, mining, simulation,
longitudinal studies, and so forth.
Participants also suggested there be some measure of collecting samples of records
of online scientific discourse just in case, specifically, gathering data at scale and
keeping in relatively low levels of curation to reduce costs required for cataloging
and description. This is recommended for data that seem relevant but may have no
short-term demand. For example, embracing an all-hands-on-deck approach to
documenting significant events, such as tsunamis, earthquakes, and hurricanes could
include lots of data in an archive for later analysis. It would be impossible to
predict exactly what future researchers will want access to. The Blue Ribbon Task
Force on Sustainable Digital Preservation and Access recommended the capture of
such data at a very low level of curation so that they may be discovered and
processed in the future if deemed desirable.
Simultaneously, there was consensus around the need to collect small, highly-curated
topical collections of web content focused on ensuring long-term access to small
representative sets of material in which scientists and historians of science see long-
term value. The idea here would be to ensure high levels of quality assurance for
collected content and a strong curatorial role in organizing and arranging
collections as a point of entry into the much broader swath of content.
CALLS TO ACTION
As a result of the discussion at the summit, and the following essays, we suggest four
calls to action for cultural heritage organizations.
Call for Engaging, Assisting, and Supporting Content Creators:
The scientists and science communicators who participated in the summit were eager
to learn more about how they could help to manage and steward their content.
Eventually, the personal documents of scientists often make up special collections at
libraries and archives. There is considerable value in the cultural heritage
community creating guidance materials for managing personal digital information.
Specifically, reaching out to scientists and science communicators to help them
better steward their own content can help creators self archive. The Library of
Congress provides personal archiving guidance to the general public that can be
customized and redistributed to a specific audience.
Call for Developing Relationships with Online Science Communities:
Sustainable Economics for a Digital Planet: Ensuring Long-term Access to Digital Information, p 68
The organizations or communities that host or contribute to online science projects or
discourse must care for their assets in the near term. Cultural heritage institutions
have the mission and expertise to serve as long-term stewards. Relationships at the
institutional level can be built to give guidance on preservation practices during the
life of a project and advise on future curatorial homes for data when
organizational affiliations change.
Call for Targeted Web Archive Collections:
To meet the challenge of stewarding this content, we suggest cultural heritage
organizations begin to develop focused web archive collections related to their
particular institutional goals and needs. For example, a focused special collection
on open notebook science, or a collection focused on controversies around vaccines,
or the web presence of its scientists and science centers. Cultural heritage
organizations are uniquely positioned to, based on their own particular focuses,
identify and collect around particular themes and topics that can collectively serve
as part of a distributed national and international online science collection. The case
study of U.S. National Library of Medicine’s Health and Medicine Blogs collection
provided in this report can serve as an exemplar. Also included are examples of a
series of different kinds of special collections we could see different cultural
heritage organizations developing as an appendix.
Call for Outreach to Historians and Other Researchers:
Stewardship organizations must establish a user community which values the content
they are preserving. There is not yet substantive interest from historians of science
and other researchers in online scientific discourse. While researchers and scholars
of literature and the arts have been engaged in helping develop practices around
the collection and preservation of born digital artwork and literature, there has not
been a similar reaction in the history of science community. Archivists, librarians, and
curators ought to reach out to historians of science and make them aware of the
born-digital primary resources that can be collected. Simply put, without
intervention, much of this online discourse is likely to disappear before historians of
science take an interest. Engaging professional organizations and associations for
these researchers will be a critical component in developing sound collection
approaches and policies.
The Historical Value of
Ephemeral Discussion of
FRED GIBBS, ASSISTANT PROFESSOR OF HISTORY AT
GEORGE MASON UNIVERSITY AND DIRECTOR OF
DIGITAL SCHOLARSHIP AT THE CENTER FOR HISTORY
AND NEW MEDIA
As librarians, curators, and archivists think more about archiving online science
content for future use, they are challenged to strike a practical balance between
the wealth of savable data on one hand, and the work required to make it into a
meaningful and accessible collection on the other. After all, content needs to be not
only gathered and stored, but also made useful and visible, a process that takes
substantial human work, even if heavy automation can aid in the process. This
challenge is often framed in terms of properly identifying what to collect, or
perhaps as a challenge in filtering the great mass of content from which one must
Needless to say, selection processes remain important. Even if one believes that
storage space is cheap, and simple file formats are likely to be available many
decades from now (as many already have been), content needs not only to be
collected and stored, but also to be made visible. The work of collecting,
organizing, as well as making visible and available is simply impossible given the
magnitude of digital material and increasingly limited resources to conduct these
This essay argues, from the point of view of a historian of science (and to some
extent of a digital historian), that librarians, curators, and archivists must address
the difficult value question of what content to save with three important but often
neglected considerations in mind: the varied audience for science content (e.g.,
scientists versus historians); the importance of collecting science content that departs
form what might be considered good or mainstream science, and; the changing
nature of archival use.
Science at Risk summit participants agreed that it is helpful to think of three stages
of archival life: creation, near-term, and long-term. This tripartite scheme nicely
encompasses the varied challenges of: 1) collecting from diverse sources that
employ diverse technologies; 2) making such content immediately available for
immediate research needs, and; 3) preserving it for posterity and future reference.
In addition to this scheme, we also must consider the different audiences that will
benefit at those various stages. In the near term, other scientists and perhaps policy
makers will likely be the primary audience—and thus dictate near-term strategies
both in terms of what to collect and how it should be made visible and available. In
the long term, however, historians—especially historians of science—will benefit
most. Collection development should be made with both audiences in mind. While
there is substantial overlap in the kinds of materials that each group will be
interested in, there are significant differences that must factor into collection
The disciplinarily diverse audience and presenters attending the Science at Risk
summit showed how many participants are actively creating and curating online
science content according to their varied needs and interests. Summit presenters
associated with science blogging or citizen science projects, for example,
demonstrated their distinct interest in preserving discussions about current science
issues—whether from professional scientists or science enthusiasts—with their
content ranging widely across natural philosophical discussions, methodological
questions, historical essays, or arguments about what species of bird appears in a
particular photo. Open notebook enthusiasts demonstrated their interest in
preserving a narrow but deep view of science in action. There is no doubt that all
of these constitute sources are worth saving. Such sources will be of use to scientists
(or civic scientists) struggling with similar problems; parts will be useful for historians
who want deeper insight into the messy processes of science that do not emerge
from official and polished publications.
Yet for these generators of online science content—as seemed true for many
participants at the summit—the emphasis of what was at risk leaned heavily
toward what the creators and managers of these resources, as well as those tasked
with archiving such sources, considered to be good science. There is no question that,
when considering the near term use of scientists or future historical uses to learn
about mainstream science, archives of content from publications like science blogs
and open notebooks will prove to be fantastic and largely unprecedented
Longer-term archival materials, however, are useful to a rather different audience
that does not share the same agenda as many creators of online science content.
From a historian’s perspective, it would be deeply problematic for future research
if content selectors focused on preserving a narrow—and to some extent
arbitrary—selection of content that a particular set of insiders thought was good.
Of course it is true that historians' ability to understand and interpret the past will
continue to be mediated by the stewards of our cultural artifacts: librarians,
curators and archivists who, laboring under various practical constraints, must often
save what is or will be of obvious value. This value is often determined by the
context in which it is collected. Science content, then, is likely to be collected
because it reflects upon the activities of a recognized scientific community, and is
said to constitute good science.
Documents you may be interested
Documents you may be interested