of this sound file. Both versions are stored and at a later date this can be reassessed. The artist is informed
and consulted when reformatting of their work is necessary.
Metadata: The metadata schema around the collection has been built locally by the project team and
focuses on describing the relationships between the artwork, the defining of the particular type of work
and how it relates to other entities within the database. This is incorporated into the context of the OHRM
interface. There is also technical documentation about formats and conversions carried out on the files.
No metadata manual or dictionary has been formalized to date but there are standards that have been
developed. The National Library has funded the conversion of data to its metadata system.
Data Access, Authentication, Authorisation and Security: This is a web based, open access collection
freely available to the general public. Contributions to the collection are encouraged from all sound
designers of public space in Australia.
PHASE TWO CONSULTATION
This project expressed interest in further consultation to identify and review sustainability issues around
the long term preservation of the collection and their records.
This activity has commenced and will
continue as part of the services provided by Information Services personnel in Information
The Digital Repository Coordinator contacted the NLA to commence processing for regular
archiving of the Australian Sound Design Project website:
Publisher’s copyright and disclaimer
statement was provided and the site is now being archived by Pandora. It is located at:
and will be updated at the end of August 2006.
The Kidneyome project
The Melbourne Kidneyome project forms part of the Physiome Project; a worldwide collaboration of
loosely connected research teams in New Zealand, Australia, France, the US, the UK and Denmark. The
Physiome Commission of the International Union of Physiological Sciences, IUPS, provides leadership to
the Physiome Project through its satellite and central meetings and through the University of Auckland's
IUPS Physiome Website
The focus of this audit was the Melbourne role in the “eResearch Grid Environment of Distributed Kidney
Models and Resources” project. This project
aims to establish an interactive web interface at the
international level, to a collection of distributed legacy models at all levels of kidney physiology, with, for
each curated model: documentation, physiological context, easily interpreted output, a statement of model
limitations, interactive exploration, and user-customisation of selected parameter values. Interaction with
the resource will be through a 3D-virtual-kidney graphical user interface (GUI).
Project team members consulted were:
Professor Peter Harris, Faculty IT Unit, Faculty of Medicine, Dentistry and Health Sciences,
Dr Andrew Lonie, Department of Information Systems, Faculty of Science
Professor Peter Harris - Department of Physiology, University of Melbourne
Dr Andrew Lonie - Department of Information Systems, University of Melbourne
Information taken from project proposal: D3 Outline of proposed initiative, provided by project team.
Dr Raj Buyya, Department of Computer Science and Software Engineering, University of
Dr S. Randall Thomas, Informatiques, Biologie Intégrative et Systèmes Complexes, Universite
d'Evry Val d'Essonne, France
Department Mathematics, Duke University, North Carolina, US
Department Physiology and Biophysics, SUNY Health Sciences Center, Stonybrook, New York,
Bioengineering Institute, Auckland University, New Zealand
Cornell University Medical College, Department of Medicine, New York, US
Institute of Medical Physiology, University of Copenhagen, Denmark
VPAC and APAC
Funding sources are
External grants via the collaboration
Data Management Processes
Data Acquisition: The data are currently acquired from within the collaboration. Data generated by
synchrotron experiments or CT scanners, are integrated with simulated models developed by the
collaboration and re-used by research partners. Acquisition occurs via the Grid interface (using Globus).
This grid infrastructure (illustrated
below) is being developed as part of the collaboration, and identifies
how data will be stored, distributed and accessed.
Evry, France Server
with meta data
Evry, France Server
with meta data
© 2006 Kidneyome project.
Grid architecture for coupling
distributed kidney models.
IP/Copyright of Data and scholarly output: The ownership of the data and its IP remains a matter of
discussion among the international collaboration, particularly as it relates to the larger Physiome
The diagram below was taken from the Project proposal provided by the project team.
Local processing of data is Melbourne IP as such but the new models are made available to the
collaboration. It is expected that this is acknowledged and that the IP remains with the creator but
accessible to other researchers. Raw data are accessed across the collaboration.
Data Quantities: The amount of data currently held is variable across the collaboration.
The Melbourne project maintains a small amount of data that is mostly derived/post-processed anatomical
data (<1MB – text files) but it is projected that this will increase as the project progresses over the next 2-3
years to around 100GB.
The raw data are what is important to keep long term and this is currently maintained by other members of
the collaboration. This is data produced from ‘one-off’ never to be repeated experiments in the
synchrotron. There is a lot of this data from a single experiment. It is estimated that these holdings will be
in the vicinity of 500GB by project mid-late stages.
Data storage and Backup: This is a collection of national and international significance (for researchers,
teaching and practitioners in clinical practice) and will continue to grow as the Kidneyome and Physiome
projects continue to plot and model all body systems. The Quantitative Kidney Database (QKDB
will be mirrored in Australia (as per diagram above) is public access with approximately 1,000 entries in
the database; this will grow very quickly now. This database also contains references and comments
relating to scholarly works which are accessible via interrogation of the system
. The QKDB is hosted
and maintained offshore and based at LaMI, Evry University, France
Current needs for this project are small but these will grow rapidly as the data continues to grow. Current
resources are not sufficient for the needs of the project and are in effect the researchers’ departmental
allocations. Data will need to be housed elsewhere and some of the data could be maintained off line. A
data centre/storage facility either within the institution or off site would meet the projected needs of the
Locally produced data are maintained on the Faculty servers including a back up facility. Despite a
fundamental confidence that the data are well managed by these departmental processes there are no
specific records around these processes. General information provided included:
Data are maintained on the Departmental server (Information Systems-Science). It is assumed that
this server will be under a standard back up protocol but the researcher was not aware of the
specifics of this process. There is also an expectation that data are backed up on tape but how
readily accessible this data would be if needed is unclear.
Researchers maintain a single CD backup which is created at the time when the file is originally
created. There is no maintenance or checking of these CDs.
No information available regarding offsite data backup of the derived data.
Raw data are mostly generated elsewhere in the collaboration and therefore available via the Grid
network. This would not be the case for raw data are generated and maintained by the Melbourne
It is unclear what the impact of system breakdown would have on project workflow delays and
turnaround in the case of disaster recovery.
Data formats: Open standard formats including locally produced tools and applications by the
collaboration are used in this project. Most of the data are in simple binary format. MicroCT, CellML
(XML based) is used to store and exchange computer-based mathematical models. A variety of open
source freeware software is used for analysis/post-processing of data, e.g. when rendering data into a 3D
image for presentation.
Search accessed at: http://www.lami.univ-evry.fr/~srthomas/qkdb/query/query_form.php
Le Laboratoire de Méthodes Informatiques (LaMI) – Data Processing Research Centre: http://www.lami.univ-
Metadata: The current project is looking at the development of standardised schemas for classifying
information (essentially metadata schema) for this community. VPAC is working with the project team on
the ontology generation and the taxonomy side of this. It is clear from the project’s goals that the areas
which will be the focus of the taxonomy around the kidney models themselves will include:
A statement of model limitations
Interactive exploration capabilities
customisation capabilities for selected parameter values.
The global community has metadata schema around some of the formats used to store and transfer some
data. CellML includes mathematics and metadata by leveraging existing languages, including MathML
and RDF. FieldML (XML-based) is also used. The international Physiome collaboration is currently
working on the ontology for the project looking at the different perspectives of the hierarchy including
those of the National Library of Medicine, NIH.
Data Access, Authentication, Authorisation and Security: Distributed data are accessed via the Grid; a
closed environment only available to researchers within the collaboration. Authentication is via Globus
The aim is to make the data, particularly the modelling data, widely available for research and
practitioners in the clinical context
The QKDB is open access with any user capable of submitting a query to interrogate the database.
Submitting data into the QKDB requires that the user to be an acknowledged scientist working in kidney
research or a related field. An authentication process occurs at the website: http://www.lami.univ-
and is managed in France.
Query form is located at: http://www.lami.univ-evry.fr/~srthomas/qkdb/query/query_form.php
3.2 Researcher Capabilities and Expertise
The project identified a number of capabilities across the projects audited.
Structured, collaborative data acquisition processes requiring integration of diverse data –
AUSTEHC, PARADISEC, MMIM, ICCR-Education
Grid technologies – Experimental Particle Physics, Dr Raj Buyya (Department of Computer
Science and Software Engineering)
Large dataset management - Experimental Particle Physics, Astrophysics
Digitisation, archiving and preservation of print and multimedia data - PARADISEC and
HDMS – Cultural collection management tool, locally developed and supported by AUSTEHC
OHRM - Web publishing tool for cultural collections, locally developed and supported by
Distributed virtual database/repository framework – MMIM
Database management – HILDA, MMIM
Video Analysis Research – particularly with StudioCode software - ICCR-Education
It became apparent during the project that there is limited opportunity to access information about the
research expertise that exists across the university. Much of this exchange of information occurs by
accident; often at social gathering or via loose networks among colleagues.
3.3 Sustainability considerations
Each project has independently chosen its data and metadata formats and handling procedures.
This has resulted in there being a variety of commercial and open source formats and software
being used, and consequently, few options for sharing expertise at the technical level.
The frequent loss, or threatened loss, of technical expertise due to project based funding model
which does not include sustainability considerations.
There was an unmet need for access to expertise, information and/or technology solutions raised
by six of the eleven groups
IP/Copyright of the raw data ownership varied across the groups. The ownership of data infers the onus to
maintain the data.
Six groups stated it belonged to the researcher/contributor of the data/item.
One group stated it belonged to the global collaboration.
One group stated it belonged to the instrument facility (observatory).
One group stated it belonged to the research project partners.
One group stated it belonged jointly to the researcher and the participant/patient.
One group stated it belonged to the Australian government.
There are a variety of locally produced, national and international standards in use.
The quality of metadata across groups was variable.
Biomedical groups in particular, have underdeveloped metadata schema/ontology for their data –
these are mostly under development in collaboration with international bodies.
Data are accessed in a number of ways across the groups audited. Most have or are developing distributed
models of data presentation and storage to improve access. One group has data security requirements that
prohibit electronic transfer of data.
Research data versions are not stored by all communities. The onus remains with researchers accessing
data for scholarly work to maintain their own copy of the data version that has been used for their
Authentication and Authorisation:
Groups have varying methods for the authorisation of users to access data; ranging from nothing at all
(public anonymous access) to having requirements undertaking a legally binding contract regarding data
access, use and storage.
Three groups have their data accessible via a publicly available website.
Two groups provided some access to their data via a public website but required authentication to
access the data itself.
Six groups had closed collections available only to project partners or researchers on
Data Storage Issues
Storage needs varied across groups. The current and projected quantities of data varied widely and the
projected storage requirements over the next ten years for these groups will be in the vicinity of 600+TB
with the two physics groups requiring the bulk of this resource. All groups are currently doing some data
management; however documentation of how the data is/should be managed throughout its life cycle tends
to be poorly assembled. Eight groups identified the need for well managed data storage facilities,
particularly for their off site back up needs and for long term preservation. Disaster recovery planning is
mostly ad hoc suggesting a reliance on the faith that backed up data will be accessible and that project
workflow will not be greatly affected should disaster strike.
Three projects have their data managed offsite by an external store, two by APAC and one
Five projects use Faculty servers, and
Five projects manage their own server for storage (some in addition to using Faculty resources).
Ten of the eleven groups stated the desire to store and preserve some of their data indefinitely. Eight of
these projects do not have specific strategies in place for this preservation.
Sustainability Risk Factors
Back up and disaster recovery protocols are not well documented. Four projects appear to be taking some
levels of risk with their current practice for some aspect of the dataset management, e.g. no managed off
site back up.
Four projects identified a concerning lack of financial sustainability with short term project funding
4. Discussion and recommendations
In addition to the specific findings for each group audited, the project findings also provide information
about more general sustainability of data management practices. Meeting the needs of the researchers
interviewed will take resources, and at present much is left to the academic department; often leading to
either no or limited action or to 'reinventing the wheel' and resulting in a less than efficient institutional
response to eResearcher needs.
These findings point to a number of issues that can help to inform an e-research strategy for the university.
Eight recommendations have been formulated for consideration by key stakeholders.
4.1 The importance of an institution-wide strategy for eResearch.
The findings from this project reinforce the work of Professor Geoff Taylor, Ms Linda O’Brien and the
eResearch Advisory Group, identifying the need for an institution-wide strategy to progress and manage
eResearch engagement and support. In particular, the findings demonstrate that when it comes to digital
data management, there is variable capability among our research communities to comply with the
University’s Policy on the Management of Research Data and Records
and the (consultation draft)
Australian Code for the Responsible Conduct of Research.
Data management, including access,
discovery and storage, must be a fundamental component of such an institution-wide strategy. A broad
eResearch strategy can also position the University to meet the challenges of the Research Quality and
Research Accessibility Frameworks
Recommendations three to eight below provide some of the essentials for such strategic planning. The
Research and Research Training Committee (R&RT) would provide the governance for enabling its
: That the University develops a strategy that broadly addresses the
policy, infrastructure, support and training needs of eResearch.
: That the University’s R&RT Committee consider forming a
subcommittee to provide governance for enabling eResearch at the university. This
committee should have broad representation and include Information Services and
4.2 A lack of information policies and guidelines
There is a lack of best practice guidelines and policy statements available to support researchers with their
data management decision making processes. The lack of shared language and terminology around many
aspects of data and its management suggests the importance for all policies and guidelines to include clear
definitions of concepts and terms used.
Areas of need include:
Implementation of research record keeping principles and requirements.
Data management for short term sustainability and long term preservation.
Metadata standards, principles and systems:
o Across the discipline divide.
o For raw and processed research data.
o For web presentations.
o For other scholarly works.
Authentication and authorisation standards and systems for access and storage of scholarly IP.
Recommendation 3 – that Information Services initiate a consultative process for the
development of appropriate guidelines and, where relevant, policy statements, to
support researchers with the management of their research data and records.
4.3 Absence of a coordinated data management infrastructure for research.
The findings suggest a need for centrally supported flexible data management, authentication and access
systems. Groups audited were found to be managing their own data and developing their own access and
presentation systems. The need was also identified by several groups for managed data storage facilities.
Groups are supporting a variety of software. Group needs around authentication and access differed;
requiring a variety of public, local, national and international collaborator access. The need for data
management capabilities that are internationally interoperable; allowing for local storage and collections
to federate internationally was highlighted. There will also be a need to promote among the University
research community our capacity for digital data management. This emphasis on developing and
marketing ‘platforms for collaboration’ through ICT within and across institutions is a key aspect of the
National Collaborative Research Infrastructure Strategy (see capability area 16).
4.3.1 A case for centrally supported data management, authentication and access systems
A centrally managed data storage and access facility would provide a secure, backed up and sustainable
repository for data. This would allow the groups to concentrate on research rather than technology and
allow for data to be preserved beyond the life of the particular project. It would also encourage more
standardization as groups would find it easier to choose similar software and standards to other groups
(group wisdom) leading to a consolidation of expertise. A centrally supported system could also act as a
base or starting point for those research groups with no existing data infrastructure, reducing the need for
group level development efforts and leveraging central support and expertise. The institution (campus) has
been identified as “a logical nexus for the development of cyberinfrastructure … and that it is worth
considering a holistic view that would promote larger, sharable, campus systems”
4.3.2 The need for flexible infrastructure
Centrally supported infrastructure must accommodate the realities of the global collaborations of many of
our eResearch communities. An authentication and access capability must allow for public, local, national
and international collaborator access. Ideally, data management capabilities would be internationally
interoperable and allow local storage and collections to federate internationally. It is recognised that it may
not be possible for all research groups to take advantage of a centrally supported system as research
domains and collaborations may dictate standards and software usage.
Central services need to concentrate on providing the base technical support of systems and facilities and
allow users freedom to use these in whatever way they want. Ideally, central service would act as utilities
where the utility has no interest in what the user does with the service.
Recommendation 4 – To review ICT infrastructure for research, paying urgent attention
to data management infrastructure.
Workshop funded by the National Science Foundation (NSF-US) to consider effective approaches for campus research
cyberinfrastructure. Workshop report: http://middleware.internet2.edu/crcc/docs/internet2-crcc-report-200607.html
4.4 Capabilities needed by eResearchers
The audit identified expertise used in the conduct of eResearch across a variety of disciplines. The
findings show that an eResearch consultation service needs to include at a minimum, information and
access to expertise in:
Middleware development, management and support
o Data management systems
o Grid and other distributed systems
o Authentication and Authorisation management
XML advice and expertise
Metadata advice: metadata systems, schema and taxonomy development
Curation and Preservation advice and support for raw data and scholarly output
o Business case development advice and support
o Discipline based advice and support around sustainable data format selection
o Obsolescence planning – knowing what to keep and why and what to delete
– To establish a structured consultation process for eResearch support
4.5 Difficulty accessing information about eResearch activity and capability
This project has identified problems around access to information around eResearch activity, capability
and support with much information exchange occurring fortuitously. It is recommended that an
information exchange strategy be established to increase the dissemination of information about support
for eResearchers. A springboard to this process could be the delivery of an E-Research Expo in December
2006 to showcase university-wide activity in eResearch.
– To establish an Information Exchange Strategy around eResearch
Part of the information exchange strategy is a registry of research capability across the university would
facilitate the dissemination of this information. The feasibility of linking such a registry to the Themis
Research Management System should also be established; minimizing the need for duplication of data
entry by our researchers.
– To establish a Registry of e-research expertise
4.6 Implications for education and training
The skill set for researchers is evolving and some consideration should be made to identify which of those
associated with eResearch might be considered part of the essential generic skill set mix for trainee
researchers, which might be discipline specific, and which might remain in the domain of expert service
It is considered that much of the expertise listed in 4.4 are not fundamental to all research disciplines and
are therefore inappropriate for broad, in-depth education and research training. However, as research
practices are rapidly adopting information and communications technology (ICT), researchers should be
made aware of the services and expertise available to them; locally, nationally and globally. An
awareness and basic understanding of research data policies, responsibilities, collections, curation,
preservation, copyright/IP, metadata and standards must be included in a researcher and postgraduate
induction program and reinforced throughout their candidature. An essential part of such a training
program would include information about the terminology and underlying principles for managing data
Documents you may be interested
Documents you may be interested