112
14
Think carefully how best to structure files in folders, in
order to make it easy to locate and organise files and
versions. When working in collaboration the need for an
orderly structure is even higher.
Consider the best hierarchy for files, deciding whether a
deep or shallow hierarchy is preferable. Files can be
organised in folders according to types of: data –
databases, text, images, models, sound; research
activities –interviews, surveys, focus groups; or material –
data, documentation, publications.
QUALITY ASSURANCE
Quality control of data is an integral part of all research
and takes place at various stages: during data collection,
data entry or digitisation, and data checking. It is
important to assign clear roles and responsibilities for data
quality assurance at all stages of research and to develop
suitable procedures before data gathering starts.
During data collection, researchers must ensure that the
data recorded reflect the actual facts, responses,
observations and events.
Quality control measures during data collection may
include:
• calibration of instruments to check the precision, bias
and/or scale of measurement
• taking multiple measurements, observations or samples
• checking the truth of the record with an expert
• using standardised methods and protocols for capturing
observations, alongside recording forms with clear
instructions
• computer-assisted interview software to: standardise
interviews, verify response consistency, route and
customise questions so that only appropriate questions
are asked, confirm responses against previous answers
where appropriate and detect inadmissible responses
The quality of data collection methods used strongly
influences data quality and documenting in detail how
data are collected provides evidence of such quality.
When data are digitised, transcribed, entered in a
database or spreadsheet, or coded, quality is ensured and
error avoided by using standardised and consistent
procedures with clear instructions. These may include:
• setting up validation rules or input masks in data entry
software
• using data entry screens
• using controlled vocabularies, code lists and choice lists
to minimise manual data entry
• detailed labelling of variable and record names to avoid
confusion
• designing a purpose-built database structure to
organise data and data files
During data checking, data are edited, cleaned, verified,
cross-checked and validated.
Checking typically involves both automated and manual
procedures. These may include:
• double-checking coding of observations or responses
and out-of-range values
• checking data completeness
• verifying random samples of the digital data against the
original data
• double entry of data
• statistical analyses such as frequencies, means, ranges
or clustering to detect errors and anomalous values
• peer review
Researchers can add significant value to their data by
including additional variables or parameters that widen
the possible applications. Including standard parameters
or generic derived variables in data files may substantially
increase the potential re-use value of data and provide
new avenues for research. For example, geo-referencing
data may allow other researchers to add value to data
more easily and apply the data in geographical
information systems. Equally, sharing field notes from an
interviewing project can help enrich the research context.
VERSION CONTROL AND AUTHENTICITY
A version is where a file is closely related to another file in
terms of its content. It is important to ensure that different
versions of files, related files held in different locations,
and information that is cross-referenced between files are
all subject to version control. It can be difficult to locate a
correct version or to know how versions differ after some
time has elapsed.23
A suitable version control strategy depends on whether
files are used by single or multiple users, in one or multiple
locations and whether or not versions across users or
locations need to be synchronised or not.
FORMATTING YOUR DATA
ADDING VALUE TO DATA
The Commission for Rural
Communities (CRC) often use
existing survey data to undertake
rural and urban analysis of national
scale data in order to analyse policies
related to deprivation.
In order to undertake this type of spatial analysis, original
postcodes need to be accessed and retrospectively
recoded according to the type of rural or urban
settlements they fall into. This can be done with the use
of products such as the National Statistics Postcode
Directory, which contains a classification of rural and
urban settlements in England.
The task of applying these geographical markers to
datasets can often be a long and sometimes unfruitful
process – sometimes the CRC have to go through this
process to just find that the data do not have a
representative rural sample frame.
If rural and urban settlement markers, such as the
Rural/Urban Definition for England and Wales were
included in datasets, this would be of great benefit to
those undertaking rural and urban analysis.22
CASE
STUDY