Storage

VideoTapes700px

I have terabytes of videotaped interviews from a European project, dozens of pseudonymised transcripts and informed consent forms. European partners need access to the files for data analysis. What's the best storage strategy for me?

  • Type of data

    Storage needs

    Storage solution

    The data which were collected are personal data.
    Extra security measures to protect it should be in place (see Security).

    High storage capacity for videos required;

    Remote access to videos and transcripts required;

    Researchers need to work on the same files simultaneously.

    Data are transmitted only in encrypted form. (see Security)

    Data for remote access is stored in cloud storage in Europe. (see Storage)

    Master copies of videos and transcripts are encrypted and backed up in the cloud and on portable hard disk and flash drives. (see Security)

    Backups locked away in different, secure locations. (see Backup)

    Consent forms and encryption keys are stored in a secure safe.

When choosing a suitable storage solution to fit your project's needs, a lot of questions need answering. For example:

  • How much storage space do I need?
  • Who needs access?
  • What precautions should I take to protect my data against loss?
  • Which storage solutions are suitable for personal data?

It is an important aspect of data management planning to determine what your storage needs are and select solutions accordingly. In the 'Adapt your DMP' section questions that need answering are covered in more detail.

Storage solutions overview

In the following, you will find an overview of different storage solutions. Factors that play a role are, for example, data sensitivity, ease of access, file size and overall data volume. Advantages and disadvantages are detailed as well as precautions you should take when working with personal (sensitive) data. Each solution closes with recommendations on what to look out for if you decide to use the solution in question.

PortableDevices

Laptops, tablets, external hard-drives, flash drives and Compact Discs

Advantages

Disadvantages/Risks

Precautions for (sensitive) personal data

  • Allow easy transport of data and files without transmitting them over the Internet. This can be especially helpful when working in the field.
  • Low-cost solution.
  • Easily lost, damaged, or stolen and may, therefore, offer an unnecessary security risk.
  • Not robust for long-term storage or master copies of your data and files.
  • Possible quality control issues due to version confusion.

Use in combination with encryption and strong password protection.

Recommendations

  • Do: use for temporary, short-term storage for non-sensitive data, e.g. in the field or to transport data and files when online transmission is not possible.
  • Do: use in combination with encryption and strong password protection, especially if working with sensitive information (see 'Security').
  • Do: conduct regular checks to ensure your device is working and that files are accessible.
  • Don’t: use for long-term storage or master copies of your data and files.
  • The General Data Protection Regulation (GDPR) only permits personal data to be stored within the EU, unless:

    • Participants consent to the data being stored in another country (this needs to be real consent i.e. a true choice);
    • There are adequate and equivalent levels of data protection in place (e.g. the US/EU Privacy Shield agreement).

    However, researchers should assess whether they really need to store the data abroad. If data does need to be stored outside the EU then information sheets and consent forms should clearly identify this and explain the reasons why this is necessitated (See 'Informed consent').

    Further guidance on sharing data outside the European Economic Area (EEA) can be found from the Information Commissioners Office.

CloudStorageOptions

E.g. Google Drive, OneDrive, Dropbox, a University’s OwnCloud, Open Science Framework and Tresorit

Advantages

Disadvantages/Risks

Precautions for (sensitive) personal data

  • Automatic backups.
  • Often automatic version control.
  • Not all cloud services are secure. May not be suitable for sensitive data containing personal information about EU citizens.
  • Insufficient control over where the data is stored and how often it is backed up.
  • Free services by commercial providers (e.g. Google Drive, Dropbox) may claim rights to use content you manage and share them for their own purposes.
  • Data can be lost if your account is suspended or accidentally deleted, or if the provider goes out of business.
  • Encrypt all (sensitive) personal data before uploading it to the cloud. This is particularly important to avoid conflict with European data protection regulations if you do not know in which countries servers used for storage and backup are located (see 'Security' for more information on encryption; also see 'Protecting data').

Recommendations

  • Do: use cloud services for granting shared, remote and easy access to data and other files to all involved in the project.
  • Do: Read the terms of service. Especially focus on rights to use content given to the service provider.
  • Do: Opt for European, national, or institutional cloud services which store data in Europe if possible.
    • B2drop (EUdat, n.d.) is an example of a European cloud storage solution.
    • SWITCHdrive (SWITCH, 2017) is a Swiss solution.
    • DataverseNL (Data Archiving and Networked Services, 2017) is an example of a service for Dutch researchers that allows the storage and sharing of data both during and after the research period.
  • Don't: make this your only storage and backup solution.
  • Don't: use for unencrypted (sensitive) personal data.
Localstorage

Desktop computers and personal laptops

Advantages

Disadvantages/Risks

Precautions for (sensitive) personal data

  • Full control over files.
  • May be easier to protect against unauthorised access.
  • If data and files are stored on only one device, they are vulnerable to loss, e.g. if the device has a malfunction, is stolen or files are overwritten/erased due to human error.
  • Only the person who has access to the computer can access the data and files.

Protect the computer with a password and consider encrypting the hard drive.

Recommendations

Using desktop computers and personal laptops as the primary way of storing and accessing data and files is only suitable for projects involving very few people (ideally: only yourself) and where data and files will not have to be moved back and forth between personal computers frequently.

If you plan to work on the data on different (local) workstations, e.g. with your laptop at home and the desktop in the office:

Do: make sure that you always work on the most current version of your files, for example with the help of versioning software or version control guidelines (see 'Data authenticity, versions and editions').
Do: make sure that the most current version is always backed up (see 'Backup').

NetworkedDrives

Shared drives on university servers or NAS servers (Network Attached Storage)

Advantages

Disadvantages/risks

Precautions for (sensitive) personal data

  • Data and files are centrally stored.
  • Shared access, remote access for everyone involved in the project possible.
  • Backups can be centrally managed and automated.
  • Higher security precautions are required to prevent unauthorised access and the accidental deletion or manipulation of data and files.
  • Access for external project partners can be difficult or impossible.
  • Higher cost.

Use in combination with a suitable security strategy to protect data against unauthorised access.

Recommendations

  • Do: Use for distributed collaborative projects involving many people who need access to data and files
  • Do: use in combination with a suitable security strategy to protect data and files against unauthorised access (see 'Security').
  • Do: use in combination with strict versioning rules (see 'Data authenticity, versions and editions')
  • Do: think about long-term archival solutions for data that is complete and has been analysed. Valuable storage space might be released in this way.
  • Do: work with rights and permissions to ensure that not everyone has access to everything if this isn’t required (e.g. access to master files more restricted than access to working files).

Types of storage media

In addition to finding a storage solution that best suits the requirements of your project, you may be required to decide which media types to use for storage and backup of your data and documentation. This is of particular importance if backup and storage are not taken care of by the IT department of your university or research institute.

Example

Advantages

Disadvantages

CD, DVD

  • Portability
  • Low cost
  • Easily damaged, especially when handled poorly or stored under poor conditions
  • Easily lost
  • Frequent read/write errors
  • Not durable
  • Relatively small capacity

Example

Advantages

Disadvantages

Hard Disk Drive (HDD)

  • Lower cost compared with built-in Flash drives (Solid State Disks)
  • High storage capacity
  • Subject to physical degradation
  • Easily damaged (e.g. by magnetic fields or by physical impact)

Example

Advantages

Disadvantages

USB drive, SD card

  • Portability
  • Low cost
  • Robustness
  • Relative longevity
  • Easily lost
  • Relatively small storage capacity
  • Data hard to recover if the carrier is damaged

Example

Advantages

Disadvantages

Solid State Drive (SSD)

  • Robustness
  • Relative longevity
  • Data hard to recover if the drive fails
  • Higher cost compared with magnetic Hard Disk Drives (HDD)
  • Smaller capacity compared with HDD

Tips for your storage strategy

The UK Data Service (2017b) recommends the following for any storage strategy:

How to ... check the integrity of your files

We recommend that you frequently check the integrity of your files. This can be done with checksum tools such as MD5summer (n.d.) or Checksum Checker (2014). Such tools create a 'digital fingerprint' - a string of numbers - from the bit values (the ones and the zeros) of a file. Monitoring whether the fingerprint of a given file changes allows you to detect if a file was changed in any way intentionally or unintentionally.

Follow the steps in the video (UK Data Service, 2016b) to perform a checksum check for your own files.