Designing a data file structure
In an early stage of your research, you are faced with the question of what form your data files should take. Your initial decisions about the structure of your data files should be considered thoroughly.
Data file structure has a huge impact on the possible ways your files can be processed and analysed and once your structure has been filled with data, any changes to it are usually laborious and time-consuming.
File structure choice
Data files may have different internal structures and a research study may encompass several different data files in different relations to one another. The structure of the data file is also determined by the formatting of its content (e.g., types and organisation of variables). It provides information on relationships among different elements and parts of its content. An important part of the metadata is often embedded into the data file (e.g., in the form of variable names and variable and value labels, different kinds of notes and content of supplementary variables). So, the structure of your data also contributes to the clarity of your data documentation.
File structure choice often depends on the requirements of the software you are using and intended analysis. At the same time, your decisions about structure may define the possibilities of future data processing, choice of software and ways of data analysis.
When deciding on data file structure, consider the following:
- Units of analysis, possible analytical objectives and methods of analysis to be used;
- between different content items and parts of your data file;
- to sources of your data;
- to any other relevant external data and information and their structure.
- Possibilities of building connections to other existing or future data files (future additions of new data or creation of cumulative data files);
- Possible strategies for version control (see 'Data authenticity and version control');
- Possible technical limitations, e.g. operability in relation to the size of the data file (consider that large and complicated structures may put high demands on both data management and computing capacities. Some software programs also have limitations with respect to the number of variables and cases they can manage);
- Software you are going to use (this should be done also with respect to flexibility because of possible secondary analysis of your data in other software).
Designing qualitative data files
Qualitative data files emerge from many different types of research material. Such data files are texts (transcribed interviews or focus group sessions, various types of written texts, such as newspaper and magazine material, diaries etc.) or photographs, audio files (recordings of speech) or video files. Unlike quantitative data, qualitative data are not presented in form of variables, numbers, data matrices etc. Alike, they must be organized and stored in an exact precise manner so they are easily managed and ready for use.
Usually, individual data collection events will be structured into individual files, e.g. one interview transcript, one image, one audio recording each time makes a single file. These single files are then organised into folders of similar files. Sometimes, qualitative information may also be organised into matrix structures, e.g. textual extracts from newspaper articles or diaries may be placed into a rectangular matrix, whereby further metadata and coding can be added alongside each entry
Designing a qualitative data structure comes down to:
- Thinking of ways to categorise data (see 'Qualitative coding');
- Developing a file naming strategy (see 'File naming and folder structure');
- Designing a comprehensive folder structure (see 'File naming and folder structure').
Designing quantitative data files
In quantitative research, the content of the data often results from numerical coding in standardised questionnaires (see 'Quantitative coding'). In addition, full-text answers or textual codes can be recorded into specific types of variables in quantitative data files. Quantitative researchers may also store other material, i.e. administrative data, data from social media or various texts. However in this chapter, when we speak about quantitative data, we usually mean survey data.
In the accordion below you will find a description of three types of file structures - flat, hierarchical and relational - which are commonplace in quantitative social science. Also, two examples which clarify the concepts are presented below.
Dive in deeper?
We have a subtopic prepared for you on organising variables. Here you will find tips on how to build the internal structure of quantitative data files by organising, naming and labelling variables.
Alternatively, you can proceed to the section on designing file names and folder structures.