Data file structure is supported by the organisation of variables. Variable names and labels contribute into structuring of the data file, allowing to integrate part of the documentation into the data file and helping researchers to orient themselves in the structure of the data sets. At the same time variable names should be short and respect the usual requirements of standard software, because they are used as calling codes in software operations.
The position of variables in the data file, their names and labelling should reflect the following:
Data files also include supplementary variables which facilitate orientation and management, ensure integrity, or are necessary for some analyses. As a rule, you should include a unique identifier (or set of identifiers) for cases (individual respondents) in the file. A unique identifier is an identification code for the case. They are usually numbers, for example, 0001, 0002, 0003 etc. To facilitate orientation, it is usually placed at the very beginning of the file.
Other variables may help to distinguish between different sources of information, methods of observation, temporal or other links. Yet others may provide information about the organisation of data collection such as interviewer ID or interviewing date, or distinguish cases which belong to various groups.
It is absolutely necessary for an analysis to distinguish data that result from overrepresentation sampling strategies, different waves of research, etc., especially if groups of cases distinguished by them are to be analysed in different ways.
For each variable in the data file, you should set the variable width, i.e. the number of characters or the length of the integer and fractional parts of a number. The set number of characters or digits for each variable is reserved for every case, even if they are left blank.
In the tabs below basic rules for variable naming are given and an example is presented.