Weights of survey data

Weighting800px

When conducting a survey, having a representative sample of the population is of paramount importance. But in practice, you are prone to oversample some kinds of people and undersample others. Weighting is a statistical technique to compensate for this type of 'sampling bias'. A weight is assigned to:

  • Reflect the data item's relative importance based on the objective of the data collection;
  • Take into account the characteristics of sampling design;
  • Reduce bias arising from nonresponse when the characteristics of the respondents differ from those not responding;
  • Correct identifiable deviations from population characteristics.

Each individual case in the file is assigned a certain coefficient – individual weight – which is used to multiply the case in order to attain the desired characteristics of the sample.

Different types of weights and their different purposes

QauntGuide

Several types of weights have different purposes and a different impact on data analysis.

An answer to the question whether or not to use weights isn't straightforward. For particular methods of analysis (e.g., estimating associations, regressions, etc.) using weights may be dysfunctional. There are also general theoretical and methodological issues which discourage some researchers from using weights. However, different types of weights are useful for different purposes. In some situations it is necessary to take an appropriate weight into account in your analysis (see several types of weighting below).

In all cases, if there are any weights in your data file, the rationale and calculation of the weights must be detailed in the data documentation.

  • Design weights are constructed in order to mutually adjust individual units’ probabilities of being sampled, which are normally not equal when complex sampling procedures combining multiple methods (stratification, group sampling) in several stages are implemented. For example, we want to adjust the probabilities of being sampled for all respondents in households. While individuals are the sampling units, households are sampled in the first stage. Therefore, respondents’ probabilities of being selected depend on the number of household members.

    To solve these differences in sampling probabilities we have to compute design weights. The design weights are equal to the inverse of the probability of inclusion in the sample. The sum of all design weights should be equal to the total number of units in our population.

  • During the implementation of a survey, we are normally not able to get a response from some of the targeted respondents we sampled due to:

    • Their refusal;
    • Our failure to contact them;
    • Other administrative reasons.

    Response rates differ between various population groups and those inequalities can be compensated for by weighting.

  • The way certain characteristics such as sex, age and education of your sample population are distributed may differ from the way it is distributed in the actual population. For example, your sample may consist of 66 percent men when they make up only 48 percent of the population. Post-stratification weighting is done in order to achieve a distribution equal with that of such known characteristics of the population. It is called a post-stratification weight because it can only be computed after you have collected all of your data. Stratification comes the various known strata (such as age group or sex distribution) of the population.

  • Different groups may be represented in the database in different proportions than they are in reality. Such discrepancies are normally compensated through weighting. For example, international data files combine data from various countries. However, similarly large surveys are usually implemented in each of these countries, although their total populations are radically different in size. If we want to analyse data about large populations, such as in Europe, then we have to adjust the proportions in the representation of individual European countries.

  • The data file may include several different types of weights for different purposes. Subsequently, they are combined into a final, combined weight.

  • Source: Data files from the ESS, round 8, Czech Republic (European Social Survey, 2016).

    Variable name: netusoft
    Question: How often a respondent uses internet

    In the first column, no weight was applied.
    In the second column, the Design Weights (DWEIGHT) adjust for different selection probabilities.

Consider the following ..

An example: Using weights in European Social Survey data

The following table provides an illustration of using weights in the data from the European Social Survey (n.d.) (ESS). There are three different weights available in the ESS Source Main Questionnaire data file (see European Social Survey, 2014):

  1. The design weight takes into consideration the different probabilities of being sampled given the sampling methods implemented in individual countries;
  2. The post-stratification weight corrects for the differences of the sample from selected population characteristics caused by other sampling and non-sampling errors;
  3. The population size weight corrects the fact that the individual countries’ sample sizes are very similar while there are large variations in the size of their actual populations.

Different types of data analysis then require the use of different weights or their combinations. When analysing data from one country alone or comparing data of two or more countries, only the design weight or the post-stratification weight needs to be applied. When combining different countries, design or post-stratification weights in combination with population size weights should be applied.

Source: European Social Survey, 2014.