Data/ About NIDS FAQs

1. Which version of the data should I be using? 

The latest NIDS Panel Dataset release package is 2016. This consists of Wave 1 Version 7.0.0, Wave 2 Version 4.0.0, Wave 3 Version 3.0.0, Wave 4  version 2.0.0 and Wave 5 Version 1.0.0.

Please note: When using NIDS, only use the most recent version of each wave as indicated above. Any panel analysis must be done based on merging across the same release package (currently 2016) as retrospective cleaning of prior waves has taken place.

2. How was the original NIDS sample selected? 

In 2008 10 367 dwellings were selected from 400 Primary Sampling Units across the country. Of those dwellings 491 were found to be multi-household dwellings. 7 296 households were successfully interviewed of the eligible 10 858 households. Within the successfully interviewed households  31 144 individuals were identified. 2 918 individuals were identified as non-resident members and were thus excluded from the study leading to the final count of 28 226 Continuing Sampling Members. A resident member was defined as an individual that usually resided at the dwelling four nights a week.  Please see FAQ 10 for information regarding top up samples.

3. Who is a sample member and what is sample status? 

Sample members can either be Continuing Sample Members (CSMs) or Temporary Sample Members (TSMs). CSMs are interviewed in every Wave of NIDS whereas TSMs are interviewed only in the Wave(s) that they are co-resident with a CSM.

4. How is an individual uniquely identified? 

Each individual is given a unique code that uniquely identifies them. The code is referred to as “pid” and helps identify individuals across waves.

5. What is the definition of a household? 

A household is a construct which can be thought of as a “roof” or compound/homestead/stand where individuals are either members, residents or both. Where household members are defined as living in the household for at least 15 days during the last 12 months OR arrived at the household in the last 15 days and the households is now their usual residence. With resident members being defined as living in the household for more than four nights a week. In addition, the members and residents must share food from a common source with other household members.

6. How is the data structured and how are respondents identified? 

Every resident individual (CSM or TSM) is allocated an individual identifier (pid). Individual interview records are created for all resident household members. The dataset in which the individual interview record can be found is dependent on age at interview and type of interview conducted. Deceased CSMs do not have individual interview records as no interview was conducted. A record of all deceased individuals is contained in the “Link File”. Each individual questionnaire maps uniquely to a household questionnaire and household roster file using the household identifier (w`x’_hhid). Individual identifiers on their own merge non-uniquely to the household roster file. This lists all the rosters on which they are considered household members. The household roster file for each household includes the details of all household members, even if they are not all resident at that household. If a respondent moved outside the borders of South Africa to a private dwelling they are assigned their own household identifier which links to a household questionnaire record in the household roster and individual questionnaire files. If the household refused to participate or there is some other type of non-response (e.g. the household could not be located), the individual questionnaires will still appear in the data files but the outcome will indicate that it was household level non-response.
For more information on the structure of the NIDS data see section 3.4. in the NIDS User Manual.

7. Does NIDS follow households? 

NIDS does not follow households but is rather a survey of individuals or more specifically continuing sampling members (CSMs), i.e. all persons that were resident in participating households in Wave 1. For this reason individuals can be identified across Wave by their unique identifier pid, while households are not identifiable across waves except insofar as they are made up of the same individuals across waves.

8. Why do family/household ID numbers vary from wave to wave? 

Different household identifiers are assigned to each Wave as NIDS is a panel of individuals, and the household identifier is simply a tool to connect each individual. Households are not identifiable across waves except insofar as they are made up of the same individuals across waves.

9. How do I match … 

… respondents across waves?
Individuals can be identified across Wave by their unique identifier pid. The guide provided in the NIDS User Manual in section 3.7 explains the process for matching respondents across waves through merging.  Two ways for doing this are noted (with one first merging to the link file), each with their own benefits in the context of the analysis to be undertaken.

… respondents within households?
Households are identifiable within Wave by their unique identifier w`x’_hhid, with this identifier acting simply as a tool to connect each individual to their household in each Wave. Note that households are not identifiable across waves except insofar as they are made up of the same individuals across waves.

… children to their parents?
In the indderived dataset there are variables indicating the pid of the mother (w`x’_best_mthpid) and the father (w`x’_best_fthpid)

… spouses to one another?
Partnerships can be identified through the w`x’_r_parhpid variable in the roster dataset.

10. Have any new samples been added since the beginning of the survey?

Yes. NIDS achieved low baseline response rates in predominantly white and Indian areas at baseline. The sample was further reduced between Wave 1 and 4 because of high attrition rates in these groups. In Wave 5 (2017) a sample top-up was undertaken. The aim of the top-up was to increase the number of white, Indian, and high income respondents. To identify individuals who were added in the 2017 top-up, the variable w5_Y_sample (where Y denotes the relevant data file indicator) was created in all the Wave 5 data files (this variable is simply called sample in the Wave 5 Link File and was also included in the Link Files of Waves 2 – 4). This variable identifies which sample households and individual respondents originated from. It takes on the value 1 for “2008 sample” and 2 for “2017 sample”.

11. What is a household head? 

In the NIDS data the household head is self-defined by the household and used simply as a construct to determine individual’s relational status to each other. No guidance is given that the household head must be the eldest, highest earner or of a specific gender.

12. How can I identify split offs from the main family? 

NIDS is a panel of individuals implying there is no concept of a main family. One can, however, identify people who live together through the link file as they have the same hhid within wave. By comparing to the hhids in previous waves you can determine which respondents were previously co-resident with each other.

13. How do I identify imputed values? 

Imputed variables and other information not asked directly of the respondent are presented in the indderived or hhderived datasets depending on whether they are individual or household level data.

14. What are derived variables? 

Derived variables are variables that were not asked directly of the respondent but were calculated or imputed from other available information. For example, aggregate income and expenditure variables were constructed.

15. How can I determine which variables are comparable across waves? 

Variables are named consistently across waves for ease of reference. Where questions are the same across waves the core of the variable name will be the same. If the question is slightly different a different name will be given. Each variable, except unique identifiers, is prefixed with the appropriate Wave identifier, e.g. w1_ or w2_. More information on the construction of variable names can be found in the NIDS User Manual (downloadable from both the NIDS and DataFirst websites).

16. What does the value of “.” mean? 

The dot symbolizes a system missing. This means that the respondent was not asked the question and the variable does not apply to this record.

17. How are missing values treated in the data? 

In cases where a data was supposed to be collected for a specific variable, but was not the missing value is always coded -3. The only exception is date variables where missing day or month is 33 and missing year is 3333. Missing responses are also labelled in the data.

18. Can the data be used for comparing multiple periods of time? 

Yes, individuals can be identified across waves by their unique “pid” codes and households can be uniquely identified within their waves by their household identifier codes “w`x’_hhid” where `x’ is the wave indicator.

19. Can the data be used for regional analysis? 

The NIDS sample is designed to be nationally representative rather than provincially representative. Consequently we do not recommend using NIDS for calculating provincial totals.

20. How are students living away treated in the data? 

For non-residents such as students living away, or any person living in an institution, at the time of interview a proxy questionnaire was completed for them in their last known household although they are not strictly speaking a household member. This is the same methodology as was followed in Wave 1 and allows information to be collected for household members who are out of scope (or residing outside of the sampling frame).

21.Why are there no learner-teacher ratio variables in the Public Admin Data? 

The learner-teacher ratio variables were not included to protect the anonymity of our respondents. These variables are available for all waves in the Secure Admin Datasets.

22. Why were there changes to household income in Wave 3 between Version 1.0 and Version 1.1? 

The way that household agricultural income is calculated changed between Version 1.0 and Version 1.1. Initially income from subsistance agriculture was calculated from the Adult questionnaire. Individual level agricultural income was aggregated up to the household level. In Version 1.1, income from subsistance agriculture was calculated as the value of all crops and/or animals harvested or consumed by the household as per the Household questionnaire. See the Program Library do-files for more details.

23.  Expenditure

In Wave 1 and Wave 2, we asked for both the 'one shot' food expenditure amount and the details on all food expenditure items. Since Wave 3, we asked for detailed food expenditure only if the:

  1. Household didn't answer the 'one shot' food question or the 'one shot' was suspicious in that it was less than 5% or more than 80% of total household income.
  2. Both the 'one shot' and the bracketed questions were non-response.
  3. Household received food as payment or ate from own stock or grew it themselves.

Because of this new rule in Wave 3 and Wave 4, the number of missing observations will be the same for each food item in cases where the 'one shot' variable is reported. See NIDS User Manual for more details.

24. Outliers

When creating the derived variables for income, expenditure, and wealth, outlier values which are found in the data are not removed or altered in any way. They are instead left for the users to decide whether to include or exclude them in their analysis.

25. Panel Weights (Pweights)

Panel weights are assigned to all successfully interviewed CSMs with the exception of CSM babies (children born to CMS mothers) who refused to be interviewed the first time they were visited. In this case, these CSM babies never receive a pweight in the wave they are first visited and refuse to take part and also in the subsequent waves.