Survey ID Number
IND_2011_HLIST_1pc_v02
Title
Sample Microdata of House Listing of India 2011 - 1 percent sample dataset
Sampling Procedure
The Sample Microdata of Census of India (SMCI) has been prepared by the ORGI based on the data collected in the Population Enumeration during 9-28 February 2011 throughout India. The SMCI files are the text (ASCII) files, containing records of individual members of the households for national or the state sample. By pooling the district level samples drawn independently, the state level sample has been prepared. Like the CIHMS, where there may be some districts with a 5-percent sample and some with a 10-percent sample depending upon the size of the districts, the SMCI sample is a uni-stage stratified sample and the size of the sample is same as CIHMS multiplied by the average household size. The record for each individual contains a field showing the weight of the record. These weights for each individual, when applied to the records, expand the sample to the relevant total.
According to Census 2011, there are 26 districts with population less than 1,00,000 with average population of 61851 or 12,940 households and 614 districts with average population of 19,68,380 or 4,11,795 households. Instead of estimating sample size separately for each of the state, if we take 5 percent sample for each of the state population more than 1,00,000 the average number of 5 percent comes out to be close to 20,000 houses/households. For states less than 1,00,000 population, average household is 12940, as such, we may go for 10 percent level of significance and the sample size may be close to 5000 households.
It is therefore proposed to adopt 5 percent sample of households of the state with population more than 1, 00,000 satisfying the condition of 5 percent level of error and 5000 households of the state population less than 1,00,000 satisfying the condition of 10 percent level of error.
To improve the sampling efficiency, the household records within each district were first stratified into homogeneous groups on the basis of place of enumeration, size of village/towns and geography as stratification variables. In all, 6strata's are formed in each district. The household records were further sorted by all the geographic codes: Tahsil/ Taluk/ P.S. / Development Block/ Circle/ Mandal, Town/ Village, Ward (for urban), Enumeration Block Number, and Household Number within each of the 6 strata. This sorted frame provides a high level of implicit stratification. Finally, a systematic random sample from this sorted frame has been drawn within each district with a sampling interval of 20 or 10 as the case may be. A detailed description of variables included in the sampled households and recodes of SMCI is an important input for the data users which may be seen in Annex 2 of Chapter 3.
The National SMCI is an aggregate of 20 percent or 10 percent sub samples of 5 percent sample or 10 percent sample respectively selected at district level for state sample using the same implicit stratification, which effectively give the 1 percent sample of total households at national level. For example, in the case of the districts with a 5 percent sample of households, a random systematic sub sample of 1 in 5 households was selected from the 5 percent sample, keeping the original order of selection based on the stratification criteria. A random systematic sub sample of 1 in 10 households was selected from the 10 percent sample for the smaller districts. The national SMCI will enable us to generate the reliable estimates at state level only. For this purpose, the identity of the districts has been concealed by replacing their location codes by 99 in the 1 percent national level SMCI file.