Appendix 1- Lessons Learnt from MHIS-1
Challenges encountered during the preparation of MHIS-1 beneficiary household list:
The following are the challenges faced along with the solutions adopted during the process of creating the
beneficiary household list MHIS-1 implementation.
Village Mapping: Mapping the villages in Electoral/MNREGA/BPL to the villages in Census is the
toughest of the challenges. This mapping is needed to get the Census Village Code for a village in
Electoral/MNREGA/BPL. The villages in Electoral/MNREGA/BPL sometimes do not match with
those in Census because of the following reasons:
Mismatch in spelling
Village exists in Electoral but not in Census list for various reasons such as: (1) The
village is too small or it is just a settlement to be considered for assigning a village code
and (2) Electoral & Census lists were prepared at different times and by different parties
Matching two villages with slight mismatch in spellings poses a risk of mapping the village to a
wrong village especially when both the village names are genuine. Here there is a strong need for the
data analytics agency to work closely with the GoM & the District Officers and get the village
mapping validated on the ground.
For villages that exist in Electoral but not in Census, they are merged with the nearby main village.
Other option is to generate a unique village code for such unmapped villages and include them as a
stand-alone village. Based on the lessons learnt from MHIS-1, the recommended approach is to
generate a unique village code instead of merging with the nearby main village.
Missing Block Code Information: Block Code is one of the columns required in the final
beneficiary list. Neither Electoral list nor Census village list has the Block Code information. FY11
RSBY pre-enrolment and post-enrolment data is the source for Block Codes. The
Electoral/MNREGA/BPL village is mapped with Census village list to get the Census Village Code.
The village is in turn mapped with FY11 RSBY pre-enrolment & post-enrolment data to get the
Block Code. The missing block code issue arises when a village exists in Electoral/MNREGA/BPL
and also in Census but not in FY11 RSBY enrolment data. The villages with missing block codes
cannot be made part of any particular block and this would in turn stop that village from participating
in the enrolment process.
The issue of missing block codes can be resolved by adding the village in subject to the same block
as that of its nearby village. The GIS maps along with the “main village” tagging in Electoral list
be used to identify the nearby main village. Other option is to manually map the village to the correct
Block by working closely with the GoM and the District Officers. The first option alone was used in
MHIS-1 list preparation. If time permits, we recommend using the second option as well in addition
to the first to get a more accurate Block mapping.
Duplicate Village Names: Within a district, there are few villages each different on ground but
having the same names. Since the village name (but not village code) is a unique identifier of a
village in Electoral list, the duplicate villages need to be dealt with care. If this issue is not taken
care, the final list will have the residents of the duplicate villages falling under the same household
depending on their House No.
The duplicate villages are renamed as “Village
A” and “Village
B” (and sometimes “Village
where there are more than two duplicates) before applying any subsequent algorithms on the data.
Flagging MNREGA & BPL Households: As there is no UID between the Electoral (the destination
list) and MNREGA & BPL (the source lists) there is no 100% accurate method for identifying
MNREGA & BPL beneficiary households in the final list. The identification has to rely on a
matching algorithm that matches the person names between the two lists.
a text-matching algorithm was used to match the person names between
Electoral list & MNGERA/BPL lists. The Levenshtein Distance algorithm scores better over the
other available matching algorithms viz. Jaccard, JaroWinkler & Longest Common Subsequence.
However, even with Levenshtein Distance only ~60% of the names are matched with accuracy of
95% and above.
Because of this only ~60% matchability, the number of households identified as MNREGA/BPL in
the final beneficiary list is not within 10% of the numbers reported in the socioeconomic census.
Issues identified during the field validation exercise of MHIS-1 beneficiary household list:
The enrolment team witnessed several issues during the enrolment process; the key issues identified
across districts are as follows:
Some families have migrated out of the original village to a different location.
Solution: The family can still enrol but can do so only from the original village
The identified head of household is either not alive or different from the actual head of household.
Solution: Put a maximum age limit (e.g., 50) for identifying the head of household
One family is split between two different households. This is by design rather than an issue as we are
splitting the larger families into smaller ones.
Solution: The family can enrol under either one or more of the identified households
Levenshtein Distance is a string metric for measuring the difference between two sequences.
Issues faced during the enrolment process of MHIS-1:
The enrolment team witnessed the following issues across the districts during the enrolment process.
These issues are in addition to the ones raised during the field validation exercise:
illages didn‟t appear in the correct
Block and sometimes they are shown in a different District.
Population of the district was typically different on the ground from what was mentioned in the list
i.e. as per the beneficiary list with SNA, the population of one particular village was 20 people,
however when the team went for enrolment it was found that the village has 50 people. Further the
headman of the village refused to allow the enrolment team to initiate enrolment activities unless the
entire population was enrolled.
The head of the family mentioned in the list was different from the actual head of family. It was also
noticed at some places that the head of the family was no longer alive therefore it becomes difficult to
enrol such families.
The households are defined as per the House Number. However under the same House Number, more
than one family lives i.e. tenant and owner staying in the same house. Under these circumstances
enrolment of both the families was difficult.
Appendix 2- Recommended Process to Build MHIS-2 Beneficiary List
The MHIS beneficiary household list is prepared using the Electoral list as base/starting list. The
supporting lists are BPL list, MGNREGA list, and other minor lists that help categorize the individuals
into Construction Worker, Domenstic Worker, Street Vendor etc. The output of the exercise is a single
list with the following attributes:
Grouped into beneficiary family Households with the Head of household identified
Each beneficiary family household classified as APL/BPL, MNREGA, Construction Worker etc.
Proper coding of relationship between village, block, panchayat, district & state with all required
RSBY fields available. The final list is validated as per RSBY software & format guidelines
Detailed illustrative steps:
Step 1: Collect & collate the necessary lists (Electoral, MNREGA & BPL)
Obtain soft copies of electoral rolls data, MNREGA & BPL data. Convert them from PDF format to
Excel if necessary. The figures below illustrate a sample Electoral, BPL & MNREGA data structure.
Converting PDF Electoral rolls to MS Excel format:
Figure: Sample Electoral roll in PDF format
Figure: Sample Electoral list after it is converted to Excel format
Converting PDF BPL data to MS Excel format
Figure: Sample BPL list in PDF format
Figure: Sample BPL list after it is converted to Excel format
Sample MNREGA data
Step 2: Data Cleansing & De-duping. And correcting anomalies
Data cleaning & de-duping
The Electoral List contains supplement details on „deletions‟ and „corrections‟ in addition to the regular
list of electors. Deletions need to be removed from the list and Corrections need to be de-duped. There is
a pattern in which these two supplement
lists appear in the electoral rolls. The deletions have a prefix „S‟
in the serial number, and the corrections have a prefix „#‟ in the serial number. The data cleaning and de
duping activity can utilize this pattern to ensure the electoral list is cleaned and de-duped before any
Correcting gender anomalies
The Electoral list contains a few cases where the Gender does not match with the Relationship Code
mentioned. For instance, a person has Gender “M” and related as “Mother” to
another member of the
family. Such anomalies are corrected by assuming that one of the information is true and correcting the
other. Gender is assumed to be true and Relationship Code is corrected accordingly to match with the
Duplicate village names
Within a district, there can be some villages each different on ground but having the same names. Since
the village name (but not village code) is a unique identifier of a village in Electoral list, the duplicate
villages need to be dealt with care. If this issue is not taken care, the final list will have the residents of the
duplicate villages falling under the same household depending on their House No.
The duplicate villages are renamed as “Village
A” and “Village
B” (and sometimes “Village
there are more than two duplicates) before applying any subsequent algorithms on the data.
Step 3: Assigning villages to the correct Blocks
Block Code is one of the columns required in the final beneficiary list. However, the Electoral List does
not contain the Block information. Each village in the Electoral List needs to be assigned to its respective
Option 1: One of the sources of Block Codes is MHIS-1 post-enrolment data. However, this approach is
prone to error as the villages in the Electoral need to be matched to those in the post-enrolment data for to
obtaining their Block Code. This may pose a larger challenge of matching the villages.
Option 2: Manually map the village to the correct Block by working closely with the GoM and the
A combination of Option 1 and Option 2 can be adopted to accomplish this step.
Step 4: Grouping into Households and Identifying the Head of Household
The households are created based on the “House Number” present in Electoral List.
Once the households
are created, the head is identified based on age, gender and relationship code. The detailed flowchart is as
Figure: Algorithm for Grouping into households & identifying the head of household
Note: MNREGA list already contains the head of household. So the head of household identified in this
step may later get corrected during MNREGA flagging. If there is a MNREGA member in the household,
he/she takes preference over non-MNREGA member in representing the head of household for the
Step 5: Flagging MNREGA & BPL
The names of persons in Electoral list are matched with those in MNREGA and BPL lists at a village
level and flagged as MNREGA/BPL accordingly. Matching is done using a suitable text-matching
Figure: Flow chart for merging BPL/MNREGA into beneficiary household list
Step 6: Preparing the final beneficiary list in RSBY/MoLE format
The final Household lists (aggregated as one per district) are needed in RSBY format (also called MoLE
format). The field enrollment software accepts only RSBY formats
. Please refer “Manual
StandAlone.docx” present on RSBY website for a comprehensive documentation of the format
Adding dummy spouse & dummy family members:
For households that don‟
t have a living spouse (of the head of household), a dummy spouse should be
added. The gender for dummy spouse should match with the gender of the living head of household to be
of opposite sex.
Also, if the household member count is less than five, dummy members should be added to make up the
count to five. The dummy members can be of any gender and age.
Finally, the beneficiary household list in RSBY format (MS Access database) is validated using the
BDCS software available on RSBY website. BDCS stands for Beneficiary Data Checking Software. The
BDCS software checks for any technical errors in the data and provides an error report towards the end of
the checking process. The acceptable error limit is 0.01%.
Documents you may be interested
Documents you may be interested