Protocol 2.4: Evaluate Evidence Reliability

The evidence to be used in the IPC consists of available data, and the final classification is obtained based on a comprehensive, integrated analysis of the whole body of available evidence. Hence, all evidence needs to be evaluated for its reliability, including evidence from quantitative methods, such as surveys, and from qualitative methods, such as focus group discussions. Evidence to be assessed includes all evidence on contributing factors (e.g. satellite images, price trends, food production, rainfall estimations and employment levels) and on outcomes, such as food consumption and livelihood change (Box 16). 

The IPC Reliability Score Table (Table 10) presents the general criteria for assessing reliability scores as well as the more specific guidance for assessing the soundness of method and time relevance for all food security evidence as follows:

  • Part A presents the combination of method (M) and time relevance (T) that underpins the different reliability scores. Evidence is only reliable if the method used is robust and evidence depicts current conditions. If evidence is yielded through a reasonable but less rigorous method, such as evidence with limited representativeness, or if evidence needs to be extrapolated to the current analysis period because it was collected in past seasons or years, the evidence can be at most R1. Evidence that has either limited soundness of M and T scores R1+, while evidence that has both types of limited parameters scores R1-. Reasonable evidence that scores less than R1 (such as field trip reports and local knowledge) can be referred to as R0 and may still be used in the IPC to support the analysis. However, it should be carefully reviewed and cannot be counted towards achieving minimum evidence needs, except for areas with limited humanitarian access for collecting evidence if the data adhere to specific parameters included later in IPC Manual Version 3.0. The IPC also draws on historical data and other evidence, such as contextual conditions, to support analysis of current or projected evidence. Both quantitative and qualitative methods can potentially be assigned as R2.
  • Part B presents the general working definition of ‘good’ and ‘limited’ for soundness of M and T as well as specific guidance for assessment of reliability of evidence on indicators included in the IPC Acute Food Insecurity Reference Table. 

Note: The recommended instructions on soundness of methods and time relevance, including estimated sample sizes and clusters, have been calculated for IPC reliability purposes only. They do not intend to constitute a best practice for the design of any method, including surveys involving primary data collection in the areas of analysis. The IPC acknowledges that evidence that score less than R2 may not provide accurate estimates of the conditions, and thus the IPC requires various pieces of evidence to be analysed and converged to provide an overall classification when R1 evidence is being used. The IPC acknowledges that the soundness of methods, including surveys, is also driven by factors other than sample design, such as measurement error, selection bias, field practices and analytical skills. Although important, the IPC cannot identify globally comparable parameters for these, and analysts are urged to assess the soundness of all methods further to issues identified in this table.

Considerations:

  • General criteria for assessment of evidence reliability are equally applicable for all evidence, including qualitative and quantitative data informing indicators in the IPC Reference Tables (i.e. direct evidence, such as the FCS and the HEA) and those informing other indicators not included in the IPC Reference Tables (i.e. indirect evidence, such as market prices, rainfall estimates and production figures). Although all evidence used for IPC classifications are to be assigned a reliability score, the IPC provides specific guidance only for indicators included in the IPC Reference Tables. Analysts are encouraged to use the general criteria to support evaluation of evidence on other indicators not included in the IPC Reference Tables.
  • Nutrition evidence should be evaluated as per the Criteria for assessment of Reliability Scores included in the IPC Acute Malnutrition protocols.
  • Surveys refer to studies of a geographical area or household group to gather data on food security outcomes and/or contributing factors, and are carried out by polling a random section of the population or through universal census.
    • The sample size for surveys with cluster sampling design will generally depend on the following parameters: P: expected prevalence; D: desired precision; d: design effect; Z: desired confidence level of estimations; and, only for populations smaller than 10,000, the population size. The sample formula:  applies to simple random and cluster sampling. However, in simple random sampling, design effect (d) is 1, whereas d of cluster sampling will vary between surveys, often ranging between 1.5 and 2.5. To support the evaluation of the validity of the method of the surveys, the IPC refers to Sphere and Standardized Monitoring and Assessment of Relief and Transitions Surveys guidance of 25 clusters as a “good” sample size. While 25 clusters can be generally applied globally since the large size allows for assessment of most conditions, an acceptable minimum sample size cannot be globally developed since it will depend on actual P (expected prevalence), d (design effect) and D (desired precision). Nevertheless, assuming general parameters of P:20% (following the IPC’s 20% rule for area classification), D: 8.5%, d: 1.5 and Z:1.65 (90% desired confidence level of estimates), the IPC has identified the need for 5 clusters and 90 observations as the minimally acceptable sample size, labelled as “limited”. Although analysts may use the minimum sample size of 5 clusters and 90 observations as the acceptable minimum sample size to support evidence reliability assessment, IPC analysts should revise the minimum sample size based on real parameters as much as possible, although the desired precision (D) cannot be greater than 8.5%.
    • The validity of the surveys is also driven by factors other than sample design, such as measurement error, selection bias, field practices and analytical skills. Although important, the IPC cannot identify globally comparable parameters for these factors, and analysts are urged to assess the soundness of the survey methods. 
    • Surveys with a good method can only come from a census or a probabilistic randomized assessment with selection based on an adequate sample frame. A good method needs to adhere to the optimal sample size (see bullet above), have low measurement error and selection bias, and be collected with adequate field practices and analytical skills. 
    • Surveys with a limited method can be: (i) a probabilistic assessment; (ii) a non-probabilistic assessment for various purposes; or (iii) re-analysed survey data collected with a good method valid at a higher administrative unit. Surveys with limited representativeness should still meet minimum sample size requirements for a 8.5% precision, have a low measurement error and selection bias, and be carried out with adequate field practices and analytical skills. Given that estimates from surveys with lower sample size are likely to generate large confidence intervals, field data collectors are urged to conduct surveys representative of the unit of analysis. The IPC also calls for care when disaggregated evidence is used, as information generated can be misleading, especially if selection bias and heterogeneity are large. As much as possible, best-practice estimates should be provided with confidence intervals to support responsible use of this evidence.
  • Computer-assisted telephone interviewing is conducted remotely by trained specialized operators who work from a call centre and interview randomly selected respondents. Computer-assisted telephone interviewing can be used either as a survey or as a monitoring system. In principle, the same sample size that would be applicable to face-to-face surveys and monitoring systems should be applied to computer-assisted telephone interviewing assessments. However, an increase of 1.5x should be applied if the selection bias needs to be corrected for the increased design effect. In order to be accepted for IPC Classification, computer-assisted telephone interviewing questionnaire modules need to also be tested and approved, considering the challenges imposed on operators by not being in direct physical presence with the respondents. Optimally, especially in areas where there is bias associated with phone ownership, it is best to use both computer-assisted telephone interviewing and face-to-face interviews with a 10% sample overlap to check for mode biases between the two approaches and produce reliable estimates for variance. Unless computer-assisted telephone interviewing is used within a dual mode (computer-assisted telephone interviewing and face-to-face) survey, or the phone numbers come from a previous cluster-sample survey, computer-assisted telephone interviewing follows a simple stratified random sample design, and therefore does not require cluster selection and other requisites of cluster surveys. 
  • Full Household Economy Analysis (HEA) refers to estimations of livelihood and survival estimations performed by a trained professional using either the Livelihood Impact Analysis Spreadsheet or the dashboard. The full analysis and assumptions need to be well documented and available for review by the IPC Technical Working Group and the potential IPC Quality Review. Full baselines are based on approximately 50 focus group and key informant interviews, and should be relevant at the time of the analysis considering the stability of the situation: not older than ten years in stable situations, and not older than 5 years in unstable situations. Analysis needs to be supported by at least four pieces of R2 evidence on contributing factors. The HEA needs to adhere to the best practices checklist.
  • Rapid Household Economy Analysis (HEA) refers to estimations of outcomes performed by a trained professional using a less complete analysis system, such as the scenario-building tool or the dashboard. Both rapid baselines and rapid profiles belong to this category, although there are differences between the two: rapid baselines are based on approximately 30 focus group and key informant interviews and use the dashboard for detailed estimates, whereas rapid profiles are based on 8–10 focus group and key informant interviews, and use the Scenario Development tool for rough estimations of outcomes. Analysis and assumptions need to be well documented and made available for review by the IPC Technical Working Groupand the potential IPC Quality Review. Reference values can be obtained from rapid baselines or rapid profiles provided that they quantify sources of food and income for subjects being classified. Rapid baselines and detailed profiles should be relevant at the time of the analysis considering the stability of the situation: not older than ten years in stable situations, and not older than five years in unstable situations. Analysis needs to be supported by at least four pieces of R2 evidence on contributing factors. The HEA needs to adhere to the best-practice checklist. The “zone summaries” or equivalents, which are also based on the concepts of HEA but which do not provide detailed information on food and income sources, score less than R1.
  • Monitoring systems include estimates usually collected routinely in community-based sites purposively selected with prevalence statistics typically done through pooled analysis for surveillance and monitoring. Observations may be selected randomly or purposively for various reasons.
  • Evidence collected during the season of analysis refers to food security data collected during the period of time defined as the current analysis period, considering seasonal changes in food consumption and livelihood change outcomes within years. Season of analysis is often referred in relation to peaks in food production, usually because of harvests and animal production. In rural settings that are highly dependent on non-irrigated local food production, food consumption seasons are mostly likely linked to rainfall patterns. If an area of analysis does not have significant seasonal changes within years, the entire year can be treated as one “season”. Acute Food Insecurity and Acute Malnutrition seasons may or may not be aligned, depending on interactions between the different drivers of Acute Malnutrition and food consumption.
  • Estimates from a R1 representative survey from a similar area can be used to support the classification only if the area being classified is relatively small (e.g. camps, villages, admin. level 4) and when the evidence on the same indicator is not available for the area of interest through another method. An analysis of the similarity of food insecurity between areas, based on evidence on contributing factors and outcomes, needs to be presented to demonstrate comparability of areas. Evidence from similar nearby areas needs to be supported by at least two pieces of reliable evidence on contributing factors to food insecurity to allow analysts to confirm the likely outcomes for the area of analysis.