Methodologies for data collection and analysis for monitoring and evaluation

The quality and utility of data derived from either monitoring or evaluation in an IOM intervention depends on the data collection planning, design, implementation, management and analysis stages of these respective processes. Understanding each stage, and the linkages between them, is important for collecting relevant, high-quality data that can inform evidence-based decision-making and learning. The following chapter will look at methodologies for planning, designing and using various data collection tools for both monitoring and evaluation ( M&E ) purposes. This chapter also focuses on managing and analysing the collected data and, finally, how to present findings.

An overview of chapter 4

IOM M&E products and facilitates the comparison of results and their aggregation.

While obtaining data is required for both M&E, it is important to note that the methodologies may vary according to respective information needs.1 This may subsequently shape the purpose of the data collection, which is guided by the availability of data, local context, resources and time, as well as other variables.

The scope of this chapter is limited to concepts that will enabble users to acquire a broad understanding of methodologies for collecting and analysing M&E data, and links to additional resources are available at the end of each section.

M&E practitioners will have an understanding of methodologies for M&E, specifically of how to select, design and implement methods relevant to their work and have a knowledgeable background to make informed choices.

Professional standards and ethical guidelines

During the different stages of monitoring or evaluation, including for the collection and use of data, M&E practitioners are required to adopt ethical behaviours that prevent them from being influenced by internal and external pressures that may try to change the findings before they are released or to use them in an inappropriate way. 2

Ethical behaviour

Ethics are a set of values and beliefs that are based on a person’s view of what is right, wrong, good and bad and that influence the decisions people make. They can be dictated by the organization and also by laws in the country in which the M&E practitioners work and what people consider to be ethical in that context

M&E practitioners must also act in accordance with the following sources

IOM resource

IOM resource

United Nations Evaluation Group ( UNEG )

Evaluation and politics

The gathered data provide an important source of information to decision makers about the intervention being monitored and/or evaluated. While positive evaluations can help secure more funds, expand a pilot project or enhance reputations, the identification of serious problems can lead to difficult situations where the credibility of the work done is at stake. Understanding and managing political situations and influence is crucial for maintaining the integrity of the monitoring and evaluation work and well defined and robust methodologies for the data collection and analysis play a critical role.4

Ethical guidelines and principles

When planning, designing, implementing, managing and reporting on M&E activities, M&E practitioners should ensure that their actions are informed by ethical guidelines, particularly those outlined below:

Some of the common ethical principles presented in the above documents “should be applied in full respect of human rights, data protection and confidentiality, gender considerations, ethnicity, age, sexual orientation, language, disability, and other considerations when designing and implementing the evaluation”.5 They can be summarized as follows:

Monitoring and evaluation ethical principles

Adhering to common ethical principles also contributes guaranteeing that the information gathered is accurate, relevant, timely and used in a responsible manner (see chapter 2, as well as Annex 2.1. Ethical monitoring and/or evaluation checklist).

IOM resource

Other resources

Buchanan-Smith, M., J. Cosgrave and A. Warner

Morris, M. and R. Cohn

Thomson, S., A. Ansoms and J. Murison (eds.)

Rigorous planning and designing for data collection can improve the quality of the approach and methods of data collection and, therefore, the quality of collected data. It is imperative to identify the approach intended for use to monitor or evaluate an intervention, and then to establish a data collection plan. Selecting an appropriate approach will also allow a relevant assessment of the monitoring or evaluation questions guiding any review, taking into account the specific context, existing constraints, access, timing, budget and availability of data.

INFORMATION - IOM migration data governance

What is it?

Data governance represents the framework used by IOM to manage the organizational structures, policies, fundamentals and quality that ensures accurate and risk-free migration data and information. It establishes standards, accountability and responsibilities and ensures that migration data and information use are of maximum value to IOM, while managing the cost and quality of handling the information. Data governance enforces the consistent, integrated and disciplined use of migration data by IOM.

How is it relevant to IOM’s work?

Data governance allows IOM to view data as an asset in every IOM intervention and, most importantly, it is the foundation upon which all IOM initiatives can rest. It is important to keep in mind the migration data life cycle throughout the whole project cycle. This includes the planning and designing, capturing and developing, organizing, storing and protecting, using, monitoring and reviewing, and eventually improving, the data or disposing of it.

Key concepts to look out for:

IOM resources

Planning for data collection

When planning for data collection, basic considerations ensure that the data to be collected and analysed is valid and reliable: purpose for data collection, methodology for data collection, resources for data collection and timing for data collection. Qualitative, quantitative or mixed methods approach to collect data can be considered in that respect.

Figure 4.2. Key considerations when planning for data collection

Figure 4.2. Key considerations when planning for data collection

Some questions to ask:

Several aspects need to be considered, such as identifying the source of the data, the frequency of data collection, knowing how data will be measured, by whom and how many people will collect data and selecting the appropriate methodology in order to design the right data collection tool/s

Some questions to ask:

Resources will be enabling the implementation of choices.

Some questions to ask:

Timing may influence the availability of resources, as well as the relevance of data (avoid outdating of data).

Some questions to ask:

Identifying the purpose of data collection

Identifying the purpose of data collection aims to address different information needs, and information needs of monitoring may also differ from those of evaluation.

Data collection for monitoring, which occurs during implementation, feeds implementation-related information needs, using data collection tools that are designed to collect data for measuring progress towards results against pre-set indicators. Data collected for evaluation serves the purpose of assessing the intervention’s results and the changes it may have brought about on a broader level, using data collection tools designed to answer evaluation questions included in the evaluation terms of reference ( ToR ), matrix or inception report (see also chapter 5, Planning for evaluation).

The process of planning and designing the respective tools for M&E data collection may be similar, as data collected for monitoring can also be used for evaluation, which will feed the diverse information needs of either. Identifying whether data collection is for either monitoring or evaluation purposes is a first step in planning, which will then influence the choice of an appropriate methodology and tools for data collection and analysis. The following tables show how questions can determine what type of data to collect respectively for monitoring and evaluation.

Figure 4.3. Monitoring and vertical logic

Figure 4.4. Evaluation and vertical logic Evaluation and the vertical log

Source: Adapted from IFRC, 2011.

International Federation of Red Cross and Red Crescent Societies ( IFRC )

Sources of data

The IOM Project Handbook defines data sources as identifying where and how the information will be gathered for the purpose of measuring the specified indicators.6

In general, there are two sources of data that can be drawn upon for monitoring and/or evaluation purposes:

Sources_of_data

INFORMATION - Availability and quality of secondary data

It is important to assess the availability and quality of secondary data, as this enables M&E practitioners to target efforts towards the collection of additional data. For instance, it is important to ascertain whether baseline data (such as census data) are available, and, if so, to determine its quality. Where this is not the case or where the quality of the data is poor, M&E practitioners are required to plan for the collection of baseline data.

Desk review

When choosing sources of data, it is helpful to start with a desk review to better assess what type of data to use. For monitoring, this corresponds to the information included under the column “Data source and collection method” of the IOM Results Matrix and Results Monitoring Framework (see chapter 3). For evaluation, the type of data will be clarified in the evaluation ToR , inception report and/or evaluation matrix and can also include data derived from monitoring.

A desk review usually focuses on analysing existing relevant primary and secondary data sources and can be either structured or unstructured. Structured desk reviews use a formal structure for document analysis, whereas unstructured reviews are background reading. For detailed guidance on conducting a desk review, see Annex 4.2. How to conduct a desk review.

Type of measurement

When planning for data collection and analysis, knowing the type of measurement, that is how data will be measured, may influence the decision to choose the appropriate methodology. This is of particular importance to inform the design of data collection tools such as surveys.

Measures of indicators identified in a Results Matrix or Evaluation Matrix can include categorical (qualitative) and/or numerical (quantitative) variables. A variable is any characteristic or attribute that differs among and can be measured for each unit in a sample or population (see section on “Sampling”).

When designing indicators, the most important tasks are to logically link these to the intervention results and determine how the indicators will measure these results.

What is the purpose of measuring?

How will you go about measuring it?

Measurement quality

Any measure that is intended to be used should be relevant, credible, valid, reliable and cost-effective. The quality of indicators is determined by four main factors:

(a) Quality of the logical link between the indicator and what is being measured (such as the objective, outcome, output and/or impact of an intervention)

(b) Quality of the measurement

(c) Quality of implementation

(d) Quality of recognizing the measurement results and their interpretation

Table 4.1 provides a checklist for ensuring good quality measures.

Table 4.1. Checklist for measuring quality
Criteria Reflection checklis
Relevancy Does it measure what really matters as opposed to what is easiest to measure?
Credibility Will it provide credible information about the actual situation?
Validity Does the content of the measure look as if it measures what it is supposed to
measure? Will the measure adequately capture what you intend to measure?
Reliability If data on the measure are collected in the same way from the same source
using the same decision rules every time, will the same results be obtained?
Cost-effectiveness What is the cost associated with collecting and analysing the data? Is the
measure cost-effective?

IOM resources

Other resources

Organisation for Economic Co-operation and Development ( OECD )

Levels of measurement

The values that a variable takes form a measurement scale, which is used to categorize and/or quantify indicators. They can be nominal, ordinal, interval or ratio scales. The levels of measurement used will determine the kind of data analysis techniques that can or cannot be used.

Levels of measurement
Levels of measurement
Nominal scales
Nominal scales consist of assigning unranked categories that represent more of quality than quantity.
Any values that may be assigned to categories only represent a descriptive category (they have no inherent
numerical value in terms of magnitude). The measurement from a nominal scale can help determine whether
the units under observation are different but cannot identify the direction or size of this difference. A
nominal scale is used for classification/grouping purposes.
EXAMPLE - Question

(Select one option)

(a) Host communities

(b) Collective settlement/centre

(c) Transitional centre

(e) Others (specify):_____________________

Ordinal scales are an ordered form of measurement, consisting of ranked categories. However, the differences between the categories are not meaningful. Each value on the ordinal scale has a unique meaning, and it has an ordered relationship to every other value on the scale. The measurement from an ordinal scale can help determine whether the units under observation are different from each other and the direction of this difference. An ordinal scale is used for comparison/sorting purposes.

EXAMPLE - Question

How often do you interact with local people?

(b) A few times per week (4)

(c) A few times per month (3)

(d) A few times per year (2)

Interval scales consist of numerical data that have no true zero point with the differences between each interval being the same regardless of where it is located on the scale. The measurement from an interval scale can help determine both the size and the direction of the difference between units. However, since there is no true zero point, it is not possible to make statements about how many times higher one score is than another (for example, a rating of 8 on the scale below is not two times a rating of 4). Thus, an interval scale is used to assess the degree of difference between values.

EXAMPLE - Question

Compared to your financial situation before leaving, how would you rate your current financial situation?

Question

Ratio scales consist of numerical data with a true zero point that is meaningful (that is, something does not exist), and there are no negative numbers on this scale. Like interval scales, ratio scales determine both the absolute size (that is, measure distance from the true zero point) and the direction of the difference between units. This measurement also allows to describe the difference between units in terms of ratios, which is not possible with interval scales. Thus, a ratio scale is used to assess the absolute amount of a variable and compare measurements in terms of a ratio.

EXAMPLE - Question

What was your income last month? _________________

The most important task of any indicator is to ensure the best possible allocation of the characteristics being measured to the measurement scale. This segregation of the characteristics “and their measurable statistical dispersion (variance) on the scale are the main insights gained because of the indicator (the variables)”.10

When planning for data collection and thinking of the type of data that will be collected, it is important to assess the target audience from which the data will be collected. A crucial consideration that may influence decision-making is to determine the sample size and sampling strategy to select a representative sample of respondents, as this has budgetary implications.

While at times it may be feasible to include the entire population in the data collection process, at other times, this may not be necessary nor feasible due to time, resource and context-specific constraints, so a sample is selected.

INFORMATION

A population, commonly denoted by the letter N, is comprised of members of a specified group. For example, in order to learn about the average age of internally displaced persons (IDPs) living in an IDP camp in city X, all IDPs living in that IDP camp would be the population.

Because available resources may not allow for the gathering of information from all IDPs living in the IDP camp in city X, a sample of this population will need to be selected. This is commonly denoted by the lowercase letter n. A sample refers to a set of observations drawn from a population. It is a part of the population that is used to make inference about/is representative for the whole population.

Illustration of population (N) versus sample (n)

Sampling is the process of selecting units from a population (that is, a sample) to describe or make inferences about that population (that is, estimate what the population is like based on the sample results).

Sampling applies to both qualitative and quantitative monitoring/evaluation methods. Whereas random sampling (also referred to as probability sampling) is often applied when primarily quantitative data collection tools are used for monitoring/evaluation purposes, non-random sampling (also referred to as non-probability or purposeful sampling) tends to be applied to monitoring/evaluation work that relies largely upon qualitative data11

Properly selecting a sample, ideally at random, can reduce the chances of introducing bias in the data, thereby enhancing the extent to which the gathered data reflects the status quo of an intervention. Bias is any process at any stage in the design, planning, implementation, analysis and reporting of data that produces results or conclusions that differ systematically from the truth.12 For more information on the types of bias, see Annex 4.3. Types of bias.

Country Y has a site hosting 1,536 IDPs; this is the entire population (N).

IOM is implementing several activities, alongside other humanitarian actors, to address the needs of the IDPs sheltering at this site. You are interested in monitoring/evaluating these activities. In particular, you are trying to capture the views of an average person benefiting from this intervention.

Due to time and budget constraints, it is impossible to survey every IDP benefiting from IOM services. Therefore, you pick a sample (n) that represents the overall view of the 1,536 IDPs benefiting from the intervention. Given the available resources, the representative sample for the target population in this case was chosen to be 300

Figure 4.6. Illustration of example

Random sampling

Random sampling is an approach to sampling used when a large number of respondents is required and where the sample results are used to generalize about an entire target population. In other words, to ensure that the sample really represents the larger target population and that not only reflecting the views of a very small group within the sample, representative individuals are randomly chosen. Random sampling is an effective method to avoid sampling bias.

True random sampling requires a sampling frame, which is a list of the whole target population from which the sample can be selected. This is often difficult to apply. As a result, other random sampling techniques exist that do not require a full sampling frame (systematic, stratified and clustered random sampling).

Simple

random sampling

Stratified random

sampling

Cluster random

sampling

Cluster random sampling divides the population into many clusters (such as neighbourhoods in a city) and then takes a simple random sample of the clusters. The units in each cluster constitute the sample. When both the target population and the desired sample size are large

Easy and convenient

Can select a random sample when the target population sampling frames are very localized

Multistage random

sampling

Multistage random sampling combines two or more of the random sampling techniques sequentially (such as starting with a cluster random sample, followed by a simple random sample or a stratified random sample). When a sampling frame does not exist and is inappropriate

Can select a random sample when the target population lists are very localized

Non-random/Purposeful sampling

Non-random sampling is used where:

Non-random/purposeful sampling is appropriate when there is a small “n” study, the research is exploratory, qualitative methods are used, access is difficult or the population is highly dispersed. For further information as to when it is appropriate to use non-random sampling, see Patton (2015) and Daniel (2012). The chosen sampling technique will depend on the information needs, the methodology (quantitative or qualitative) and the data collection tools that will be required.

Note: While the table shows the most common types of non-random/purposeful sampling, further types of non-random/purposeful sampling can be found in Patton, 2015.

Limitations of non-random/purposeful sampling

There are several limitations when using non-random/purposeful samples, especially convenience and snowball samples. First, generalizations to the entire target population cannot be made. Second, statistical tests for making inferences cannot be applied to quantitative data. Finally, non-random samples can be subject to various biases that are reduced when the sample is selected at random. If using a non-random sample, M&E practitioners should ask the following: “Is there something about this particular sample that might be different from the population as a whole?” If the answer is affirmative, the sample may lack representation from some groups in the population. Presenting demographic characteristics of the sample can provide insight as to how representative it is of the target population from which the sample was drawn.

Regardless of which sampling approach and technique you decide to use, it is important that you are clear about your sample selection criteria, procedures and limitations.

Resources for random sampling and non-random/purposeful sampling are provided in Annex 4.4. Applying types of sampling.

Determining sample size

The size of the sample will be determined by what will be measured, for what purpose and how it will be measured. The size of the sample will also need to ensure, with the maximum level of confidence possible, that an observed change or difference between groups is the result of the intervention, rather than a product of chance. However, this may not always be the case for non-random/purposeful sampling.

Determining sample size: Random sampling

When a large number of respondents is required, the appropriate sample size is decided by considering the confidence level and the sampling error.

How confident should the person collecting data be in the sample results and their accuracy in reflecting the entire population?

Generally, the confidence level is set at 95 per cent, that is, there is a 5 per cent chance that the results will not accurately reflect the entire population.

In other words, if a survey is conducted and it is repeated multiple times, the results would match those from the actual population 95 per cent of the time.

Increasing the confidence level requires increasing the sample size.

It is important to determine how precise estimates should be for the purpose of data collection. This is the sampling error or margin of error.

The sampling error or margin of error is the estimate of error that arises when data is gathered on a sample rather than the entire population.

A sampling error or margin of error occurs when a sample is selected that does not represent the entire population.

EXAMPLE - Confidence level and sampling error

IOM is currently implementing a livelihoods project in region M of country Y. A poll is taken in region M, which reveals that 62 per cent of the people are satisfied with the activities organized through the livelihoods project and 38 per cent of those surveyed are not satisfied with the assistance received.

The M&E officer responsible for data collection in this case has decided that the sampling error for the poll is +/- 3 per cent points. This means that if everyone in region M were surveyed, between 59 (62 -3) and 65 (62 +3) per cent would be satisfied and between 35 (38 -3) and 41 (38 +3) per cent would not be satisfied with the assistance received at the 95 per cent confidence level. The plus or minus 3 per cent points is called the confidence interval, which is the range within which the true population value lies with a given probability (that is, 95% confidence level). In other words, the +/- 3 per cent points is the confidence interval and represents the width of confidence level, which tells more about uncertain or certain we are about the true figure in the population. When the confidence interval and confidence level are put together, a spread of a percentage results.

RESOURCES - Online sample size calculator

A number of tools are available online to help calculate the sample size needed for a given confidence level and margin of error. One useful tool is the Survey System Sample Size Calculator as well as the Population Proportion – Sample Size Calculator.

EXAMPLE- How to calculate the sample size using an online calculator

At the IDP site in country Y, there are 1,536 IDPs. You would like to make sure that the sample you select is adequate. You decide that having 95 per cent confidence in the sample results with a margin of error of 5 per cent is acceptable. The accuracy and precision for the population of interest tells you that you need a sample size of 307 IDPs to be able to generalize the entire population of IDPs at the site.

calculate the sample size using an online calculator

Determining sample size: Non-random/purposeful sampling

For non-random/purposeful sampling, an indication of whether an adequate sample has been reached or not is data saturation. Once this point is reached, no more data needs to be collected. However, due to little guidance on how many interviews are needed to reach saturation, this can be sometimes difficult to identify.

The following questions can help determine how many people to include in the sample achieving both data saturation and credibility:

Methods, approaches and tools for monitoring and evaluation

Once data collection has been planned and data sources and sampling have been established, it is time to focus on approaches and methods for designing the data collection tools. The indicators in the Results Matrix, as well as the evaluation criteria and related questions, will determine the approach and tools that will be used to collect the necessary data for monitoring progress/evaluating the intervention.

Time and budget constraints, as well as ethical or logistical challenges, will inform the data collection approach and tools used. The use of multiple tools for gathering information, also known as the triangulation of sources, can increase the accuracy of the information collected about the intervention. For instance, if the intervention is managed remotely due to lack of access to the field and relies upon data collection teams, triangulating the information remotely is a crucial quality check mechanism.

While triangulation is ideal, it can also be very expensive. In general, M&E practitioners use a combination of surveys, interviews, focus groups and/or observations. Studies that use only one tool are more vulnerable to biases linked to that particular method.

Methods for and approaches to data collection are systematic procedures and useful to support the process of designing data collection tools. Generally, a mixture of qualitative and quantitative methods and approaches to data collection are used for M&E. Although there are multiple definitions for these concepts, quantitative methods and approaches can be viewed as being based on numerical data that can be analysed using statistics. They focus on pinpointing what, where, when, how often and how long something occurs and can provide objective, hard facts, but cannot explain why something occurs. Qualitative methods and approaches for data collection are based on data that are descriptive in nature, rather than data that can be measured or counted. Qualitative research methods can use descriptive words that can be examined for patterns or meaning and, therefore, focus on why or how something occurs.

The following provides an overview of when a quantitative and/or qualitative approach, and corresponding tools for collecting monitoring and/or evaluation data should be used:

Words and pictures

The following graphic provides an overview of data collection methods for both monitoring and evaluation.

Focus group discussions

Strengths, weaknesses, opportunities and threats ( SWOT ) Dreams realized or visioning ( DR/V ) Drama and role plays

Photos and videos

Surveys

Surveys are a common technique for collecting data. Surveys can collect focused, targeted information about a sample taken from the target population for a project, programme or policy, especially data about perceptions, opinions and ideas. While surveys can also be used to measure intended behaviour, there is always room for interpretation, and any data gathered may be less “factual” as what people say they (intend to) do may not reflect what they in fact do in reality.

Generally, a survey is conducted with a relatively large sample that is randomly selected so that the results reflect the larger target population (see section on Sampling). The format of the survey can be structured or semi-structured, depending on the purpose of the data collection (see Table 4.8) and be implemented on a one-time basis (cross-sectional) or over a period of time (longitudinal).

Cross-sectional surveys are used to gather information on the target population at a single point in time, such as at the end of a project. This survey format can be used to determine the relationship between two factors, for example, the impact of a livelihoods project on the respondent’s level of knowledge for establishing an income-generating activity.

Longitudinal surveys gather data over a period of time, allowing for an analysis of changes in the target population over time, as well as the relationship between factors over time. There are different types of longitudinal surveys, such as panel and cohort studies.13

Content

Content

Purpose

Aggregate and make comparisons between groups, and/or across time, on issues about which there is already a thorough understanding.

Purpose

Acquire an in-depth understanding of the issues that are being monitored and/or evaluated.

For more information about the different types, design and implementation of longitudinal surveys, see the following:

IOM resource

Other resouces

Lugtig, P. and P.A. Smith

Morra-Imas, L.G. and R.C. Rist

(Kindly note that this can further be adapted as needed.)

Surveys can be administered in different ways, such as in-person interviews, phone interviews or as paper or online questionnaires that require participants to write their answers.

o design and implement a survey

For more information on how to design and implement a survey, see Annex 4.5. Survey design and implementation and Annex 4.6. Survey example

Interviews

Interviews are a qualitative research technique used to shed light on subjectively lived experiences of, and viewpoints from, the respondents’ perspective on a given issue, or sets of issues, that are being monitored or evaluated for a given intervention. Interviews provide opportunities for mutual discovery, understanding, reflection and explanation. Interviews are of three types: (a) structured; (b) semi-structured; and (c) unstructured. Table 4.9 provides an overview of each interview approach, when to use it and some examples.

When there is already a thorough understanding about one or more complex issues being monitored/evaluated.

When comparable data is desired/needed.

INFORMATION - Formulating interview questions

Good-quality interview questions should have the following characteristics:

To know more about interviews, examples of interview structure and probing, see Annex 4.7. Interview structure and questions (examples provided throughout the annex) and Annex 4.8. Interview example.

Focus group discussions

A focus group is another qualitative research technique in the form of a planned group discussion among a limited number of people, with a moderator and if possible, note takers, as well as observers if also using observations.15 The purpose of a focus group is to attain diverse ideas and perceptions on a topic of interest in a relaxed, permissive environment that allows the expression of different points of view, with no pressure for consensus. Focus groups are also used to acquire an in-depth understanding about a topic or issue, which is generally not possible using a survey. For instance, a survey can tell you that 63 per cent of the population prefers activity Y, but a focus group can reveal the reasons behind this preference. Focus groups can also help check for social desirability bias, which is the tendency among survey respondents to answer what they think the enumerator wants to hear, rather than their actual opinions. For example, during the focus group discussion, one may discover that the actual preference of the participants is activity Z, not activity Y, as per their responses to the survey. However, focus groups provide less of an opportunity to generate detailed individual accounts on the topic or issue being explored. If this type of data is required, one should use interviews instead. If someone is answering too often, it is important to identify if this behaviour intimidates other participants and moderate the discussions inviting others to contribute. It is also important to understand who that person is, for instance, a political leader trying to impose answers to the group.

focus group discussions

Case study

A case study is a qualitative data collection method that is used to examine real-life situations and if the findings of the case can illustrate aspects of the intervention being monitored and/or evaluated. It is a comprehensive examination of cases to obtain in-depth information, with the goal of understanding the operational dynamics, activities, outputs, outcomes and interactions of an intervention.

Case studies involve a detailed contextual analysis of a limited number of events or conditions and their relationships. It provides the basis for the application of ideas and extension of methods. Data collected using a case study can help understand a complex issue or object and add strength to what is already known.

A case study is useful to explore the factors that contribute to outputs and outcomes. However, this method of data collection may require considerable time and resources, and information obtained from case studies can be complex to analyse and extrapolate.

For further information on case studies and how to conduct them, please see the following:

Observation

Observation is a research technique that M&E practitioners can use to better understand participants’ behaviour and the physical setting in which a project, programme or policy is being implemented. To observe means to watch individuals and their environments and notice their behaviours and interactions by using all five senses: seeing, touching, tasting, hearing and smelling.

Observations should be used on the following:

Observations can be conducted in a structured, semi-structured or unstructured approach.

Looking for a specific behaviour, object or event

Looking for a specific behaviour, object or event, how they appear or are done, and what other specific issues may exist

Collect information about the extent to which particular behaviours or events occur, with information about the frequency, intensity and duration of the behaviours

For more information, tips on and examples of observations, as well as planning and conducting observations, see Annex 4.11. Examples of observations and planning and conducting observations.

Additional methods for data collection for monitoring and evaluation

Brainstorming means to gain many ideas quickly from a group without delving into a deeper and more detailed discussion. It encourages critical and creative thinking, rather than simply generating a list of options, answers or interests. From an M&E perspective, this method is often a first step in a discussion that is followed by other methods.

Drama and role plays are used to encourage groups of people to enact scenes from their lives concerning perceptions, issues and problems that have emerged relating to a project intervention, which can then be discussed. Drama can also help a group to identify what indicators would be useful for monitoring or evaluation and identify changes emerging from a project intervention.

INFORMATION - Methods for impact evaluations

Impact evaluations aim to identify a proper counterfactual and whether impact can be confidently attributed to an intervention18 .Specifically, this may be done by assessing the situation of the beneficiaries “before and after” and “with or without” the intervention. By comparing the before and after and/or with or without scenarios, any differences/changes observed can be attributed to the intervention, with some reservations as it is not always straightforward and attribution may be more complex to assess than by answering the above scenarios.

A common first step in impact evaluation is to determine the sample size and sampling strategy to select a representative sample from both the treatment group (participating in the intervention) and comparison group (not participating in the intervention). The calculation of a robust and representative sample depends on various factors.

While there is a range of impact evaluation designs, there is also a range of methods that are applicable within these designs.19 To answer the specific evaluation questions, methods are flexible and can be used in different combinations within impact evaluation designs. Experimental, quasi-experimental and non-experimental are three types of impact evaluation design.

Experimental methods

Experimental methods, also called randomized control trials, use randomization techniques at the outset of the intervention to sample both intervention and comparison groups.20 While there are different methods to randomize a population, a general requirement is that the two groups remain as similar as possible in terms of socioeconomic characteristics and that their size should be broadly equivalent. Ensuring these makes them comparable and maximizes the statistical degree of precision of the impact on the target group.21

Given the rigourous approach to selecting treatment and control groups, as well as the frequency of primary data collection for generating the required data sets, experimental methods are considered the most robust for assessing and attributing impact to an intervention. However, they have cost and time implications, and might raise ethical considerations (given the purposive exclusion of a group of people from project benefits) that need to be dealt with upfront. Methods of fairly selecting participants include using a lottery, phasing in an intervention and rotating participants through the intervention to ensure that everyone benefits.

Quasi-experimental methods

Quasi-experimental designs identify a comparison group that is as similar as possible to the intervention group in terms of pre-intervention characteristics; with the key difference that quasi-experimental design lacks random assignment.22 The main quasi-experimental approaches are pre-post, simple difference, double difference (difference-in-differences), multivariate regression, propensity score matching and regression discontinuity design (see Table 4.10 for definitions).23

Non-experimental methods

In non-experimental methods used in ex-post-impact evaluations, the participants as well as the comparison groups are not selected randomly prior to the intervention, but the comparison group is reconstructed ex post, that is, at the time of the evaluation. To determine ex-post changes that may have occurred as a result of the intervention, impact evaluations using non-experimental methods conduct at least two complimentary analyses: “before and after” and “with or without”.

Non-experimental methods are often considered if the decision to do an impact evaluation is taken after the intervention has taken place.24

A variety of methods are used in non-experimental design to ensure that they are as similar as possible and to minimize selection bias. This can include (propensity) score matching, regression discontinuity design, difference-in-differences and instrumental variables. 25 A description of the different techniques are found in the following table.

Table 4.11. Quasi and non-experimental methods

Methodology Description Who is in the
comparison
group?
Required
assumptions
Required
data
Pre-post Measure how programme participants improved (or changed) over time. Programme participants themselves – before participating in the programme. The programme was the only factor influencing any changes in the measured outcome over time. Before and after data for programme participants.
Simple
difference
Measure difference between programme participants and non-participants after the programme is completed. Individuals who didn’t participate in the programme (for any reason), but for whom data were collected after the programme. Non-participants are identical to participants except for programme participation, and were equally likely to enter the programme before it started. “After” data of the beforeand- after scenario for programme participants and nonparticipants.
Difference-in-differences Measure improvement (change) over time of programme participants relative to the improvement (change) of non-participants. ndividuals who didn’t participate in the programme (for any reason), but for whom data were collected both before and after the programme. If the programme didn’t exist, the two groups would have had identical trajectories over this period. Before and after data for both participants and non-participants.
Multivariate regression Individuals who received treatment are compared with those who did not, and other factors that might explain differences in the outcomes are “controlled” for. Individuals who didn’t participate in the programme (for any reason), but for whom data were collected both before and after the programme. In this case, data is not comprised of just indicators of outcomes, but other “explanatory” variables as well. The factors that were excluded (because they are unobservable and/ or have been not been measured) do not bias results because they are either uncorrelated with the outcome or do not differ between participants and nonparticipants Outcomes as well as “control variables” for both participants and non-participants.
Statistical matching Individuals in control group are compared to similar individuals in experimental group. Exact matching: For each participant, at least one nonparticipant who is identical on selected characteristics Propensity score matching: Nonparticipants who have a mix of characteristics, which predict that they would be as likely to participate as participants. The factors that were excluded (because they are unobservable and/ or have been not been measured) do not bias results, because they are either uncorrelated with the outcome or do not differ between participants and non-participants Outcomes, as well as “variables for matching” for both participants and non-participants.
Regression discontinuity design Individuals are ranked based on specific, measurable criteria. There is some cutoff that determines whether an individual is eligible to participate. Participants are then compared to non-participants and the eligibility criterion is controlled for Individuals who are close to the cut-off, but fall on the “wrong” side of that cut-off, and therefore do not get the programme. After controlling for the criteria (and other measures of choice), the remaining differences between individuals directly below and directly above the cut-off score are not statistically significant and will not bias the results. A necessary but sufficient requirement for this to hold is that the cut-off criteria are strictly adhered to. Outcomes, as well as measures on criteria (and any other controls).
Instrumental variables Participation can be predicted by an incidental (almost random) factor, or “instrumental” variable, that is uncorrelated with the outcome, other than the fact that it predicts participation (and participation affects the outcome). ndividuals who, because of this close to random factor, are predicted not to participate and (possibly as a result) did not participate. If it weren’t for the instrumental variable’s ability to predict participation, this “instrument” would otherwise have no effect on or be uncorrelated with the outcome. Outcomes, the “instrument,” and other control variables.
Randomized evaluation Experimental method for measuring a causal relationship between two variables. Participants are randomly assigned to the control groups. Randomization “worked.” That is, the two groups are statistically identical (on observed and unobserved factors). Outcome data for control and experimental groups; control variables can help absorb variance and improve “power”.

Gertler, P.J., S. Martinez, P. Premand, L.B. Rawlings and C.M.J. Vermeersch

International Fund for Agricultural Development ( IFAD )

Leeuw, F. and J. Vaessen

United Nations Evaluation Group ( UNEG )

US Department of Health and Human Services, Centers for Disease Control and Prevention ( CDC )

White, H. and D. Raitzer

White, H. and S. Sabarwal

White, H., S. Sabarwal and T. de Hoop

Collecting and managing data Data collection

Once the M&E design has been identified and the method(s) and tools have been developed, the data collection can start. It is also recommended to organize a training with the data collection team(s) on the methodology. The training should cover in detail each data collection tool that will be used and include practical exercises of how to implement them.

Developing a data collection guide with clear instructions for the enumerators is a useful reference tool, both during the training and after, for the actual data collection; see the example provided below for an excerpt from a survey included in a data collection guide. Taking these steps will ensure that the collected data will be accurate with a minimum amount of error. In certain cases, however, conducting a full training is not feasible due to time and resource constraints, and having a data collection guide can be an important reference.

EXAMPLE - Excerpt from a data collection guide

Section 1: Economic situation

This section looks at the economic/financial situation of the respondent.

Objective: To find out whether or not the respondent has a regular supply of money. Possible sources of income include employment, small business and participation in a credit and savings group

Instructions: First read out the question and response options and then circle the respondent's answer (yes or no).

a) (If # 1 YES) What has been your average monthly income over the past six months?

Objective: To find out how regularly the respondent is receiving financial support from a third party which can be a person or an organization.

Instructions: First read out the question and response options and then circle the respondent's answer (one of the four options listed beside the question).

Each data collection team should have a supervisor who can oversee the data collection and check for any errors. During the data collection, it is imperative that the supervisor of the data collection team regularly checks for the following:

Doing these checks will help reduce the amount of error in the data collected.

Data entry

The data collected needs then to be transferred onto a computer application, such as Microsoft Word or Excel. Having the data in an electronic format will facilitate the data clean-up and data analysis. For quantitative data, the first step in data entry is to create the data file(s) to achieve a smooth transfer between a spreadsheet and a statistical programme package, such as SPSS and Stata for conducting statistical analyses.

Table 4.12. Wide format data file example
ID Age Income 2015 Income 2016 Income 2017
1 067 43 30 000 30 000 32 000
2 135 37 28 000 31 000 30 000
Table 4.13. Long format data file example
ID Age Income Year
1 067 43 30 000 2015
2 067 43 30 000 2016
3 067 43 32 000 2017
4 135 37 28 000 2015
5 135 37 31 000 2016
6 135 37 30 000 2017

For qualitative data, the first step in the data entry process is transferring all the interview, focus group and observation notes to a Word document for conducting content analysis using qualitative data programme packages, such as NVivo or MAXQDA.

Another component of the data entry is assigning each subject (or unit of analysis) a unique identifier (ID) (for example: 01, 02, 03 and so on), unless this is done directly during the data collection process. To do this, a separate file should be created that matches the identifying information for each subject (unit of analysis) with their unique ID. Assigning a unique identifier to each respondent ensures that the data cannot be traced back to them if the data is disclosed to other parties.

Data clean-up

Once the data has been transferred from the medium used to record the information to a computer application (Word or Excel), it needs to be screened for errors. Following this, any errors need to be diagnosed and treated.

Data errors can occur at different stages of the design, implementation and analysis of data (see Figure 4.8):

Figure 4.8 Sources of error

Once the suspect data has been identified, the next step is to review all the respondent’s answers to determine if the data makes sense given the context in which it was collected. Following this review, there are several possible diagnoses for each suspect data point identified:

Once the problematic observations have been identified, these need to be treated before the data can be analysed. The following are some of the key approaches to dealing with data errors:

Missing data

Missing values require attention because they cannot be simply ignored. The first step is to decide which blank cells need to be filled with zeros (because they represent negative observation; for example “no”, “not present” and “option not taken”) and which to leave blank (if using blanks to indicate missing or not applicable). Blank cells can also be replaced with missing value codes; for example, 96 (I don’t know), 97 (refused to answer), 98 (skip question/not applicable) and 99 (blank/missing).

If the proportion of missing or incomplete cases is substantial for a category of cases, this will be a major M&E concern. Once a set of data is known to be missing, it is important to determine whether the missing data are random or whether they vary in a systematic fashion, and also the extent to which the problem exists. Random missing values may occur because the subject inadvertently did not answer some questions. The assessment may be overly complex and/or long, or the enumerator may be tired and/or not paying attention, thereby missing the question. Random missing values may also occur through data entry mistakes. If there are only a small number of missing values in the data set (typically, less than 5%), then it is extremely likely to be random. Non-random missing values may occur because the key informant purposefully did not answer some questions (confusing or sensitive question, no appropriate choices such as “no opinion” or “not applicable”).

The default option for handling missing data is filtering and excluding them from the analysis:

(a) Listwise/casewise deletion: Cases that have missing values on the variable(s) under analysis are excluded. If only analysing one variable, then listwise deletion is simply analysing the existing data. If analysing multiple variables, then listwise deletion removes cases if there is a missing value on any of the variables. The disadvantage is a loss of data, because all data from cases who may have answered some of the questions, but not others (such as the missing data), are removed.

(b) Pairwise deletion: All available data is included. Unlike listwise deletion, which removes cases (subjects) that have missing values on any of the variables under analysis, pairwise deletion only removes the specific missing values from the analysis (not the entire case). In other words, all available data is included. If conducting a correlation on multiple variables, this technique allows to conduct the bivariate correlation between all available data points and ignore only those missing values if they exist on some variables. In this case, pairwise deletion will result in different sample sizes for each correlation. Pairwise deletion is useful when the sample size is small or missing values are large, because there are not many values to begin with, so why omit even more with listwise deletion.

Note: Deletion means exclusion within a statistical procedure, not deletion (of variables or cases) from the data set.

(c) Deletion of all cases with missing values: Only those cases with complete data are retained. This approach reduces the sample size of the data, resulting in a loss of power and increased error in estimation (wider confidence intervals). While this may not be a problem for large data sets, it is a big disadvantage for small ones. Results may also be biased if subjects with missing values are different from the subjects without missing values (that is, non-random) resulting in a non-representative sample.

(d) Imputation (replace the missing values): All cases are preserved by replacing the missing data with a probable value based on other available information (such as the mean or median of the observations for the variable for which the value is missing). Once all missing values have been imputed, the data set can then be analysed using standard techniques for complete data. More sophisticated imputation methods, involving equations that attempt to predict the values of the missing data based on a number of variables for which data are available, exist. Each imputation method can result in biased estimates. Detailing the technicalities, appropriateness and validity of each technique goes beyond the scope of this document. Ultimately, choosing the right technique depends on the following: (i) how much data are missing (and why); (ii) patterns, randomness and distribution of missing values; and (iii) effects of the missing data and how the data will be used in the analysis. It is strongly recommended to refer to a statistician if M&E practitioners are faced with a small data set with large quantities of missing values.

In practice, for M&E purposes with few statistical resources, creating a copy of the variable and replacing missing values with the mean or median may often be enough and preferable to losing cases through deletion methods.