Blogs Extension Research

BLOG – 63: Sample Size for Extension Research – Part 1. Quantitative Studies

Non-availability of sound guidelines for sample size estimation is the primary factor affecting the quality of extension research in the country. In this blog, Dr P Sethuraman Sivakumar presents guidelines for choosing adequate sample for extension research. 

CONTEXT 

Sample size is of primary importance for any applied scientific research as it directly influences the validity and generalizability of the research findings. . In extension science, empirical research is expected to yield sound extension tools and techniques to help the field functionaries effectively implement extension programmes. However, the empirical extension research is often conducted with smaller samples, which is confined to a specific geographical or demographical population (Sivakumar and Sulaiman, 2015). Social science studies conducted with inadequate sample sizes are vulnerable to inconsistencies. Such studies are likely to produce contradictory findings when conducted on the same research problem on an identical population (Johnson and Lauren, 2013). Though there are many factors responsible for the small sample extension research, the non-availability of sound guidelines for sample size estimation is the primary factor affecting the quality of extension research in the country. The purpose of this blog is to describe the sample size estimation process and provide guidelines for choosing adequate sample for both the quantitative and qualitative studies in extension research.

SAMPLING STRATEGY 

The strategy is the plan devised by the researcher to ensure sample chosen for the research work represents the selected population. Choosing an appropriate sampling strategy is a key aspect of the research design. Robinson (2014) proposed a four-point sampling process for systematically selecting adequate samples for obtaining quality results.

  1. Define a sample universe: Establish a sample universe, specifically by way of a set of inclusion and/or exclusion Inclusion criteria specifies the attribute(s) that respondents must possess to qualify for the study and the exclusion criteria stipulate attributes that disqualify a case from the study. For example, in a research investigation focusing on the “Information source utilisation of Ber growers”, the inclusion criteria is “Ber grower (Current/ past specified in years)”, while the exclusion criteria is “growers of other crops”. During the selection, the homogeneity of the samples i.e., demographic (e.g. youth), geographical (e.g. Maharashtra or Tamil Nadu), physical (e.g. female workers), psychological (progressive farmers) and life history (e.g., migrant workers) should be considered.
  1. Deciding on sample size: The size of a sample used for a quantitative or qualitative extension research is influenced by both the theoretical and practical considerations. The theoretical considerations for quantitative studies include the nature of problem, the population size and the type of analytical strategies used; while qualitative investigations focus on the saturation and redundancy of the data collection methods (Robinson, 2014). The practical aspects include the time and resource availability, researcher capability and purpose of research work (e.g., for dissertations or sponsored research).
  1. Selecting a Sample Strategy: The popular sampling methods in quantitative research are probabilistic and non-probabilistic sampling, while qualitative research uses random/convenience sampling and purposive sampling strategies. After deciding on the sampling strategy, the respondents required for each sample category (e.g., strata) is decided from the overall sample
  1. Sourcing sample: When the sample universe, size and strategy are decided, the researcher needs to recruit the participants from the real world. Voluntary participation, recruiting students from the subject pools, advertising in social and print media for recruiting community members, online surveys with jackpot provisions are few ways of recruiting participants for research work. In this phase, the researcher should follow ethical guidelines (if suggested by the ethics committee) in advertising, selection and handling participants, confidentiality of research data, compensating participants for their time and effort, However, the extension research in India is conducted without following any ethical practices as suggested by various “Human Subject Research” regulatory agencies. The ignorance and non-compliance with International ethical guidelines poses serious problems when the research outcomes are published in peer-reviewed international journals.

SAMPLE SIZE ESTIMATION FOR QUANTITATIVE EXTENSION RESEARCH 

In the quantitative extension research, the samples are drawn through either probabilistic or non-probabilistic sampling techniques and stratified random sampling is widely used by the researchers. Though the sampling methods specify few guidelines on the number of samples to be selected, the sample size is dependent on various other factors like type of study, nature and size of the population and choice of statistical analytical methods for the study. Other factors which help in deciding the sample size include the following:

  • confidence level at which the results are interpreted,
  • acceptable levels of sampling errors and precision of the results expected,
  • effect sizes required,
  • variance and standard deviations of the primary variables reported by the past

In case of self-report methods, the expected response rates also influence the sample size since poor response rates are likely to reduce the sample numbers required and affect validity of the research.

The following are the factors to be considered while selecting the sample size for a quantitative study:

(1)    Type of research investigation and test population:

The type of research investigation whether descriptive and observational or experimental, determines the number of samples required for the work. The descriptive studies employ minimal statistical estimation procedures like proportions and Chi-square tests, and sample size estimation procedures are described in the following sections. For experimental studies involving human subjects (e.g., knowledge gain from a multimedia instruction), the sample size depends on the design – replication, randomisation and stratification. The test population size also plays a crucial role in sample size estimation and the quantitative methods often require samples representing a maximum of 5% of the total population (Henry, 1990). The study population size can also be derived from past studies and secondary data sources (e.g. agricultural census). If the population size is unknown, the sample size can be estimated using the modified procedures as described in Box 1.

Table 1. Necessary Sample Size to Detect a Given Effect Size for Simple Linear Regression, ANOVA (t-test), and χ2 Analyses ( = 0.05 and  = 0.20).

Multiple regression

ANOVA and t test

Correlation coefficient (r)

Reqd. Sample size (N) Eta (ƞ)

Reqd. Sample size (N)

0.10

782 0.10 396

0.15

346 0.15

176

0.20

193 0.20

99

0.25

123 0.25

64

0.30

84 0.30

44

0.35

61 0.35

33

0.40

46 0.40

25

0.45

36 0.45

20

0.50

29 0.50

16

0.55 23 0.55

14

0.60 19 0.60

11

0.65

16 0.65

10

0.70

13 0.70

9

0.75 11 0.75

8

(Source: Gatsonis and Sampson, 1989)

2. Primary variable(s) of measurement:

A research investigation may use a variety of dependent and independent For estimating the sample size, the researcher should decide the primary variables (dependent and few significant independent variables) to be included in the study. After deciding on the primary variables, the sample sizes are estimated separately for each primary variable or combinations using the formulae given in the Box 1. For example, if a researcher wishes to conduct a study on the factors influencing adoption of IPM for tomato crop, he/she should review the past studies to know the primary independent variables influenced adoption (e.g., gender, educational status, scientific orientation etc). Using the estimates of those variables (e.g., educational qualification correlation coefficient with adoption), the researcher can decide on the sample size using Table 1. After estimating the sample size for all primary independent variables individually, the researcher must choose the largest estimated sample size for the investigation.

3. Acceptable Margin of Error – confidence intervals and confidence levels:

The margin of error is the error the researcher is willing to accept in the study. The margin of error depends on the confidence interval, which is a measure of probability that a population parameter will fall between two set values. In any empirical research, we are selecting samples to estimate few numerical values for describing or analysing certain attributes of the The confidence intervals provide a range of values which represent a population parameter (e.g. adoption level of a crop variety or animal breed in the full population of the farmers in the real world) and tell us this that these values are true with a probability level (eg., 90%, 95% or 99%).

These probability levels are called as confidence levels. In any descriptive or analytical study, the confidence intervals are presented along with the mean and standard deviation of a specific attribute or variable. The confidence interval provides a range of values around the mean (both + or – mean) which represent the value of marginal error. It is necessary to decide on the allowable margin of error prior to the survey for calculating the appropriate sample size. It is decided by scanning through the past research studies on the same topic and identifying the reported mean values of primary variables. For example, if a researcher wishes to conduct a study on “Effectiveness of the training programme” with “Knowledge gain” as the primary variable, he/she should find the knowledge gain mean values reported from the past studies and decide on the value to be used for sample size estimation. In social research, a maximum of 5 percentage points around the mean is used as marginal error (Krejcie and Morgan, 1970).

The confidence level indicates an alpha error value in hypothesis testing. The alpha (a) or type I error is a false-positive error of rejecting the null hypothesis that is actually true in the population, while the beta (b) or type II error indicates a false-negative error of failing to reject the false null hypothesis. Statistical power is probability of correctly rejecting the null hypothesis and is represented as 1 – b. During sample size estimation, we are trying to reduce the alpha error by selecting a lower significance level of either 0.05 (95%) or 0.01 (99%) of the test. While an alpha level of 0.05 (5% probability for error) is acceptable for most social research, 0.01 (1% probability for error) is preferred when critical decisions are taken using the research results. As indicated in the previous paragraph, the confidence intervals are always expressed with a specific confidence levels (alpha error). The b error is not as serious as a error, but it is of particular concern when interpreting the results of a negative study, without statistical significance (no statistical significance or there is small significance and the test is unable to detect it). Statistical power for any sample estimation is conventionally set at 0.80 i.e. b = 0.20.

4. Effect size:

The effect size represents the size of the association between variables or difference between treatments the researcher expects to be present in the If the researcher expects that his/her study to detect even a smaller association or difference between variables with precision, then he/she may need a larger sample size. For example, the knowledge gain from multimedia extension module can be detected precisely when the researcher tests the module with a large sample. In descriptive studies, the association or difference between the variables is reflected by the amplitude of the confidence interval calculated in the estimation. The effect sizes can be estimated from the reported values of association or effect from previous studies using Cohen’s D, odds ratio, correlation coefficient and eta square methods. In general, the effect size (Cohen’s d) of 0.2 to 0.3 is considered as “small”, around 0.5 a “medium” effect and 0.8 to infinity, a “large” effect (Cohen, 1988). As a thumb rule, the associations or differences between variables reported in the past studies with “small” effect, require a large sample size for further studies. Various online effect size calculators are available in the Psychometrica website (http://www.psychometrica.de/effect_size.html).

5. Variance or Standard Deviation:

When the variables analysed in the study are of a quantitative nature, their variability (variance or standard deviation) is considered for sample size estimation. Variance is a measurement of the spread between numbers or observations in a data set and is a square of standard deviation. The variance measures how far each number or observation in the data set is from the mean.

Cochran (1977) listed ways of estimating population variances or standard deviations for sample size estimations: (1) Select the sample in two steps, i.e. select the first sample and estimate the variance through pilot study and use the estimated value for the selection of sample size estimation for the main study; (2) use data from previous studies of the same or a similar population; or (3) estimate or guess the structure of the population assisted by some logical mathematical results. If the researcher finds difficulty in obtaining variance values from the previous study, he/ she can use an arbitrary value of 50% (Krejcie and Morgan, 1970).

In case of descriptive studies involving proportions, the researcher must specify the response distribution (labelled as p in the sample size formula) i.e., the expected proportion of the population that have the attribute the researcher is estimating from the survey. This proportion can be obtained from past studies, a pilot study or through other secondary sources. For example, if a researcher wishes to assess the gender differences in effectiveness of training on vegetable cultivation, he/she should review past studies to know the gender difference values (e.g., percentage of females who are satisfied with training). If this proportion is unknown, it should be arbitrarily set to 50% for use in the equation 1a. In case of descriptive studies involving means, the response distribution is replaced by variance or standard deviation (s2 in Equation 1b).

CALCULATION OF SAMPLE SIZE

The sample size estimation follows the various aspects discussed in the previous section. Considering the complexity of sample size estimation, a simple of way of deriving sample size based on the nature of the research investigation (pre-testing phase, descriptive and analytical or hypothesis testing) and type of statistical tests planned for the study.

  • Pre-Testing (of Research Instrument): The pre-testing of the research instruments is a key phase of any social research study. The main purpose of the pre-test is to verify that the target audience understands the questions and proposed response options are used as intended by the researcher, and the respondents are able to answer meaningfully (Perneger et , 2015). Identification of problems in the instrument —e.g., unclear question, unfamiliar word, ambiguous syntax, missing time-frame, lack of an appropriate answer— lead to a modification of the instrument. The sample size for the pre-test in extension research is often decided based on few flexible criteria, without following any rigorous procedures. Past studies indicated that a sample size of minimum 30 respondents to achieve a reasonable statistical power to detect problems in the instrument (Perneger et al., 2015).
  • Descriptive studies: Descriptive studies are conducted to explore and describe a test population or their attributes in a systematic way. These studies are designed to estimate population parameters from sample which do not involve testing hypotheses. The data generated through these studies are described by presenting frequencies, proportions and The sample size estimation procedures for descriptive studies proposed by Rodríguez del Águilaa, and González-Ramírezba (2014) are described in Box 1.

P. Sethuraman Sivakumar is Senior Scientist, Central Tuber Crops Research Institute (ICAR-CTCRI) Sreekariyam, Thiruvananthapuram – 695017, Kerala, India. Email: sethu_73@fulbrightmail.org

TO DOWNLOAD AS PDF CLICK HERE

 

5 Comments

Click here to post a comment

  • I congratulate Dr Sethuraman for his blog, which is of high standard and of academic brilliance. The determination of sample size in research is of critical concern, which is well covered in the blog To be frank, I have not fully understood some of the concepts (sourcing sample) and beg to differ with some of the ideas given ( I am of the opinion that sampling methods mostly determine the sample size). I also doubt as to how sample size can be determined based on choice of statistical analytical procedure? I feel researchers decide which analysis has to be done after collection and tabulation of data. Illustrative examples would have enhanced the readability of the blog My views are as below: – Sample size problems vary widely in their complexity The most important aspect is that it is context dependent. – Sample size is one aspect of study design. To help determine sample size, many questions have to be answered. Some of the questions are: – Objectives? – Response variable? How measured? – Estimate of non response rate – Sources of variation? -Whether normally distributed data can be ensured? Some other related points to be considered are: – desired width of a confidence interval – prior distribution in a Bayesian context – precision of estimation desired – pilot study – relevance of historical data? -effect size measures described by Cohen- how much dependable?

  • Sir Thank you for your critical and constructive comments. I will agree with you that sample size determination depends on several factors like sampling method, nature and size of the population and choice of statistical analytical methods. I believe that sampling methods like probabilistic or non-prob samples explain the ways eliminating BIAS (systematic error) and SAMPLING ERROR (random error) in selecting the samples, but not the adequacy of sample. Sampling adequacy or adequate sample is a statistical sample size that is large enough to provide required precision of the survey or test results by minimizing the effect of chance. In scientific research we are trying to explain a phenomenon by describing it, explaining relationship between its components and with other phenomenon with an objective to enhance our ability to replicate it in a similar context. Statistics is the major tool which aid us to perform this description and explanation in a objective way. In this work, the sample size estimation is viewed in terms of accuracy of end results with respect to population parameter, which is a essential part of generalization of research results. Our extension professionals view statistics as a SUFFIX component which emerges after collection of data. As I said earlier, statistics is a major tool which helps to describe and explain a phenomenon. In any scientific research, this description or explanation is a Research purpose which is expressed in objectives (e.g. to assess, explain, describe, predict etc). It is essential for the researcher to know how systematically assess or describe the phenomenon through appropriate analytical strategies (statistical methods). The research instrument development phase, we are dealing with reliability and validity, which are essentially analytical indicators (e.g. correlation, regression, factor analysis). In scientific research, we are dealing with statistics at every phase problem identification (e.g. research prioritization using statistics), objective formulation (knowledge of appropriate statistics for analysis to describe, explain, predict), sampling (sample size using statistics), research instrument development (reliability and validity using statistics), data coding (missing data, outliers etc) and data analysis.

  • I congratulate Dr.Sethuraman Sivakumar for coming out with a very useful blog on sample size in social research studies. Many a time, due to flaws in sample size and sampling plan which restrict the use of appropriate statistical tools fails to come out with quality research articles. Consequently, we cannot submit such articles ( poor quality ) to good journals and look for mediocre journals. It is high time that extension researchers must focus on quality research. In this context this blog is very comprehensive with good examples and certainly helps all those who are interested in good quality research.

  • Sample size selection is always a problem for extension researchers, especially for PG and PhD Scholars. Many are following the rules of thumb from unknown sources. Thanks for providing the guidelines. Also, it will be good to have list of statistical tools and techniques, which could be used for small samples

  • This blog is an attempt to put relevant guidelines in one place to help the budding scientists and students in selecting adequate samples. Sample size is one of the primary factor which determines generalisability. Many studies demonstrated that small sample research works produced inconsistencies when replicated in similar systems. In general, small samples are advisable only when the research is conducted in a distinct and unique system (e.g. geographically-inaccessible, high disaster prone areas) and the relationships with large effect size.