Why I Studied This Issue?

I did an apprentice project studying the reasons why students withdrew from their online courses. In this project, I got a dataset that had 35 variables indicating various withdrawal reasons. I wanted to use factor analysis to reduce the 35 variables to a few categories of withdrawal reasons. However, I only have 47 casese in the dataset. Many people suggested that the number of cases was too small for performing a factor analysis. But, I really do not want to waste the time and energy I had spent and just throw away this dataset. Yes! I want to "explain the most with the least" (Henson & Roberts, 2006, p. 393). (wink)

Thus, I deceided to find out what is the minimum sample size (i.e., the minimum number of cases - some researchers called it subjects) for performing factor analysis. Here is the related information I found.

The General Recommendations

There are two categories of general recommendations in terms of minimum sample size in factor analysis. One category says that the absolute number of cases (N) is important, while the another says that the subject-to-variable ratio (p) is important. Arrindell and van der Ende (1985), Velicer and Fava (1998), and MacCallum, Widaman, Zhang and Hong (1999) have reviewed many of these recommendations.

of sample size

of subjects-to-variables (STV)ratio

Statistical Research Findings on Minimum Sample Size

Little statistical research in the fields of Education and Behaviour Science has shed light on the issue of establishing a minimum desirable level of sample size (MacCallum, Widaman, Zhang & Hong, 1999). These studies used either artificial or empirical data to investigate the minimum sample size or STV ratio that is required in order to recover the population factor structure. In this section, I will summarize the minimum sample size and STV ratio that these studies had examined.

The Minimum Sample Size or STV Ratio Used in Practical Studies

Factors Related to Sample Size

Research has demonstrated that the general rule of thumb of the minimum sample size are not valid and useful (MacCallum, Widaman, Zhang, & Hong, 1999; Preacher & MacCallum, 2002). It is hard and simplicity to say whether absolute sample size is important or the STV ratio is important in factor analysis. The minimum level of N (sample size) was dependent on other aspects of design, such as:


Finally, with principal component analysis, I got 4 factors with 32 variables, representing a STV ratio of 1.48:1 (47/32). The overall KMO is .616, the minimum value of all communalities is .62, the maximum value of communalities is .879, the mean value of communalities is .770 with a standard deviation of .074. There is no cross loading among the 4 factors. Two of the 4 factors each have 5 loaded variables, one has 4 loaded variables, and one has 3 loaded variables. The variable-to-factor ratio is 8 (32/4). I think this can be considered as a moderate to high degree of overdetermination.

"As long as communalities are high, the number of expected factors is relatively small, and model error is low (a condition which often goes hand-in-hand with high communalities), researchers and reviewers should not be overly concerned about small sample sizes." (Preacher & MacCallum, 2002, p. 160)

"Strong data" in factor analysis means uniformly high communalities without cross loadings, plus several variables loading strongly on each factor. (Costello and Osborne, 2005, p. 4)