Why I Studied This Issue?
=20
I did an apprentice project studying the reasons why students withdrew f=
rom their online courses. In this project, I got a dataset that had 35 vari=
ables indicating various withdrawal reasons. I wanted to use factor analysi=
s to reduce the 35 variables to a few categories of withdrawal reasons. How=
ever, I only have 47 casese in the dataset. Many people suggested that the =
number of cases was too small for performing a factor analysis. But, I real=
ly do not want to waste the time and energy I had spent and just throw away=
this dataset. Yes! I want to "explain the most with the least" (=
Henson & Roberts, 2006, p. 393).
=20
Thus, I deceided to find out what is the minimum sample size (i.e., the =
minimum number of cases  some researchers called it subjects) for performi=
ng factor analysis. Here is the related information I found.
=20
T=
he General Recommendations
=20
There are two categories of general recommendations in terms of minimum =
sample size in factor analysis. One category says that the absolute number =
of cases (N) is important, while the another says that the subject=
tovariable ratio (p) is important. Arrindell and van der Ende (1=
985), Velicer and Fava (1998), and MacCallum, Widaman, Zhang and Hong (1999=
) have reviewed many of these recommendations.
=20
of sample size=
=20
=20
 Rule of 100: Gorsuch (1983) and Kline (1979, p. 40) recommanded at leas=
t 100 (MacCallum, Widaman, Zhang & Hong, 1999). No sample should be les=
s than 100 even though the number of variables is less than 20 (Gorsuch, 19=
74, p. 333; in Arrindell & van der Ende, 1985, p. 166);
=20
 Hatcher (1994) recommanded that the number of subjects should be the la=
rger of 5 times the number of variables, or 100. Even more subjects are nee=
ded when communalities are low and/or few variables load on each factor (in=
David Garson, 2008).
=20
 Rule of 150: Hutcheson and Sofroniou (1999) recommends at least 150  3=
00 cases, more toward the 150 end when there are a few highly correlated va=
riables, as would be the case when collapsing highly multicollinear variabl=
es (in David Garson, 2008).
=20
 Rule of 200. Guilford (1954, p. 533) suggested that N should b=
e at least 200 cases (in MacCallum, Widaman, Zhang & Hong, 1999, p84; i=
n Arrindell & van der Ende, 1985; p. 166).
=20
 Rule of 250. Cattell (1978) claimed the minimum desirable N to=
be 250 (in MacCallum, Widaman, Zhang & Hong, 1999, p84).
=20
 Rule of 300. There should be at least 300 cases (Noru?is, 2005: 400, in=
David Garson, 2008).
=20
 Significance rule. Lawley and Maxwell (1971) suggested 51 more cases th=
an the number of variables, to support chisquare testing (in David Garson,=
2008).
=20
 Rule of 500. Comrey and Lee (1992) thought that 100 =3D poor, 200 =3D f=
air, 300 =3D good, 500 =3D very good, 1,000 or more =3D excellent They urge=
d researchers to obtain samples of 500 or more observations whenever possib=
le (in MacCallum, Widaman, Zhang & Hong, 1999, p84).
=20
=20
of subjectstovariables (STV)ratio
=20
=20
 A ratio of 20:1. Hair, Anderson, Tatham, and Black (1995, in Hogarty, H=
ines, Kromrey, Ferron, & Mumford, 2005)
=20
 Rule of 10. There should be at least 10 cases for each item in the inst=
rument being used. (David Garson, 2008; Everitt, 1975; Everitt, 1975, Nunna=
lly, 1978, p. 276, in Arrindell & van der Ende, 1985, p. 166; Kunce, Co=
ok, & Miller, 1975, Marascuilor & Levin, 1983, in Velicer & Fav=
a, 1998, p. 232)
=20
 Rule of 5. The subjectstovariables ratio should be no lower than 5 (B=
ryant and Yarnold, 1995, in David Garson, 2008; Gorsuch, 1983, in MacCallum=
, Widaman, Zhang & Hong, 1999; Everitt, 1975, in Arrindell & van de=
r Ende, 1985; Gorsuch, 1974, in Arrindell & van der Ende, 1985, p. 166)=
=20
 A ratio of 3(:1) to 6(:1) of STV is acceptable if the lower limit of va=
riablestofactors ratio is 3 to 6. But, the absolute minimum sample size s=
hould not be less than 250.(Cattell, 1978, p. 508, in Arrindell & van d=
er Ende, 1985, p. 166)
=20
 Ratio of 2. "[T]here should be at least twice as many subjects as =
variables in factoranalytic investigations. This means that in any large s=
tudy on this account alone, one should have to use more than the minimum 10=
0 subjects" (Kline, 1979, p. 40).
=20
=20
Statistical Research Findings on Minimum Sample Size=20
Little statistical research in the fields of Education and Behaviour Sci=
ence has shed light on the issue of establishing a minimum desirable level =
of sample size (MacCallum, Widaman, Zhang & Hong, 1999). These studies =
used either artificial or empirical data to investigate the minimum sample =
size or STV ratio that is required in order to recover the population facto=
r structure. In this section, I will summarize the minimum sample size and =
STV ratio that these studies had examined.
=20
=20
 Barrett and Kline (1981, in MacCallum, Widaman, Zhang & Hong, 1999)=
used two large empirical data sets to investigate this issue. They drew su=
bsamples of various size from the original full samples and performed fact=
or analysis with each subsample to compare the results of subsamples with=
the result of full samples. They obtained good recovery:=20
=20
 from a subsample of N =3D 48 [1]<=
/span> for one data set that has 16 variables, which represents a STV ratio=
of 3.0;
=20
 and from a subsample of N =3D 112 for another data set that h=
as 90 variables, which STV ratio is 1.2.=20
=20
Icon=20
=20
[1] This number was reported as 50 &q=
uot;to be the minimum to yield a clear, recognizable factor pattern" (=
p. 167) in Arrindell and van der Ende's paper (1985).
=20
=20
=20
=20
=20
=20
 Arrindell and van der Ende (1985) used two large empirical data sets th=
at have 1104 cases and 960 cases respectively to examine the minimum sample=
sizes and STV ratios that can produce stable factor structure. By drawing =
subsamples from the two large data sets, the authors found that:=20
=20
 for the first data set, which had 76 variables, the minimum STV ratio (=
p) that required to produce clear, recognizable factor solution wa=
s 1.3 and the corresponding sample size (N) was 100;
=20
 for the second data set, which has 20 variables, the minimum STV ratio =
(p) was 3.9 and the corresponding sample size (N) was 78.=
=20
=20
=20
=20
 MacCallum, Widaman, Zhang & Hong (1999) conducted a Monte Carlo Stu=
dy on sample size effects. They obtained an excellent recovery (100% conver=
gence) of population factor structure with a sample size (N) of 60=
and 20 variables. However, this result was obtained only when the level of=
communality (over .7 in average) and overdetermination (3 loaded factors) =
were high (Table 1 on page 93).
=20
=20
=20
 Preacher & MacCallum (2002) conducted a Monte Carlo study. Their co=
nclusion is:=20
=20
 N had by far the largest effect on factor recovery, which exhi=
bited a sharp dropoff below N s of 20 or so. (p.157)
=20
=20
=20
The Minimum Sample Size or STV Ratio Used in Prac=
tical Studies
=20
=20
 Henson and Roberts (2006) reported a review of 60 exploratory factor an=
alysis in four journals: Educational and Psychological Measurement, Jou=
rnal of Educational Psychology, Personality and Individual Differences=
, and Psychological Assessment.=20
=20
 Minimum sample size reported: 42.
=20
 Minimum STV ratio reported: 3.25:1; 11.86% of reviewed studies used a r=
atio less than 5:1.
=20
=20
=20
=20
 Fabrigar, Wegener, MacCallum, and Strahan (1999) reported a review of a=
rticles that used EFA in two journals: Journal of Personality and Socia=
l Psychology (JPSP) and Journal of Applied Psychology (JAP).=
=20
=20
 Sample size: 30 (18.9%) articles in JPSP and 8 (13.8%) in JAP were 100 =
or less.
=20
 Ratio of variable to factors: 55 (24.6%) papers in JPSP and 20 (34.4%) =
in JAP were 4:1 or less.
=20
=20
=20
=20
 Costello and Osborne (2005) surveyed two year's PsychINFO articles that=
reported principal components or exploratory factor analysis.=20
=20

=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
STV ratio  % of studies  Cumulative % 

2:1 or less  14.7%  14.7& 
> 2:1, ? 5:1  25.8%  40.5% 
> 5:1, ? 10:1  22.7%  63.2% 
> 10:1, ? 20:1  15.4%  78.6% 
> 20:1, ? 100:1  18.4%  97.0% 
> 100:1  3.0%  100.0% 
=20
=20
=20
=20
 Ford, MacCallum, and Tait (1986) examined articles published in Jou=
rnal of Applied Psychology, Personnel Psychology, and Org=
anizational Behavior and Human Performance during the period of 1974 =
1984.=20
=20
 RTV ratio: 27.3% of the studies were less than 5:1, 56% were less than =
10:1.
=20
=20
=20
=20
Research has demonstrated that the general rule of thumb of the minimum =
sample size are not valid and useful (MacCallum, Widaman, Zhang, & Hong=
, 1999; Preacher & MacCallum, 2002). It is hard and simplicity to say w=
hether absolute sample size is important or the STV ratio is important in f=
actor analysis. The minimum level of N (sample size) was dependent=
on other aspects of design, such as:
=20
=20
 Communality of the variables=20
=20
 The communality measures the percent of variance in a given variable ex=
plained by all the factors jointly and may be interpreted as the reliabilit=
y of the indicator (Gason, 2008).
=20
 If communalities are high, recovery of popu=
lation factors in sample data is normally very good, almost regardle=
ss of sample size, level of overdetermination, or the presence of model err=
or (MacCallum, Widaman, Preacher, and Hong, 2001, p. 636)
=20
 MacCallum, Widaman, Zhang, and Hong (1999) suggested communalities shou=
ld all greater than .6, or the mean level of communality to be at least .7 (p. 96).
=20
 Item communalities are considered "high" if they are all .8 or greater  but this is unlikely to occu=
r in real data (Costello & Osborne, 2005, p. 4).
=20
=20
=20
=20
 Degree of overdetermination of the factor (or number of factors/number =
of variables)=20
=20
 Overdetermination is the factortovariable ratio (Preacher & MacCa=
llum, 2002).
=20
 Six or seven indicators per factor a=
nd a rather small number of factors is considered as high overdetermination=
of factors if many or all communalities are under .50 (MacCallum, Widaman,=
Zhang, & Hong, 1999).
=20
 A minimum of 3 variables per factor =
is critical. This confirms the theoretical results of T. W. Anderson and Ru=
bin (1956; also see McDonald & Krane, 1977, 1979, and Rindskopf, 1984).=
(Velicer, & Fava, 1998, p. 243).
=20
 At least four measured variables for=
each common factor and perhaps as many as six<=
/span> (Fabrigar, Wegener, MacCallum, & Strahan, 1999, p. 282)
=20
 A factor with fewer than three itmes is generally weak and unstable (Costello & Osborne, 2005, p. =
5)
=20
=20
=20
=20
 Size of loading=20
=20
 Item loading magnitude accounted for significant unique variance in the=
expected direction in all but one case, and in most cases was the stronges=
t unique predictor of congruence between sample and population (Osborne, &a=
mp; Costello, 2004).
=20
 The sampletopopulation pattern fit was ve=
ry good for the high (.80) loading condition, moderate for the middle (.60)=
loading condition, and very poor (.40) for the low loading conditio=
n (Velicer & Fava, 1998).
=20
 5 or more strongly loading items (.50 or be=
tter) are desirable and indicate a solid factor (Costello & Osbo=
rne, 2005, p. 5).
=20
 If components possess four or more variables with loadings above .60, the pattern may be interpreted whatever the sample size used . Similarly, a patte=
rn composed of many variables per component (10=
to 12) but low loadings (=3D .40) should be an accurate solution at=
all but the lowest sample sizes (N < 150). If a solution possesses components with =
only a few variables per component and low component loadings, the p=
attern should not be interpreted unless a sampl=
e size of 300 or more observations has been used. (Guadagnoli & =
Velicer, 1988, p. 274)
=20
=20
=20
=20
 Model fit (f)=20
=20
 It is defined in terms of the population root mean squared residual (RM=
SR) (Preacher & MacCallum, 2002).
=20
 RMSR =3D .00, .03, .06, respectively correspond to perfect, good, and f=
air model fit in the population (Preacher & MacCallum, 2002).
=20
 Lack of fit of the model in the population =
will not, on the average, influence recovery of population factors i=
n analysis of sample data, regardless of degree of model error and regardle=
ss of sample size (MacCallum, Widaman, Preacher, & Hong, 2001, p. 611).=
=20
 Model fit has little effect on factor recov=
ery. It is probably very rare in practice to find factor models exhi=
biting simultaneously high communalities and poor fit (Preacher & MacCa=
llum, 2002, p. 157).
=20
 the differences between (extraction) methods with respect to ability to=
reproduce the population pattern were generally minor (Velicer & Fava,=
1998, p. 243)
=20
=20
=20
Conclusion
=
=20
=20
 The general rule of thumb of the minimum sample size are not valid and =
useful.
=20
 What I did with the data I have:=20
=20
 Repeat the method Garson (http://www2.=
chass.ncsu.edu/garson/pa765/factor.htm#kmo) proposed until the KMO over=
all is over .60.
=20
 Check the communality of each variable. Drop the variables that has the=
smallest communality, until the communalities of all variables are above .=
60.
=20
 Check the mean value of all communalities to ensure that the mean value=
is over .07. If not, repeat step 2.
=20
 Use Kaiser strategy (dropping all components with eigenvalues under 1.0=
) and Scree plot to determine the number of factors.
=20
 Set the loading size cutoff value as .60, and drop the factors that ha=
s less than 3 variables.
=20
=20
=20
Finally, with principal component analysis, I got 4 factors with 32 vari=
ables, representing a STV ratio of 1.48:1 (47/32). The overall KMO is .616,=
the minimum value of all communalities is .62, the maximum value of commun=
alities is .879, the mean value of communalities is .770 with a standard de=
viation of .074. There is no cross loading among the 4 factors. Two of the =
4 factors each have 5 loaded variables, one has 4 loaded variables, and one=
has 3 loaded variables. The variabletofactor ratio is 8 (32/4). I think =
this can be considered as a moderate to high degree of overdetermination. <=
/p>=20
=20
Icon=20
=20
"As long as communalities are high, the number of expected factors =
is relatively small, and model error is low (a condition which often goes h=
andinhand with high communalities), researchers and reviewers should not =
be overly concerned about small sample sizes." (Preacher & MacCall=
um, 2002, p. 160)
=20
=20
=20
=20
Icon=20
=20
"Strong data" in factor analysis means uniformly high communal=
ities without cross loadings, plus several variables loading strongly on ea=
ch factor. (Costello and Osborne, 2005, p. 4)
=20
=20
=20
=
=
=20
References
=
=20
=20
 Anderson, T. W., & Rubin, H. (1956). Statistical inference in facto=
r analysis. In J. Neyman (Ed.), Proceedings of the Third Berkeley Sympo=
sium on Mathematical Statistics and Probability (pp. 111150). Berkele=
y: University of California Press.
=20
=20
=20
 Arrindell, W. A., & van der Ende. J. (1985). An empirical test of t=
he utility of the observationstovariables ratio in factor and components =
analysis. Applied Psychological Measurement, 9, 165  178.
=20
=20
=20
 Barrett, P. T., & Kline. P. (1981). The observation to variable rat=
io in factor analysis. Personality Study in Group Behavior, 1, 23=
33.
=20
=20
=20
 Bryant, F. B., & Yarnold, P. R. (1995). Principal components analys=
is and exploratory and confirmatory factor analysis. In L. G. Grimm & R=
R. Yarnold (Eds.), Reading and understanding multivariale statistics (pp. 99136). Washington, DC: American Psychological Association.
=
=20
=20
=20
 Cattell, R. B. (1978). The Scientific Use of Factor Analysis. =
New York: Plenum
=20
=20
=20
 Comrey, A. L., & Lee, H. B. (1992). A first Course in Factor An=
alysis. Hillsdale, NJ: Erlbaum.
=20
=20
=20
 Costello, A. B., & Osborne, J. W. (2005). Best practices in explora=
tory factor analysis: Four recommendations for getting the most from your a=
nalysis. Practical Assessment Research & Evaluation, 10(7). Re=
trieved July 3, 2008 from http://pareonline.net/pdf/v10n7a.p=
df.
=20
=20
=20
 Everitt, 1:1. S. (1975). Multivariate analysis: The need for data, and =
other problems. British Journal of Psychiatry. 126, 2S7240.
=
=20
=20
=20
 Fabrigar, L. R., Wegener, D. T., MacCallum, R. C., & Strahan, E. J.=
(1999). Evaluating the use of exploratory factor analysis in psychological=
research. Psychological Methods, 4, 272299.
=20
=20
=20
 Ford, J. K., MacCallum, R. C., & Tait, M. (1986). The application o=
f exploratory factor analysis in applied psychology: A critical review and =
analysis. Personnel Psychology, 39, 291314.
=20
=20
=20
=20
 Gorsuch, R. L. (1983). Factor analysis (2nd ed.). Hillsdale,NJ=
: Erlbaum.
=20
=20
=20
 Guadagnoli, E., & velicer, W. F. (1988). Relation of sample size to=
the stability of component patterns. Psychological bulletin, 103,=
265275.
=20
=20
=20
 Guilford, J. P. (1954). Psychometric methods (2nd ed.). New Yo=
rk: McGrawHill.
=20
=20
=20
 Hair, J. F. J., Anderson, R. E., Tatham, R. L., & Black,W. C. (1995=
). Multivariate data analysis (4th ed.). Saddle River, NJ: Prentice Hall.=20
=20
=20
 Hatcher, L. (1994). A StepbyStep Approach to Using the SAS® S=
ystem for Factor Analysis and Structural Equation Modeling. Cary, NC: =
SAS Institute, Inc.
=20
=20
=20
 Hogarty, K. Y., Hines, C. V., Kromrey, J. D., Ferron, J. M., & Mumf=
ord K. R. (2005). The quality of factor solutions in exploratory factor ana=
lysis: The influence of sample size, communality, and overdetermination. Educational and Psychological Measurement, 65, 202226.
=20
=20
=20
 Henson, R. K., & Roberts, J. K. (2006). Use of exploratory factor a=
nalysis in published research: Common errors and some comment on improved p=
ractice. Educational and Psychological Measurement, 66, 393416.=20
=20
=20
 Hutcheson, G., & Sofroniou, N. (1999). The multivariate social =
scientist: Introductory statistics using generalized linear models. Th=
ousand Oaks, CA: Sage Publications.
=20
=20
=20
 Kline, P. (1979). Psychometrics and psychology. London: Acader=
ric Press.
=20
=20
=20
 Kunce, J. T., Cook, W. D., & Miller, D. E. (1975). Random variables=
and correlational overkill. Educational and Psychological Measurement,=
35, 529534.
=20
=20
=20
 Lawley, D. N., & Maxwell, A. E. (1971). Factor analysis as a st=
atistical method. London: Butterworth and Co.
=20
=20
=20
 McDonald, R. P., & Krane, W. R. (1977). A note on local identifiabi=
lity and degrees of freedom in the asymptotic likelihood ratio test. Br=
itish Journal ofMathematical and Statistical Psychology, 30, 198203.<=
/li>=20
=20
=20
 McDonald, R. P., & Krane, W. R (1979). A Monte Carlo study of local=
identifiability and degrees of freedom in the asymptotic likelihood ratio =
test. British Journal of Mathematical and Statistical Psychology, 32, 121132.
=20
=20
=20
 Marascuilo, 1.. A., & Levin, J. R (1983). Multivariate statisti=
cs in the social sciences. Monterey, CA: Brooks/Cole.
=20
=20
=20
 MacCallum, R. C., Widaman, K. F., Preacher, K. J., & Hong S. (2001)=
. Sample size in factor analysis: The role of model error. Multivariate=
Behavioral Research, 36, 611637.
=20
=20
=20
 MacCallum, R. C., Widaman, K. F., Zhang, S., & Hong S. (1999). Samp=
le size in factor analysis. Psychological Methods, 4, 8499.
=
=20
=20
=20
 Noru?is, M. J. (2005). SPSS 13.0 Statistical Procedures Companion. Chicago: SPSS, Inc.
=20
=20
=20
 Nunnally, J. C. (1978). Psychometric theory (2nd Ed.). New Yor=
k: McGrawHill.
=20
=20
=20
 Osborne, J. W., & Costello, A. B. (2004). Sample size and subject t=
o item ratio in principal components analysis. Practical Assessment, Re=
search & Evaluation, 9(11). Retrieved July 1, 2008 from http://PAREonline.net/getvn.asp?v=3D9&n=3D11.
=
=20
=20
=20
 Preacher, K. J., & MacCallum, R. C. (2002). Exploratory Factor Anal=
ysis in Behavior Genetics Research: Factor Recovery with Small Sample Sizes=
. Behavior Genetics, 32, 153161.
=20
=20
=20
 Rindskopf, D. (1984). Structural equation models: Empirical identificat=
ion, Heywood cases, and related problems. Sociological Methods and Rese=
arch, 13, 109119.
=20
=20
=20
 Velicer, W. F., & Fava, J. L. (1998). Effects of variable and subje=
ct sampling on factor pattern recovery. Psychological Methods, 3, =
231251.
=20