Message-ID: <572836534.184.1563842201627.JavaMail.confluence@encore> Subject: Exported From Confluence MIME-Version: 1.0 Content-Type: multipart/related; boundary="----=_Part_183_2134484867.1563842201627" ------=_Part_183_2134484867.1563842201627 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Content-Location: file:///C:/exported.html The Minimum Sample Size in Factor Analysis

# The Minimum Sample Size in Factor Analysis

### Why I Studied This Issue?

=20

I did an apprentice project studying the reasons why students withdrew f= rom their online courses. In this project, I got a dataset that had 35 vari= ables indicating various withdrawal reasons. I wanted to use factor analysi= s to reduce the 35 variables to a few categories of withdrawal reasons. How= ever, I only have 47 casese in the dataset. Many people suggested that the = number of cases was too small for performing a factor analysis. But, I real= ly do not want to waste the time and energy I had spent and just throw away= this dataset. Yes! I want to "explain the most with the least" (= Henson & Roberts, 2006, p. 393).

=20

Thus, I deceided to find out what is the minimum sample size (i.e., the = minimum number of cases - some researchers called it subjects) for performi= ng factor analysis. Here is the related information I found.

=20

### T= he General Recommendations

=20

There are two categories of general recommendations in terms of minimum = sample size in factor analysis. One category says that the absolute number = of cases (N) is important, while the another says that the subject= -to-variable ratio (p) is important. Arrindell and van der Ende (1= 985), Velicer and Fava (1998), and MacCallum, Widaman, Zhang and Hong (1999= ) have reviewed many of these recommendations.

=20

#### of sample size=

=20
=20
• Rule of 100: Gorsuch (1983) and Kline (1979, p. 40) recommanded at leas= t 100 (MacCallum, Widaman, Zhang & Hong, 1999). No sample should be les= s than 100 even though the number of variables is less than 20 (Gorsuch, 19= 74, p. 333; in Arrindell & van der Ende, 1985, p. 166);
• =20
• Hatcher (1994) recommanded that the number of subjects should be the la= rger of 5 times the number of variables, or 100. Even more subjects are nee= ded when communalities are low and/or few variables load on each factor (in= David Garson, 2008).
• =20
• Rule of 150: Hutcheson and Sofroniou (1999) recommends at least 150 - 3= 00 cases, more toward the 150 end when there are a few highly correlated va= riables, as would be the case when collapsing highly multicollinear variabl= es (in David Garson, 2008).
• =20
• Rule of 200. Guilford (1954, p. 533) suggested that N should b= e at least 200 cases (in MacCallum, Widaman, Zhang & Hong, 1999, p84; i= n Arrindell & van der Ende, 1985; p. 166).
• =20
• Rule of 250. Cattell (1978) claimed the minimum desirable N to= be 250 (in MacCallum, Widaman, Zhang & Hong, 1999, p84).
• =20
• Rule of 300. There should be at least 300 cases (Noru?is, 2005: 400, in= David Garson, 2008).
• =20
• Significance rule. Lawley and Maxwell (1971) suggested 51 more cases th= an the number of variables, to support chi-square testing (in David Garson,= 2008).
• =20
• Rule of 500. Comrey and Lee (1992) thought that 100 =3D poor, 200 =3D f= air, 300 =3D good, 500 =3D very good, 1,000 or more =3D excellent They urge= d researchers to obtain samples of 500 or more observations whenever possib= le (in MacCallum, Widaman, Zhang & Hong, 1999, p84).
• =20
=20

#### of subjects-to-variables (STV)ratio

=20
=20
• A ratio of 20:1. Hair, Anderson, Tatham, and Black (1995, in Hogarty, H= ines, Kromrey, Ferron, & Mumford, 2005)
• =20
• Rule of 10. There should be at least 10 cases for each item in the inst= rument being used. (David Garson, 2008; Everitt, 1975; Everitt, 1975, Nunna= lly, 1978, p. 276, in Arrindell & van der Ende, 1985, p. 166; Kunce, Co= ok, & Miller, 1975, Marascuilor & Levin, 1983, in Velicer & Fav= a, 1998, p. 232)
• =20
• Rule of 5. The subjects-to-variables ratio should be no lower than 5 (B= ryant and Yarnold, 1995, in David Garson, 2008; Gorsuch, 1983, in MacCallum= , Widaman, Zhang & Hong, 1999; Everitt, 1975, in Arrindell & van de= r Ende, 1985; Gorsuch, 1974, in Arrindell & van der Ende, 1985, p. 166)=
• =20
• A ratio of 3(:1) to 6(:1) of STV is acceptable if the lower limit of va= riables-to-factors ratio is 3 to 6. But, the absolute minimum sample size s= hould not be less than 250.(Cattell, 1978, p. 508, in Arrindell & van d= er Ende, 1985, p. 166)
• =20
• Ratio of 2. "[T]here should be at least twice as many subjects as = variables in factor-analytic investigations. This means that in any large s= tudy on this account alone, one should have to use more than the minimum 10= 0 subjects" (Kline, 1979, p. 40).
• =20
=20

### The Minimum Sample Size or STV Ratio Used in Prac= tical Studies

=20
=20
• Henson and Roberts (2006) reported a review of 60 exploratory factor an= alysis in four journals: Educational and Psychological Measurement, Jou= rnal of Educational Psychology, Personality and Individual Differences= , and Psychological Assessment.=20
=20
• Minimum sample size reported: 42.
• =20
• Minimum STV ratio reported: 3.25:1; 11.86% of reviewed studies used a r= atio less than 5:1.
• =20
• =20
=20
=20
• Fabrigar, Wegener, MacCallum, and Strahan (1999) reported a review of a= rticles that used EFA in two journals: Journal of Personality and Socia= l Psychology (JPSP) and Journal of Applied Psychology (JAP).= =20
=20
• Sample size: 30 (18.9%) articles in JPSP and 8 (13.8%) in JAP were 100 = or less.
• =20
• Ratio of variable to factors: 55 (24.6%) papers in JPSP and 20 (34.4%) = in JAP were 4:1 or less.
• =20
• =20
=20
=20
• Costello and Osborne (2005) surveyed two year's PsychINFO articles that= reported principal components or exploratory factor analysis.=20
=20
• =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20

STV ratio

% of studies

Cumulative %

2:1 or less

14.7%

14.7&

> 2:1, ? 5:1

25.8%

40.5%

> 5:1, ? 10:1

22.7%

63.2%

> 10:1, ? 20:1

15.4%

78.6%

> 20:1, ? 100:1

18.4%

97.0%

> 100:1

3.0%

100.0%

• =20
• =20
=20
=20
• Ford, MacCallum, and Tait (1986) examined articles published in Jou= rnal of Applied Psychology, Personnel Psychology, and Org= anizational Behavior and Human Performance during the period of 1974 -= 1984.=20
=20
• RTV ratio: 27.3% of the studies were less than 5:1, 56% were less than = 10:1.
• =20
• =20
=20

### = Factors Related to Sample Size

=20

Research has demonstrated that the general rule of thumb of the minimum = sample size are not valid and useful (MacCallum, Widaman, Zhang, & Hong= , 1999; Preacher & MacCallum, 2002). It is hard and simplicity to say w= hether absolute sample size is important or the STV ratio is important in f= actor analysis. The minimum level of N (sample size) was dependent= on other aspects of design, such as:

=20
=20
• Communality of the variables=20
=20
• The communality measures the percent of variance in a given variable ex= plained by all the factors jointly and may be interpreted as the reliabilit= y of the indicator (Gason, 2008).
• =20
• If communalities are high, recovery of popu= lation factors in sample data is normally very good, almost regardle= ss of sample size, level of overdetermination, or the presence of model err= or (MacCallum, Widaman, Preacher, and Hong, 2001, p. 636)
• =20
• MacCallum, Widaman, Zhang, and Hong (1999) suggested communalities shou= ld all greater than .6, or the mean level of communality to be at least .7 (p. 96).
• =20
• Item communalities are considered "high" if they are all .8 or greater - but this is unlikely to occu= r in real data (Costello & Osborne, 2005, p. 4).
• =20
• =20
=20
=20
• Degree of overdetermination of the factor (or number of factors/number = of variables)=20
=20
• Overdetermination is the factor-to-variable ratio (Preacher & MacCa= llum, 2002).
• =20
• Six or seven indicators per factor a= nd a rather small number of factors is considered as high overdetermination= of factors if many or all communalities are under .50 (MacCallum, Widaman,= Zhang, & Hong, 1999).
• =20
• A minimum of 3 variables per factor = is critical. This confirms the theoretical results of T. W. Anderson and Ru= bin (1956; also see McDonald & Krane, 1977, 1979, and Rindskopf, 1984).= (Velicer, & Fava, 1998, p. 243).
• =20
• At least four measured variables for= each common factor and perhaps as many as six<= /span> (Fabrigar, Wegener, MacCallum, & Strahan, 1999, p. 282)
• =20
• A factor with fewer than three itmes is generally weak and unstable (Costello & Osborne, 2005, p. = 5)
• =20
• =20
=20
=20
=20
• Item loading magnitude accounted for significant unique variance in the= expected direction in all but one case, and in most cases was the stronges= t unique predictor of congruence between sample and population (Osborne, &a= mp; Costello, 2004).
• =20
• The sample-to-population pattern fit was ve= ry good for the high (.80) loading condition, moderate for the middle (.60)= loading condition, and very poor (.40) for the low loading conditio= n (Velicer & Fava, 1998).
• =20
• 5 or more strongly loading items (.50 or be= tter) are desirable and indicate a solid factor (Costello & Osbo= rne, 2005, p. 5).
• =20
• If components possess four or more variables with loadings above .60, the pattern may be interpreted whatever the sample size used . Similarly, a patte= rn composed of many variables per component (10= to 12) but low loadings (=3D .40) should be an accurate solution at= all but the lowest sample sizes (N < 150). If a solution possesses components with = only a few variables per component and low component loadings, the p= attern should not be interpreted unless a sampl= e size of 300 or more observations has been used. (Guadagnoli & = Velicer, 1988, p. 274)
• =20
• =20
=20
=20
• Model fit (f)=20
=20
• It is defined in terms of the population root mean squared residual (RM= SR) (Preacher & MacCallum, 2002).
• =20
• RMSR =3D .00, .03, .06, respectively correspond to perfect, good, and f= air model fit in the population (Preacher & MacCallum, 2002).
• =20
• Lack of fit of the model in the population = will not, on the average, influence recovery of population factors i= n analysis of sample data, regardless of degree of model error and regardle= ss of sample size (MacCallum, Widaman, Preacher, & Hong, 2001, p. 611).=
• =20
• Model fit has little effect on factor recov= ery. It is probably very rare in practice to find factor models exhi= biting simultaneously high communalities and poor fit (Preacher & MacCa= llum, 2002, p. 157).
• =20
• the differences between (extraction) methods with respect to ability to= reproduce the population pattern were generally minor (Velicer & Fava,= 1998, p. 243)
• =20
• =20
=20

### Conclusion

= =20
=20
• The general rule of thumb of the minimum sample size are not valid and = useful.
• =20
• What I did with the data I have:=20
=20
1. Repeat the method Garson (http://www2.= chass.ncsu.edu/garson/pa765/factor.htm#kmo) proposed until the KMO over= all is over .60.
2. =20
3. Check the communality of each variable. Drop the variables that has the= smallest communality, until the communalities of all variables are above .= 60.
4. =20
5. Check the mean value of all communalities to ensure that the mean value= is over .07. If not, repeat step 2.
6. =20
7. Use Kaiser strategy (dropping all components with eigenvalues under 1.0= ) and Scree plot to determine the number of factors.
8. =20
9. Set the loading size cut-off value as .60, and drop the factors that ha= s less than 3 variables.
10. =20
• =20
=20

Finally, with principal component analysis, I got 4 factors with 32 vari= ables, representing a STV ratio of 1.48:1 (47/32). The overall KMO is .616,= the minimum value of all communalities is .62, the maximum value of commun= alities is .879, the mean value of communalities is .770 with a standard de= viation of .074. There is no cross loading among the 4 factors. Two of the = 4 factors each have 5 loaded variables, one has 4 loaded variables, and one= has 3 loaded variables. The variable-to-factor ratio is 8 (32/4). I think = this can be considered as a moderate to high degree of overdetermination. <= /p>=20

=20 Icon=20
=20

"As long as communalities are high, the number of expected factors = is relatively small, and model error is low (a condition which often goes h= and-in-hand with high communalities), researchers and reviewers should not = be overly concerned about small sample sizes." (Preacher & MacCall= um, 2002, p. 160)

=20
=20
=20
=20 Icon=20
=20

"Strong data" in factor analysis means uniformly high communal= ities without cross loadings, plus several variables loading strongly on ea= ch factor. (Costello and Osborne, 2005, p. 4)

=20
=20
=20

=

= =20

### References

= =20
=20
• Anderson, T. W., & Rubin, H. (1956). Statistical inference in facto= r analysis. In J. Neyman (Ed.), Proceedings of the Third Berkeley Sympo= sium on Mathematical Statistics and Probability (pp. 111-150). Berkele= y: University of California Press.
• =20
=20
=20
• Arrindell, W. A., & van der Ende. J. (1985). An empirical test of t= he utility of the observations-to-variables ratio in factor and components = analysis. Applied Psychological Measurement, 9, 165 - 178.
• =20
=20
=20
• Barrett, P. T., & Kline. P. (1981). The observation to variable rat= io in factor analysis. Personality Study in Group Behavior, 1, 23-= 33.
• =20
=20
=20
• Bryant, F. B., & Yarnold, P. R. (1995). Principal components analys= is and exploratory and confirmatory factor analysis. In L. G. Grimm & R= R. Yarnold (Eds.), Reading and understanding multivariale statistics (pp. 99-136). Washington, DC: American Psychological Association.
• = =20
=20
=20
• Cattell, R. B. (1978). The Scientific Use of Factor Analysis. = New York: Plenum
• =20
=20
=20
• Comrey, A. L., & Lee, H. B. (1992). A first Course in Factor An= alysis. Hillsdale, NJ: Erlbaum.
• =20
=20
=20
• Costello, A. B., & Osborne, J. W. (2005). Best practices in explora= tory factor analysis: Four recommendations for getting the most from your a= nalysis. Practical Assessment Research & Evaluation, 10(7). Re= trieved July 3, 2008 from http://pareonline.net/pdf/v10n7a.p= df.
• =20
=20
=20
• Everitt, 1:1. S. (1975). Multivariate analysis: The need for data, and = other problems. British Journal of Psychiatry. 126, 2S7-240.
• = =20
=20
=20
• Fabrigar, L. R., Wegener, D. T., MacCallum, R. C., & Strahan, E. J.= (1999). Evaluating the use of exploratory factor analysis in psychological= research. Psychological Methods, 4, 272-299.
• =20
=20
=20
• Ford, J. K., MacCallum, R. C., & Tait, M. (1986). The application o= f exploratory factor analysis in applied psychology: A critical review and = analysis. Personnel Psychology, 39, 291-314.
• =20
=20
=20
=20
• Gorsuch, R. L. (1983). Factor analysis (2nd ed.). Hillsdale,NJ= : Erlbaum.
• =20
=20
=20
• Guadagnoli, E., & velicer, W. F. (1988). Relation of sample size to= the stability of component patterns. Psychological bulletin, 103,= 265-275.
• =20
=20
=20
• Guilford, J. P. (1954). Psychometric methods (2nd ed.). New Yo= rk: McGraw-Hill.
• =20
=20
=20
• Hair, J. F. J., Anderson, R. E., Tatham, R. L., & Black,W. C. (1995= ). Multivariate data analysis (4th ed.). Saddle River, NJ: Prentice Hall.=20
=20
=20
• Hatcher, L. (1994). A Step-by-Step Approach to Using the SAS® S= ystem for Factor Analysis and Structural Equation Modeling. Cary, NC: = SAS Institute, Inc.
• =20
=20
=20
• Hogarty, K. Y., Hines, C. V., Kromrey, J. D., Ferron, J. M., & Mumf= ord K. R. (2005). The quality of factor solutions in exploratory factor ana= lysis: The influence of sample size, communality, and overdetermination. Educational and Psychological Measurement, 65, 202-226.
• =20
=20
=20
• Henson, R. K., & Roberts, J. K. (2006). Use of exploratory factor a= nalysis in published research: Common errors and some comment on improved p= ractice. Educational and Psychological Measurement, 66, 393-416.=20
=20
=20
• Hutcheson, G., & Sofroniou, N. (1999). The multivariate social = scientist: Introductory statistics using generalized linear models. Th= ousand Oaks, CA: Sage Publications.
• =20
=20
=20
• Kline, P. (1979). Psychometrics and psychology. London: Acader= ric Press.
• =20
=20
=20
• Kunce, J. T., Cook, W. D., & Miller, D. E. (1975). Random variables= and correlational overkill. Educational and Psychological Measurement,= 35, 529-534.
• =20
=20
=20
• Lawley, D. N., & Maxwell, A. E. (1971). Factor analysis as a st= atistical method. London: Butterworth and Co.
• =20
=20
=20
• McDonald, R. P., & Krane, W. R. (1977). A note on local identifiabi= lity and degrees of freedom in the asymptotic likelihood ratio test. Br= itish Journal ofMathematical and Statistical Psychology, 30, 198-203.<= /li>=20
=20
=20
• McDonald, R. P., & Krane, W. R (1979). A Monte Carlo study of local= identifiability and degrees of freedom in the asymptotic likelihood ratio = test. British Journal of Mathematical and Statistical Psychology, 32, 121-132.
• =20
=20
=20
• Marascuilo, 1.. A., & Levin, J. R (1983). Multivariate statisti= cs in the social sciences. Monterey, CA: Brooks/Cole.
• =20
=20
=20
• MacCallum, R. C., Widaman, K. F., Preacher, K. J., & Hong S. (2001)= . Sample size in factor analysis: The role of model error. Multivariate= Behavioral Research, 36, 611-637.
• =20
=20
=20
• MacCallum, R. C., Widaman, K. F., Zhang, S., & Hong S. (1999). Samp= le size in factor analysis. Psychological Methods, 4, 84-99.
• = =20
=20
=20
• Noru?is, M. J. (2005). SPSS 13.0 Statistical Procedures Companion. Chicago: SPSS, Inc.
• =20
=20
=20
• Nunnally, J. C. (1978). Psychometric theory (2nd Ed.). New Yor= k: McGraw-Hill.
• =20
=20
=20
• Osborne, J. W., & Costello, A. B. (2004). Sample size and subject t= o item ratio in principal components analysis. Practical Assessment, Re= search & Evaluation, 9(11). Retrieved July 1, 2008 from http://PAREonline.net/getvn.asp?v=3D9&n=3D11.
• = =20
=20
=20
• Preacher, K. J., & MacCallum, R. C. (2002). Exploratory Factor Anal= ysis in Behavior Genetics Research: Factor Recovery with Small Sample Sizes= . Behavior Genetics, 32, 153-161.
• =20
=20
=20
• Rindskopf, D. (1984). Structural equation models: Empirical identificat= ion, Heywood cases, and related problems. Sociological Methods and Rese= arch, 13, 109-119.
• =20
=20
=20
• Velicer, W. F., & Fava, J. L. (1998). Effects of variable and subje= ct sampling on factor pattern recovery. Psychological Methods, 3, = 231-251.
• =20
------=_Part_183_2134484867.1563842201627 Content-Type: image/png Content-Transfer-Encoding: base64 Content-Location: file:///C:/69b9ed1a8d52fd462fb94a6d2054f103 iVBORw0KGgoAAAANSUhEUgAAABAAAAAQCAMAAAAoLQ9TAAAAA3NCSVQICAjb4U/gAAAAeFBMVEX/ ///ht0zds0vbsUrZr0rZr0rZr0rXrUr/01H90VH7z1D5zVD2zVP3y1D1yU/xx07tx1XwxU7vw07k wFfjuUzht0zhtkvftUvds0vbsUrQslvZr0rXrUrJrl3Vq0nAqF+3oWGllWScj2aUiWiKgmqBfGx4 dm5wcHAZd7u/AAAAKHRSTlMAEXe7u8zd7v////////////////////////////////////////// apo9sAAAAAlwSFlzAAALEgAACxIB0t1+/AAAABx0RVh0U29mdHdhcmUAQWRvYmUgRmlyZXdvcmtz IENTNui8sowAAAAUdEVYdENyZWF0aW9uIFRpbWUANi8xLzEzOKlF0AAAAJ1JREFUGJVFT1sCgjAM KyClbvIQ0Q0pU1Tk/je02wDztaRL2gAI0owFhxQiEhy6Wuu6GzAJvDCKArQtvIK3I61QBsXP69xa +cMp5J2Q+5toWeRhcnAVTYsnQTg7cFuCt5DivxBjHXBFn9MuiEVCx/nREF2eX6JrLms1NdMsua+R KlkLuB0qCRbD6X0ZedkXsQyyaUvdGo7l9vpZqP8DgbkMiplsfQgAAAAASUVORK5CYII= ------=_Part_183_2134484867.1563842201627--