Structural Validity and Measurement Invariance of The Short Version of The Big Five Inventory BFI-10 in Selected Countries
Structural Validity and Measurement Invariance of The Short Version of The Big Five Inventory BFI-10 in Selected Countries
To cite this article: Renier Steyn & Takawira Munyaradzi Ndofirepi (2022) Structural validity
and measurement invariance of the short version of the Big Five Inventory (BFI-10) in selected
countries, Cogent Psychology, 9:1, 2095035, DOI: 10.1080/23311908.2022.2095035
© 2022 The Author(s). This open access article is distributed under a Creative Commons
Attribution (CC-BY) 4.0 license.
Page 1 of 17
Steyn & Ndofirepi, Cogent Psychology (2022), 9: 2095035
https://doi.org/10.1080/23311908.2022.2095035
The five major dimensions of the B5P model are extraversion, agreeableness, openness, con
scientiousness, and neuroticism (Costa & McCrae, 1985). Costa and McCrae (1985) start by defining
extraversion, stating it to be a personality trait comprising energy, talkativeness, and assertive
ness. Second, agreeableness is explained as the degree of friendliness, cooperation, and compas
sion. Third, openness entails being perceptive and imaginative, in addition to possessing a diverse
range of interests. Fourth, conscientiousness refers to the human characteristics of orderliness and
thoroughness. Finally, neuroticism refers to an individual’s emotional stability and susceptibility to
negative emotions.
Research interest in the B5P has coincided with the proliferation of a variety of measurement
instruments claiming to assess the big five traits. Subjectively, the measures can be classified as
either long or short versions. Best known are Costa and McCrae’s (1992) 240-item Revised NEO
Personality Inventory, their 60-item NEO-Five Factor Inventory and their 44-item Big-Five
Inventory. Taylor and De Bruin (2006) developed a South African version of the test, the Basic
Traits Inventory (BTI), with 193 items. Short versions include Donnellan et al.’s (2006) 20-item
International Personality Item Pool–Five Factor Model (IPIP–FFM) and Gerlitz and Schupp’s (2005)
15-item Big Five Personality Inventory (BFI-S). Used in the World Values Survey (sixth wave) was
Rammstedt and John’s (2007) 10-item Big Five Personality Inventory (BFI-10).
While scholars of personality have agreed that personality traits generally fall within the five
categories proposed by the B5P (John, 2021), a methodological concern is whether respondents
react similarly to B5P items (Hahn et al., 2012). Within the South African context, Abrahams and
Mauer (1999) and McDonald (2011) demonstrated that individuals from different cultural back
grounds may differ in their interpretations of the terminology used to categorise the big five
personality traits. Consistent with this, Grobler and De Beer (2015, p. 50) observe that when partici
pants from diverse cultural backgrounds are included in studies, the likelihood of “measurement bias
from item interpretation differences is high, and empirical investigation of the items is important.”
This study examined the psychometric properties of the BFI-10 instrument, a brief instrument for
measuring the B5P factors, using data from the World Values Survey (WVS) of four culturally
diverse countries, namely Germany, the Netherlands, Rwanda, and South Africa. The first two
countries can be classified as western, educated, industrialised, rich, and democratic (WEIRD),
and the second two as non-WEIRD. The goal was to determine whether a five-structure model
similar to that proposed in the B5P is suited to application in both WEIRD and non-WEIRD
countries. Based on data from the sixth wave of the WVS, Ludeke and Larsen (2017) as well as
Simha and Parboteeah (2020) have raised concerns about the structural equivalence of the BFI-10
questionnaire. Ludeke and Larsen (2017) reported that when the questionnaire was used, indica
tors from the same scale tended to correlate negatively. Additional testing is necessary, as
structural equivalence, which exists when a factor model is applicable across groups, is
Page 2 of 17
Steyn & Ndofirepi, Cogent Psychology (2022), 9: 2095035
https://doi.org/10.1080/23311908.2022.2095035
a necessary condition for accurate statistical analyses across cultural groups, and this requirement
must be objectively established (Fontaine et al., 2008).
Although Ludeke and Larsen (2017) found significant item-correlation issues with the Big Five
measures in the WVS data, their analysis ignores the pattern of loadings and factor structures that
emerge from the data in order to determine the aspects of the theorised model that are applicable to
different countries. To analyse the factor structures and patterns of factor loadings in four countries
with differing degrees of WEIRD-ness, we revisited the data for four countries with varying degrees of
WEIRD-ness. This study is expected to contribute to a better understanding of the Big Five model’s
applicability in various countries. Considering this, the objectives of the study are as follows.
1.2. Sub-objectives
-to determine whether a five-structure model similar to that proposed in the B5P is suited for
application in both WEIRD and non-WEIRD countries.
-to assess if the short version of the Big Five Inventory is measurement invariant in both WEIRD
and non-WEIRD countries.
2. Literature review
Personality tests, as well as career aptitude and competence assessments, are useful for ascertaining
people’s strengths and weaknesses. The results of such tests can be used as a basis for potentially
life-altering career decisions (Cascio & Aguinis, 2011). However, concerns have been raised about the
direct transfer of assessment tools developed in developed economies to disadvantaged and cultu
rally distinct contexts without considering the instrument validity implications in the contrasting
contexts (Allik et al., 2017; Laajaj et al., 2019; Meiring et al., 2005). Inadequate consideration of the
fact that different cultural groups may ascribe different meanings to items in a rating instrument can
result in inaccurate judgements about individuals’ personalities (Fontaine et al., 2008). In South
Africa, the Employment Equity Act (No. 55 of 1988) offers a measure of protection in stipulating
that “psychometric testing and other similar assessments of an employee are prohibited unless the
test or test being used has been scientifically shown to be valid and reliable”.
Page 3 of 17
Steyn & Ndofirepi, Cogent Psychology (2022), 9: 2095035
https://doi.org/10.1080/23311908.2022.2095035
The B5P assessment tools are examples of where measurement invariance testing is applicable,
with a number of studies having been done to test the measurement invariance of the B5P scales
(Chiorri et al., 2016; Laverdière et al., 2013; Schmitt et al., 2011). Measurement invariance is
applicable in the case of the B5P scales, as it employs multiple and conceptually related clusters
of items to assess the five personality characteristics. When measurement invariance holds,
respondents from diverse groups assign the same meaning to each of the five personality clusters,
allowing for the comparability of research results.
● Conceptual invariance implies that the domain or trait should make sense in all the groups to
be compared (Berry et al., 2011). When a measured construct is specific to a particular
context, it would thus be impossible to find a comparable operational pattern of relationships
with other constructs across the groups (Fontaine et al., 2008). Although conceptual invar
iance is based mainly on theoretical arguments, and although no statistical tests directly test
conceptual equivalence, Berry et al. (2011) state that evidence of configural invariance sup
ports claims regarding conceptual equivalence.
● Configural (configurational) invariance is the fundamental type of invariance and is examined
first, before any other types of invariance are considered. Configural invariance (pattern
invariance) is confirmed when the number of factors and their loading patterns is consistent
across groups (Bialosiewicz et al., 2013). However, with configural invariance the strength of
the factor loadings may vary across population groups, and this does not guarantee structural
equivalence across respondents’ groups in a multigroup study, with additional tests being
required to confirm group comparability.
● Metric invariance (also known as weak invariance) is a type of measurement invariance that is
at a higher level than configural invariance. Unlike configural invariance, which requires only
a similar number of factors and an identical pattern of factor loadings to confirm measure
ment invariance, metric invariance requires that the strength/size of the factor loadings be
Page 4 of 17
Steyn & Ndofirepi, Cogent Psychology (2022), 9: 2095035
https://doi.org/10.1080/23311908.2022.2095035
equal across population groups (Davidov et al., 2014). Where there is no metric invariance, any
comparison of constructs across groups should be performed with caution, as the constructs
themselves are not identical (Marsh et al., 2012).
● Scalar invariance is the most robust and desired level of measurement invariance, according
to many authors. Apart from the conditions of configural and metric invariance, the intercepts
of the scale items must be equivalent across respondent groups (Melipillán & Hu, 2020), and
only then will scalar invariance be indicated. When this level of invariance is achieved, mean
ingful comparisons of constructs across groups of respondents are possible (Li et al., 2018;
Wang et al., 2018).
● The final level of measurement invariance, known as strict invariance, is concerned with the
equivalence of residual error between groups (Bialosiewicz et al., 2013). Unlike the previously
discussed levels of measurement invariance, strict invariance has two sublevels. The invar
iance of factor variances is the first level of strict invariance, where the error variances of the
factors are equal across groups. The second level of strict invariance refers to the invariance of
the error terms of an indicator variable, indicating that the unique errors of the indicator
variable are equal across groups. Therefore, when you test for strict invariance, you are in
essence determining whether your residual error is comparable across administrations
(Bialosiewicz et al., 2013).
Confirmatory factor analysis indicators are used to determine an acceptable level of model fit
and, ultimately, measurement invariance. Among the most commonly used goodness-of-fit
indices are the comparative fit index (CFI), the Tucker–Lewis index (TLI), the root mean square
error of approximation (RMSEA), and the standardised root mean square residual (SRMR), as well as
the chi-square statistic (Bibi et al., 2020; Browne & Cudeck, 1993; Kim, 2017; Kong, 2017; Sun,
2005). For each of the indicators, Hu and Bentler (1999) propose the following cut-off criteria: CFI
and TLI values of 0.9 or greater; RMSEA and SRMSR values of less than 0.08; and a chi-square
statistic that is not statistically significant. Last-mentioned is seldom met. Lavaan (R; R Core Team,
2020) can be used to test measurement invariance up to the level of strict invariance, and both
Svetina et al. (2020) and Steyn and De Bruin (2019) used Lavaan (R) to analyse their data, and
applied the abovementioned guidelines to interpret the findings of their respective studies. Once
measurement invariance is proven, group comparisons can be made to ascertain whether different
categories of respondents interpret the multiple scale items measuring a particular construct
similarly (Rhudy et al., 2020).
Although short questionnaires are convenient in psychological studies, reservations over them,
particularly the brief B5P scales, have been highlighted. For instance, it is claimed that lengthier
B5P tests seem to be more reflective of broad personality constructs and have better measuring
capacity than shorter ones (Langford, 2003). According to Gosling, Rentfrow, and Swann Jr. (2003),
Page 5 of 17
Steyn & Ndofirepi, Cogent Psychology (2022), 9: 2095035
https://doi.org/10.1080/23311908.2022.2095035
abridged versions of the B5P show poorer psychometric qualities than the regular multi-item
measure. Similar instrument validity concerns were observed by Laajaj et al. (2019) whose
research based on a 15-item instrument could not accurately assess the target personality traits
and found that the instrument had low validity.
Several studies have found the Rammstedt and John’s (2007) 10-item BFI-10 scale to be
a reliable measure of extraversion, agreeableness, openness, conscientiousness, and neuroticism
in individuals (Balgiu, 2018; Guido et al., 2015. Other studies based on the WVS sixth wave (from
2010 to 2014), however, reveal the shortcomings of the BFI-10 scale’s psychometric qualities when
measuring the Big Five personality traits in cross-national scenarios (Ludeke & Larsen, 2017; Simha
& Parboteeah, 2017). The conclusions regarding the challenges of the BFI-10 instrument were
substantiated by findings from Chapman and Elliot’s (2019) study based on General Social Survey
data, which revealed odd results that did not replicate those of original big five instruments. Given
the foregoing, the B5P model’s universal application becomes problematic. Furthermore, the 10-
item measure’s persistent use in cross-cultural research without a consensus on its generalisability
seems contentious. As a result, the literature supports the necessity for additional research into
the structural validity of the WVS’s 10-item measure.
4. Method
In this section the design, sampling, research instrument used, procedure, statistical analyses and
ethical concerns are discussed.
4.1. Design
This study is based on the analysis of cross-sectional data on the B5P model collected during the
World Values Survey (sixth wave) in Germany, the Netherlands, Rwanda, and South Africa. The
data was quantitative in nature and collected by means of Rammstedt and John’s (2007) 10-item
Big Five Personality Inventory (BFI-10). The purpose of the analysis was either to confirm or refute
the structural validity of the BFI-10 instrument using data from selected WEIRD and non-WEIRD
nations. We acknowledge that the study’s concentration on a small number of nations and a single
dataset limits the universal applicability of its findings.
4.2. Sampling
The WVS website provides a detailed description of how individuals were sampled across countries
(Inglehart et al., 2014), with every effort being made to sample individuals randomly per country.
The data from four countries, namely Germany, the Netherlands, Rwanda, and South Africa, was
used in the study. The selection of countries for analysis was by no means arbitrary. As the focus of
the study was on the B5P construct, and as this measure was only used in the sixth wave, the
countries were selected from those which completed the questionnaires during that period. We
chose only two Western European countries to represent WEIRD since their WVS data was found
by Ludeke and Larsen (2017) to reflect the Big Five Model’s five-factor structure. The inclusion of
Rwanda and South Africa in the non-WEIRD category was motivated by the desire to include
contrasting comparisons. The demographics of Rwandan respondents were substantially skewed
toward non-WEIRD, but those of South African respondents were mixed, allowing for a good
comparison of countries with diverse characteristics.
Page 6 of 17
Steyn & Ndofirepi, Cogent Psychology (2022), 9: 2095035
https://doi.org/10.1080/23311908.2022.2095035
to 5 = strongly agree. The table below summarises the questionnaire items and how they were
classified in the survey.
It can be observed from Table 1 that two items each, one positively worded and the other
negatively worded, represent each of the five traits of the B5P-model.
Literature on the reliability and validity of the BFI-10 paints a mixed picture. Studies by
Rammstedt and John (2007), Carciofo et al. (2016), Rammstedt and Krebs (2007), and Erdle and
Rushton (2011) yielded respectable reliability coefficients for each of the five constructs in the five-
factor model. Also, studies by Rammstedt and John (2007), Balgiu (2018), and Guido et al. (2015)
yielded data which supported a five-factor model as was theorised in the B5F-model.
Notwithstanding the preceding findings, later research by Ludeke and Larsen (2017) disconfirmed
positive reliability findings and the validity of the five-factor hypothesis.
4.4. Procedure
The primary aim of the study was to test the structural validity of the B5P-model across countries,
some of which were WEIRD, and others non-WEIRD. As stated, data from the WVS was used,
specifically the SPSS file available on the WVS website (https://www.worldvaluessurvey.org/wvs.jsp).
As the aim of the study was to assess the fit of the B5P-model with data from four countries,
exploratory factor analyses (EFA) were first performed, to visually assess whether the data
fitted the B5P-model in each country. First, following Kaiser’s rule regarding eigenvalues, the
natural fit of the data was established. Then, using the SPSS software (IBM Corp, 2020), the
data was “forced” into a five-factor solution, again one country at a time. As a final prepara
tion before entering into multi-group confirmatory factor analyses (MGCFA), confirmatory factor
analyses (CFA) were performed per country. Given the aforementioned results, multi-group
Page 7 of 17
Steyn & Ndofirepi, Cogent Psychology (2022), 9: 2095035
https://doi.org/10.1080/23311908.2022.2095035
confirmatory factor analyses (MGCFA) were envisaged as a means to test for the levels of
measurement invariance.
5. Results
Data for the sixth wave of the WVS was collected in Germany (2013), the Netherlands (2012),
Rwanda (2012), and South Africa (2013). In all, 8724 responses were collected: 2010 from
Germany, 1739 from the Netherlands, 1527 from Rwanda, and 3448 from South Africa.
Page 8 of 17
Steyn & Ndofirepi, Cogent Psychology (2022), 9: 2095035
https://doi.org/10.1080/23311908.2022.2095035
From Table 3 it can be observed that men are marginally underrepresented, but this applies
across all the groups. Noteworthy are the differences in average age between the European and
the African sample, namely 15 years, which is about the same as the standard deviation in the
groups. This equates to a large effect size and a practically significant difference on age.
The data captured in Table 3 affirms that the European countries can be classified as WEIRD and
the African countries as non-WEIRD. Stated differently, the European countries scored higher on all
the WVS’s proxies of WEIRD than the African countries did.
What can be observed in Table 4 is that the a priori model yielded the same number of factors
and patterns of loadings as the theorised model. The loadings also fitted the positive–negative
alternation, as was expected given the reverse wording of every other item. However, for compo
nent five (A(R)), the loading weight for that scale item was lower than what was theoretically
expected. The model based on eigenvalues exposed four factors, where the first factor was more
complex, but with the remaining three following the B5P conceptualisation.
The data for the Netherlands was analysed next. The KMO measure of sampling adequacy was
.526 for the Netherlands sample, and the BTS approximated chi-square was 1510.445 (df = 45),
Table 4. Germany: EFA with components (a priori) and component (by eigenvalues greater
than 1)
G Five components (a priori) Component with eigenvalues >1
1 2 3 4 5 1 2 3 4 -
E(R) −.769 .091 −.042 .041 .157 −.789 −.035 .039 −.017 -
A −.003 −.085 .008 −.09 .878 −.123 .342 .416 .215 -
C(R) −.09 −.014 −.004 .872 .025 −.167 .131 −.794 .100 -
N −.073 −.713 .12 .233 .33 −.100 .801 −.088 .164 -
O(R) .026 .165 −.815 .177 .126 −.032 −.081 −.126 −.709 -
E .752 −.128 .185 −.083 .187 .728 .17 .132 .233 -
A(R) .731 .176 −.011 .097 −.047 .707 −.164 −.125 .031 -
C −.15 −.042 .138 −.603 .444 −.158 .098 .731 .166 -
N(R) −.08 .853 .028 .164 .086 −.153 −.757 −.116 .137 -
O .221 .107 .748 .093 .199 .186 −.038 −.007 .794 -
Note: G = Germany, E = Extraversion, A = Agreeableness, C = Conscientiousness, N = Neuroticism, O = Openness,
R = Reverse coding. Also, in bold in the table are the two highest loadings per component, and “shadowed” are the
a priori loadings, as per the B5P model, where the highest loading was used as a marker item.
Page 9 of 17
Steyn & Ndofirepi, Cogent Psychology (2022), 9: 2095035
https://doi.org/10.1080/23311908.2022.2095035
Table 5. Netherlands: EFA with components (a priori) and component (by eigenvalues greater
than 1)
N Five components (a priori) Component with eigenvalues >1
1 2 3 4 5 1 2 3 4 5
E(R) .109 .117 −.042 −.774 .296 .109 .117 −.042 −.774 .296
A −.120 −.090 −.022 .008 .785 −.120 −.090 −.022 .008 .785
C(R) .138 −.059 .825 −.134 .138 .138 −.059 .825 −.134 .138
N −.781 .065 .045 .136 .297 −.781 .065 .045 .136 .297
O(R) .050 .846 .028 .017 .074 .050 .846 .028 .017 .074
E −.018 −.049 −.152 .736 .327 −.018 −.049 −.152 .736 .327
A(R) .461 .241 .345 .297 .152 .461 .241 .345 .297 .152
C .090 .080 −.673 −.043 .407 .090 .080 −.673 −.043 .407
N(R) .853 −.050 .062 −.055 .047 .853 −.050 .062 −.055 .047
O .113 −.701 .140 .172 .200 .113 −.701 .14 .172 .200
Note: N = Netherlands, E = Extraversion, A = Agreeableness, C = Conscientiousness, N = Neuroticism, O = Openness,
R = Reverse coding. Also, in bold in the table are the two highest loadings per component, and “shadowed” are the
a priori loadings, as per the B5P model, where the highest loading was used as a marker item.
Table 6. Rwanda: EFA with components (a priori) and component (by eigenvalues greater
than 1)
R Five components (a priori) Component with eigenvalues >1
1 2 3 4 5 1 2 3 4 -
E(R) .261 .819 .132 .054 −.100 .323 .251 .753 .001 -
A −.058 .869 .039 .009 .218 −.036 .038 .883 .129 -
C(R) −.097 .119 .834 .015 .135 −.237 .718 .174 .097
N .187 .211 .034 .272 .808 .047 −.169 .352 .683 -
O(R) .010 −.008 .100 .927 .098 .071 .259 −.132 .825 -
E .462 .202 −.376 .482 .207 .523 −.273 .148 .526 -
A(R) .421 .322 .320 .326 −.534 .528 .593 .136 −.008 -
C .824 .107 .008 .072 −.062 .816 .083 .073 .054 -
N(R) .315 .067 .665 .075 −.277 .272 .732 .008 −.071 -
O .843 .032 .093 .006 .119 .770 .075 .058 .102 -
Note: R = Rwanda, E = Extraversion, A = Agreeableness, C = Conscientiousness, N = Neuroticism, O = Openness,
R = Reverse coding. Also, in bold in the table are the two highest loadings per component, and “shadowed” are the
a priori loadings, as per the B5P model, where the highest loading was used as a marker item.
which was statistically significant (p < .001) (N = 1902). When Kaiser’s criterion of retaining factors
with eigenvalues greater than one was applied, five factors were retained, accounting for
66.859 per cent of variance. These findings are summarised in Table 5.
Table 5 shows that, as was the case with the German sample, the a priori model had five factors
with patterns of loadings reflecting those theorised in the B5P model. However, for component five
(A(R)), the loading weight of the scale items was lower than expected. This is the same item for
which loading was not satisfactory in the German sample. Worse in the case of the Netherlands
was that the loading of the items was not signed contrary to that of the other marker variable.
When the model was based on eigenvalues, five factors were found, which implies that the
“forced” and the eigenvalue models were identical, both providing support to the B5P
conceptualisation.
Page 10 of 17
Steyn & Ndofirepi, Cogent Psychology (2022), 9: 2095035
https://doi.org/10.1080/23311908.2022.2095035
Table 7. South Africa: EFA with components (a priori) and component (by eigenvalues greater
than 1)
p Five components (a priori) Component with eigenvalues >1
1 2 3 4 5 1 2 - - -
E(R) .183 .142 .127 .892 .033 .632 .128 - - -
A −.081 .306 .28 .683 .237 .778 −.024 - - -
C(R) .846 −.111 .145 .018 .052 −.062 .824 - - -
N .104 .28 .846 .159 .048 .683 .188 - - -
O(R) .196 .13 .723 .221 .326 .639 .34 - - -
E .096 .261 .25 .133 .825 .581 .364 - - -
A(R) .627 .135 .044 .114 .564 .256 .764 - - -
C −.004 .836 .189 .189 .171 .733 .073 - - -
N(R) .699 .479 .123 .123 .063 .345 .693 - - -
O .164 .626 .295 .271 .245 .715 .253 - - -
Note: S = South Africa, E = Extraversion, A = Agreeableness, C = Conscientiousness, N = Neuroticism, O = Openness,
R = Reverse coding. Also, in bold in the table are the two highest loadings per component, and “shadowed” are the
a priori loadings, as per the B5P model, where the highest loading was used as a marker item.
The data for Rwanda was analysed next. The KMO measure of sampling adequacy was .611 for
the Rwanda sample, and the BTS estimated chi-square was 1693.435 (df = 45), which was
statistically significant (p < .001) (N = 1527). When Kaiser’s criterion for maintaining variables
with eigenvalues greater than one was applied, four factors were maintained, accounting for
61.395 per cent of the variance. The total variance explained was 82.646 per cent when “forcing”
the data into the five-factor solution. Table 6 summarises these findings.
What can be observed is that the number of factors and patterns of loadings on the factor
components based on the eigenvalues greater than one do not match those theorised in the B5P
model. The indicator items loaded haphazardly across the factors such that no definite personality
construct in line with the B5P model could be identified. Also, when the data was “forced” into the
five-factor solution, the patterns so pronounced in the WEIRD-countries were absent.
Lastly, data for South Africa was analysed. The KMO measure of sampling adequacy for the
South African sample was .872, and the BTS approximate chi-square was 1057.903 (df = 45), which
was statistically significant (p < .001) (N = 3531). When Kaiser’s criterion of retaining factors with
eigenvalues greater than one was applied, two factors were retained, accounting for
55.644 per cent of the variance. When the data was “forced” into the five-factor solution, the
declared variance was 76.606 per cent. These findings are shown in Table 7.
Only two components were derived based on the strategy of selecting eigenvalues greater than
one. As shown in Table 7, the patterns of the item loadings do not reflect any particular personality
construct as proposed in the five-factor model. Also, when considering the forced five-factor
solution, none of the patterns typical of B5P, as observed in the samples from Germany and the
Netherlands, were observed.
The results presented in Tables 4 to 8 show compelling evidence that the BFI-10 functions at
different levels of effectiveness: in the WEIRD countries it follows the B5P conceptualisation, while
this does not occur in the non-WEIRD countries.
The aforementioned conclusion is based on exploratory factor analyses (EFA), and the inter
pretation involved some subjectivity. To quantify these tentative conclusions, confirmatory factor
analysis (CFA) was used to evaluate for structural validity in a thorough and comprehensive
Page 11 of 17
Steyn & Ndofirepi, Cogent Psychology (2022), 9: 2095035
https://doi.org/10.1080/23311908.2022.2095035
manner, again per country. Five factors were postulated, with two items loading on each factor, as
explained in Table 1. Hu and Bentler’s (1999) cut-off criteria for fit indices, as presented in Table 3,
were applied. The test results for each of the four countries are presented in Table 8.
According to the cut-off criteria, the findings revealed a poor fit across all counties, including
those which had comparatively better patterns of loadings in EFA. These results were surprising, to
say the least, and will be discussed in the discussion below.
6. Discussion
The overall aim of the study was to establish the structural validity and measurement of variance
of the BFI-10 instrument in WEIRD and non-WEIRD countries. The novelty of the study lies in its
attempt to compare and contrast the psychometric properties of the BFI-10 instrument in the
culturally different environments.
The literature review focused on the B5P conceptualisation, as well as the concept of measure
ment invariance and the need to test for it. It was revealed that although the B5P conceptualisa
tion is well accepted, it is not without critique, particularly in terms of its use as a universal theory
of personality. With regard to measurement invariance, the concept, as well as how it should be
assessed, was considered.
The data revealed that the countries included in the study were clearly differentiable based on the
WEIRD concept, with the two Western-European counties being classified as WEIRD, and the two from
sub-Saharan Africa as non-WEIRD. Another characteristic which differentiated the countries was mean
age, which was substantially higher in the WEIRD countries. It may well be asked whether the acronym
WEIRDO, in which O stands for old, might perhaps be applicable. The implications of including older
respondents from the WEIRD group in the analyses of personality can only be speculated on. Lang et al.
(2011) hypothesised that the mental strain associated with personality studies could preclude elderly
respondents from providing valid self-report responses, thereby decreasing the likelihood of deriving
a compact five-factor model. They did note, however, that more educated elderly people are likely to
cope better with mental strain than less educated elderly people, and thus provide more consistent
item responses during surveys. Given that B5P conceptualisation is based largely on trait theory, the
fact that data was extracted from a more mature sample may be irrelevant.
The result of the EFA suggests a partially valid model in the WEIRD contexts and an invalid one
in the non-WEIRD contexts (see, Tables 4 to 7). The results reveal that, at a configural level of
measurement invariance, WEIRD countries met the criteria (to a large degree), but that the non-
WEIRD data did not support the proposed theoretical structure at all.
Page 12 of 17
Steyn & Ndofirepi, Cogent Psychology (2022), 9: 2095035
https://doi.org/10.1080/23311908.2022.2095035
Even though the outcome of the EFA for the WEIRD countries suggested a factor structure
almost equivalent to the theorised model, it still fell short of the ideal. When a more compre
hensive statistic is used to test for configural fit, the model fit indicators (CFI and RMSEA)
reflected a poor model fit for all four countries. These results were surprising, as the EFA results
were quite satisfactory for Germany and the Netherlands. Given the CFA results, it should be
stated that at the most basic form of measurement invariance, that is, configural invariance,
the BFI-10 failed.
All further tests of measurement invariance were abandoned, as testing for measurement
invariance is a sequential process, where testing for higher levels of invariance is done only once
certain milestones have been achieved (Berry et al., 2011). Thus, no tests were performed on
configural, metric, scalar, and strict invariance.
Other studies, notably those of Ludeke and Larsen (2017)have since shown the measurement
limitations of using the BFI-10 instrument. However, Ludeke and Larsen (2017) found it to be
a reliable and valid tool in Germany and the Netherlands (the WEIRD countries). Previous studies
within the WVS domain (Balgiu, 2018; Rammstedt & John, 2007) found satisfactory levels of struc
tural validity and measurement invariance. Several studies outside the WVS domain have found the
BFI-10 to be a reliable, valid, convenient and useful tool for gauging self-reported personality traits
(Erdle & Rushton, 2011; Guido et al., 2015; Rammstedt & Krebs, 2007).
These results affirm the problematic nature of the BFI-10 tool, and this may be the reason why
its use in the WVS has since been discontinued, and it does not appear in the WVS seventh wave
questionnaire. Notwithstanding its limitations identified in the WVS sixth wave data, the BFI-10
remains a useful instrument for researchers, perhaps particularly so in WEIRD environments. Thus,
it is recommended that the psychometric properties of even well-established personality rating
tools need to be examined, particularly if they are used in environments foreign to where they
were developed. There is also evidence that disparities in schooling affect structural validity even
within a country (Lang et al., 2001; Rammstedt et al., 2010). As a result, the findings may be
attributed less to WEIRD vs. non-Weird and more to educational differences. Future studies should
probe further the possible influence of the variations in respondents’ levels of education and other
demographic variables on the structural validity of the BFI-10 instrument.
7. Conclusion
The article discusses the importance of assessing the psychometric properties of well-established
personality rating instruments when they are used across diverse cultural groups. Preliminary
results (using EFA) indicate that the BFI-10 instrument has configural structural validity and
measurement invariance in WEIRD countries, but not in non-WEIRD countries. Further investiga
tions (using CFA) revealed that not even in WEIRD countries was the BF-10 structurally valid. If
nothing else, this indicates the importance of using multiple statistical techniques to gain informa
tion on a specific question. The results suggest that practitioners and researchers should adopt
a cautious approach when applying ostensibly globally accepted tools in contexts for which they
were not designed, and that, particularly as research findings concerning the BFI-10 are so
contradictory, further research on this instrument be conducted.
Lastly, it seems that the WVS data supported the WEIRD categorisation of countries, even
though the results did not fit the theorised factor structure perfectly. It was noted that
Page 13 of 17
Steyn & Ndofirepi, Cogent Psychology (2022), 9: 2095035
https://doi.org/10.1080/23311908.2022.2095035
respondents in these countries were also older than those in non-WEIRD countries, opening up the
opportunity for further studies on whether an O (for old) can be added to the acronym.
Page 14 of 17
Steyn & Ndofirepi, Cogent Psychology (2022), 9: 2095035
https://doi.org/10.1080/23311908.2022.2095035
Fontaine, J. R., Poortinga, Y. H., Delbeke, L., & Individual Differences, 116(1), 29–31. https://doi.org/
Schwartz, S. H. (2008). Structural equivalence of the 10.1016/j.paid.2017.04.025
values domain across cultures: Distinguishing sam Laajaj, R., Macours, K., Hernandez, D. A. P., Arias, O.,
pling fluctuations from meaningful variation. Journal Gosling, S. D., Potter, J., Rubio-Codina, M., & Vakis, R.
of Cross-Cultural Psychology, 39(4), 345–365. https:// (2019). Challenges to capture the Big Five personality
doi.org/10.1177/0022022108318112 traits in non-WEIRD populations. Science Advances, 5
Gerlitz, J. Y., & Schupp, J. (2005). Zur Erhebung der Big-Five (7), eaaw5226. https://doi.org/10.1126/sciadv.
-basierten persoenlichkeitsmerkmale im SOEP. DIW. aaw5226
Gosling S D, Rentfrow P J and Swann W B. (2003). A very Lang, F. R., Lüdtke, O., & Asendorpf, J. B. (2001). Testgüte
brief measure of the Big-Five personality domains. und psychometrische Äquivalenz der deutschen
Journal of Research in Personality, 37(6), 504–528. Version des Big Five Inventory (BFI) bei jungen, mit
10.1016/S0092-6566(03)00046-1 telalten und alten Erwachsenen. Diagnostica, 47(3),
Grobler, S., & De Beer, M. (2015). Psychometric evaluation 111–121. https://doi.org/10.1026//0012-1924.47.3.
of the basic traits inventory in the multilingual South 111
African environment. Journal of Psychology in Africa, Lang, F. R., John, D., Lüdtke, O., Schupp, J., & Wagner, G. G.
25(1), 50–55. https://doi.org/10.1080/14330237. (2011). Short assessment of the Big Five: Robust
2014.997033 across survey methods except telephone
Guido, G., Peluso, A. M., Capestro, M., & Miglietta, M. interviewing. Behavior Research Methods, 43(2),
(2015). An Italian version of the 10-item Big Five 548–567. https://doi.org/10.3758/s13428-011-0066-z
Inventory: An application to hedonic and utilitarian Laverdière, O., Morin, A. J., & St-Hilaire, F. (2013). Factor
shopping values. Personality and Individual structure and measurement invariance of a short
Differences, 76 1 , 135–140. https://doi.org/10.1016/j. measure of the Big Five personality traits. Personality
paid.2014.11.053 and Individual Differences, 55(7), 739–743. https://
Hahn, E., Gottschling, J., & Spinath, F. M. (2012). Short doi.org/10.1016/j.paid.2013.06.008
measurements of personality – Validity and reliability Li, M., Wang, M., Shou, Y., Zhong, C., Ren, F., Zhang, X., &
of the GSOEP Big Five Inventory (BFI-S). Journal of Yang, W. (2018). Psychometric Properties and
Research in Personality, 46(3), 355–359. https://doi.org/ Measurement Invariance of the Brief Symptom
10.1016/j.jrp.2012.03.008 Inventory-18 Among Chinese Insurance Employees.
Hofstede, G., & McCrae, R. R. (2004). Personality and cul Frontiers in Psychology, 9. https://doi.org/10.3389/
ture revisited: Linking traits and dimensions of fpsyg.2018.00519
culture. Cross-Cultural Research, 38(1), 52–88. https:// Ludeke, S. G., & Larsen, E. G. (2017). Problems with the Big
doi.org/10.1177/1069397103259443 Five assessment in the world values survey.
Hu, L. T., & Bentler, P. M. (1999). Cutoff criteria for fit Personality and Individual Differences, 112(1),
indexes in covariance structure analysis: 103–105. https://doi.org/10.1016/j.paid.2017.02.042
Conventional criteria versus new alternatives. Marsh, H. W., Nagengast, B., & Morin, A. J. S. (2012).
Structural Equation Modeling: A Multidisciplinary Measurement invariance of Big-Five factors over the
Journal, 6(1), 1–55. https://doi.org/10.1080/ life span: ESEM tests of gender, age, plasticity,
10705519909540118 maturity, and La Dolce Vita effects. Developmental
Hughes, B. T., Costello, C. K., Pearman, J., Razavi, P., Psychology, 49(6), 1194–1218. Advance online pub
Bedford-Petersen, C., Ludwig, R. M., & Srivastava, S. lication. https://doi.org/10.1037/a0026913
(2021). The Big Five across socioeconomic status: McDonald, E. (2011). Comparing a native English-speaking
Measurement invariance, relationships, and age group’s and non-native English-speaking group’s
trends. Collabra: Psychology, 7(1). University of understanding of the vocabulary used in the 16PF5.
Chicago Press. https://psyarxiv.com/wkhfx/down [Unpublished Master’s Thesis]. University of South
load?format=pdf Africa.
IBM Corp. (2020). IBM SPSS Statistics for Windows, Version Meiring, D., Van de Vijver, A. J. R., Rothmann, S., &
27.0. Barrick, M. R. (2005). Construct, item and method
Inglehart, R., Haerpfer, C., Moreno, A., Welzel, C., bias of cognitive and personality tests in South
Kizilova, K., Diez-Medrano, J., Lagos, M., Norris, P., Africa. SA Journal of Industrial Psychology, 31(1), 1–8.
Ponarin, E., Puranen, B. (eds.). (2014). World values https://doi.org/10.4102/sajip.v31i1.182
survey: Round six – country-pooled datafile version. JD Melipillán, E. R., & Hu, M. (2020). Measurement invariance
Systems Institute. www.worldvaluessurvey.org/ across groups. SAGE.
WVSDocumentationWV6.jsp Norman, W. T. (1963). Toward an adequate taxonomy of
Jak, S., Oort, F. J., & Dolan, C. V. (2014). Measurement bias personality attributes: Replicated factor structure in
in multilevel data. Structural Equation Modeling: peer nomination personality ratings. The Journal of
A Multidisciplinary Journal, 21(1), 31–39. https://doi. Abnormal and Social Psychology, 66(6), 574–686.
org/10.1080/10705511.2014.856694 https://doi.org/10.1037/h0040291
John, O. P. (2021). History, measurement, and conceptual Nye, C. D., Roberts, B. W., Saucier, G., & Zhou, X. (2008).
elaboration of the Big Five trait taxonomy: The Testing the measurement equivalence of personality
paradigm matures. In O. P. John & R. W. Robins adjective items across cultures. Journal of Research
(Eds.), Handbook of personality: Theory and research in Personality, 42(6), 1524–1536. https://doi.org/10.
(pp. 35–82). Guilford Press. 1016/j.jrp.2008.07.004
Kim, S. (2017). Developing an item pool and testing Patel, J. S., Oh, Y., Rand, K. L., Wu, W., Cyders, M. A.,
measurement invariance for measuring public ser Kroenke, K., & Stewart, J. C. (2019). Measurement
vice motivation in Korea. International Review of invariance of the patient health questionnaire-9
Public Administration, 22(3), 231–244. https://doi.org/ (PHQ-9) depression screener in US adults across sex,
10.1080/12294659.2017.1327113 race/ethnicity, and education level: Nhanes 2005–
Kong, F. (2017). The validity of the Wong and Law 2016. Depression and Anxiety, 36(9), 813–823.
Emotional Intelligence Scale in a Chinese sample: https://doi.org/10.1002/da.22940
Tests of measurement invariance and latent mean Pletzer, J. L., Bentvelzen, M., Oostrom, J. K., & De
differences across gender and age. Personality and Vries, R. E. (2019). A meta-analysis of the relations
Page 15 of 17
Steyn & Ndofirepi, Cogent Psychology (2022), 9: 2095035
https://doi.org/10.1080/23311908.2022.2095035
between personality and workplace deviance: Big of Career Assessment, 23(2), 191–209. https://doi.org/
Five versus HEXACO. Journal of Vocational Behavior, 10.1177/1069072714535019
112(1), 369–383. https://doi.org/10.1016/j.jvb.2019. Steyn, R., & De Bruin, G. (2019). The structural validity of
04.004 the innovative work behaviour questionnaire:
R Core Team. (2020). R: A language and environment for Comparing competing factorial models. The Southern
statistical computing. R Foundation for Statistical African Journal of Entrepreneurship and Small
Computing. https://www.R-project.org/ Business Management, 11(1), 1–11. https://doi.org/
Rammstedt, B., & Krebs, D. (2007). Does response scale 10.4102/sajesbm.v11i1.291
format affect the answering of personality scales? Sun, J. (2005). Assessing goodness of fit in confirmatory
Assessing the Big Five dimensions of personality with factor analysis. Measurement and Evaluation in
different response scales in a dependent sample. Counseling and Development, 37(4), 240–256. https://
European Journal of Psychological Assessment, 23(1), doi.org/10.1080/07481756.2005.11909764
32–38. https://doi.org/10.1027/1015-5759.23.1.32 Sun, J., Kaufman, S. B., & Smillie, L. D. (2018). Unique
Rammstedt, B., & John, O. P. (2007). Measuring person associations between Big Five personality aspects
ality in one minute or less: A 10-item short version of and multiple dimensions of well-being. Journal of
the Big Five Inventory in English and German. Journal Personality, 86(2), 158–172. https://doi.org/10.1111/
of Research in Personality, 41(1), 203–212. https://doi. jopy.12301
org/10.1016/j.jrp.2006.02.001 Svetina, D., Rutkowski, L., & Rutkowski, D. (2020). Multiple-
Rammstedt, B., Goldberg, L. R., & Borg, I. (2010). The group invariance with categorical outcomes using
measurement equivalence of Big-Five factor markers updated guidelines: An illustration using M plus and
for persons with different levels of education. Journal the Lavaan/SEM tools packages. Structural Equation
of Research in Personality, 44(1), 53–61. https://doi. Modeling: A Multidisciplinary Journal, 27(1), 111–130.
org/10.1016/j.jrp.2009.10.005 https://doi.org/10.1080/10705511.2019.1602776
Rhudy, J. L., Arnau, R. C., Huber, F. A., Lannon, E. W., Taylor, N., & De Bruin, G. P. (2006). Basic traits inventory.
Kuhn, B. L., Palit, S., Payne, M. F., Sturycz, C. A., Jopie van Rooyen.
Hellman, N., Guereca, Y. M., Toledo, T. A., & Thalmayer, A. G., & Saucier, G. (2014). The questionnaire
Shadlow, J. O. (2020). Examining configural, metric, big six in 26 nations: Developing cross-culturally
and scalar invariance of the pain catastrophizing applicable big six, big five and big two inventories.
scale in native American and non-Hispanic White European Journal of Personality, 28(5), 482–496.
adults in the Oklahoma study of Native American https://doi.org/10.1002/per.1969
pain risk (OK-SNAP). Journal of Pain Research, 13(1), Thalmayer, A. G., Saucier, G., Ole-Kotikash, L., & Payne, D.
961. https://doi.org/10.2147/JPR.S242126 (2020). Personality structure in east and West Africa:
Sass, D. A. (2011). Testing measurement invariance and Lexical studies of personality in Maa and Supyire-Senufo.
comparing latent factor means within a confirmatory Journal of Personality and Social Psychology, 119(5),
factor analysis framework. Journal of 1132. https://doi.org/10.1037/pspp0000264
Psychoeducational Assessment, 29(4), 347–363. Trapmann, S., Hell, B., Hirn, J. O. W., & Schuler, H. (2007).
https://doi.org/10.1177/0734282911406661 Meta-analysis of the relationship between the Big Five
Saucier, G., Thalmayer, A. G., Payne, D. L., Carlson, R., and academic success at university. Zeitschrift Für
Sanogo, L., Ole-Kotikash, L., Church, A. T., Psychologie/Journal of Psychology, 215(2), 132–151.
Katigbak, M. S., Somer, O., Szarota, P., Szirmák, Z., & https://doi.org/10.1027/0044-3409.215.2.132
Zhou, X. (2014). A basic bivariate structure of per Tupes, E. C., & Christal, R. E. Recurrent personality factors
sonality attributes evident across nine languages. based on trait ratings. (1992). Journal of Personality,
Journal of Personality, 82(1), 1–14. https://doi.org/10. 60(2), 225–251. Original work published 1961.
1111/jopy.12028 https://doi.org/10.1111/j.1467-6494.1992.tb00973.x
Schmitt, N., Golubovich, J., & Leong, F. T. (2011). Impact of Van de Vijver, F. J., & Poortinga, Y. H. (2002). Structural
measurement invariance on construct correlations, equivalence in multilevel research. Journal of Cross-
mean differences, and relations with external corre Cultural Psychology, 33(2), 141–156. https://doi.org/
lates: An illustrative example using Big Five and 10.1177/0022022102033002002
RIASEC measures. Assessment, 18(4), 412–427. Van de Vijver, F., & Tanzer, N. K. (2004). Bias and
https://doi.org/10.1177/1073191110373223 equivalence in cross-cultural assessment: An
Selig, J. P., Card, N. A., & Little, T. D. (2008). Latent variable overview. European Review of Applied Psychology,
structural equation modelling in cross-cultural 54(2), 119–135. https://doi.org/10.1016/j.erap.2003.
research: multi-group and multi-level approaches . In 12.004
F. J. R Van de Vijver, D. A Van Hemert, Y. H Poortinga. Vedel, A. (2016). Big Five personality group differences
(Eds.), Multilevel Analysis of Individuals and Cultures across academic majors: A systematic review.
(pp. 93–119). Lawrence Erlbaum. Personality and Individual Differences, 92(1), 1–10.
Simha, A., & Parboteeah, K. P. (2020). The big 5 personal https://doi.org/10.1016/j.paid.2015.12.011
ity traits and willingness to justify unethical behavior: Wang, S., Chen, C. C., Dai, C. L., & Richardson, G. B. (2018).
A cross-national examination. Journal of Business A call for, and beginner’s guide to, measurement
Ethics, 167(3), 451–471. https://doi.org/10.1007/ invariance testing in evolutionary psychology.
s10551-019-04142-7 Evolutionary Psychological Science, 4(2), 166–178.
Soto, C. J., & John, O. P. (2017). Short and extra-short https://doi.org/10.1007/s40806-017-0125-5
forms of the Big Five Inventory–2: The BFI-2-S and Wu, A. D., Li, Z., & Zumbo, B. D. (2007). Decoding the
BFI-2-XS. Journal of Research in Personality, 68 69– meaning of factorial invariance and updating the
81. https://doi.org/10.1016/j.jrp.2017.02.004 practice of multi-group confirmatory factor analysis:
Spurk, D., Abele, A. E., & Volmer, J. (2015). The career A demonstration with TIMSS data. Practical
satisfaction scale in context: A test for measurement Assessment, Research, and Evaluation, 12(3), 1–26.
invariance across four occupational groups. Journal https://doi.org/10.7275/mhqa-cd89
Page 16 of 17
Steyn & Ndofirepi, Cogent Psychology (2022), 9: 2095035
https://doi.org/10.1080/23311908.2022.2095035
© 2022 The Author(s). This open access article is distributed under a Creative Commons Attribution (CC-BY) 4.0 license.
You are free to:
Share — copy and redistribute the material in any medium or format.
Adapt — remix, transform, and build upon the material for any purpose, even commercially.
The licensor cannot revoke these freedoms as long as you follow the license terms.
Under the following terms:
Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made.
You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
No additional restrictions
You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.
Cogent Psychology (ISSN: 2331-1908) is published by Cogent OA, part of Taylor & Francis Group.
Publishing with Cogent OA ensures:
• Immediate, universal access to your article on publication
• High visibility and discoverability via the Cogent OA website as well as Taylor & Francis Online
• Download and citation statistics for your article
• Rapid online publication
• Input from, and dialog with, expert editors and editorial boards
• Retention of full copyright of your article
• Guaranteed legacy preservation of your article
• Discounts and waivers for authors in developing regions
Submit your manuscript to a Cogent OA journal at www.CogentOA.com
Page 17 of 17