Structural Equivalence and Differential Item Functioning

0 downloads 0 Views 556KB Size Report
Structural Equivalence and Differential Item Functioning in the Social Axioms ... scores obtained in different countries must be treated with caution; (3) the ...
Bias and Equivalence 1 RUNNING HEAD: Social Axioms Survey

Structural Equivalence and Differential Item Functioning in the Social Axioms Survey

Fons J. R. Van de Vijver Tilburg University, Tilburg, the Netherlands and North-West University, Potchefstroom, South Africa Velichko H. Valchev and Irina Suanet Tilburg University, Tilburg, the Netherlands

Bias and Equivalence 2 Abstract The present chapter focuses on the assessment of bias and equivalence of the Social Axioms Survey in a 41-country data set analyzed at the individual level. Two main issues are examined. The first, structural equivalence, addresses the question to what extent the constructs underlying the Social Axioms Survey are universal across the 41 countries. The second, differential item functioning, deals with the question of whether there are particular items or countries that are problematic. Exploratory factor analyses (testing structural equivalence) and analyses of variance (testing item bias) were carried out. The equivalence of the scales was adequate, but neither the exploratory factor analysis nor the analyses of variance provided indisputable support for the equivalence of any scale. The results led to three main conclusions: (1) social axioms show important similarities across cultures; (2) numerical comparisons of scores obtained in different countries must be treated with caution; (3) the observed bias was due to both item and country characteristics. Several items showed secondary (i.e., deviant) loadings in the global factorial solution. Level of economic development and religion (main religious denomination of a country) were associated with bias. In the discussion of our findings, a balanced treatment is recommended to account for both instrument and country characteristics that cause bias.

KEY WORDS: Social Axioms Survey, Structural Equivalence, Differential Item Functioning

Bias and Equivalence 3 Structural Equivalence and Differential Item Functioning in the Social Axioms Survey

Theoretical and Methodological Framework Social axioms deal with our implicit theories about how the world works; they are general beliefs about people and the social world. The initiative for a large-scale project to examine social axioms in a large number of cultures was taken by Leung, Bond, and their colleagues (Leung & Bond, 2004; Leung et al., 2002). They suggested that social axioms are universal. On the basis of a literature review, interviews and content analysis of various sources, they developed the Social Axioms Survey, a scale of 60 items with the following five factors at the individual level (Leung & Bond, 2004): 1. Social Cynicism. This factor relates to a negative view of human nature, a mistrust of social institutions, and a biased view against some groups of people. 2. Reward for Application expresses the belief that effort, knowledge, and careful planning will provide positive results. 3. Social Complexity represents the belief that there are no general rigid rules, but multiple ways of achieving a given outcome. 4. Fate Control refers to the belief that life events are predetermined, but at the same time there are ways for people to influence those events. 5. Religiosity refers to the existence of supernatural forces and beneficial functions of religious belief (Leung & Bond, 2004). Leung and Bond (2004) found that these five factors were reasonably well replicated across 40 cultures. They used a meta-analytic procedure to identify factors,

Bias and Equivalence 4 which they found to show a stable factor structure across the cultures. Leung and Bond focused on identifying similarity across cultures: Are there social axioms that are shared across cultures? Our approach differs significantly from theirs and is aimed at establishing equivalence. We do not address the question of whether there are commonalities in social axioms across cultures but which social axioms have the same meaning across cultures and which items are adequate indicators of these social axioms. So, our approach starts from the findings of Leung and Bond’s similarity approach – which has indicated the presence of culturally shared social axioms – and addresses more specific questions about this similarity. Using a 41-country data set of the Social Axioms Survey, we examined bias and equivalence of the items and factors in individual-level analyses. (The term “country” is used in this chapter for convenience. Problems of the term in the current context should be acknowledged; thus, most samples were recruited from one cultural group in a country only and not all regions where participants were recruited are independent nation states.) Our main question was to what extent concepts and scores measured by the Social Axioms Survey can be compared across the 41 countries. More specifically, we address three issues: (1)

Structural equivalence: To what extent are the constructs underlying the Social Axioms Survey universal across the 41 countries?

(2)

Differential item functioning: Is it possible to identify anomalous items that threaten the equivalence of the instrument?

(3)

Interpretation of the bias: Is it possible to identify country-level characteristics that are related to bias? We draw here on work by Georgas, Berry, and Van de Vijver (2004), who found that

Bias and Equivalence 5 psychological characteristics of countries are related to their levels of economic development and the main religious denomination. We examine whether the latter two country-level variables are also associated with bias. Theoretical Framework The social axioms project starts from the premise that adult persons in all cultures have acquired stable expectations about contingencies and causal agencies operative in the world during their lives. Cultures do not socialize identical beliefs about these contingencies. For example, cultures can be expected to differ in the belief about the role a supreme being has in people’s day-to-day affairs. So, the five types of beliefs are assumed to provide the universal structure of social axioms, but any two countries may have similar or dissimilar positions on each of the five. Bond et al. (2004a) hypothesized that these five dimensions of social axioms can be combined with values to predict social behavior. The revised Schwartz Value Survey (1992, 1994) was administered together with the Social Axioms Survey (Leung et al., 2002) to a sample of 180 undergraduate students in Hong Kong in order to predict three classes of behavioral tendencies: styles of conflict resolution, ways of coping, and vocational interests. The regression analysis demonstrated that although social cynicism was moderately related to the value of self-enhancement, the empirical overlap was small. Social complexity correlated positively with the value dimension of self-transcendence, and fate control with conservation. Religiosity related positively to the values of conservation and self-transcendence and negatively to self-enhancement. It was found that social axioms enhanced the predictive power of values in explaining the measured social behaviors. Methodological Framework

Bias and Equivalence 6 The exploration of bias and equivalence is guided by the framework developed by Van de Vijver and Leung (1997). These authors made a distinction between three sources of bias: constructs, method-related aspects (such as confounding sample differences and response styles), and items (item bias or differential item functioning). Construct bias occurs when the construct measured is not identical across groups. Method bias refers to all sources of assessment problems emanating from an instrument or its administration, such as varying concerns about social desirability. Finally, item bias (or differential item functioning) refers to anomalies at item level, such as poor translation of words or differences in connotations in multilingual studies. An item is biased if persons from different groups with the same level of the underlying trait (commonly operationalized as the total score on the scale or instrument measuring that trait) do not have the same expected score on that item. Because of data constraints, the present chapter focuses on construct and item bias. An analysis of method bias typically requires additional data (such as social desirability questionnaire data administered to the same participants), whereas the assessment of construct and item bias can be based entirely on data obtained to date with the Social Axioms Survey. The Present Study We set out to examine the structural equivalence and item bias of the social axioms instrument as applied in 41 countries. If neither construct bias (structural equivalence) nor item bias (differential item functioning) plays a substantial role, our analysis provides important evidence for the universality of social axioms and their measure as developed by Leung and Bond (2004). However, if we find that either or both types of bias play a substantial role, it is important to describe that bias in a detailed manner and to identify its possible sources. The bias research tradition has

Bias and Equivalence 7 been much stronger in identifying bias than in identifying its precursors. By a careful analysis of the content of any bias and of country characteristics that might be associated with that bias, we expect to be able to delineate aspects of social axioms that are unique to cultures, that are shared by some cultures, and that are shared by all cultures. Method Sample The data from Leung and Bond (2004) were used for this study. The sample consisted of convenience student samples from 41 countries (with their respective sample sizes): Belgium (284), Brazil (200), Canada (146), China (160), Czech Republic (100), Estonia (124), Finland (100), France (120), Georgia (118), Germany (272), Greece (136), Hong Kong (162), Hungary (258), India (710), Indonesia (178), Iran (84), Israel (150), Italy (138), Japan (180), Korea (222), Latvia (142), Lebanon (110), Malaysia (324), the Netherlands (252), New Zealand (200), Nigeria (94), Norway (104), Pakistan (142), Peru (122), Philippines (172), Portugal (304), Romania (128), Russia (116), Singapore (138), Spain (104), Taiwan (246), Thailand (90), Turkey (198), United Kingdom (80), United States (682), and Venezuela (64). The total sample size was 7654. Instrument All students filled out the Social Axioms Survey. The questionnaire of 60 items has five subscales: Social Cynicism (18 items), Reward for Application (14 items), Social Complexity (12 items), Fate Control (8 items), and Religiosity (8 items). Each item consists of a statement and participants indicate their level of (dis)agreement with the statement on a five-point scale. Psychometric properties of the instrument were adequate (Leung & Bond, 2004).

Bias and Equivalence 8 Statistical Analysis Structural equivalence. The statistical analysis consisted of several procedures addressing structural equivalence and item bias. We first reviewed the global factorial solution of the combined data set and addressed the structural equivalence of the factor loadings across the 41 countries. Structural equivalence addresses the question to what extent the constructs that are supposed to be assessed by the Social Axioms Survey are comparable across all countries. Structural equivalence is usually addressed by means of confirmatory or exploratory factor analyses. Our preference for exploratory factor analysis is based on previous experience with large scale studies (e.g., Georgas, Berry, Van de Vijver, Kagitcibasi, & Poortinga, 2006) in which we found that the standard procedures for establishing equivalence with structural equation modeling cannot be employed in a straightforward manner to studies involving many countries. A major problem is the interpretation of fit statistics when separate items are analyzed. Fit statistics sensitive to sample size, such as the global chi-square test, can be expected to invariably yield a poor fit in the current data set, even when models are tested that impose no or few equality constraints across cultures. The examination of structural equivalence comprised two different tests. In the first procedure, a pooled solution is computed which is a factor analysis of the covariance matrices averaged across all countries (and weighted by the sample size of each country). This pooled solution is our best estimate of the global solution. The factor loadings of this solution are then compared to the loadings in the factor solutions of the separate countries. This comparison determines to what extent each country is similar to the global solution.

Bias and Equivalence 9 We use Tucker’s (1951) phi as a measure of factorial similarity; the statistic does not measure absolute identity across groups, but allows for differences in eigenvalues (reliabilities) of factors obtained in different groups. Tucker’s phi measures similarity of loadings in two groups. Different rules of thumb have been proposed as to which lower value of phi is needed for establishing similarity of factors. Ten Berge (1986) has proposed .85 as a lower value, whereas Van de Vijver and Leung (1997) proposed .90. Chan, Ho, Leung, Cha, and Yung (1999) developed a bootstrap procedure to determine the lower bound. This approach was not adopted here because of the large number of comparisons involved. The second procedure amounts to a comparison of the factor solutions of all countries in a pairwise manner. For each factor the matrix with pairwise agreement indices of the factors across the countries is subjected to a weighted multidimensional scaling procedure to examine to what extent the sample of 41 countries has subsets of countries showing high agreement indices within a subset and low agreement indices across subsets. The two procedures address the same problem, but their strengths are not the same. The first procedure in which each country is compared to the global average works better when deviations from the global solution are few and not systematic. The second procedure works better when the total sample of countries consists of clusters (e.g., all individualistic countries show the same factor loadings and all collectivistic countries show the same factor loadings, but the loadings of the two clusters are not identical). Both procedures were employed here because we did not have prior expectations about the size or nature of the (dis)similarities of the factors in the various countries. Differential item functioning (item bias). In the factor analytic procedures, we made a distinction between two kinds of comparisons - the first compared a country

Bias and Equivalence 10 with the pooled solution, whereas the second made pairwise comparisons of all countries. An analogous distinction (but here applied in reverse order) was used in the analysis of item bias - first we compared all 41 countries with each other, and next we compared each country with the combined data of all remaining countries. We employed a procedure to identify item bias in Likert data that is described by Van de Vijver and Leung (1997). The procedure assumes that the scales scrutinized for bias are unidimensional. For each scale a separate set of item bias analyses was conducted. It was decided to split up the total sample in three, more or less equal-sized subgroups (with low, medium, and high total scale scores) in order to have adequate numbers in all groups to allow for stable estimates. A split in score levels was made for each scale separately. The core of the item bias analysis is constituted by an analysis of variance in which country and total score level are the independent variables and item score is the dependent variable. Country is an independent variable with 41 levels in the first item bias analysis (in which all countries are compared) and 2 levels in the second analysis (in which a country is compared to the combination of all other countries). A main effect of culture points to the presence of uniform bias (Mellenbergh, 1982), indicating that an item with this bias is consistently more attractive (or less attractive) for at least those persons from one country in comparison to those from the other countries. A significant country-by-score-level component points to the presence of nonuniform bias; such an effect indicates that country differences in scores depend on score level. An item with this bias is better at discriminating in one country than in another. The large sample size and the large number of countries in the analysis of variance of the present study complicate the analyses of item bias. The use of

Bias and Equivalence 11 conventional significance tests in such large samples is likely to provide an overestimation of the number of biased items. Small and psychologically trivial differences may lead to uniform or nonuniform bias. The problem is compounded by the need to merge large groups of persons into three score levels. These problems were circumvented by examining effect sizes rather than significances. An item is considered to be biased if the sum of the effect sizes (partial eta square) of the uniform and nonuniform bias is .06 or higher (which constitutes a moderate effect; Cohen, 1988). Analyses of bias and equivalence are usually employed to identify anomalies in measurement instruments that prevent full-fledged, cross-cultural comparisons of scores. However, the procedures can also be used to identify countries that show deviances in response patterns. We are interested in both features; it could well be that threats to equivalence come from specific items and from specific countries. The analyses that are reported below examine both item fit and country fit. Results The results are split into four parts: The first discusses the pooled solution; the results of the structural equivalence analyses are presented in the second part; item bias analyses are discussed in the third part; the fourth part integrates the results from the analyses of structural equivalence and item bias. Factor Analysis of the Combined Data Set: Pooled Solution In accordance with the existing literature on the factorial structure of social axioms at the individual level, we extracted five factors; they accounted for 24.33% of the variance. The factor loadings of the pooled data set are presented in Table 1. Most of the items had their highest loading on the intended factor.

Bias and Equivalence 12 We focus here on the items that showed minor or major deviations from the expected structure. An item is (somewhat arbitrarily) taken to be deviant if it shows a loading (in absolute value) of at least .30 on a nontarget factor or a loading on any factor that is larger (in absolute value) than its loading on the target factor. One item of the Social Cynicism scale is deviant, viz., item 16: “Humility is dishonesty”. The item had a higher loading on both Social Complexity and Fate Control than on Social Cynicism. These multiple loadings suggest that the item can be interpreted in different ways. The secondary loading of the item may be due to the ambiguous nature of the word “humility” which could refer to a value and a virtue that is a stable characteristic of personality, but also to context-specific negotiation tactics. The positive loading of .27 on Fate Control could indicate that some participants see humility as a tool to get ahead in life (various other items of Fate Control refer to ways individuals can influence their fate), and the negative loading of -.22 on Social Complexity may indicate that humility is viewed by some participants as a successful ingratiation tactic (given its negative sign). The fourth item of the Reward for Application scale, viz., “Mutual tolerance can lead to satisfactory human relationships”, had a secondary loading of .30 on the Social Complexity factor. The loading is not surprising, as many items of the latter factor refer to the mutual coordination of behavior among society members. In other words, both the coordination and reward aspects of the item are considered relevant by the participants. The same combination seemed to affect the fifth item of the Social Complexity scale, viz., “To deal with things in a flexible way leads to success”, which had a secondary loading of .31 on the Reward for Application factor. The last item of

Bias and Equivalence 13 the Social Complexity scale, viz., “One’s appearance does not reflect one’s character”, did not show a strong loading on any factor. The last item of the Fate Control scale, viz., “A person’s talents are inborn”, showed the highest (though still modest) loading of .20 on Social Cynicism. It is noteworthy that the latter two items, which deal exclusively with the role of physiological factors, do not show a clear factor pattern. It could well be that in the groups of students of the present study these physiological features do not belong to the realm of social axioms. The last factor, Religiosity, showed an interesting pattern. The main finding of our analysis was that all relevant items showed high loadings on that factor; yet, five more items of other factors showed salient loadings: Reward for Application, item 9: “The just will eventually defeat the wicked;” Reward for Application, item 11: “Good deeds will be rewarded, and bad deeds will be punished;” Social Complexity, item 6: “To experience various life styles is a way to enjoy life” (negative loading); Social Complexity, item 11: “There are phenomena in the world that cannot be explained by science;” and Fate Control, item 5: “All things in the universe have been determined.” All these items refer to issues that are associated with theology and relinquishing power to a supreme being. Leung and Bond (2004) reported the same five factors, but their factor loadings are slightly different from ours. There are four reasons for these small differences. Firstly, we analyzed the full set of 60 items, while Leung and Bond report the loadings of 39 pancultural items, with many problematic items excluded. Secondly, we analyzed data from 41 countries while Leung and Bond analyzed data from 40 countries that have relatively larger sample sizes. Thirdly, we employed a factor analytic procedure while Leung and Bond employed a meta-analytic procedure.

Bias and Equivalence 14 Finally, we weighted the data by sample size, while Leung and Bond used unweighted procedures. It can be concluded that our results largely replicate those of Leung and Bond (2004), and most items showed their strongest loadings on the intended factor. Some items contained words or expressions which were interpreted in more than one way by participants, and the Religiosity factor was not just about religion and a supreme being but also about other religion-related convictions, such as the belief in a just world and the belief that certain things in life cannot be explained by science. Structural Equivalence Analysis Comparison of pooled factors with country factors. The first analysis of structural equivalence addressed the similarity of the five factors in the pooled data set to the factor solutions in each of the 41 countries. The results are presented in Table 2. The first striking observation is that with the exception of the U.S.A., there is no country with a value of phi, averaged across the five factors, that is larger than .90. An application of Ten Berge’s lower limit of .85 increases the group to ten nondeveloping countries (Hong Kong, Brazil, Germany, India, the Netherlands, Canada, Hungary, Taiwan, and the U.S.A.). However, most countries do not meet the criterion of .85. A closer inspection showed that the lowest values (i.e., average values lower than .70) were obtained for Thailand (.56), Nigeria (.63), Pakistan (.63), Georgia (.67), Indonesia (.67), and Iran (.68). Although one can only speculate about the reasons for these low values, it is noteworthy that these countries are culturally dissimilar from the mainly Western countries, which show higher agreement statistics. Moreover, very few comparative studies have been conducted in these countries, with the implication that there are no reference data available that could possibly account for the low values.

Bias and Equivalence 15 A second observation from the Table involves the low global average (across all factors and countries) of .77; this low value points to equivalence issues in the data set. Particularly the last two factors (Fate Control and Religiosity) revealed low averages (of .73 and .64, respectively). We cannot rule out the possibility that these low values are inherent to the lower cross-cultural validity of these constructs as compared to the first three factors; yet, the more obvious explanation may be the number of markers for the factors. The last two scales were the shortest (with 8 items each). As a consequence, the eigenvalues of these factors are lower and the factor structure is less well defined, as is often found in the literature. It can be concluded that overall, the item-level analyses do not unequivocally support the structural equivalence of the factors in the comparison of each country with the pooled factor solution. In particular Fate Control and Religiosity do not meet the requirements of structural equivalence in most countries. Although not further documented here, we tried to eliminate some items of scales that seemed to contribute most to the misfit. The approach of selective item elimination did not lead to acceptable values of phi for all countries. Thirty-nine-item version. At this point we addressed the question whether our findings would replicate for the subset of 39 pancultural items proposed by Leung and Bond (2004). We constructed a pooled factorial solution extracting the same five factors on the basis of the 39 items and compared the solutions of the 41 countries to the pooled one. Tucker’s phi values, averaged across countries, were: .89 (Social Cynicism), .86 (Reward for Application), .85 (Social Complexity), and .72 (both Fate Control and Religiosity); the global average across factors and countries was .81. Thus, there was some visible improvement in overall structural equivalence, but especially Fate Control and Religiosity (the item content of which two has hardly

Bias and Equivalence 16 changed in the 39-item version) remained problematic. In sum, it seems safe to conclude that the reasons for the observed misfit do not reside in an identifiable subset of items. Pairwise comparison of country factors. The next analysis computed factor solutions for each of the 41 countries. The analysis produced a country-by-country agreement matrix for each factor. These five matrices yielded average values of .72, .76, .69, .65, and .62, respectively (see Table 3 for the average agreement indices); as could be expected, these values are somewhat lower than the values found for the pooled solution (of .83, .85, .83, .73, and .64, respectively). We were, however, interested in the patterning of the agreement indices across countries. Therefore, we conducted a weighted multidimensional scaling analysis (also known as INDSCAL) on the country-by-country agreement matrices; the five factors were used as the replications. Solutions of 2 up to 6 dimensions were examined. The stress values of these solutions were .41, .32, .27, .23, and .19, respectively. The squared multiple correlations were .20, .25, .27, .29, and .29. It was decided to extract three dimensions. The coordinates of the countries in this solution are given in Table 4. A visual inspection of the coordinates does not suggest a clear interpretation of the dimensions. We computed correlations of the coordinates with two kinds of country-level variables, namely the Gross Domestic Product (GDP, corrected for differences in purchasing power parity) in 2004 and the main religious denomination of a country (cf., Georgas et al., 2004). In the case of the latter variable, we assigned the countries according to the self-declared denomination of the largest part of the country’s population as reflected in the most recent available census. We distinguished between Roman Catholic countries (Belgium, Brazil, Canada, France, Hungary, Italy, the

Bias and Equivalence 17 Netherlands, Peru, the Philippines, Portugal, Spain, and Venezuela), Protestant countries (Finland, Germany, Latvia, New Zealand, Norway, United Kingdom, and the United States), Eastern Orthodox countries (Georgia, Greece, Romania, and Russia), Muslim countries (Indonesia, Iran, Lebanon, Malaysia, Nigeria, Pakistan, and Turkey), Buddhist countries (Hong Kong, Japan, Singapore, Taiwan, and Thailand), and atheist countries (China, Czech Republic, Estonia, and South Korea). India (Hinduism) and Israel (Judaism) were not included in any cluster because their main religion is unique for that country in the current data set, which prevents the disentangling of country- and religion-specific effects. We constructed dummy variables for each religious denomination; so, we used denomination as a dichotomous variable for the computation of correlations. The atheist countries tended to have a higher score on the first dimension of the weighted multidimensional scaling solution, r(41) = .33, p < .05, and Gross Domestic Product was negatively related to the second dimension, r(41) = -.50, p < .05. No other correlation differed significantly from zero. It can be concluded that there is no simple patterning in the pairwise agreement matrices of the factors across the various countries. Differential Item Functioning As a first step to assessing item bias, we conducted analyses of variance with country and sum-scale-score per factor as the independent variables and item score as the dependent variable. The bias indicators (combined effect sizes for main effect of country and the interaction between country and sum score) are presented in the first column of Table 5. Out of the 60 items, 49 (82%) were biased with an effect size (partial η2) of at least .06. For the remaining 11 items, the effects were significant as well, but effect sizes were lower (one was at .03, one at .04 and all the rest at .05).

Bias and Equivalence 18 Four items had large country effects (above .14): “Humility is dishonesty” (Social Cynicism), “Individual effort makes little difference in the outcome” and “One’s appearance does not reflect one’s character” (Social Complexity), and “Good luck follows if one survives a disaster” (Fate Control). To explore the sources of the observed bias, we conducted a series of analyses of variance in which we compared each of the 41 countries with the rest. Conceptually, this analysis is comparable to the exploratory factor analysis of the pooled data in which we also compared each country with the global average of the countries. We constructed dummy variables for each country and tested the effects of these variables on item score, again splitting up the sample according to sum score level. The outcomes of this series of analyses can be described as a large matrix containing the effect sizes of each of the 41 country-rest pairs on each of the 60 items, adding up to 2460 entries. It is remarkable that there were only 11 instances of low and two of medium effects (presented in the second column of Table 5) in such a large data matrix. As to the two medium effects, Indian respondents were more likely to score higher than the rest on “Humility is dishonesty,” irrespective of mean Social Cynicism, and Malaysian responses on “Individual effort makes little difference in the outcome” were similarly shifted upwards irrespective of mean Social Complexity. The scarcity of strong effects indicated that the bias observed in the first analysis was not driven by any particular single country’s standing out from the rest. Different methods could be deployed to explore the country patterns of bias. One would be to identify in the analysis that included all countries the most strongly biased countries for each item. Then we could form – per item – groups of countries according to the direction of item bias, and compare these groups in an analysis of

Bias and Equivalence 19 variance. A second possibility would be to group the countries based on known country-level variables such as GDP and religion, applying the same grouping to all items. We chose the latter approach, because it offered two important and related advantages: firstly, by dealing with a common grouping for all items, we would be able to identify global patterns of bias which would be more generalizable than any single item-country combination; secondly, groupings based on country-level variables are the most theoretically relevant and would allow us to get a grip on the sources of bias at the country level, which is a central question of our study. GDP. We formed six groups of countries according to their 2004 GDP level (in US dollars): up to 3,800 (Georgia, India, Indonesia, Nigeria, and Pakistan), 4,7006,000 (China, Lebanon, Peru, the Philippines, and Venezuela), 7,500-10,400 (Brazil, Iran, Malaysia, Romania, Russia, Thailand, and Turkey), 11,900-25,000 (Czech Republic, Estonia, Greece, Hungary, South Korea, Latvia, New Zealand, Portugal, and Spain), 26,500-32,200 (Belgium, Canada, Finland, France, Germany, Hong Kong, Israel, Italy, Japan, the Netherlands, Singapore, Taiwan, and the United Kingdom), and over 39,000 (Norway and the United States). We conducted analyses of variance in the same way as in the previous analyses, this time using GDP level (6 levels) instead of country as independent variable. The results are displayed in the third column of Table 5. The effect sizes of three items were at least .06: “Individual effort makes little difference in the outcome” (Social Complexity), “Good luck follows if one survives a disaster” (Fate Control), and “There is a supreme being controlling the universe” (Religiosity). Overall, higher GDP levels seemed to go with lower scores on all three items (see Figure 1). This trend was most obvious for the first item, whereas in the second one, it was mainly the lowest GDP group that stood out, and in the third, the trend could be represented as

Bias and Equivalence 20 approaching a U-curve. The countries in the highest GDP group (the United States and Norway) scored higher on this Religiosity item and closer to the intermediate GDP group than the two GDP groups below them. We assessed the association of GDP level with mean scores on these three items by means of rank order correlations: we correlated GDP level with mean item score. For all three items, correlations were substantial, with Spearman’s rho values of -.89, -.60, and -.77, respectively, although only the first was significant. Religion. The same classification in six religions was used as in the structural equivalence analysis; India and Israel were again excluded. We conducted ANOVA in the same way as in the previous item bias analyses. The outcomes are presented in the last column on the right in Table 5. There were five items with medium effect sizes (see Figure 2 for the specific trends per item). Firstly, Catholics and Protestants generally scored lower than the rest on “Females need a better appearance than men”, given the same level of Social Cynicism. Secondly, “Individual effort makes little difference in the outcome” received low scores in Catholic and Protestant countries and high scores in Muslim countries across levels of Social Complexity. Thirdly, Islam and, to a lesser extent, Buddhist countries scored consistently higher than the rest on “All things in the universe have been determined”, irrespective of mean Fate Control. Fourthly, “There is a supreme being controlling the universe” received high scores in Islam countries and low scores in atheist and Buddhist countries, with the others falling in between, across all levels of Religiosity. Finally, the statement, “Belief in a religion makes people good citizens”, was endorsed more in Eastern Orthodox countries and less in Protestant and Catholic countries as compared to Buddhist and atheist countries. Muslim countries cut across the lines of the rest on this item: they had the lowest scores in the low Religiosity level group, and the highest in

Bias and Equivalence 21 the high one. In sum, it seemed that, particularly for the above five cases, religion had a consistent, rather strong effect on the functioning of the items, regardless of their underlying dimensions. 39-item version. Finally, we were interested in the question, to what extent our findings of item bias also hold for the subset of 39 pancultural items, proposed by Leung and Bond (2004). Out of the 49 biased items we had identified in the analysis by country, 29 are retained in the 39-item version. One of them was the item, “Good luck follows if one survives a disaster” (Fate Control), which had an effect size of .18. Out of the six items biased by GDP and/or religion, three are retained in this pancultural set. To explore the reduction of bias in the 39 items more directly, we compared the means of their item bias indices to those of the remaining 21 items. We conducted such comparisons per scale (except for Religiosity, where only one item had been removed in the 39-item version) and for the Social Axioms Survey as a whole; we included in the comparison the item bias estimates by country, GDP, and religion (from Table 5). The mean bias estimates were lower in the 39 items; we tested the significance of these differences with t tests. The significance values are presented in Table 6. Only five differences out of 15 comparisons were significant at .05. These findings suggest that item bias is reduced in the 39 items, but not eliminated. The specific influences of GDP and religion seemed to remain clearly recognizable even after the removal of the 21 items. In summary, there were indications for a widespread bias of the Social Axioms Survey items by country. For the most part, this bias could not be attributed to single countries but rather to groups of countries. Religion and economic development (as expressed by GDP) were found to influence bias. The medium-high average effect

Bias and Equivalence 22 size of item bias of .08 on the level of the Social Axioms Survey as a whole could be explained 26% by GDP and 29% by religious denomination (cf., bottom row of Table 5). Synthesis Various analyses were conducted to identify items and countries with deviant score patterns. The main conclusion of the analyses is that support for the equivalence of the scales across the 41 countries is moderate. Equivalence is not unequivocally supported for any scale. Yet, equivalence is not particularly poor for any scale either. The question was then addressed whether the bias and equivalence analyses yielded similar results. We explored the association between differential item functioning and the structural equivalence analyses by computing the correlation between the item bias effect sizes of country, on the one hand, and the root mean squared differences between the country loadings of each item and the loadings in the pooled solution, averaged across the five factors and across countries, on the other hand. The latter measures constitute aggregate factor-discrepancy terms of the items. As item bias indicators, we used both the combined effect sizes of country (first column of Table 5) and the effect sizes for the main effect of country (not presented here). When using the combined estimates, r(60) = .24, ns; when using the main effect sizes, r(60) = .25, p = .05. This analysis suggested there is some association between structural inequivalence and item bias across items, but it is rather weak; there did not seem to be any sets of items clearly bringing about inequivalence and differential item functioning in tandem. A recurrent problem in analyses of differential item functioning is the elusive nature of item bias. It is often not easy to identify the commonalities of biased items (e.g., Holland & Wainer, 1993). We tried to unpackage the meaning of item bias by

Bias and Equivalence 23 using Gross Domestic Product and the main religious denomination of a country as moderators of item bias. We found that about one quarter of the item bias could be accounted for by each of these two factors; these factors were fairly independent. Discussion Social axioms provide an interesting way to study the social reality of individuals from different countries. The axioms are clearly different from norms and values and attitudes, and their relevance for understanding social behavior has been demonstrated in within-culture research (Bond et al., 2004a; Chen, Fok, Bond, Matsumuto, 2006; Leung & Bond, 2004; Liem, Hidayat, & Soemarno, this volume), as well as cross-cultural research (Fu et al., 2004). The current chapter addressed the question to what extent social axioms, as measured by the questionnaire developed by the initiators of the project (Leung and Bond), are comparable across cultures. Leung and Bond (2004) provided support for the structural equivalence of the factors underlying the questionnaire. The five scales (social cynicism, reward for application, social complexity, fate control, and religiosity) were found to be reasonably equivalent across cultures. The present chapter examined this topic in more detail by applying different statistical techniques. We were interested in the question to what extent the 41-country data set can be seen as homogeneous with regards to the psychological meaning of the scales. Two methods were employed to examine the equivalence. The first is exploratory factor analysis. Equivalence is supported if factors found in different countries are similarly constituted. The second is item bias analysis. Equivalence is supported here if persons from different countries with the same total scores on a scale have on average the same score on each item of the scale.

Bias and Equivalence 24 The support for the equivalence of the scales was moderate; neither the exploratory factor analysis nor the item bias analysis provided indisputable support for the equivalence of any scale. We found that the equivalence of the scales was not bad, but it was not very good either. A more detailed analysis of the sources for the lack of equivalence showed problems with different items in different countries. It is not clear to what extent sample particulars, language issues, specific cultural issues or procedures of administering the questionnaire and recording the responses might have played a role in the poor equivalence. It is clear, however, that the removal of some countries or some items would not resolve the equivalence issues we observed. Probably the most important problem on item level was the presence of secondary loadings, which means that these items measure not only their intended constructs, but also tap other social axioms. The best example is the item, “Humility is dishonesty,” which was apparently interpreted in different manners by the participants. Various items showed secondary loadings on the Religiosity scale. Our analysis suggests that religiosity is an important domain of social axioms that might cover a broader array of beliefs than in its initial conceptualization. Besides the belief in a supreme being and the reference to beneficial functions of religious belief in people’s lives, religiosity seems to involve also other theological issues, such as the belief in a just world and beliefs about non-material causes of events in the everyday world. In our view, three main conclusions can be derived from our study. Firstly, social axioms show important similarities across cultures. Although our equivalence analyses indicated that there were various minor sources of discrepancies across cultures, we did not find any consistent evidence for the presence of different ways of construing the social world in different cultures. Secondly, numerical comparisons of

Bias and Equivalence 25 scores obtained in different countries have to be treated with caution. There is a fair chance that due to the incomplete equivalence, the validity of these comparisons is challenged and that non-intended factors, such as other social axioms or other crosscultural differences in norms, values, or attitudes, contaminate scores of all scales. Thirdly, the observed bias may be a consequence of both item and country characteristics; level of economic development as well as religious denomination were predictors of bias at the item level. It is common in the literature to focus on instrument characteristics in the analysis of bias and equivalence. On the other hand, our analyses clearly indicated that a more balanced treatment is needed to account for both instrument and country characteristics that lead to bias. For example, we found that there are certain problematic items that should be removed if the aim of the study was to maximize equivalence. Nevertheless, the removal of some countries may also lead to higher levels of equivalence. The psychometric techniques for identifying bias and determining equivalence lack this balanced perspective. The one-sidedness of bias analysis may not be important if the cross-cultural study involves a small number of countries. However, if the number of countries is large, as in the present study, it is counterproductive not to look for country as a source of bias and to exclude the possibility that a different model would hold in some countries. Leung and Bond (2004) have encouraged researchers in any country or given subset of countries to conduct their own analyses of the axioms matrix in order to identify these relevant factor structures. The question can be asked whether our finding that so many items are biased invalidates previous publications. It may seem impossible to combine our bias analyses with the findings that many individual and country-level characteristics show

Bias and Equivalence 26 meaningful associations with social axioms. In our view, these two findings are compatible. Item bias points to the presence of sources of variation at item level other than the target construct measured by the scale (e.g., an item may measure acquiescence in addition to social cynicism). However, the presence of such bias does not imply that country-level differences in scores are entirely accounted for by acquiescence but rather that these differences are influenced, for example, both by social cynicism and acquiescence. Correlations with country characteristics are affected by item bias, but as long as the item bias is not substantial, country differences in item means may still largely reflect social cynicism. There is an increasing number of large-scale studies in which structural equivalence and/or item bias have been determined; examples are studies of leadership and management styles (House, Hanges, Javidan, Dorfman, & Gupta, 2003), values (Fontaine, 1999), personality (Barrett, Petrides, Eysenck, & Eysenck, 1998; McCrae et al., 2005), and family roles and functions (Georgas et al., 2006). Based on the experiences obtained in these projects, we see both strengths and weaknesses in statistical procedures for identifying bias and their usage. The main strength of the procedures is their statistical rigor. It goes without saying that support for the absence of bias is required before any cross-cultural score comparison can be made. Another advantage of these techniques is that they identify and quantify problematic features of the data set. An examination of the equivalence makes clear which items or factors show bias. We also see two weaknesses in these procedures. The first issue is what could be called a power problem. Some statistical techniques seem to be overpowered for their purpose. A straightforward application of significance criteria in the current study would have led to the conclusion that all items were biased. However, an

Bias and Equivalence 27 inspection of the effect sizes indicated that many items showed small amounts of bias. The same problem of finding significant, though psychologically trivial, cross-cultural differences in large data sets is also often troubling for multigroup structural equation modeling. The second weakness involves the elusiveness of the sources and nature of bias and inequivalence. It is a recurrent problem in bias and equivalence analysis that the commonality underlying problematic items or factors is difficult to find. We observed the same problem in the present study. Different procedures have been proposed in the literature to deal with this problem: Scheuneman (1987), Kok, Mellenbergh, and Van der Flier (1985), and more recently Schilt-Van Mol and Vallen (2006) tried to manipulate item contents so as to decrease the bias in cognitive items. The present study adopted a different approach and attempted to unpackage bias by examining the role of specific country-level variables, such as level of economic development and religious denomination, as sources of bias. The reduction of bias after the introduction of these country-level variables provides an estimate of their relative contribution to the bias. What are the implications of our study for the analysis of large-scale crosscultural projects? The first is that there is a need for examining cross-cultural equivalence. These examinations will give insight about the etic and emic aspects of the constructs measured. Equivalence is not an intrinsic feature of a data set but a characteristic of a cross-cultural comparison. As a consequence, it is important to address the issue in each and every comparison. The second implication is that bias and valid score differences can occur at the same time. If there is some item bias or if some items have higher loadings in some countries, observed country differences in mean scores are confounded by this bias. This does not imply that country-level

Bias and Equivalence 28 differences are entirely due to bias but that they are a mixture of valid and artifactual differences. The present chapter described and applied a bias detection procedure that can be useful in various large-scale cross-cultural studies. There appears to be some reluctance in reports of such studies to address bias, possibly because of the idea that bias analyses belittle the value of a study. This reluctance is based on a one-sided interpretation of bias. In our view, it is unrealistic to expect that studies involving dozens of countries do not show any bias. It is unlikely that in the long lists of items we like to present to our participants, all words and concepts convey the same denotative and connotative meaning across all cultures. The absence of any bias analysis does not mean that there is no bias in a data set, whereas an analysis of the bias gives us clues as to the extent of bias. Therefore, it is a sign of strength and not of weakness to examine the presence of bias in crosscultural data sets. Moreover, bias analyses can reveal interesting cross-cultural differences. Finding that a concept has a different meaning in a culture can be an interesting starting point for further study. Bias and substantive analyses are often seen as aiming at incompatible goals, with bias analysis aiming at the removal of “wrong” items and the substantive analyses aiming at exploring “correct” items. We think that both kinds of analyses are highly compatible and that bias and substantive analyses explore or test cross-cultural similarities and differences. Many hypotheses about cross-cultural differences that we test in substantive analyses can be couched in terms of bias. For example, item bias analyses address the question of item score differences holding overall scores on the instrument constant. However, “holding all other variables constant” is also a kind of reasoning that we often employ in testing substantive hypotheses, such as in

Bias and Equivalence 29 regression analyses in which we study the direct relation between variables controlling for all other variables. In conclusion, the analysis of bias and substantive analyses should be thought of as much more compatible and complementary than is usually assumed. The present chapter indicates that there are important cross-cultural commonalities in the structure of social axioms. However, our study also shows that measurement issues are still not completely resolved, and it may well be the case that there are also sources of culture-specific features in social axioms. The finding of Bond and colleagues (2004b) that social axioms are more strongly correlated at the country level than at the individual level also indicates that both the conceptualization and measurement of social axioms need further study to better understand the individual and cross-cultural differences we observe.

Bias and Equivalence 30 References Barrett, P. T., Petrides, K. V., Eysenck, S. B. G., & Eysenck, H. J. (1998). The Eysenck Personality Questionnaire: An examination of the factorial similarity of P, E, N, and L across 34 countries. Personality and Individual Differences, 25, 805-819. Bond, M. H., Leung, K., Au. A., Tong, K.-K., & Chemonges-Nielson, Z. (2004a). Combining social axioms with values in predicting social behaviors. European Journal of Personality, 18, 177-191. Bond, M. H., Leung, K., Au, A., Tong, K.-K., Reimel de Carrasquel, S., Murakami, F., Yamaguchi, S., Bierbrauer, G., Singelis, T. M., Broer, M., Boen, F., Lambert, S. M., Ferreira, M. C. Noels, K. A., Van Bavel, J., Safdar, S., Zhang, J., Chen, L., Solcova, I., Stetovska, I., Niit, T., Niit, K., Hurme, H., Böling, M., Franchi, V., Magradze, G., Javakhishvili, N., Boehnke, K., Klinger, E., Huang, X., Fülop, M., Berkics, M., Panagiotopoulou, P., Sriram S., Chaudhary, N., Ghosh, A., Vohra, N, Iqbal, D. F, Kurman, J., Comunian, A. L., Son, K. A., Austers, I., Harb, C., Odusanya, J. O. T., Ahmed, Z. A., Ismail, R., Van de Vijver, F. J. R., Ward, C., Mogaji, A., Sam, D. L., Khan, M. J. Z., Cabanillas, W. E., SyCip, L., Neto, F., Cabecinhas, R., Xavier, P., Dinca, M., Lebedeva, N., Viskochil, A., Ponomareva, O., Burgess, S. M., Oceja, L. J., Campo, S. Hwang, K., D'Souza, J. B., Ataca, B., Furnham, A., & Lewis, J. R. (2004b). Culture-level dimensions of social axioms and their correlates across 41 cultures. Journal of Cross-Cultural Psychology, 35, 548-570. Chan, W. Ho, R. M., Leung, K. Cha, D. K-S., & Yung, Y-F. (1999). An alternative method for evaluating congruence coefficients with Procrustes rotation: A bootstrap procedure. Psychological Methods, 4, 378-402.

Bias and Equivalence 31 Chen, S. X., Fok, H. K., Bond, M. H., & Matsumuto, D. (2006). Personality and beliefs about the world revisited: Expanding the nomological network of social axioms. Personality and Individual Differences, 41, 201-211. Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Erlbaum. Fontaine, J. (1999). Culturele vertekening in Schwartz’ waardeninstrument [Cultural bias in Schwartz’s values instrument]. Unpublished doctoral dissertation, Leuven, Belgium: Katholieke Universiteit Leuven. Georgas, J., Berry, J. W., Van de Vijver, F. J. R., Kagitcibasi, C., & Poortinga, Y. H. (Eds.) (2006), Families across cultures: A 30-nation psychological study. Cambridge, UK: Cambridge University Press. Georgas, J., Van de Vijver, F. J. R., & Berry, J. W. (2004). The ecocultural framework, ecosocial indices and psychological variables in cross-cultural research. Journal of Cross-Cultural Psychology, 35, 74-96. Holland, P. W., & Wainer, H. (Eds.) (1993). Differential item functioning. Hillsdale, NJ: Erlbaum. House, R. J., Hanges, P. J., Javidan, M., Dorfman, P., & Gupta, V. (Eds.) (2003). GLOBE, cultures, leadership, and organizations: GLOBE study of 62 societies. Newbury Park, CA: Sage. Kok, F. G., Mellenbergh, G. J., & Van der Flier, H. (1985). Detecting experimentally induced item bias using the iterative logit method. Journal of Educational Measurement, 22, 295-303. Leung, K., & Bond, M. H. (2004). Social axioms: A model for social beliefs in multicultural perspective. Advances in Experimental Social Psychology, 36, 119-197.

Bias and Equivalence 32 Leung, K., Bond, M. H., De Carrasquel, S. R., Muñoz, C., Hernández, M., Murakami, F., Yamaguchi, S., Bierbrauer, G., & Singelis, T. M. (2002). Social axioms: The search for universal dimensions of general beliefs about how the world functions. Journal of Cross-Cultural Psychology, 33, 286-302. McCrae, R. R., Terracciano, A., and 79 Members of the Personality Profiles of Cultures Project (2005). Personality profiles of cultures: Aggregate personality traits. Journal of Personality and Social Psychology, 89, 407-425. Mellenbergh, G. J. (1982). Contingency table models for assessing item bias. Journal of Educational Statistics, 7, 105-118. Scheuneman, J. (1987). An experimental, exploratory study of causes of bias in test items. Journal of Educational Measurement, 24, 97-118. Schilt-Van Mol, T. M. M. L., & Vallen, T. (2006). Avoiding unintentionally difficult test items for immigrant minority students. Differentiation without DIF. LAUD Papers. Series A: General and Theoretical Papers, 674, 1-17. Schwartz, S. H. (1992). The universal content and structure of values: Theoretical advances and empirical tests in 20 countries. Advances in Experimental Social Psychology, 25, 1-65. Schwartz, S. H. (1994). Beyond individualism and collectivism: New cultural dimensions of values. In U. Kim, H. C. Triandis, C. Kagitcibasi, S. C. Choi, & G. Yoon (Eds.), Individualism and collectivism: Theory, method and applications (pp. 85-119). Thousand Oaks, CA: Sage. Ten Berge, J. M. F. (1986). Rotation to perfect congruence and the cross-validation of component weights across populations. Multivariate Behavioral Research, 21, 41-64.

Bias and Equivalence 33 Tucker, L. R. (1951). A method for synthesis of factor analysis studies (Personnel Research Section Report No. 984). Washington, DC: Department of the Army. Van de Vijver, F., & Leung, K. (1997). Methods and data analysis for cross-cultural research. Newbury, CA: Sage.

Bias and Equivalence 34 Table 1 Loadings of the Five-Factor Pooled Solution Scale and Item Content Social cynicism Kind-hearted people usually suffer losses. Powerful people tend to exploit others. Power and status make people arrogant. Kind-hearted people are easily bullied. It is rare to see a happy ending in real life. To care about societal affairs only brings trouble for yourself. Harsh laws can make people obey. Significant achievement requires one to show no concern for the means needed for that achievement. The various social institutions in society are biased towards the rich. People deeply in love are usually blind. Old people are usually stubborn and biased. It is easier to succeed if one knows how to take short-cuts. People will stop working hard after they secure a comfortable life. Females need a better appearance than males. Old people are a heavy burden on society. Humility is dishonesty. Young people are impulsive and unreliable. Most people hope to be repaid after they help others.

Factor Social Reward for Social Fate cynicism application complexity control Religiosity .55

-.02

.04

.02

.01

.58

.05

.10

-.10

.05

.56

.06

.06

-.12

.08

.52

.02

.08

.02

.00

.43

-.09

-.08

.05

-.04

.42 .26

-.10 .16

-.16 -.10

.13 .06

-.01 .06

.26

-.01

-.22

.22

-.07

.41

-.07

.19

-.02

.02

.41

.06

.11

.05

.00

.45

.03

-.04

.13

-.11

.30

.06

.10

.21

-.11

.45

.10

-.09

-.01

.05

.28

.09

-.01

.19

-.01

.27 .19

-.03 .03

-.20 -.22

.20 .27

-.13 -.17

.35

.05

-.26

.16

.00

.34

.05

.18

.04

-.03

-.06

.58

.07

.01

.04

-.01 .04

.56 .35

-.01 .02

-.03 .13

.07 .06

.02

.27

.30

-.05

.07

.12

.43

-.07

.03

.03

-.06

.49

.20

-.02

-.02

.03

.31

.18

-.05

.12

Reward for application One will succeed if he/she really tries. Hard working people will achieve more in the end. Failure is the beginning of success. Mutual tolerance can lead to satisfactory human relationships. One who does not know how to plan his or her future will eventually fail. Adversity can be overcome by effort. A modest person can make a good impression on people.

Bias and Equivalence 35 Every problem has a solution. The just will eventually defeat the wicked. Competition brings about progress. Good deeds will be rewarded, and bad deeds will be punished. Caution helps avoid mistakes. Knowledge is necessary for success. Social justice can be maintained if everyone cares about politics.

-.10

.47

.01

.04

.03

-.08 .14

.40 .38

-.07 .11

.09 .00

.37 -.10

-.04 .05 .05

.39 .35 .44

-.07 .15 .10

.14 -.03 -.05

.38 .07 -.01

.09

.29

-.11

.16

.00

.12

.07

.46

-.02

-.07

.03

.20

.44

.02

-.05

.09

.04

.53

.01

-.04

-.21

.11

.32

-.16

-.04

.05

.31

.26

.10

.00

.12

.10

.20

.19

-.20

.09

-.06

.47

.02

.00

.04

.27

.29

.10

-.02

-.19

-.02

.39

-.24

-.03

-.07

.12

.36

.08

.03

.05

-.03

.35

.02

.39

.03

.05

.06

-.10

.00

.17

-.14

-.03

.46

.23

.15

-.08

.08

.54

.07

.00

.20

.07

.47

.02

.02

.02

.10

.56

.06

.14 .05 .20

-.04 .12 .04

-.05 -.07 .02

.33 .47 .11

.45 -.03 .14

.08

.21

-.07

.41

.09

.03

.00

.11

.12

.66

Social complexity Human behavior changes with the social context. One has to deal with matters according to the specific circumstances. People may have opposite behaviors on different occasions. Individual effort makes little difference in the outcome.* To deal with things in a flexible way leads to success. To experience various life styles is a way to enjoy life. One’s behaviors may be contrary to his or her true feelings. To plan for possible mistakes will result in fewer obstacles. There is usually only one way to solve a problem.* Current losses are not necessarily bad for one’s long-term future. There are phenomena in the world that cannot be explained by science. One’s appearance does not reflect one’s character.

Fate control Fate determines one’s successes and failures. Individual characteristics, such as appearance and birthday, affect one’s fate. There are certain ways to help us improve our luck and avoid unlucky things. There are many ways for people to predict what will happen in the future. All things in the universe have been determined. Most disasters can be predicted. A person’s talents are inborn. Good luck follows if one survives a disaster.

Religiosity There is a supreme being controlling the universe.

Bias and Equivalence 36 Belief in a religion makes people good citizens. Ghosts or spirits are people’s fantasy.* Belief in a religion helps one understand the meaning of life. Religious faith contributes to good mental health. Religious people are more likely to maintain moral standards. Religion makes people escape from reality.* Religious beliefs lead to unscientific thinking.*

.11

.23

-.16

.11

.55

-.08

-.20

.16

.24

.30

.04

.12

-.01

.04

.72

.02

.11

-.01

.04

.67

.17

.21

-.12

-.01

.44

-.20

-.01

-.09

-.18

.54

-.23 -.06 .05 -.13 .52 *Reversed item. Note. Loadings of items on their target factors are printed in bold. Salient secondary loadings (i.e., absolute loadings above .30 on non-target factors and absolute loadings that are larger than the loading of the item on the target factor) are bold, italicized.

Bias and Equivalence 37 Table 2 Correspondence of the Factors of the Pooled Solution with the Factor Solutions in the 41 Countries Country

Social Reward for Social Fate Religiosity Average cynicism application complexity control Belgium .87 .88 .82 .87 .49 .79 Brazil .92 .89 .92 .79 .80 .86 Canada .94 .91 .90 .83 .76 .87 China .76 .89 .89 .65 .42 .72 Czech .83 .82 .80 .74 .50 .74 Estonia .84 .81 .81 .78 .49 .75 Finland .88 .87 .74 .69 .63 .76 France .91 .87 .68 .77 .60 .77 Georgia .80 .61 .71 .64 .59 .67 Germany .89 .91 .94 .74 .81 .86 Greece .92 .86 .83 .73 .66 .80 Hong Kong .90 .94 .87 .80 .76 .85 Hungary .93 .91 .94 .83 .72 .87 India .91 .91 .90 .77 .82 .86 Indonesia .61 .79 .79 .50 .67 .67 Iran .80 .87 .82 .48 .44 .68 Israel .78 .92 .79 .81 .55 .77 Italy .85 .92 .88 .76 .70 .82 Japan .65 .89 .86 .76 .57 .75 Korea .94 .92 .88 .81 .67 .84 Latvia .89 .80 .86 .82 .57 .79 Lebanon .91 .90 .81 .82 .78 .84 Malaysia .77 .91 .80 .52 .57 .72 Netherlands .88 .89 .92 .88 .71 .86 New Zealand .87 .87 .87 .86 .59 .81 Nigeria .65 .78 .56 .57 .60 .63 Norway .84 .85 .85 .77 .69 .80 Pakistan .73 .78 .82 .48 .35 .63 Peru .67 .77 .65 .62 .77 .70 Philippines .91 .88 .89 .68 .55 .78 Portugal .88 .87 .90 .84 .66 .83 Romania .82 .90 .88 .73 .86 .84 Russia .77 .70 .72 .73 .64 .71 Singapore .90 .89 .88 .77 .76 .84 Spain .83 .82 .83 .48 .61 .71 Taiwan .87 .92 .94 .90 .81 .89 Thailand .46 .62 .76 .47 .51 .56 Turkey .93 .90 .86 .80 .31 .76 United Kingdom .85 .82 .75 .57 .57 .71 United States .98 .96 .95 .92 .87 .94 Venezuela .51 .78 .80 .76 .68 .71 Average .83 .85 .83 .73 .64 .77

Bias and Equivalence 38 Table 3 Average Agreement Index of a Country across All Pairwise Country Comparisons per Factor Country Belgium Brazil Canada China Czech Estonia Finland France Georgia Germany Greece Hong Kong Hungary India Indonesia Iran Israel Italy Japan Korea Latvia Lebanon Malaysia Netherlands New Zealand Nigeria Norway Pakistan Peru Philippines Portugal Romania Russia Singapore Spain Taiwan Thailand Turkey United Kingdom United States Venezuela Average

Social Reward for Social Fate Religiosity Average cynicism application complexity control .79 .77 .62 .67 .62 .69 .78 .79 .77 .64 .61 .72 .79 .78 .77 .68 .70 .75 .70 .77 .70 .60 .53 .66 .68 .73 .70 .61 .51 .65 .71 .72 .72 .64 .62 .68 .76 .76 .61 .64 .58 .67 .78 .74 .60 .59 .64 .67 .68 .63 .63 .57 .58 .62 .77 .79 .72 .71 .71 .74 .78 .83 .67 .69 .73 .74 .74 .82 .70 .68 .60 .71 .79 .78 .76 .66 .63 .72 .77 .79 .77 .62 .68 .73 .62 .73 .63 .73 .53 .65 .72 .76 .69 .52 .50 .64 .71 .80 .67 .73 .62 .71 .70 .80 .81 .64 .62 .71 .60 .82 .71 .67 .66 .69 .78 .77 .71 .69 .57 .70 .75 .68 .71 .71 .52 .67 .79 .83 .64 .69 .68 .73 .65 .74 .69 .53 .64 .65 .76 .79 .75 .68 .68 .73 .71 .74 .70 .73 .65 .71 .59 .67 .48 .68 .67 .62 .72 .82 .71 .65 .62 .70 .62 .72 .65 .68 .51 .64 .63 .78 .61 .54 .69 .65 .74 .79 .67 .71 .68 .72 .81 .80 .69 .71 .66 .73 .72 .76 .76 .65 .63 .71 .65 .71 .58 .63 .59 .63 .77 .77 .71 .68 .65 .71 .71 .70 .67 .55 .56 .64 .74 .77 .78 .73 .65 .73 .44 .70 .64 .45 .50 .55 .79 .83 .73 .63 .55 .71 .70 .78 .60 .59 .62 .66 .82 .80 .77 .76 .70 .77 .57 .70 .66 .70 .54 .63 .72 .76 .69 .65 .62 .69

Bias and Equivalence 39 Table 4 Country Coordinates in the Three-Dimensional Weighted Multidimensional Scaling Solution Country Belgium Brazil Canada China Czech Estonia Finland France Georgia Germany Greece Hong Kong Hungary India Indonesia Iran Israel Italy Japan Korea Latvia Lebanon Malaysia Netherlands New Zealand Nigeria Norway Pakistan Peru Philippines Portugal Romania Russia Singapore Spain Taiwan Thailand Turkey United Kingdom United States Venezuela

Dimension I 1.09 -1.04 -0.92 1.62 1.77 1.51 0.45 1.06 0.71 0.77 -0.19 -1.13 0.84 -1.23 -0.44 -1.58 0.16 0.80 -0.21 -0.92 1.53 -0.73 -1.44 0.86 -0.55 -0.26 -0.68 0.26 -1.72 -0.34 0.51 -1.36 1.01 -1.09 1.44 -0.44 -0.24 1.48 0.11 -0.52 -0.93

Dimension II -0.78 -0.55 -0.75 0.05 0.18 -0.09 -1.57 1.30 1.75 0.62 -1.28 -0.28 -0.72 0.55 1.68 0.98 -1.24 0.51 -0.83 0.90 0.44 -1.18 0.58 -0.83 -1.08 1.69 -1.29 1.67 0.26 1.07 -0.99 0.29 -1.58 -0.67 -0.42 -0.94 -0.05 0.37 1.70 -0.67 1.20

Dimension III -0.60 0.87 0.45 -0.66 0.71 -0.61 -0.65 -0.48 0.64 -0.69 -0.50 0.96 -0.71 -0.23 0.93 -0.47 -0.86 1.21 1.76 0.67 0.64 -0.53 -1.01 -0.68 0.79 -1.36 0.76 1.17 -0.99 -0.87 -0.63 0.20 0.74 0.90 1.13 0.78 -3.66 0.57 -0.65 -0.44 1.40

Bias and Equivalence 40 Table 5 Overall Bias (Expressed as the Sum of Uniform and Nonuniform Bias Effect Sizes) for the 60 Items of the Social Axioms Survey by Country (Comparing All), Country-Rest Pair, GDP Level, and Religion Items per Scale Country

Partial η2 Country-Rest GDP Level

Religion

Social Cynicism Kind-hearted people usually suffer losses. Powerful people tend to exploit others. Power and status make people arrogant. Kind-hearted people are easily bullied. It is rare to see a happy ending in real life. To care about societal affairs only brings trouble for yourself. Harsh laws can make people obey. Significant achievement requires one to show no concern for the means needed for that achievement. The various social institutions in society are biased towards the rich. People deeply in love are usually blind. Old people are usually stubborn and biased. It is easier to succeed if one knows how to take short-cuts. People will stop working hard after they secure a comfortable life. Females need a better appearance than males. Old people are a heavy burden on society. Humility is dishonesty. Young people are impulsive and unreliable. Most people hope to be repaid after they help others. Mean for Scale

.02 .01 .01 .03 .03 .02

.03 .01 .01 .03 .01 .01

.07 .08

.01 .02

.00 .03

.06

.01

.02

.09 .05 .12

.02 .01 .04

.02 .02 .03

.07

.02

.01

.11 .08 .16 .06 .07

.02 .02 .04 .02 .01

.06 .02 .03 .02 .03

.02

.02

.03 .07 .09 .11

.01 .02 .02 .04

.01 .02 .03 .01

.10

.03

.05

.06 .08

.01 .03

.00 .02

.06 .08 .08 .09

.02 .03 .01 .04

.01 .01 .02 .03

.10 .06 .07 .10 .09 .07

.08

Malaysia .03

Hungary .02

Pakistan .03 India .06

.00

Reward for Application One will succeed if he/she really tries. Hard working people will achieve more in the end. Failure is the beginning of success. Mutual tolerance can lead to satisfactory human relationships. One who does not know how to plan his or her future will eventually fail. Adversity can be overcome by effort. A modest person can make a good impression on people. Every problem has a solution. The just will eventually defeat the wicked. Competition brings about progress. Good deeds will be rewarded, and bad deeds will be punished.

Bias and Equivalence 41 Caution helps avoid mistakes. Knowledge is necessary for success. Social justice can be maintained if everyone cares about politics. Mean for Scale

.07 .06 .09 .08

.00

.01 .01 .02

.02 .01 .02

.02

.02

Social Complexity Human behavior changes with the social context. One has to deal with matters according to the specific circumstances. People may have opposite behaviors on different occasions. Individual effort makes little difference in the outcome.*

.05 .05

.01 .01

.01 .00

.06

.01

.02

.07

.07

To deal with things in a flexible way leads to success. To experience various life styles is a way to enjoy life. One’s behaviors may be contrary to his or her true feelings. To plan for possible mistakes will result in fewer obstacles. There is usually only one way to solve a problem.* Current losses are not necessarily bad for one’s long-term future. There are phenomena in the world that cannot be explained by science. One’s appearance does not reflect one’s character.

.09

.02

.03

.09

.01

.04

.07

.02

.02

.09

.04

.04

.04 .01

.03 .02

.01

.01

.04

.04

Mean for Scale

.09

.02

.03

.18

.08 .10

India .03, Malaysia .06

Portugal .02

.05 .15

Korea .02, New Zealand .03 .00

Fate Control Fate determines one’s successes and failures. Individual characteristics, such as appearance and birthday, affect one’s fate. There are certain ways to help us improve our luck and avoid unlucky things. There are many ways for people to predict what will happen in the future. All things in the universe have been determined.

.05 .07

.01 .04

.01 .04

.12

.01

.04

.05

.01

.01

.05

.08

Most disasters can be predicted. A person’s talents are inborn. Good luck follows if one survives a disaster. Mean for Scale

.05 .09 .18 .09

.01 .02 .07 .03

.02 .03 .05 .03

.06 .05 .01 .01

.06 .06 .02 .02

.07 .06

.02 .01

.02 .01

.05 .05 .07

.01 .01 .02

.01 .02 .03

.13

Indonesia .02, Malaysia .02

India .05 .00

Religiosity There is a supreme being controlling the universe. Belief in a religion makes people good citizens. Ghosts or spirits are people’s fantasy.* Belief in a religion helps one understand the meaning of life. Religious faith contributes to good mental health. Religious people are more likely to maintain moral standards. Religion makes people escape from reality.* Religious beliefs lead to unscientific thinking.* Mean for Scale

.12 .09 .08 .04

India .03

.00

Bias and Equivalence 42 Mean for Social Axioms Survey

.08

.00

.02

*Reversed item. Note. The Country-Rest column presents only the countries that had an effect of at least .02 or .06 for particular items in the country-rest analysis. Small effect sizes (between .02 and .06) are underlined; medium effect sizes (between .06 and .14) are given in bold; large effect sizes (over .14) are given in bold, italicized, and double-underlined.

.02

Bias and Equivalence 43 Table 6 Significance Levels of the Differences between the Mean Item Bias Estimates of the 39 Pancultural Items of Leung and Bond (2004) and the Remaining 21 Items, Estimated per Scale and for the Social Axioms Survey Scale

p-value Country GDP Religion Social Cynicism .04 ns .09 Reward for Application .08 .00 ns Social Complexity .08 ns .03 Fate Control ns ns ns Social Axioms Survey .00 .03 .03 Note. The Country column is based on the analysis comparing all 41 countries. There are no values for the Religiosity scale, because only one item had been removed from it in the 39-item version.

Bias and Equivalence 44 Figure Caption Figure 1 (a to c). Item scores of three items with medium effect sizes of bias by GDP Figure 2 (a to e). Item scores of five items with medium effect sizes of bias by religion

Bias and Equivalence 45

1a. "Individual effort makes little difference in the outcome."* 4.8 4.6 GDP level

4.4

1

Item Score

4.2

2 4

3

3.8

4 5

3.6

6

3.4 3.2 3 Low

Medium

High

Social Com plexity Scale Score Level

* Reversed item.

1b. "Good luck follows if one survives a disaster."

3.8 3.6

GDP level

3.4 1

Item Score

3.2

2 3

3

2.8

4

2.6

5

2.4

6

2.2 2 1.8 Low

Medium Fate Control Scale Score Level

High

Bias and Equivalence 46

1c. "There is a supreme being controlling the universe."

4.5

GDP level

4

Item Score

1 2 3.5

3 4 5

3

6 2.5

2

Low

Medium Religiosity Scale Score Level

High

Bias and Equivalence 47

2a. "Females need a better appearance than males." Religion Catholic

Protestant

Eastern Orthodox

Islam

Buddhism

Atheist

3.8 3.6

Item Score

3.4 3.2 3 2.8 2.6 2.4 2.2 2 Low

Medium

High

Social Cynicism Scale Score Level

2b. "Individual effort makes little difference in the outcome."* Religion Catholic

Protestant

Eastern Orthodox

Islam

Buddhism

Atheist

4.6 4.4

Item Score

4.2 4 3.8 3.6 3.4 3.2 3 2.8 Low

Medium Social Com plexity Scale Score Level

* Reversed item.

High

Bias and Equivalence 48

2c. "All things in the universe have been determined." Religion Catholic

Protestant

Eastern Orthodox

Islam

Buddhism

Atheist

4

Item Score

3.5 3 2.5 2 1.5 Low

Medium

High

Fate Control Scale Score Level

2d. "There is a supreme being controlling the universe." Religion Catholic

Protestant

Eastern Orthodox

Islam

Buddhism

Atheist

4.5

Item Score

4 3.5 3 2.5 2 1.5 Low

Medium Religiosity Scale Score Level

High

Bias and Equivalence 49

2e. "Belief in a religion makes people good citizens." Religion Catholic

Protestant

Eastern Orthodox

Islam

Buddhism

Atheist

4

Item Score

3.5 3 2.5 2 1.5 Low

Medium Religiosity Scale Score Level

High