Inconsistent Probability Estimates of a Hypothesis - CiteSeerX

3 downloads 0 Views 105KB Size Report
people were presented with a list of mutually exclusive nonexhaustive causes of a person's death. ... hypothesis presented in lists of nonexhaustive and mu-.
Inconsistent Probability Estimates of a Hypothesis The Role of Contrasting Support Nicolao Bonini1 and Michel Gonzalez2 1

Department of Cognitive Sciences and Education, Center for Decision Research, University of Trento, Rovereto, Italy, 2 Laboratory of Cognitive Psychology, CNRS & University of Provence, Marseille, France

Abstract. This paper studies consistency in the judged probability of a target hypothesis in lists of mutually exclusive nonexhaustive hypotheses. Specifically, it controls the role played by the support of displayed competing hypotheses and the relatedness between the target hypothesis and its alternatives. Three experiments are reported. In all experiments, groups of people were presented with a list of mutually exclusive nonexhaustive causes of a person’s death. In the first two experiments, they were asked to judge the probability of each cause as that of the person’s decease. In the third experiment, people were asked for a frequency estimation task. Target causes were presented in all lists. Several other alternative causes to the target ones differed across the lists. Findings show that the judged probability/frequency of a target cause changes as a function of the support of the displayed competing causes. Specifically, it is higher when its competing displayed causes have low rather than high support. Findings are consistent with the contrastive support hypothesis within the support theory. Keywords: probability judgment, extensionality, consistency, salience, availability, support theory

In various situations, when several alternative hypotheses can explain a given evidential fact (e.g., a car failing to start, the decease of a person), a probability evaluation of each of them is made. In most situations, a person who considers an evidential fact retrieves causal hypotheses from memory. In most studies of probability judgments, however, a number of specific hypotheses are selected from the set of possible hypotheses and displayed for evaluation (Fischhoff, Slovic, & Lichtenstein, 1978; Harries & Harvey, 2000; Russo & Kolzow, 1994; Tversky & Koehler, 1994; Van Schie & Van der Pligt, 1990). This paper addresses the study of the consistency of subjective probabilities of a target hypothesis that is displayed across different lists of alternative hypotheses. Previous studies have shown that the subjective probability of a target hypothesis depends on the set of displayed hypotheses (Bonini, Van Schie, Van der Pligt, & Legrenzi, 1994; Fischhoff et al., 1978; Van der Pligt, Eiser, & Spears, 1987; Van Schie & Van der Pligt, 1990). For example, Fischhoff et al. asked car mechanics as well as lay people to predict the relative frequencies of a number of possible causes for a car failing to start. The same cause (e.g., “battery”) was displayed in two lists that covered a selection of three ” 2005 Hogrefe & Huber Publishers

or six specific causes of the failure, and a residual catch-all category (“all other causes”). The frequency of the target cause was judged higher when fewer causes were displayed. The phenomenon of inconsistent probability judgments for a target hypothesis casts doubts on the reliability of subjective probability. Indeed, for the same evidential fact, the probability of a hypothesis should not depend on the set of displayed hypotheses. There is no reason to believe that the battery cause is a more or less probable cause of the car failing to start as a function of which specific alternatives are selected. The aim of the paper is twofold. First, we document inconsistent probability judgments of a target hypothesis presented in lists of nonexhaustive and mutually exclusive hypotheses. Second, we provide an interpretation for such a phenomenon. How could the results in literature, which have shown inconsistent probability estimates of a target hypothesis as a function of the set of competing hypotheses, be interpreted? Below, we outline a few explanations by grouping them into three main categories. The first two relate to several interpretations that are given in the literature (mainly in the fault tree literature), and that can be used to account for the Experimental Psychology 2005; Vol. 52(1):55Ð66 DOI: 10.1027/1618-3169.52.1.55

56

N. Bonini & M. Gonzalez: Inconsistent Probability Estimates and Contrasting Support

inconsistency phenomenon: the first refers to a specific experimental constraint, the second to the representation of the target hypothesis. The third category relates to the judged support of the target and competing hypotheses.

The Side Effect of the Experimental Demands According to this interpretation inconsistency is a consequence of a constraint imposed by the experimenter. In fault trees and related studies, people are asked to make sure that their probability or frequency estimates sum up to 100 %. A fault tree is a list that hierarchically organizes several categories of causes related to an evidential fact such as a car failing to start, the decease of a person or a firm bankruptcy. The systematic analysis of the problem by a fault tree is believed to enable better and more reliable judgment. The categories of causes of the major event are usually named branches of the tree (see Russo & Kolzow, 1994 for a review on fault tree studies). The sumup-to 100 % probability constraint is justified by the additivity requirement because people are presented with a list of exhaustive and mutually exclusive causes. For example, in the seminal work of Fischhoff et al., both in the full and pruned trees, participants were presented with an exhaustive list of possible causes for a car failing to start. Specifically, in the full fault tree six specific causes were displayed (“battery,” “fuel system,” “engine,” “starting system,” “ignition system,” “mischief ”), plus the catch-all category “all other causes.” In the pruned fault tree, only three specific causes from the previous tree were presented, plus the catch-all category. Two main findings were reported. First, participants underestimated the probability of the catch-all category in the pruned tree relative to the aggregate probability for that category in the full tree (“pruning bias”). From the additivity of probability, it follows that the probability judgment of the catch-all category in the pruned tree should be equal to the sum of the probabilities of the omitted causes plus the probability of the catch-all category in the full tree. However, Fischhoff et al. found that the probability of the catch-all category in the pruned tree was lower than normatively expected (0.22 vs. 0.45). Second, and most relevant for our paper, they found inconsistent probability judgments for specific causes displayed in both trees: the sum of the probability estimates of the three target causes increased from the full (0.55) to the pruned tree (0.78). Experimental Psychology 2005; Vol. 52(1):55Ð66

Although fault tree and subsequent related studies were focused on the “pruning bias,” the underestimation of the catch-all category in the pruned tree will necessarily determine an equivalent overestimation of the target hypotheses in the same tree because of the sum-to-100 % probability requirement. Thus, the inconsistent probability judgments of a hypothesis found in fault tree and related studies could be considered as a side effect of the pruning bias and the sumto-100 % probability constraint. It should be noted that people exhibit nonadditive probability distributions, especially for finer partitions (Teigen, 1983; Tversky & Koehler, 1994; Wright & Whaley, 1983). It is thus questionable whether the inconsistency phenomenon first reported by Fischhoff et al. could be found when people are not asked to make their probability estimates sum to 100 %. To the best of our knowledge this control has not been done. Another interpretation has been advanced to explain both the pruning bias and the inconsistent judgments of a hypothesis in fault tree and related studies (Van der Pligt et al., 1987; Van der Pligt, Van Schie, & Hoevenagel, 1998). This interpretation is based on the anchoring and adjustment heuristic (Kahneman & Tversky, 1973), and again requires the 100 % probability constraint to be applied. To illustrate, let us consider the following: When assessing a cause on a list of exhaustive and mutually exclusive causes, people start with an anchor-value that is later adjusted. The anchor-value is approximately equal to 100 % divided by the total number of displayed causes. As a consequence, the fewer the causes displayed in the tree the higher the anchor-value. Because of an insufficient adjustment of the anchor-value, the use of this heuristic would explain why the probability estimates of a hypothesis (including the catch-all category) increase from the full to the pruned tree.

Hypothesis Redefinition This explanation does not rely on the sum-to-100 % probability constraint. In this interpretation, the inconsistency phenomenon is due to a change in the representation of the target hypothesis across lists or trees. Hirt and Castellan (1988) argued that omissions in a fault tree could induce a subject to redefine the content of a target hypothesis. For example, the cause “battery” could be redefined as a more extensive category in the pruned than in the full tree by covering instances of the omitted cause “starting system.” This would explain the inconsistency in the probability estimates of the target hypothesis across the trees as well as the pruning bias. Russo and Kolzow (1994), in a ” 2005 Hogrefe & Huber Publishers

N. Bonini & M. Gonzalez: Inconsistent Probability Estimates and Contrasting Support

discussion of sources of bias in fault trees, refer to ambiguity due to the presence of vertical incongruity between displayed hypotheses. The ambiguity may trigger a category redefinition of the target hypothesis, which, in turn, may give rise to inconsistency phenomena (e.g., a redefinition of “lung cancer” as the more inclusive category “cancer”). Another explanation of inconsistency in probability judgments assumes that people spontaneously organize the displayed hypotheses before assessing them (Bonini & Van Schie, 1991; Bonini et al., 1994; Rottenstreich & Tversky, 1997; Smith, Shafir, & Osherson, 1993). In this case, the failure of consistency would not be due to a change in the content of the hypothesis but rather to a category organisation. For example, when several instances of a category of causes of death are displayed (e.g., several types of cancer), people may well group them into the same category (e.g., “cancer”), and later assess the inclusive category rather than each single instance. The similarity coverage model (Smith et al., 1993) posits a similar category-generation-and-evaluation process. According to this model, when a subject assesses the probability that a conclusion is true given that a premise is true, she or he generates and assesses the lowest-level category that includes the premise and conclusion categories. For example, when a subject is asked to judge the probability that “bobcats use serotonin as a neurotransmitter given that tigers and cougars do” she or he will assess the probability that the inclusive category (e.g., “feline”) will use serotonin as a neurotransmitter. As another example of category-generation-andevaluation process, Bonini and colleagues (Bonini & Van Schie, 1991; Bonini et al., 1994) reported evidence of inconsistent probability judgments of a target hypothesis that could be due to a spontaneous grouping of displayed causes. In that study, the probability judgment of the target hypothesis “lung cancer” was lower when it was displayed together with three other examples of causes of death by cancer (multiple cancer list) than when it was the only cause of death by cancer (single cancer list). This finding could be explained by saying that subjects in the multiple cancer list grouped the four specific types of cancer into the inclusive category “cancer,” assigned a probability value to this category, and distributed the probability value across its four types. As a consequence of this categorization-and-probability-redistribution process, the probability of “lung cancer” could well be lower in the multiple than in the single cancer list.

” 2005 Hogrefe & Huber Publishers

57

The Contrasting Support Hypothesis We propose an interpretation of the inconsistency of probability judgments of a target hypothesis across different sets of alternative hypotheses. This interpretation is conditioned neither by the specific features of the judgment task, such as the requirement of the sum-to-100 % of the probability estimates, nor by the different number of hypotheses displayed in the sets. Also it is not based on a different representation of the target hypothesis across the sets. This interpretation is based on the notion that the subjective probability of a hypothesis is a function of the perceived support, or strength of evidence, in favor and against it. To this extent, our interpretation is within the support theory framework (see Idsen, Krantz, Osherson, & Bonini, 2001; Rottenstreich, & Tversky, 1997; Tversky & Koehler, 1994). As Tversky and Koehler (1994, p. 549) write: “The support associated with a given hypothesis is interpreted as a measure of the strength of evidence in favor of this hypothesis that is available to the judge. The support may be based on objective data (e.g., the frequency of homicide in the relevant population) or on a subjective impression mediated by judgmental heuristics, such as representativeness, availability, or anchoring and adjustment” (Kahneman, Slovic, & Tversky, 1982). According to support theory, people judge the description of an event (“hypothesis”) rather than the event itself (as assumed by standard probability theory). Also, the probability judgment of a focal hypothesis H equals the ratio of support for the focal hypothesis (focal support) to the total support for the focal hypothesis H and its alternative not-H (contrasting support). Focal support positively contributes to the judged probability of the focal hypothesis, whereas its contrasting support negatively contributes to the judged probability. For example, people may be asked to judge the probability that the decease of a person was due to “lung cancer rather than thrombosis” (assuming the person in question died either of lung cancer or thrombosis). According to Support Theory, the judged probability of the focal hypothesis “lung cancer” equals the ratio between focal support and total support (e.g. the sum of support for “lung cancer” and “thrombosis”). Thus, the higher the contrastive support the lower the subjective probability of the target hypothesis. Extended support theory makes a similar analysis (see Idsen et al., 2001). In this theory, the judged probability of a focal proposition X is based on the perceived evidence in support of X and against X. Experimental Psychology 2005; Vol. 52(1):55Ð66

58

N. Bonini & M. Gonzalez: Inconsistent Probability Estimates and Contrasting Support

However, unlike the support theory, the extended support theory allows for the case where evidence against X is not considered by the judge. Usually, the set not-H of alternative hypotheses to H is ill defined, as in fault trees and related studies. In these situations, the list of specific hypotheses is incomplete, and generally is ended by a catch-all category such as “other causes.” In these situations, the representation of not-H depends both on the retrieval of facts from memory in order to characterize the implicit catch-all category and the alternative hypotheses that have been displayed. Thus, the display of specific competing hypotheses contributes to the representation of not-H and, as a consequence, to its judged support.

The Contrasting Support According to the contrasting support hypothesis, if the displayed alternatives have low support, the support for not-H will be low. By contrast, if the support for the displayed alternatives is high, the support for notH will be high. The prediction that the contrasting support not-H changes as a function of the way the implicit catchall category is unpacked is based on two possible sources: a memory-based or a salience mechanism. With the memory-based mechanism, when low support competing causes are displayed, people may fail to retrieve high support ones. With the salience mechanism, high support hypotheses may well be retrieved, but they are weighed less in the low than in the high competing support display because they were not mentioned. Tversky & Koehler (1994) indicated these two mechanisms as sources of lack of extensionality in probability judgment. Koehler (1994) provided further evidence of memory retrieval as a source of failure of invariance in the judged probability of a hypothesis. The reported findings show that the subjects who generated their own hypotheses expressed less confidence in their truth than the subjects who were presented with the same hypotheses for evaluation. The author argues that “[in the generation condition, subjects] should consider more alternative hypotheses than people who are merely asked to evaluate a hypothesis that is presented to them” (p. 462). This interpretation is consistent with our contrasting support hypothesis. The findings reported by Koehler (1994) could be due to a difference in the perception of contrastive support across the generated vs. evaluated condition because of differences in the retrieval of competing hypotheses across the same conditions. As a consequence, confiExperimental Psychology 2005; Vol. 52(1):55Ð66

dence judgments would be higher in the displayed than in the generated condition.

The Focal Support We don’t make a specific assumption about the type of evaluation frame the judge uses to estimate the probability of the focal hypothesis (e.g., the focal hypothesis is compared to its negation or to a set of specific alternative causes, or to a mix between them). We do, however, assume that the focal support does not depend on the specific set of displayed alternatives. This is because the same label describes the focal hypothesis H across the sets of competing hypotheses. This assumption is congruent with the support theory that states that the perceived support is associated to hypotheses and not to events. Furthermore, as suggested in Idsen et al. (2001), the support for a focal hypothesis may be only a function of the evidence in its favor rather than the evidence in favor and against it. Altogether, there is no reason to expect a change in the focal support across the lists, given the same description of the focal hypothesis.

The Judged Probability Given the same focal support but a different contrasting support for a focal hypothesis across the lists, in accordance with the support theory we expect that the judged probability of the focal hypothesis will be lower in lists displaying high rather than low competing supported hypotheses. Inconsistent probability judgments of the focal hypothesis are thus attributed to differences in its contrasting support. The same prediction can be made by assuming that the focal support changes across the lists because it depends on the listed alternatives. Specifically, by a contrast effect the focal support would be higher in the list displaying low rather than high competing hypotheses. With this account, for example, it should be found that more/stronger reasons in favor of the focal hypothesis would be available in the low than in the high competing support list. Or, the salience of the focal hypothesis would be higher in the former than in the latter list. Regardless of changes in the focal support, the finding of a change in the judged probability of the focal hypothesis would be congruent with the notion of an effect of contrasting support. This is because it would be implausible (on the basis of previous evidence on the catch-all underestimation bias and re” 2005 Hogrefe & Huber Publishers

N. Bonini & M. Gonzalez: Inconsistent Probability Estimates and Contrasting Support

lated findings) to assume the same contrasting support across the lists. To illustrate this, let us consider a situation where the focal hypothesis “lung cancer” is judged in a list covering “infarcts” and “road accident” as two examples of a number of possible competing hypotheses. Lung cancer could, however, be presented in a list covering “influenza” and “diabetes” as two examples of a number of possible competing hypotheses. According to the contrasting support hypothesis, we expect that the focal support for “lung cancer” does not change across the two lists, but its contrasting support does. Specifically, we expect that the contrasting support for “lung cancer” is higher when the alternative hypothesis to “lung cancer” not-H is partially unpacked by high-support competitors (e.g., infarcts, road accident) than when partially unpacked by lowsupport competitors (e.g., influenza or diabetes). As a consequence, we expect that the judged probability of “lung cancer” is lower in the list covering the high rather than the low support competing hypotheses. The paper is organised as follows. In Experiment 1, the effect of the factors display of contrasting support and relatedness between presented hypotheses is tested. In Experiment 2, the role of the factor display of contrasting support is further controlled. In Experiment 3, the effect of the factor display of contrasting support is tested with a frequency judgment task. In the Experiments, the same number of hypotheses is displayed across conditions, and the sum-to100 % probability requirement is absent. As previously discussed in the section The Side Effect of Experimental Demands, previous evidence of inconsistent probability judgments of a target hypothesis could be attributed to the constraint of the additive probabilities. It is thus important to check whether the phenomenon of inconsistent probability judgments of a target hypothesis can be found also in the absence of such experimental demand. This is one aim of this study. In the affirmative case, the inconsistency phenomenon cannot be attributed to the sum-to-100 % probability requirement or to the anchoring and adjustment heuristic. To the best of our knowledge this control has not been done. Finally, in the experiments, the mix of general and specific hypotheses in the list is eliminated to avoid category redefinition of the target hypothesis due to vertical incongruity between displayed hypotheses (see Russo & Kolzow, 1994 for a discussion of this

59

kind of restructuring activity). Thus, if the inconsistency in the probability judgments of the target hypothesis is found in this study, it couldn’t be attributed to its category redefinition due to vertical incongruity between displayed hypotheses.

Experiment 1 According to the contrasting support interpretation, the judged probability of a target hypothesis is higher when low rather than high competing support hypotheses are displayed. According to the relatedness-between-hypotheses interpretation, the judged probability of a target hypothesis is lower when it and its competing hypotheses can be easily grouped together than when they cannot. When the target hypothesis can be easily grouped together with its competing ones, people first organize the displayed hypotheses within the same inclusive category, then assess the inclusive category, and finally distribute the global value across its instances. This categorization-and-probability-redistribution process would allow less value for the target hypothesis than when singularly evaluated. In order to distinguish the effects of the contrasting support factor and the relatedness-between-hypotheses factor, three target hypotheses were presented in different lists. In each list, competing hypotheses had high or low support, and belonged/did not belong to the same category as the target hypothesis. According to the contrasting support interpretation, any target hypothesis should be affected by the degree of support of the displayed competing hypotheses, regardless of the presence or absence of relatedness between them. On the contrary, according to the relatedness interpretation, only the judged probability of the target cause that belongs to the same category as the competing causes should be affected.

Method Following Tversky and Koehler (1994, Study 1),1 participants assessed the likelihood of various possible causes of the death of a person. Four lists of causes of death were presented to four groups of participants. Each list displayed six mutually exclusive nonexhaustive possible causes of death of a person. Participants

1

Although Tversky and Koehler (1994) acknowledged that they were following the method of Russo and Kolzow (1994), the task they used differed in many respects from that used by Russo and Kolzow. For example, in Tversky and Koehler’s study 1, no hierarchical organization of the hypotheses in branch and specific causes was displayed to the subjects. Also, the list of causes was not exhaustive (e.g., they did not present subjects with a catch-all category such as “all other causes”). ” 2005 Hogrefe & Huber Publishers

Experimental Psychology 2005; Vol. 52(1):55Ð66

60

N. Bonini & M. Gonzalez: Inconsistent Probability Estimates and Contrasting Support

judged the percentage probability of each cause being the cause of death of an Italian man aged 40Ð45 years. Three causes were inserted in all the lists (target hypotheses). The other three causes of each list were distinct in four different sets (competing hypotheses). The latter described either three specific types of cancer or three different kinds of disease. All of them had either a low or a high degree of support as a possible cause of death. Material A pilot study was run to devise the four lists of possible causes of death. A group of 50 undergraduate students of the University of Cagliari (Italy), who did not participate in the experiment, were presented with a list of 46 items. These items were selected from fault-tree and related studies about a person’s decease (Bonini et al., 1994; Russo & Kolzow, 1994). Each item described a possible cause of death and was organised across three categories of causes of death: 14 items described different types of cancer; 24 items described other types of disease; 8 items described accidental causes of death. Participants were asked to estimate the percentage due to each cause, among the deaths of Italian men aged 40Ð45 years. Findings from the pilot study show that the total judged frequency largely exceeded 100 % (mean total frequency: 418 %). Selection of the Three Target Causes The causes of death that were judged as most frequent in each of the three categories were “lung cancer” (mean frequency judgment: 27 %), “infarcts” (28 %), and “road accident” (28 %). These three items were consequently selected as target causes. That is, they were inserted in each of the four lists that were used in the experiment. Selection of the Competing Causes Four other sets of three items each were selected. One set contained the causes of death by cancers that were judged as most frequent after “lung cancer”: “leukaemia” (19 %), “liver cancer” (16 %), and “stomach cancer” (15 %). Another set contained the causes of death by cancers that were judged as least frequent: “skin cancer” (10 %), “oesophagus cancer” (9 %), and “larynx cancer” (9 %). A third set contained the causes of death by a disease other than cancer that were judged Experimental Psychology 2005; Vol. 52(1):55Ð66

as most frequent after “infarcts”: “AIDS” (28 %), “cerebral ischemia” (13 %), and “cirrhosis” (10 %). The last set contained the causes of death by diseases other than cancer that were judged as least frequent: “asthma” (2 %), “abdominal hernia” (1 %), and “influenza” (1 %). Each of these four sets of competing causes was inserted in one of the four lists. In sum, each list contained three target causes and three competing causes. The three competing causes were either types of cancer or kinds of disease other than cancer that were judged as either frequent or rare causes of death. Participants Two hundred undergraduate students of the University of Cagliari (Italy) took part in the experiment. They were randomly assigned to four equal-sized groups (N = 50 per group). Design and Procedure The four lists correspond to the crossing of two factors: support of displayed competing causes (henceforth “competing support;” high vs. low support of displayed competing causes) and type of competing causes (henceforth “Competing type;” types of cancer vs. diseases other than cancer). A 2 (Competing Support) ¥ 2 (Competing Type) ¥ 3 (Target Cause) experimental design was used, with the last factor as a within-subject factor. Each group was presented with only one list with the six items printed on one page, in one of two different orders of presentation, and asked to evaluate the probabilities that a person died from the different causes. To make clear that the list was not exhaustive of all possible causes of death, subjects were told that: 1) the person to be judged was randomly selected from the population of the thousands of dead Italian 40Ð45 old males who died from various causes of death; 2) the six displayed causes represented only a few of the many possible causes of the decease of the person; 3) an example of two causes of death that were not included in the list was presented. Participants read the following instructions: Last year thousands of Italian men aged 40Ð45 years died of various causes of death. Imagine that one of these men is randomly selected. In the next page, you will be presented with six of the many possible causes of his death. You are asked to estimate the probability that, in your opinion, the decease of that person was due to each of the six causes. For each cause, write a prob” 2005 Hogrefe & Huber Publishers

N. Bonini & M. Gonzalez: Inconsistent Probability Estimates and Contrasting Support

ability estimate between 0 % (you think that the decease is surely not due to that cause) and 100 % (you think that the decease is surely due to that cause). According to statistics, last year, for example, the probability of death by “suicide” and “inflammation” in the Italian population was 1.5 % and 4.2 %, respectively.

Results and Discussion The mean sum of the probability judgments of the six nonexhaustive causes of death was 131 %. Hence, probabilities were overestimated. The mean probability estimates of the three target causes, and the total probability estimates of the competing causes across the four conditions are presented in Table 1. As expected, the judged probability of the competing causes was higher in the high than in the low support condition. Moreover, in the low competing support condition the judged probability of competing types of cancer was higher than competing noncancer diseases (F(1, 196) = 3.04, p = .08). An ANOVA with a 2 (Competing Support) ¥ 2 (Competing Type) ¥ 3 (Target Cause) design with the last factor as a within-subject factor was performed. As predicted from the contrasting support hypothesis, the mean probability judgments of the target causes were higher in the low than in the high competing support condition (36 % vs. 30 %, respectively; F[1, 196] = 6.91, p ⬍ .01). However, the effect of the competing support interacted with the type of competing causes (F[1, 196] = 8.14, p ⬍ .01). The effect of the competing support was found when the competing causes were noncancer diseases (41 % vs. 28 % for low and high competing support, respectively; F(1, 196) = 15.02, p ⬍ .001). However, it was not found when the competing causes were types of cancer (30 % vs.

61

31 %; F[1, 196] ⬍ 1). The interaction between the competing support and the target cause was not found (F[2, 392] = 2.35, p = .10). This suggests that the manipulation of the competing support affects all target causes (see Table 1). As predicted by the relatedness-between-hypotheses interpretation, the judged probability of “lung cancer” is lower with competing cancers than with competing noncancer diseases (30 % vs. 35 %, respectively; F[1, 196] = 3.68, p = .06). However, the interaction between the type of competing causes and the competing support is statistically significant (F[1, 196] = 5.66, p ⬍ .05). This shows that the effect of the type of competing causes (noncancer diseases vs. cancers) depends on the support for the displayed competing causes. In fact, only in the low competing support condition is the judged probability of “lung cancer” lower with competing cancers (30 %) than with competing noncancer diseases (43 %), F[1, 196] = 9.24, p ⬍ .01. Furthermore, as shown in the left part of Table 1, in the low competing support condition the effect of type of competing causes is not limited to “lung cancer” but concerns all three target causes (F[1, 196] = 10.43, p ⬍ .001). Also, the interaction between the factors type and target is not statistically significant (F[1, 196] ⬍ 1). The explanation based on the perceived relatedness-between-hypotheses does not give an adequate account of reported findings. First, the effect of the factor type of competing causes (cancers vs. noncancer diseases) on the judged probability of “lung cancer” should have been found regardless of the support of the displayed competing causes. Unlike this prediction, it was found only in the low competing support condition. Second, and most important, the relatedness effect should have been found only for “lung cancer,” but it was found for all three-target causes. The above analysis does not distinguish participants whose probability estimations over the six non-

Table 1. Mean probability estimates of causes of death between the two competing support conditions and the two types of competing causes conditions. Competing Causes Low Support

High Support

Diseases

Cancer

Diseases

Cancer

Competing causes Overall

16 %*

26 %*

44 %*

47 %*

Target causes Lung cancer Infarcts Road accident

43 % 36 % 46 %

30 % 26 % 36 %

27 % 29 % 29 %

29 % 31 % 33 %

* Mean sum of the probabilities assigned to the competing causes. ” 2005 Hogrefe & Huber Publishers

Experimental Psychology 2005; Vol. 52(1):55Ð66

62

N. Bonini & M. Gonzalez: Inconsistent Probability Estimates and Contrasting Support

exhaustive causes sum up to less than 1 and those whose total probability estimation was equal or greater than 1. If the former seem to have understood that the set of displayed causes is not exhaustive, there is uncertainty about the latter. Participants whose estimations summed to 1 may have considered the set as an exhaustive one. In this case, the probability of a target cause may consistently vary as a function of the competing causes. However, one cannot say whether the participants who gave estimations summing to more than 1 considered that the displayed hypotheses were exhaustive or not. A typical finding in the probability judgment literature is the subadditivity of probability judgments as a function of the number of displayed alternatives (or, given the same number of alternatives, as a function of the presence of individuating information). For example, Wright and Whalley (1983) and Teigen (1983) reported this phenomenon when judged hypotheses are clearly exhaustive. Thus, people can understand that a set of causes is not exhaustive while assessing probabilities that sum to more than 100 %. Below an analysis is reported based only on those participants whose probability distribution is less than 1. It should be noted that this is a conservative analysis because subjects that could have correctly considered the set as nonexhaustive are discarded from the analysis (e.g., all subjects or a subset of the subjects whose probability distribution was greater than 1). Out of the 200 subjects, 46 subjects (23 %) provided probability judgments that summed to 100 %, 55 subjects (27 %) provided probability judgments that summed to less than 100 %, and 99 subjects (50 %) provided probability judgments that summed to more than 100 %. The 55 subjects were similarly distributed across the four conditions: 14 subjects in the low contrasting support and noncancer diseases condition, 15 subjects in the low contrasting support and cancers condition, 12 subjects in the high contrasting support and noncancer diseases condition, and 14 subjects in the high contrasting support and cancers condition. The mean of the sum of probability distribution over the six causes is 66 %. An ANOVA with a 2 (Competing Support) ¥ 2 (Competing Type) ¥ 3 (Target Cause) design, with the last factor as a within-subject factor, was performed. Although not statistically reliable, a pattern of inconsistent probability estimates of the target causes between the two contrasting support conditions was found. The judged probability of the target cause was higher in the low (19 %) than in the high (16 %) contrasting support condition (F[1, 51] = 2.01, p = .16). This pattern was systematically found for all three Experimental Psychology 2005; Vol. 52(1):55Ð66

target causes. It was found for “infarcts” (16 % vs. 14 %), “lung cancer” (17 % vs. 15 %), and “Road accident” (22 % vs. 20 %). Also, the analysis showed that the probability estimate of the target causes change between the two types of competing causes (cancers vs. noncancer diseases). The probability of the target cause was lower in the competing cancers (16 %) than in the competing noncancer diseases (19 %) condition (F[1, 51] = 4.16, p ⬍ .05). Furthermore, the interaction between competing type and competing support (F[1, 51] ⬍ 1, p = .44) and the interaction between competing support and target cause (F[2, 51] ⬍ 1, p = .79) were not statistically reliable. As with the total sample of subjects (see Table 1), the effect of type of competing causes is not constrained to “lung cancer,” but affects all three target causes. Again, this evidence cannot be accounted by the relatedness-between-hypotheses explanation.

Experiment 2 Experiment 1 showed that the relatedness hypothesis does not give a satisfactory account of the reported findings. Thus, the relatedness hypothesis is no longer considered in the following experiments. In Experiment 1, an effect of contrasting support was evidenced, but it was not found in all conditions. Also, it was not statistically reliable when the analysis of consistency was based only on the subjects whose probability distribution over the six not-exhaustive causes was less than 1. The aim of Experiment 2 was to better control the contrastive support hypothesis. To this end, two lists were devised in order to maximize the difference in the total contrastive support for the target hypotheses. Also, to avoid the grouping of displayed hypotheses the two lists covered unrelated causes of death. This allowed us to attribute the reporting of inconsistencies in the judgment of the target causes only to the degree of support of the displayed competing causes. Finally, unlike the previous experiment, only two target causes were used: one with high and one with low support. Thus, the effect of contrasting support was tested on target causes of different degrees of support.

Method Two lists of five causes of death were presented to two groups of participants. Each list displayed five ” 2005 Hogrefe & Huber Publishers

63

N. Bonini & M. Gonzalez: Inconsistent Probability Estimates and Contrasting Support

mutually exclusive nonexhaustive possible causes of death of a person. As in the previous experiment, participants judged in percentage the probability that each of the displayed items might be the cause of a person’s decease.

Material The lists were based on the results of the pilot study that was described in Experiment 1. Two items (target causes) were common to both lists: one with high support (“lung cancer”) and the other with low support (“diabetes”). In the pilot study, the mean probability judgments of dying by “lung cancer” and “diabetes” were 27 % and 5 %, respectively. In each list, the three remaining items were specific causes of death that did not belong to the same inclusive category of target causes. In the low competing support condition, the three specific items were “asthma,” “abdominal hernia,” and “influenza.” They are the same causes as the ones used in the previous experiment, which had been judged unlikely in the pilot study (mean total judged probability: 4 %). In the high competing support condition, the three specific items were “AIDS,” “road accident,” and “infarcts,” which were judged in the pilot study as the most likely causes of death together with “lung cancer” (mean total judged probability: 84 %).

Participants The participants were 134 students from the University of Cagliari (Italy). Six more students had been discarded from the analysis because they had either not answered or assigned a 100 % probability estimate to a hypothesis and a nonnull probability to another hypothesis. The 134 students were randomly assigned to two groups of 64 (low competing support condition) and 70 (high competing support condition).

Results and Discussion The mean sum of the probability judgments of the five causes was equal to 97 %. It was higher in the high than in the low competing support condition (116 % vs. 76 %, respectively; F[1, 132] = 24.17, p ⬍ .001). The mean probability estimates of the two target causes, and the total probability estimates of the competing causes between the two competing support conditions (high vs. low competing support) are presented in Table 2. An ANOVA with a 2 (Competing Support) ¥ 2 (Target Cause) design was performed, with the last factor as a within-subject factor. As predicted from the contrasting support hypothesis, the judged probability of the target causes was higher in the low (30 %) than in the high (17 %) competing support condition (F[1,132] = 31.49, p ⬍ .001). As the two target causes have different levels of expectation, the competing support effect is less important for “diabetes” than for “lung cancer.” Nonetheless, it is statistically significant in each case (F[1, 132] = 5.92, p ⬍ .05, and F[1, 132] = 32.95, p ⬍ .001, respectively). As in the previous experiment, we further analysed the data by considering only the participants whose probability distribution was less than 1. Those participants shown to have correctly considered the set of displayed causes as not exhaustive. Out of the 134 participants, 34 subjects (25 %) provided probability judgments that summed to 100 %, 58 subjects (43 %) provided probability judgments that summed to less than 100 %, and 42 subjects (32 %) provided probability judgments that summed to more than 100 %. Most of the subjects whose probability judgments summed to less than 100 % were found in the low (40 subjects) than in the high competing support condition (18 subjects). The mean of the sum of the probabilities over the five causes was

Table 2. Mean probability estimates of causes of death between the two competing support conditions.

Design and Procedure The experimental design was a 2 (Competing Support) ¥ 2 (Target Cause) design, with the last factor as a within-subject factor. The instructions and the probability estimation task were the same as in the previous experiment. In each condition, the group of participants was split in two subgroups, with the five items presented in two different orders. ” 2005 Hogrefe & Huber Publishers

Competing Causes Low Support

High Support

Competing causes Overall

16 %*

83 %*

Target causes Lung cancer Diabetes

47 % 13 %

25 % 8%

* Mean sum of the probabilities assigned to the competing causes. Experimental Psychology 2005; Vol. 52(1):55Ð66

64

N. Bonini & M. Gonzalez: Inconsistent Probability Estimates and Contrasting Support

59 % (54 % and 71 %, respectively in the low and high competing support conditions). An ANOVA with a 2 (Competing Support) ¥ 2 (Target Cause) design was performed, with the last factor as a within-subject factor. As predicted from the contrasting support hypothesis, the judged probability of the target causes was higher in the low (22 %) than in the high (11 %) competing support condition (F[1, 56] = 10.14, p ⬍ .01). This finding replicates that reported with the total sample of participants. As the two target causes have different levels of expectation, the competing support effect is less important for “diabetes” than for “lung cancer.” The change in the judged probability for “diabetes” amounts to a 27 % increase (a shift from 7 % to 9 %). The change in the judged probability for “lung cancer” amounts to a 127 % increase (a shift from 15 % to 35 %). Only the change in the judged probability of “lung cancer” between the two competing support conditions was statistically reliable (t[56] = 3.22, p ⬍ .01).

Experiment 3 Experiments 1 and 2 showed that the judged probability of the cause of a death of a randomly selected individual was affected by the degree of support of the displayed competing causes. The same cause of death was judged more likely when the competing causes had low rather than high support. In Experiment 3, we want to examine whether this effect extends to a frequency judgment task. We decided to use a frequency judgment task because there is evidence that frequency judgments exhibit subadditivity to a lesser extent than probability judgments (Tversky & Koehler, 1994). Thus, one expects that frequency judgments would sum to less than 100 % more often than probability judgments. This would ensure us that the displayed causes of death are considered as nonexhaustive ones. We presented two lists of mutually exclusive and nonexhaustive causes of death. They contained only one common target cause (“lung cancer”) and three specific competing causes. In the high competing support list, three high support competing causes were displayed (“road accident,” “AIDS,” and “infarcts”). In the low competing support list, three low support competing causes were displayed (“asthma,” “abdominal hernia,” and “influenza”). Experimental Psychology 2005; Vol. 52(1):55Ð66

Method Two lists of four causes of death were presented to two groups of participants. Each list displayed four mutually exclusive nonexhaustive possible causes of death of thousands of Italian people who died one year before, at the time of the experiment. Participants were asked a frequency judgment task, as in the Study 1 of Tversky and Koehler (1994) on the perception of causes of death. Participants read the following instructions: Last year thousands of Italian men aged 40Ð45 years died of various causes of death. In the next page, you will be presented with four of the many possible causes of death of those deceases. You are asked to estimate the percentage of those deceases due to each of the four causes. If you think that none of those deaths are due to the cause, please write 0 %. If you think that all deaths are due to the cause, please write 100 %. Although it is plausible that you don’t know the exact percentage of death for each cause, please try nevertheless to be as accurate as possible. According to statistics, last year, for example, the percentage of death by “suicide” and “inflammation” in the Italian population aged 40Ð45 years old was 1.5 % and 4.2 %, respectively.

Material Two lists of mutually exclusive and nonexhaustive causes of death were used. In both lists, there was only one target cause (“lung cancer”) and three competing causes. In the high competing support list, three high support competing causes were displayed (“road accident,” “AIDS,” and “infarcts”). In the low competing support list, three low support competing causes were displayed (“asthma,” “abdominal hernia,” and “influenza”). For each condition, two lists with an opposite presentation order of the causes was used.

Participants The participants were 98 undergraduate and graduate students from the University of Trento (Italy), randomly assigned to two groups of 47 (low competing support condition) and 51 (high competing support condition) participants. ” 2005 Hogrefe & Huber Publishers

N. Bonini & M. Gonzalez: Inconsistent Probability Estimates and Contrasting Support

Results and Discussion

65

General Discussion

The mean sum of the estimated frequencies of the four causes was 61 %. It was higher in the high than in the low competing support condition (74 % vs. 47 %, respectively; t[96] = 4.14, p ⬍ .001). The mean frequency of “lung cancer,” and the mean total frequencies of the competing causes in the two competing support conditions (high vs. low competing support) are presented in Table 3. A t test for independent samples was undertaken with the frequency estimates of “lung cancer.” As predicted from the contrasting support hypothesis, the frequency of “lung cancer” was higher in the low (34 %) than in the high (22 %) competing support condition (t[96] = 2.93, p ⬍ .01). As in the two previous experiments, we analysed the data by considering only the subjects whose total estimates were less than 100 %. Those subjects shown to have correctly considered the set of displayed causes as nonexhaustive. Out of the 98 subjects, 20 (20 %) provided frequencies that summed to 100 %, 74 (76 %) provided frequencies that summed to less than 100 %, and only 4 (4 %) provided frequencies that summed to more than 100 %. If we consider the subjects whose total frequencies summed to less than 100 %, there were 40 subjects in the low competing support condition and 34 subjects in the high competing support condition. The mean of the total frequencies over the four causes was 47 % (37 % and 57 %, respectively in the low and high competing support conditions). A t test for independent samples was performed on the frequency estimate of “lung cancer.” As predicted from the contrasting support hypothesis, the frequency of “lung cancer” was higher in the low (27 %) than in the high (16 %) competing support condition (t[72] = 3.08, p ⬍ .01). This finding replicates that reported with the total sample of participants. Table 3. Mean frequency estimates of “lung cancer” between the two competing support conditions. Competing Causes Low Support

High Support

Competing causes Overall

13 %*

53 %*

Target cause Lung cancer

34 %

22 %

* Mean sum of the frequencies assigned to the competing causes. ” 2005 Hogrefe & Huber Publishers

The reported findings showed that the judged probability of a target cause and its frequency estimate changed as a function of the display of its alternatives. This was found when the target cause was displayed across lists of not exhaustive and mutually exclusive causes of death. Specifically, the judged probability/ frequency of the target cause was higher when low rather than high support competing causes of death were displayed. These inconsistent probability/frequency judgments are normatively incorrect. Specifically, they demonstrate a lack of extensionality of subjective probability when people are asked to assess the cause of death of a person in lists that describe the person and the cause in the same way (e.g., the probability that a 40Ð45-year-old Italian man died by “lung cancer”). The reported phenomenon can be attributed neither to the anchoring and adjustment heuristic nor to the conjoint effect of the catch-all underestimation bias and the experimental demand for additive probability/ frequency distribution. This is because in the reported experiments the number of displayed causes is the same across the lists, and the sum-to-100 % probability constraint is absent. We argue that the reported phenomenon is due to differences in the perception of the contrastive support of the target cause across the lists. Specifically, we argue that the perception of the contrastive support is higher when high rather than low support specific alternative causes are displayed. Changes in the perception of the contrastive support of a target hypothesis as a function of the way the catch-all category is unpacked may be due either to a memory-based or a salience mechanism. With the memory-based mechanism, when low support competing causes are displayed, people may fail to retrieve high support ones. With the salience mechanism, high support hypotheses may well be retrieved, but they are weighed less in the low than in the high competing support display because they were not explicitly presented. Tversky and Koehler (1994) also indicated these two mechanisms as sources of lack of extensionality in probability judgment. Future research should try to disentangle the potential role of these two mechanisms on the reported phenomenon. Also, it should try to assess whether the reported inconsistency in judged probability extends to other more familiar domains. Finally, future research should try to check the generality of the reported phenomenon with a different operationalization of the contrastive support of a target hypothesis. In Experimental Psychology 2005; Vol. 52(1):55Ð66

66

N. Bonini & M. Gonzalez: Inconsistent Probability Estimates and Contrasting Support

this study, the contrastive support (high vs. low) of a target hypothesis was measured in terms of the judged probability (high vs. low) of a list of causes of death from a pilot study. This is one of several possible operationalizations of the contrastive support. For example, an argument generation task could be used. With this operationalization, a list of high competing causes would be related, for example, to those causes that are most supported in terms of quantity/quality of arguments in their favor.

Acknowledgement We wish to thank two anonymous referees for their comments on an earlier version of this paper. We also wish to thank Peter Ayton, Dave Krantz and Yuval Rottenstreich for helpful discussions of this study while the first author was on sabbatical at Princeton University in 1998. Also, we wish to thank Paolo Legrenzi, Joop van der Pligt, and Dan Osherson for comments on the Ph.D. thesis of the first author. Finally, we gratefully acknowledge a financial contribution by the M.I.U.R.-F.I.R.B. to the first author.

References Bonini, N., & Van Schie, E. C. M (1991). Context effects in causal estimates: The impact of the presence of related categories. Paper presented at the 13th Research Conference on Subjective Probability, Utility and Decision Making, Fribourg, Switzerland. Bonini, N., Van Schie, E. C. M., Van der Pligt, J., & Legrenzi, P. (1994). Gli effetti della manipolazione del contesto di giudizio sull’invarianza della stima probabilistica dell’accadimento di eventi. Giornale Italiano di Psicologia, 1, 47Ð62. Fischhoff, B., Slovic, P., & Lichtenstein, S. (1978). Fault trees: Sensitivity of estimated failure probabilities to problem representation. Journal of Experimental Psychology: Human Perception and Performance, 4, 330Ð344. Harries, C., & Harvey, N. (2000). Are absolute frequencies, relative frequencies, or both effective in reducing cognitive biases? Journal of Behavioral Decision Making, 13, 431Ð 444. Hirt, E. R., & Castellan, N. J. Jr (1988). Probability and category redefinition in the fault tree paradigm. Journal of Experimental Psychology; Human Perception and Performance, 14, 112Ð131. Idsen, L. C., Krantz, D. H., Osherson, D., & Bonini, N. (2001). The relation between probability and evidence judgment: An extension of support theory. Journal of Risk and Uncertainty, 22, 3, 227Ð249.

Experimental Psychology 2005; Vol. 52(1):55Ð66

Kahneman, D., Slovic, P., & Tversky, A. (Eds.) (1982). Judgment under uncertainty: heuristics and biases. New York: Cambridge University Press. Kahneman, D., & Tversky, A. (1973). On the psychology of prediction. Psychological Review, 80, 251Ð273. Koehler, D. J. (1994). Hypothesis generation and confidence in judgment. Journal of Experimental Psychology: Learning, Memory and Cognition, 20, 461Ð469. Rottenstreich, Y., & Tversky, A. (1997). Unpacking, repacking, and anchoring: Advances in support theory. Psychological Review, 104, 2, 406Ð415. Russo, J. E., & Kolzow, K. (1994). Where is the fault in fault trees? Journal of Experimental Psychology: Human Perception and Performance, 20, 1, 17Ð32. Smith, E. E., Shafir, E., & Osherson, D. (1993). Similarity, plausibility, and judgments of probability. Cognition, 49, 67Ð96. Teigen, K. H. (1983). Studies in subjective probability III: The unimportance of alternatives. Scandinavian Journal of Psychology, 24, 97Ð105. Tversky, A., & Koehler, D. J. (1994). Support theory: A nonextensional representation of subjective probability. Psychological Review, 101, 4, 547Ð567. Van der Pligt, J., Eiser, J. R., & Spears, R. (1987). Comparative judgements and preferences: The influence of the number of response alternatives. British Journal of Social psychology, 26, 269Ð280. Van der Pligt, J., Van Schie, E. C. M., & Hoevenagel, R. (1998). Understanding and valuing environmental issues: The effects of availability and anchoring on judgment. Zeitschrift fur Experimentelle Psychologie, 45 (4), 286Ð302. Van Schie, E. C. M., & Van der Pligt, J. (1990). Influence diagrams and fault trees: The role of salience and anchoring. In K. Borcherding, O. I. Larichev, D. M. Messick (eds.), Contemporary issues in decision making (pp. 177Ð188). Amsterdam: North Holland. Wright, G., & Whalley, P. (1983). The supra-additivity of subjective probability. In B. P. Stigum, F. Wenstop (eds.), Foundations of utility and risk theory with applications (pp. 233Ð 244). Reidel Publishing Company.

Received March 23, 2003 Revision received January 26, 2004 Accepted January 27, 2004 Adress for correspondence Nicolao Bonini Department of Cognitive Sciences and Education Center for Decision Research University of Trento Via Matteo del Ben, 5 I-38068 Rovereto Italy Tel. +39 0464 483 525 Fax +39 0464 483 554 E-mail [email protected]

” 2005 Hogrefe & Huber Publishers