There's something about Linda: Probability ... - Uni Oldenburg

13 downloads 69 Views 72KB Size Report
ABSTRACT: Gigerenzer has argued that many subjects in the Linda experiment do not violate the conjunction principle because they understand the word ...
There’s something about Linda: Probability, coherence and rationality Mark Siebel (angenommen bei Philosophical Psychology)

ABSTRACT: Gigerenzer has argued that many subjects in the Linda experiment do not violate the conjunction principle because they understand the word ‘probable’ in a non-mathematical way. This raises the question of what kind of reasoning they perform instead of probabilistic reasoning. I propose to view their judgements as resulting from inferences to the maximally coherent whole, where explanatory relations are a central factor in increasing coherence. Furthermore, I examine the consequences of that account for people’s rationality.

1.

The Linda experiment

To what extent are human beings rational in their reasoning? The way we see ourselves is as being quite rational: sometimes we might be a bit hasty; most of the time, however, we make justified judgements and valid inferences. But is that true? Does our self-image correspond to reality? 30 years ago, Amos Tversky and Daniel Kahneman started a series of studies about judgements under uncertainty. They examined the way people estimate and compare probabilities, and their conclusion was rather devastating: We do not reason in accordance with the rules of probability theory but are prone to heuristics and biases leading to severe errors. Since the principles of the probability calculus provide the standard of rationality for that domain, our reasoning is not rational. Thus, it seems we have to face a fourth humiliation, the others being the ones due to modern cosmology, the theory of evolution and the psychoanalytic school. We are not at the centre of the universe, we are descended from monkeys, we have suppressed attitudes not in our control – and we are not the rational beings we would like to be. In one of Tversky’s and Kahneman’s most famous experiments, subjects read a brief personality sketch and have to say to which of two categories the described person is more likely to belong: Linda is 31 years old, single, outspoken and very bright. She majored in philosophy. As a student, she was deeply concerned with issues of discrimination and social justice, and also participated in anti-nuclear demonstrations. Which of these two alternatives is more probable? (a) Linda is a bank teller. (b) Linda is a bank teller and is active in the feminist movement. Nearly 90 % of the statistically naive participants voted for the second alternative. Among statistically sophisticated subjects, that proportion dropped to 50 %. But when (a) and (b) were embedded in a list of eight alternatives, over 80 % of the

1

trained persons ranked (b) higher than (a) (cf. Kahneman and Tversky, 1982, p. 496; Tversky and Kahneman, 1982, pp. 91-94). This result is considered as showing that people commit the conjunction fallacy. They seem to rank the conditional probability of Linda’s being a feminist bank teller, given that the personality sketch is true, higher than the conditional probability of her being a bank teller, given the same assumption. As a result, they violate the probabilistic principle that a conjunction cannot be more likely than one of its conjuncts because the former entails the latter: p(A & B|C) ≤ p(A|C) (cf. Tversky and Kahneman, 1983, p. 303). Are those people irrational because they do not reason in accordance with a quite simple and apparently cogent probabilistic rule? I want to argue that there is still hope. In the next section, I describe, but also partly criticise, Gerd Gigerenzer’s approach to Tversky and Kahneman’s experiments. The point to be maintained is that subjects interpret the instructions in a way that prevents them from estimating probabilities in the sense of the mathematical calculus. This raises the question of what kind of reasoning they perform instead. In section 3, I discuss David Chart’s proposal, which, according to its title, considers people as making an inference to the best explanation. It will be shown, however, that Chart added the wrong title to his account because, actually, he takes them to infer the bestexplained explanandum. Nonetheless, by stressing explanations, he is on the right track. In the fourth part, I first outline the notion of coherence, which can be carried out of its usual context, namely, epistemology and philosophy of science, in order to be put to use in psychology. For I want to propose that participants’ judgements can be seen as resulting from inferences to the maximally coherent whole, where explanatory relations are a central factor in increasing coherence. After briefly pointing out in section 5 how one could distinguish empirically that account from Tversky and Kahneman’s, the final section is devoted to the consequences of the coherentist analysis for people’s rationality. I emphasise that, despite Gigerenzer’s argument, a Subjective Bayesian will judge people to be irrational even if they do not assess probability but coherence. But I also argue that this means to invoke a standard of rationality too narrow. One can do worse than choosing the most probable hypothesis; but sometimes it is even more rational to select the one providing more coherence because this increases the chance of understanding. 2.

Gigerenzer’s worry

Do the participants who choose (b) really make a probabilistic mistake? Of course, one might think, they assert thereby that (b) is more likely than (a); and since (a) is one of (b)’s conjuncts, that assertion violates the conjunction rule. However, Gigerenzer and his colleagues have pointed out that things are not that straightforward. They have presented two considerations in order to call into question that subjects are in conflict with probability theory. While I am not convinced by the first argument, I find the second quite plausible. The starting point of the first argument is the fact that probability theory is no unanimous affair. There are different camps – including Subjective1 Bayesianism 1

In the following, I leave aside this qualification. From now on, ‘Bayesianism’ just means Subjective Bayesianism, i.e., the interpretation of probability that, among other things, equates it with degree of belief.

2

and frequentism – that have an argument over the right interpretation of probability. Now, subjects in the Linda experiment are asked to think about the probabilities of statements saying that an individual has a certain property (i.e., about single events). From the Bayesian point of view, there is nothing to be said against assigning a probability to such a statement because it just means conferring on it a certain degree of subjective confidence. But it is dubious if we assume the frequency interpretation. According to a frequentist, probability necessarily involves a reference class because the probability of an event is its relative frequency within a certain class of events. That is, it can vary from reference class to reference class, and it is not defined without such a class. Claiming that the probability of Linda’s being a bank teller is, say, 0.01 thus makes no sense without a specification of a reference class. Whatever probabilities are assigned, Gigerenzer says, such claims cannot amount to a violation of the conjunction principle because they do not respect a precondition of tackling them with the probability calculus: From the frequency point of view, the laws of probability are mute on the Linda problem, and what has been called a conjunction fallacy is not an error in probabilistic reasoning – probability theory simply does not apply to such cases. (Gigerenzer, 1993, p. 292f.)

To be sure, Gigerenzer does not claim that frequentism is true (cf. Gigerenzer, 2001, p. 100; Vranas, 2000, pp. 181f.). He rather argues that it depends on the interpretation of probability whether or not subjects are to be counted as making a mistake. Probability theorists are far from agreeing that the reading Tversky and Kahneman seem to presuppose, namely, Bayesianism, is the right one. And on the frequency interpretation, participants do not violate the conjunction rule. Hence, the conclusion that participants’ answers are erroneous is too hasty. But note that this argument rests on the questionable premise that there is no reference class given. For it seems that the story about Linda provides such a class: women who are 31 years old, single, outspoken and so on.2 Even an advocate of the frequency interpretation can thus be content with conferring probabilities on ‘Linda is a (feminist) bank teller’ because the context offers a reference class that makes these assignments intelligible. ‘Linda’s being a feminist bank teller is more probable than her being a bank teller’ just means that the frequency of feminist bank tellers within the class in question is higher than the frequency of bank tellers. That claim is wrong, but it satisfies every condition that has to be satisfied in order to apply the apparatus of probability theory in its frequentist variant. However, Gigerenzer (e.g. 2001, pp. 95f.) goes on with a second consideration carrying strong conviction. The expression ‘probable’ is ambiguous. Apart from its mathematical sense, there are further interpretations in everyday English, such as ‘plausible’ and ‘supported by the evidence’. If participants understand the task in one of these ways, it appears that their reaction does not violate the probability calculus because it has a different topic. What they judge is then, for ex2

Gigerenzer (1993, p. 293f.) himself exploits the reference class when he replaces Tversky and Kahneman’s single-event question by a frequency question: ‘There are 100 people who fit the description above. How many of them are: (a) bank tellers, (b) bank tellers and active in the feminist movement?’ – This experiment is meant to show that people know the conjunction rule but do not apply it in Tversky and Kahneman’s setting because the instructions were formulated in a way that prevented them from ranking probabilities in the mathematical sense. Cf. also Cosmides and Tooby, 1996; Fiedler, 1988.

3

ample, that, given Linda’s traits, her being a feminist bank teller is more plausible than her being a bank teller. This judgement does not seem to be in conflict with the conjunction rule because it is not about probability in the sense of the calculus. According to Hertwig and Gigerenzer (1999, p. 278), subjects are even urged to choose a non-mathematical interpretation because of Paul Grice’s relevance maxim.3 This maxim is a conversational rule usually taken to be followed by participants in a dialogue. It says, roughly, that one’s contributions have to be relevant to the topic and goal of the conversation. Applied to the case at hand, this means that every part of the experimenters’ instruction is relevant. On the mathematical interpretation of ‘probable’, however, the description of Linda is irrelevant to the question subjects are supposed to answer. For the probability of a conjunction, given an assumption, cannot be higher than the probability of one of the conjuncts, given the same assumption – regardless of what that assumption is. Take any description C you like; according to probability theory, p(A & B|C) is never higher than p(A|C). It does not matter whether Linda is described as having studied philosophy or as having made a bank traineeship; in any case the conjunction rule prohibits assigning Linda’s being a feminist bank teller a higher conditional probability than her being a bank teller. Hence, people might choose a reading of ‘probable’ not captured by probability theory because, otherwise, the personality sketch had to be considered idle (cf. also Gigerenzer, 2001, p. 96). 3.

Chart’s account

Given that many participants in the original experiments do not reason probabilistically, what type of inference do they make instead? In an online-paper, Chart (2001) argues that subjects make an inference to the best explanation. This idea shows the right way because it emphasises the role of explanation. However, a closer look at Chart’s account reveals that, actually, it does not invoke inferences to the best explanation. What is an inference to the best explanation? Briefly, it is an inference to an explanans that best explains an explanandum. There are some facts, the evidence, which is taken as given, and several hypotheses being at stake. The latter are examined for whether, and how well, they explain the evidence. Then the one providing the best explanation is chosen. When we apply this to the Linda case, the situation is as follows. The story about her is the evidence because it is the part that is not in question in the given context. The assumptions that she is a bank teller and that she is a feminist bank teller are the hypotheses because they constitute the set from which an element is to be selected. Hence, in an inference to the best explanation, the question would be which of these assumptions offers a better explanation for the story about

3

Cf. Grice, 1975, p. 27; Adler, 1984, p. 167; 1991, pp. 258f. Adler (1991, p. 255) and other researchers also emphasised that the relevance maxim might tempt subjects to read ‘Linda is a bank teller’ as implying that she is not a feminist. Some experimenters blocked this implicature by using formulations like ‘Linda is a bank teller, whether or not she is active in the feminist movement’. Since about 50 % of the participants in many studies still choose ‘Linda is a bank teller and is active in the feminist movement’, it is fair to say that this explanation does not account for all of them. The above-mentioned way of preserving the maxim is still in play. For a brief overview see Hertwig and Gigerenzer, 1999, pp. 297f.

4

Linda. That is, the story is the explanandum and the assumptions are the (putative) explanantes. However, when Chart tries to show that an inference to the best explanation accounts for the outcome of the experiment, he argues just the other way round: Linda’s background, as given, provides absolutely no explanation for her becoming a bank teller. It is completely unexpected, nothing in her background seems prone to cause it, and her background and the career are not at all unified. On the other hand, it does provide a partial explanation for her becoming a feminist bank teller: it explains why she is a feminist. Those political views are expected on the basis of her background, could be caused by several elements of it, and the background and politics are quite well unified.

Obviously, Chart conceives of the alternative assumptions as the explananda and the story about Linda as the explanans. However plausible his account may be, he has therefore added the wrong title to it because it is not an account in terms of an inference to the best explanation. It rather assumes that subject choose the bestexplained explanandum. In my view, both approaches are not liberal enough because they force us to sort the personality sketch and the assumptions into only one drawer, respectively – either ‘explanans’ or ‘explanandum’. Thereby they ignore that the explanatory relations can take different directions for different parts of the sketch. We must not overlook that the story about Linda consists of a mixture of present- and past-tense formulations: she is single and outspoken, and she was deeply concerned with issues of discrimination. As to the past facts, it is reasonable to view some of them, e.g., Linda’s having been concerned with issues of discrimination, as explaining why Linda is active in the feminist movement because they could have caused that activity. Here a constituent of the sketch is the explanans. But if we have a look at the present facts, the reverse order of explanation makes more sense. For example, Linda’s being outspoken can be explained by her being an active feminist because such persons often learn from their companions to express their opinion straight out. This also appears to be the natural order of explanation with respect to Linda’s being single. Why is she single? Because, one could argue, feminists are more likely to choose not to marry as early as nonfeminists in order to preserve their professional options and independence. Here parts of the story about Linda are the explananda. In all of these cases, an effect is explained by a potential cause. But in the first case, the cause belongs to the personality sketch and the effect is described by one of the hypotheses, whereas in the other two cases, the cause is to be found in the hypotheses and the effects in the sketch. Demanding of the personality sketch that it is either explanans (potential cause) or explanandum (effect) is thus an unnatural limitation. For that reason, I advocate neither an account referring to inferences to the best explanation nor Chart’s account in terms of an inference to the best-explained explanandum. I want to suggest an interpretation that resembles Chart’s insofar as it places great emphasis on explanation. It differs from it, however, in embedding explanation in a wider framework, namely, coherence. As we will see, this framework is more generous as to the explanans/explanandum status of the sketch and the hypotheses.

5

4.

The coherentist account

There is a certain use of the expression ‘coherence’ in probability theory, or more exactly, in the Bayesian camp. In the Bayesian view, probabilities are to be interpreted as degrees of confidence, and such degrees are coherent if and only if they satisfy the principles of probability theory. That is, the probability a person assigns to a statement represents the extent to which she believes in its truth. And the person’s doxastic system is considered coherent just in case her degrees of confidence do not violate the rules of the calculus. Moreover, that condition is taken to be necessary for rational belief: the degrees of confidence of a rational person must not violate the axioms of probability theory. For example, if A implies B, then a rational believer is not allowed to place more confidence in A than in B. Tversky and Kahneman concur with the Bayesian account. In a summary of their experimental results, they say: [I]ntuitive judgments of […] probabilities are not likely to be coherent, that is, to satisfy the constraints of probability theory. (Tversky and Kahneman, 1983, p. 313; my emph.)

There is, of course, nothing to be said against using ‘coherence’ as an abbreviation for a feature like conformity to the probability calculus. But we should not lose touch with the fact that there is also a broader meaning of that term whose home is in epistemology and philosophy of science. Whether or not participants’ judgements are coherent in the narrow probabilistic sense, they might aim at coherence in the latter sense. What is coherence in the broad sense? To adopt Laurence BonJour’s (1985, p. 93) nice formulation, “coherence is a matter of how well a body of beliefs ‘hangs together’: how well its component beliefs fit together, agree or dovetail with each other, so as to produce an organised, tightly structured system of beliefs, rather than either a helter-skelter collection or a set of conflicting subsystems”. BonJour also offers more concrete conditions in order to spell out the rough characterization.4 It was Paul Thagard, however, who first developed a formal, quantitative model of coherence implemented in a connectionist program called ECHO.5 But it is not necessary to go into the details here. Although I also show in the end of the section what Thagard’s program has to say about the Linda experiment, an intuitive analysis based on some qualitative principles makes a good starting point. First, a system of beliefs is the less coherent the more inconsistencies it contains. For example, if we add ¬P to a set containing P, we thereby decrease coherence because we get a logical inconsistency. But note that it is not only logical inconsistency that diminishes coherence. Weaker inconsistencies have a negative effect as well. Just think of a student who believes that he will pass an examination although he is quite aware of the fact that he does not know the first thing 4

Cf. also Lehrer, 1992, p. 148, whose constraints have the disadvantage of being negative for the most part because they are about the absence of conflicts. What is missing is an account of relations positively contributing to coherence. – The following presentation is due to Bartelborth, 1996, sects. IV.D and IV.F. 5 Cf., e.g., Thagard, 1989 and 1992, ch. 4. For another model based on fuzzy logic, cf. Schoch, 2000 and 2002. For probabilistic measures of coherence, i.e., functions taking as input probabilities relating to the assumptions in question and calculating from them a number that is supposed to represent their degree of coherence, cf. Shogenji, 1999, Olsson, 2002, Fitelson, 2003, Bovens & Hartmann, 2003, and Douven & Meijs, 2005.

6

about its topic. These beliefs are not logically inconsistent, but it is quite improbable that they are both true. The student’s doxastic system is thus less coherent than the system of a person who believes that she will fail the examination, provided that everything else is equal. Or consider inconsistencies resulting from prototypes. If my prototype of a bird involves the feature can fly, and if I know that ostriches cannot fly, then acquiring the belief that ostriches are birds leads to a weak inconsistency. Moreover, this could be the adequate place for coherence in the Bayesian sense. Coherence in the broad sense might be decreased to the extent to which degrees of belief do not conform to the probability calculus, that is, to the extent to which they are probabilistically incoherent. Second, although we have to bear in mind that inconsistencies decrease coherence, it is just as important to have an idea of the factors positively contributing to it. BonJour (1985, p. 98) points out that inferential relations take a central place: The coherence of a system of beliefs is increased by the presence of inferential connections between its component beliefs and increased in proportion to the number and strength of such connections.

Even if a body of beliefs is completely consistent, that does not entail that it is coherent because it does not imply that its elements hang together. But if there are inferential connections between them, the elements dovetail with each other. These connections do not have to be deductive. For P and Q to be inferentially connected, it need not be the case that, if one of them is true, the other one must be true as well. Inductive relations also contribute to coherence. If P inductively implies Q, then P and Q cohere. Although P can be true whereas Q is false, the truth of P at least makes it likely that Q is also true. Hence, they are not isolated but networked with respect to their truth-values, such that P epistemically supports Q. Thagard (1992, p. 65) has called attention to a further kind of relation that increases coherence: explanation. I do not know whether explanatory connections should be considered as a subtype of inferential connections or as distinct from them. Fortunately, there is no need to answer that question because the following argumentation does not hang on it. I will talk about inferential and explanatory relations, leaving it open thereby whether or not the latter are a special kind of the former. Anyhow, explanatory relations positively contribute to coherence. The more explanatory connections there are, and the stronger they are, the more coherent is a body of beliefs. For example, if P1 explains Q and R, whereas P2 explains only one of them, then {P1, Q, R} is more coherent than {P2, Q, R}. Third, it is not only inconsistencies which diminish coherence. A further issue is whether a body of beliefs contains isolated subsystems: The coherence of a system of beliefs is diminished to the extent to which it is divided into subsystems of beliefs which are relatively unconnected to each other by inferential connections. (BonJour, 1985, p. 98)

Suppose there is a doxastic system that contains no inconsistencies while there are many connections between its elements. It can be divided, however, into two parts such that, although the elements of each part strongly hang together, there are many fewer connections stretching across the border. That is, within the subsystems there is a large amount of epistemic support, but the subsystems do not sup-

7

port each other. A physicist who also believes in astrology might provide an example. Even if his astrological subsystem is coherent, there is no connection between the forces known in physics (for example, gravitation) and the way in which the position of celestial bodies is supposed to affect our fate. The coherence of his system of beliefs is diminished because it consists of subsystems that are relatively unconnected to each other. All in all, coherence is increased by the presence of inferential and explanatory relations and decreased by the presence of inconsistencies and isolated subsystems. My proposal now is to take participants’ judgements in the Linda experiment as being due to maximizing coherence in the above-mentioned sense. They rank higher the hypothesis leading to the most coherent overall picture. This proposal is not ad hoc. Many psychologists and philosophers have emphasised the crucial role of explanation in cognition (see, e.g., the articles in Keil and Wilson, 2000). In particular, Thagard and his colleagues have shown that the notion of (explanatory) coherence is extremely useful in handling various phenomena. By means of it, one can (partly) account for (i) the way we make sense of other people, (ii) concept combination, (iii) autistic behaviour and (iv) perhaps even emotion formation (cf. Thagard and Kunda, 1998; Thagard, 1997; O’Laughlin and Thagard, 2000; Thagard, 2000, ch. 6). It would therefore appear that coherence is in general an important factor in people’s mental life. To view the reaction in the Linda experiment as being driven by coherence is simply to view it as resulting from common mechanisms of information evaluation. That view has the advantage of providing an explanation that has proven successful in other areas and is thus quite comprehensive. As announced in the preceding section, the issue of what is explanans or explanandum is only of minor importance in a coherentist framework. The crucial point is how many explanatory relations there are and how strong they are. These connections are allowed to take different directions. Hence, there is no need to consider the story about Linda as a whole as either explanandum or explanans; and there is, accordingly, no need to confer only one of these statuses on the hypotheses. Let us first consider the assumption that Linda is a bank teller. There can hardly be found inferential or explanatory connections between it and the personality sketch. Prima facie, none of the given properties is entailed or explained by the assumption, and the assumption is neither entailed nor explained by the properties. On the contrary, they display an inconsistency. Although the hypothesis that Linda is a bank teller and the story about her are not logically inconsistent, they do not fit together in a wider sense because it is quite unlikely that a bank teller is like Linda. Therefore, this hypothesis and the story form an incoherent whole. Things are different when we pass over to the conjunctive assumption that Linda is a bank teller and a feminist. Since it contains the preceding assumption as a part, there are of course the same inconsistencies. There is even a further inconsistency because the hypothesis itself consists of two parts unlikely to be realised together. Moreover, the conjunction includes an element (‘Linda is a bank teller’) that neither explains nor is explained by any part of the story about Linda. It is the second conjunct that does the whole work. Thus, to choose the conjunction means to incorporate an element that is idle because there is no explanatory relation be-

8

tween it and other parts of the system. Since this is close to an isolated subsystem, except the ”system” here is just a part of a proposition, it could also be seen as decreasing coherence. The crucial point, however, is that various explanatory relations compensate for these factors diminishing coherence. As pointed out in the preceding section, the ‘feminist’ constituent of the hypothesis allows to explain some parts of the personality sketch, and other parts of the latter can be explained by the former. Hence, we get some degree of coherence, or at least less incoherence, such that the conjunction is to be preferred. As already said in section 1, Tversky and Kahneman (1982, p. 92) also conducted a study where subjects are offered not only two but eight statements about Linda. That list included the other part of the conjunction, ‘Linda is active in the feminist movement’. Although many subjects still ranked the conjunction higher than Linda’s being a bank teller, they ranked the hypothesis that she is a feminist even higher. Again, that can easily be captured in terms of coherence. On the one hand, regardless whether one takes the conjunction or the assumption that Linda is a feminist, the explanatory relations are the same. In the former case, however, we face many aspects decreasing coherence: there are some inconsistencies, and the bank teller element is close to an isolated subsystem. Hence, the hypothesis that Linda is a feminist leads to more coherence than the hypothesis that she is a feminist and a bank teller. The coherentist account thus nicely explains subjects’ judgements in the Linda study. They do not understand ‘probable’ in the sense of probability theory but rank the given alternatives with respect to the amount of coherence they provide. If they are offered only the two alternatives ‘Linda is a bank teller’ and ‘Linda is a feminist bank teller’, they choose the latter because the explanatory relations resulting from the ‘feminist’ component lead to more coherence. If they are also offered the hypothesis that Linda is a feminist, they rank it even higher than the conjunction because it neither gives rise to inconsistencies nor to something close to an isolated subsystem. From a coherentist standpoint, the three alternatives are expected to be ranked in the order in which subjects actually rank them. We arrive at the same result if we make use of Thagard’s specific model of coherence, the program ECHO.6 Let ‘PS’ be the personality sketch, ‘BF’ the hypothesis that Linda is a bank teller and a feminist, ‘B’ the hypothesis that she is a bank teller and ‘F’ the hypothesis that she is a feminist. Then feed into Thagard’s program the following constraints: explain((PS),BF), explain((PS),F) representing that parts of PS explain both a part of BF and F,7 contradict(PS,BF), contradict(PS,B) pointing out that PS is both inconsistent with a part of BF and with B, data(PS) making PS an item that cannot easily be rejected.8

6

A Java version of ECHO is available at http://cogsci.uwaterloo.ca. One can add ‘explain((FB),PS)’ and ‘explain((F),PS)’ in order to represent that the reverse order of explanation also makes sense for other parts of PS.

7

9

Furthermore, BF is a conjunction of B and F, and these conjuncts are inconsistent. The second part of that proposition can easily be written down in ECHO: contradict(B,F) But since there is no predicate in the program perfectly reproducing the first part, it must be represented by explanatory relations: explain((BF),B), explain((BF),F) Given these constraints, the values ECHO assigns to the hypotheses are arranged as follows: F (0.51) > BF (–0.25) > B (–0.61). If you leave aside the (constraints concerning the) hypothesis that Linda is a feminist, modelling thereby the task with only two alternatives, then ECHO displays the ranking BF (–0.38) > B (– 0.50). So, although both hypotheses are rejected, the conjunction is rejected to a higher degree. If one is urged to choose one of them, as it is the case for subjects in the corresponding experiment, the conjunction is, from a coherentist standpoint, a better choice. 5.

Distinguishing the coherentist account from Tversky and Kahneman’s account

Tversky and Kahneman (1974, p. 4; 1982, p. 97; 1983, p. 297) account for the result of the Linda experiment as follows. People do not compute probabilities by the rules of the probability calculus but rely on the so-called representativeness heuristic. They estimate the probability of Linda’s being a (feminist) bank teller by the degree to which she is similar to the stereotype of a (feminist) bank teller. Since Linda resembles the stereotype of a feminist bank teller more than the stereotype of a bank teller, subjects find it more probable that she is a feminist bank teller. Smith and Osherson (1989) and Shafir et al. (1990) specified that account on the basis of Tversky’s (1977) contrast model of similarity. According to that model, the similarity of a to b is a function of three arguments: the features common to both a and b, the features a possesses but not b, and the features belonging to b but not to a. Roughly, i.e., ignoring weights, the more common and the less distinctive features there are, the more resemblance there is. That is, subjects prefer ‘Linda is a feminist bank teller’ over ‘Linda is a bank teller’ because, among other things, there are more common features in the personality sketch and the stereotype of a feminist bank teller than in the sketch and the stereotype of a bank teller. In the Linda problem, Tversky and Kahneman’s account can hardly be distinguished empirically from the coherentist account because they make the same predictions. But it should be possible to find other material where the sketch is more similar to the stereotype given by one of the alternatives whereas the other alternative provides more coherence because of explanatory connections. Here is a rough idea, based on one of Jerry Fodor and Ernest Lepore’s (1996, sect. 4) notorious examples. A trout is a typical fish, but it is a poorish example of a pet fish. Among other things, pet fishes are normally not greyish brown but are coloured beautifully 8

Actually, you do not need that constraint in order to get the expected result. But it is part of the Linda situation that the claims of the personality sketch are beyond doubt.

10

(think of gold fishes), and they are typically not as large as trouts are. Now consider a story about an animal that starts with a lot of trout features: ‘It was greyish brown, had fins and gills, was 15 inches long, …’ In the end, however, the story says that a boy buried the animal and shed bitter tears. The question, then, is whether it is more probable that it was a fish or a pet fish. The representativeness account seems to predict that people will prefer ‘fish’ over ‘pet fish’. For there is a higher amount of feature overlap while the set of distinctive features is smaller, including, e.g., the funeral. The coherentist account, on the other hand, predicts that people choose ‘pet fish’. After all, they can thereby explain why a boy buried such a seemingly profane fish and shed bitter tears. It is quite easy to get this result in ECHO. Let ‘S’ refer to the whole story, while ‘S1’ is its first part (mentioning the trout features) and ‘S2’ the second part (the funeral and the sad boy). These propositions are beyond doubt in the given scenario: data(S,S1,S2) ‘PF’ stands for the hypothesis that the animal is a pet fish, and ‘F’ for the hypothesis that it is a fish. Again, both the fact that PF is a conjunction containing F and the fact that S is a conjunction containing S1 and S2 have to be represented by explanatory relations: explain((PF),F) explain((S),S1) explain((S),S2) The fish hypothesis explains the first part of the story and is inconsistent with the second one because boys are normally not sad when a fish dies: explain((F),S1) contradict(F,S2) What about the pet-fish hypothesis? Although it and the ‘trout part’ of the story are weakly inconsistent because pet fishes are usually not grey …, it explains the story as a whole: contradict(PF,S1) explain((PF),S) ECHO then rejects both PF and F, but F is rejected more strongly (–0.59) than PF (–0.36). 6.

Rationality

Given that people reason in a coherentist way, what about their rationality? This is a broad question that cannot be fully answered on two or three pages. Nevertheless, there are some arguments that must not be easily dismissed. To begin with, we should not turn a blind eye to the fact that the instructions of the Linda experiment are close to a conundrum. If one looks at them without blinkers on, one can have the distinct feeling that experimenters take subjects for a ride by telling a story that is completely irrelevant to the question they finally have to answer. After all, as already said in section 2, it does not matter how a description D runs, p(A & B|D) is never higher than p(A|D).

11

For this reason, it is far from obvious that the Linda study proves irrationality. It is no wonder that participants reach for an interpretation of the task according to which the story about Linda is relevant. After all, they do not expect to be involved in a conundrum-like situation but in a serious enterprise. Their interpretation is thus not subject to the verdict ‘irrational task construal’. With reference to Howard Margolis (1987, p. 141), Stanovich and West (2000, p. 655) argue that, even if subjects give an appropriate answer to the question they take to be asked, they can be deemed irrational if their interpretations are “so bizarre – so far from what the very words in the instructions said – that they represent serious cognitive errors” (cf. also Vranas, 2001, p. 108, fn. 2). But construing ‘probable’ as alluding to coherence is not bizarre. That interpretation fits the everyday use of ‘probable’; and it is reasonable not to reach for the mathematical reading intended by the experimenter because it makes the personality sketch idle and thus moves the task critically close to a conundrum. However, my main argument takes a different course. It starts with a fact that can easily be overlooked. Suppose participants in the Linda experiment do not understand ‘probable’ in the sense of the probability calculus but assess instead coherence. Does that show that, from a probabilistic perspective, their answers are not to be counted as errors? This appears to be what Gigerenzer holds, and it is quite plausible at first glance. If they do not rank probabilities, then, it seems, they cannot be in conflict with probability theory. But I think this is too simple a way out. For an advocate of the Bayesian interpretation of probability will not be so generous.9 As pointed out in section 4, on this interpretation, probabilities are degrees of confidence, and such degrees are rational only if they meet the constraints of probability theory. That is, wherever the degrees of confidence of a person come from, if they violate probabilistic principles, then the subject is not a rational believer. In particular, a person whose confidence in Linda’s being a feminist bank teller is stronger than his confidence in Linda’s being a bank teller is mistaken on that standard. The reason for the person’s preference is irrelevant. It does not matter whether it consists in the fact that he judged the conjunctive assumption to be more probable or in the fact that he judged it to provide more coherence. A Bayesian will maintain in both cases that he is irrational. For it suffices that he has a preference for the conjunction, no matter where it comes from. In that sense, people violate the conjunction rule even if they do not rank probabilities. But is the Bayesian assessment of the Linda experiment itself rational? I do not think so. The Bayesian account rather narrows down rationality by equating it with conformity to the probability calculus. We must be careful not to exaggerate the significance of probability theory to rationality. Possessing beliefs whose degrees are in accordance with the rules of probability theory is only one clue when it comes to the question of whether someone is a rational believer. There are further factors, and one of them is coherence.10 Stressing coherence does not mean to totally reject probability as a yardstick of rationality. It just means to recognise that there are many clues besides probability. The latter must be offset against further criteria. Consequently, we should be extremely cautious about drawing normative conclusions from the experiments. Let us assume no one had found any violations of probability theory. Even 9

The following argument is similar to Vranas’s (2000, p. 185; 2001, p. 108). For other criticisms of Bayesianism cf. Glymour, 1998, and Sober, 2002.

10

12

without the relevant courses, people were able to perfectly estimate and rank probabilities. Does that entail that they are more rational than we are in fact? No, it merely means that they are better at calculating probabilities. Whether they are also more rational depends on the totality of their way of gathering information, reasoning, proposing hypotheses, testing them and so on. To be sure, I do not deny that degrees of confidence conforming to the probability calculus have some value. For example, so-called Dutch book arguments show that you can be made to suffer a sure loss in betting situations if your degrees of confidence violate probabilistic rules (cf., e.g., Vranas, 2000, p. 186). Thus, if you want to steer clear of that danger, you would better not violate them. But do not ignore the fact that there are further values besides not being made to suffer a sure loss in a betting situation. One of these values turns up in the given conception of coherence: the value of explaining, and thus understanding, things. Consider two hypotheses of physics, H1 and H2, where H2 is a conjunctive part of H1. Both of them explain many crucial data, but H1 explains more of them than H2. Suppose, furthermore, the extra-element in H1 does not give rise to inconsistencies or other features that would diminish the coherence of the overall system when H1 is embedded in physics. Which hypothesis should a physicist choose? Common sense, as well as the physicist’s sense, says it is reasonable to prefer the conjunction H1 over its part H2. For its higher explanatory power enables a more extensive understanding without leading to unwelcome consequences. An account emphasizing coherence subscribes to that result, whereas a purely probabilistic account does not. Or assume a certain person, Tom, is what may be called a ‘probability addict’. If he has to choose from a set of hypotheses, he always selects the one having the highest probability given the evidence. In a Linda situation, Tom prefers the assumption that she is a bank teller over the assumption that she is a feminist bank teller because, although Bayes’s theorem tags it as being improbable as well, it is at least more likely than the conjunction. This is, in some sense, honourable. But it means to be far from understanding the states of affairs in question because there can hardly be seen any explanatory relations. Tom is confronted with two assumptions, ‘Linda is a bank teller’ and ‘Linda is an outspoken woman …’, for which there does not seem to be a connection in the offing, and which are thus close to a mystery to him. If he had chosen the hypothesis that Linda is a feminist bank teller, he would have been in a much better situation. For Tom can then partly understand why she is the way she is because he is in a position to explain some of her features. Although the bank teller element is still a nuisance, life is much more comfortable if that element is accompanied by the possibility to explain, and thus understand, things. To a probability addict, such a consideration is completely alien because, to him, adding further assumptions just means to reduce probability. He does not recognise that understanding might compensate for a higher risk as to probability. To conclude, the Bayesians’ assessment of people’s reaction in the Linda experiment is much too harsh. Of course, by preferring the conjunction, subjects display more confidence in the alternative that is less probable. They are therefore not coherent in the probabilistic sense. But that does not mean they are irrational. For they choose the alternative that gives rise to explanations and thus provides more coherence in the broad sense. Since this enables them to make sense of Linda’s features, I do not see why rationality should prohibit them from preferring

13

the less probable assumption. In such a situation, probabilistic incoherence does not entail irrationality because it is escorted by coherence in the broad sense. Occasionally, Gigerenzer alludes to the paleontologist Stephen Gould (1992, p. 469) who wrote that, although he acknowledges the conjunction rule, yet “a little homunculus in my head continues to jump up and down, shouting at me – ‘but she can’t just be a bank teller; read the description’”. Like Gigerenzer (1994, p. 131), I believe Gould should not shrug off his homunculus. For it might be a coherent thinker not addicted to probability.11 References ADLER, J. (1984). Abstraction is uncooperative. Journal for the Theory of Social Behaviour, 14, 165–181. ADLER, J. (1991). An optimists’s pessimism: Conversation and conjunction. Poznan Studies in the Philosophy of the Sciences and the Humanities, 21, 251–182. BARTELBORTH, T. (1996). Begründungsstrategien. Ein Weg durch die analytische Erkenntnistheorie. Berlin: Akademie-Verlag. BONJOUR, L. (1985). The Structure of Empirical Knowledge. Cambridge, MA and London: Harvard University Press. BOVENS, L. & HARTMANN, S. (2003). Bayesian Epistemology. New York and Oxford: Oxford University Press. CHART, D. (2001). Inference to the best explanation, Bayesianism, and feminist bank tellers. Online-paper, http://philsci-archive.pitt.edu/documents/disk0/00/ 00/03/22/. COSMIDES, L. & TOOBY, J. (1996). Are humans good intuitive statisticians after all? Rethinking some conclusions from the literature on judgment under uncertainty. Cognition, 58, 1–73. DOUVEN, I. & MEIJS, W. (2005). Measuring coherence. To appear in Synthese. FIEDLER, K. (1988). The dependence of the conjunction fallacy on subtle linguistic factors. Psychological Research, 50, 123–29. FITELSON, B. (2003). A probabilistic theory of coherence. Analysis, 63, 194–199. FODOR, J. & LEPORE, E. (1996). The red herring and the pet fish: why concepts still can’t be prototypes. Cognition, 58, 253–270. GIGERENZER, G. (1993). The bounded rationality of probabilistic mental models. In K. I. MANKTELOW & D. E. OVER (Eds) Rationality: psychological and philosophical perspectives (pp. 284–313). London: Routledge. GIGERENZER, G. (1994) Why the distinction between single-event probabilities and frequencies is important for psychology (and vice versa). In G. WRIGHT & P. AYTON (Eds) Subjective probability (pp. 129–161). New York: Wiley. GIGERENZER, G. (2001). Content-blind norms, no norms, or good norms? A reply to Vranas. Cognition, 81, 93–103.

11

This paper grew out of the project Explanatory Coherence (University of Leipzig), funded by the Deutsche Forschungsgemeinschaft. For valuable suggestions and criticism I would like to thank Nick Allott, Thomas Bartelborth, Jean François Bonnefon, David Chart, Rui Da Costa Neves, Christoph Dörge, Ralph Hertwig, Frank Stahnisch, Mark Textor, Paul Thagard and an anonymous referee. Da Costa Neves and Bonnefon are going to prepare some experiments where the coherentist account makes predictions different from the one provided by the similarity account.

14

GLYMOUR, C. (1998). Why I am not a Bayesian. In M. CURD & J. A. COVER (Eds) Philosophy of science: the central issues (pp. 584–606). New York and London: Norton. GOLDMAN, A. I. (1986). Epistemology and cognition. Cambridge, MA and London: Harvard University Press. 5th ed. 1995. GOULD, S. J. (1992). Bully for brontosaurus. Further reflections in natural history. London: Penguin Books. GRICE, H. P. (1975). Logic and conversation. In Studies in the way of words (pp. 22–40). Cambridge, MA and London 1989: Harvard University Press. HERTWIG, R. & GIGERENZER, G. (1999). The ‘conjunction fallacy’ revisited: How intelligent inferences look like reasoning errors. Journal of Behavioral Decision Making, 12, 275–305. KAHNEMAN, D. & TVERSKY, A. (1982). On the study of statistical intuitions. In D. KAHNEMAN, P. SLOVIC & A. TVERSKY (Eds) Judgment under uncertainty: heuristics and biases (pp. 493–508). Cambridge: Cambridge University Press. 13th ed. 1993. KEIL, F. C. & WILSON, R. A. (Eds) (2000). Explanation and cognition. Cambridge, MA and London: MIT Press. KUNDA, Z. & THAGARD, P. (1996). Forming impressions from stereotypes, traits, and behaviors: a parallel-constraint-satisfaction theory. Psychological Review, 103, 284–308. LEHRER, K. (1990). Theory of knowledge. London and New York: Routledge. MARGOLIS, H. (1987). Patterns, thinking, and cognition. Chicago: University of Chicago Press. MISES, R. VON (1957). Probability, statistics, and truth. London: Allen and Unwin. O’LAUGHLIN, C. & THAGARD, P. (2000). Autism and coherence: a computational model. Mind & Language, 15, 375–392. OLSSON, E. J. (2002). What is the problem of coherence and truth? The Journal of Philosophy, 94, 246–272. SCHOCH, D. (2000). Explanatory coherence. Synthese, 122, 291–311. SCHOCH, D. (2002). Computationale Abduktion. In M. SIEBEL (Ed.) Kommunikatives Verstehen (pp. 198–219). Leipzig: Leipziger Universitätsverlag. SHOGENJI, T. (1999). Is coherence truth conducive? Analysis, 59, 338–345. SOBER, E. (2002). Bayesianism – its scope and limits. To appear in R. SWINBURNE (Ed.) Bayes’s Theorem (pp. 21–38). Oxford: Oxford University Press. STANOVICH, K. E. & WEST, R. F. (2000). Individual differences in reasoning: implications for the rationality debate? Behavioral and Brain Sciences, 23, 645–665. THAGARD, P. (1989). Explanatory coherence. Behavioral and Brain Sciences, 12, 435–502. THAGARD, P. (1992). Conceptual revolutions. Princeton: Princeton University Press. THAGARD, P. (1997). Coherent and creative conceptual combinations. In T. B. WARD, S. M. Smith & J. VIAD (Eds) Creative thought: an investigation of conceptual structures and processes (pp. 129–141). Washington, D.C.: American Psychological Association.

15

THAGARD, P. (2000). Coherence in thought and action. Cambridge, MA and London: MIT Press. THAGARD, P. & KUNDA, Z. (1998). Making sense of people: coherence mechanisms. In S. J. READ & L. C. MILLER (Eds) Connectionist models of social reasoning and social behavior (pp. 3–26). Hillsdale, NJ: Erlbaum. TVERSKY, A. (1977). Features of similarity. Psychological Review, 84, 327–352. TVERSKY, A. & KAHNEMAN, D. (1974). Judgment under uncertainty: heuristics and biases. In D. KAHNEMAN, P. SLOVIC & A. TVERSKY (Eds) Judgment under uncertainty: heuristics and biases (pp. 3–20). Cambridge: Cambridge University Press. 13th ed. 1993. TVERSKY, A. & KAHNEMAN, D. (1982). Judgments of and by representativeness. In D. KAHNEMAN, P. SLOVIC & A. TVERSKY (Eds) Judgment under uncertainty: heuristics and biases (pp. 84–98). Cambridge: Cambridge University Press. 13th ed. 1993. TVERSKY, A. & KAHNEMAN, D. (1983). Extensional versus intuitive reasoning: the conjunction fallacy in probability judgment. Psychological Review, 90, 293–315. VRANAS, P. B. M. (2000). Gigerenzer’s normative critique of Kahneman and Tversky. Cognition, 76, 179–193. VRANAS, P. B. M. (2001). Single-case probabilities and content-neutral norms: a reply to Gigerenzer. Cognition, 81, 105–111.

16