Bounded Rationality and Directed Cognition

Bounded Rationality and Directed Cognition Xavier Gabaix MIT

David Laibson Harvard University and NBER∗

October 15, 2000 Preliminary and incomplete.

Abstract This paper proposes a psychologically sensible bounded rationality model which can be operationalized. The model formalizes the wellknown idea that cognitive resources are scarce and should be economized. The model makes quantitative predictions and can be realistically applied to decision problems that require an arbitrarily large number of cognitive state variables. We test the model using experimental data, and find that it significantly outperforms the rational actor model with zero cognition costs.

JEL classification: C70, C91, D80 Keywords: directed cognition, optimization, bounded rationality, heuristics, biases, experimental economics.

∗ Gabaix: Department of Economics, MIT, Cambridge, MA 02142, [email protected]. Laibson: Department of Economics, Harvard University, and NBER, Cambridge MA 02138, [email protected]. We are grateful for the advice of Philippe Aghion, Abhijit Banerjee, Gary Becker, Glenn Ellison, Drew Fudenberg, Lars Hansen, Oliver Hart, Robert Lucas, Paul Milgrom, Andrei Shleifer, Al Roth, Richard Thaler, Martin Weitzman, and Richard Zeckhauser. Our paper benefitted from the comments of seminar participants at the University of Chicago and Harvard University, and the University of Michigan. Numerous research assistants helped implement and analyze the experiment; we are particularly indebted to Guillermo Moloche and Stephen Weinberg. Laibson acknowledges financial support from the National Science Foundation (SBR-9510985), the National Institutes of Health (XXX), and the MacArthur Foundation.

1

1

Introduction

Cognitive resources should be allocated just like other scarce resources. Consider the problem of a chess player who must decide what move to make. This chess player faces a tradeoff between his desire to choose well, and his desire to economize on decision time. The player should cut corners when analyzing his options, ignoring certain lines of play, even if there is a chance that he will consequently overlook some critical element of the game. Similar issues arise in almost every decision task. Where should I go to college? Which job offer should I take? Which house should I buy? In all of these problems, the decision-maker should not insist on identifying the best possible selection. Rather, he should make an educated decision and hope for the best. Often the decision-maker must make educated guesses, say when time constraints preclude a complete analysis of the available options. All economists are familiar with the arguments of the previous paragraph. Since the work of Herbert Simon (1955), economists have recognized that cognition/attention is costly and the efficient allocation of cognition is a critical element of human decision-making. This paper moves beyond such observations by providing an operational theory of cognition, which makes sharp quantitative predictions that can be empirically tested. We provide such a test in this paper. Our model has three elements: a set of cognitive operations, a set of state variables that encode the current state of beliefs, and a value function. Cognitive operations include the different categories of analysis that we use to deepen our understanding of a given problem. For example, consider the selection of an undergraduate college. Thinking more deeply about a particular college is a cognitive operation that will refine one’s estimate of the value of attending that college. For the college choice problem, we could define a long list of cognitive operations (e.g., thinking about Boston University, thinking about Brown University, thinking about Cornell University, etc...). Cognitive state variables encode the current state of the decision-maker’s beliefs. For example, cognitive state variables will capture one’s expectations about the value of each of the colleges in one’s choice set. Naturally, one’s expectations about any given college will be revised if one devotes incremental thought to that college. Cognitive state variables will also capture one’s sense of how thoroughly each college has been considered. After dedicating hours of thought to Brown and no thought to other schools, the decision-maker will know that she has less to learn about Brown than Cornell. 2

The value function is used to evaluate the expected benefit of each cognitive operation. For example, should the decision-maker invest additional time thinking about Boston University or Cornell? Thinking incrementally about Cornell may substantially deepen one’s knowledge about Cornell, but this information will not be useful if the decision-maker has a strong preference for an urban lifestyle. There is little use thinking about an option that is unlikely to be chosen at the end of the day. The value function enables the decision maker to reason in such an option-theoretic way, directing thought in the most productive directions. Our value function approach uses the standard dynamic programming toolbox, but our approach does not assume that the decision-maker knows the optimal value function. Instead, we propose a modelling strategy which uses ‘proxy value functions’ which provide good approximations of the optimal value function. Just like optimal value functions, these proxy value functions encode knowledge states and evaluate the usefulness of potential cognitive operations. We argue that proxy value functions are methodologically superior to optimal value functions because proxy value functions generate similar qualitative and quantitative predictions but are easy to work with. By contrast, optimal value functions can not be calculated for realistically complex bounded rationality problems. To model complex decisions, we propose a three-step search-theoretic decision algorithm. First, the algorithm evaluates the expected benefit of various cognitive operations. The expected benefit is related to the predicted likelihood that a cognitive operation will reveal information that will change the decision-maker’s planned action. Second, the algorithm executes the cognitive operation with the greatest expected benefit. Third, the algorithm repeatedly cycles through these first two steps, stopping when the cognitive costs of analysis outweigh the expected benefits. We call this the directed cognition model. This paper shows how to practically operationalize this framework for complex decision problems with any number of state variables. To demonstrate the operation of our approach, we analyze a one-player decision problem. We then compare the results of this analysis to the decisions of experimental subjects. In this application, we effectivly use hundreds of continuous state variables. Nevertheless, such an application is easy to execute with our proxy value functions, though it is impossible to execute with optimal value functions. We believe that our model realizes several goals. First, our model is psychologically plausible because it is based on the actual decision algorithms that subjects claim to use. Second, the model is broadly applicable, because 3

it can be used to analyze a wide range of decision problems including sequential games. Third the model generates precise quantitative predictions and can be empirically tested; such tests are provided in this paper using data from an experiment on Harvard undergraduates. The data rejects the rational model (i.e., zero cognition cost) in favor of our bounded rationality alternative. Our approach combines numerous strands in the existing literature on bounded rationality. First, our conceptual approach extends the satisficing literature which was pioneered by Herbert Simon (1955) and subsequently formalized with models of deliberation costs by Conlisk (1996), Payne et al (1993) and others. Second, our approach extends the heuristics literature started by Daniel Kahneman and Amos Tversky (1974) and Richard Thaler (1991). Third, our approach is motivated by experimental work by Colin Camerer et al (1994) and theoretical work by Philippe Jehiel (1995) that argues that decision-makers may solve problems by looking forward, rather than using backward induction. Fourth, our emphasis on experimental evaluation is motivated by the work of Ido Erev and Alvin Roth (1998) and Colin Camerer and Teck-Hua Ho (1999) who develop and then econometrically test learning models. In section 2 we describe our theoretical approach and motivate our use of proxy value functions. In section 3 we introduce a class of one-player decision trees and informally describe the way in which our model applies to this class of problems. In section 4 we provide a detailed description of this application. In section 5 we describe other algorithms that could be used to approximate human analysis of these decision trees. In section 6 we describe an experiment in which we asked undergraduates to solve these decision trees. In section 7 we compare the predictions of the models to the experimental results. In section 8 we discuss other implications and extensions of our framework. In section 9 we conclude.

2 2.1

The directed cognition approach to bounded rationality A simple model of choice

Consumers routinely choose between multiple goods. We begin with the simplest version of such a decision problem. The consumer must choose either good B or good C. The value of the two goods is not immediately known, though careful thought will better reveal their respective values.

4

For example, two detailed job offers may be hard to immediately compare. It will be necessary to thoughtfully consider all of the features of the offers (e.g., health plan, pension benefits, vesting rules, maternity leave, sick days, etc....). After such consideration, the potential employee will be in a better position to compare the two options. Similar issues arise whenever a consumer chooses from a set of goods. Having all of the information is not enough. Thinking about the information is also necessary for a smart choice. We build a simple bounded rationality model that describes such thinking. To analyze this problem, it is helpful to note that a decision between uncertain goods B and C is equivalent mathematically to a decision between a sure thing with payoff zero and good A, where A0 s payoff is the difference between the payoffs of goods B and C. In this section, we work with the 0 vs. A problem because it simplifies notation. Let at represent the agent’s expectations about the payoff of good A at time t. We assume that thinking about A leads to a change in the expectations, dat = σ(t)dz.

(1)

Here dz represents an independent Brownian increment, and captures knowledge gained from introspection. Finally, we assume that thinking generates instantaneous shadow cost q. We can now formally state the agent’s problem. The agent picks a time, T , at which to stop thinking such that she maximizes E [max{aT , 0} − qT ]

(2)

Note that the stopping time need not be fixed in advance, but can be continuously updated as the consumer learns more about the likely value of A. Finally, it is important to emphasize that the state variable in this problem is a, the agent’s expectation about the state of the world. Throughout the analysis that follows, state variables will represent the state of knowledge of the decision-maker.

2.2

Rational bounded rationality: divine cognition.

Suppose our decision-maker can costlessly and instantaneously think about the solution to the problem posed in equations (1) and (2). This assumption is internally inconsistent, because this costless and instantaneous thinking only applies to the solution of the maximization problem (eqs. (1)-(2)) and 5

not to the underlying analysis of good A, where thinking is assumed to impose shadow cost q. Nevertheless, this internally inconsistent assumption is an interesting benchmark, and it is the only way forward for a mainstream economist whose only behavioral model is maximization. We characterize the results of this ‘rational’ bounded rationality approach in Appendix A assuming either that σ(t) is constant or that σ(t) is affine and decreasing. This analysis is standard, using continuous-time Bellman Equations and boundary conditions to solve for value functions and optimal policies. We refer to such rational bounded rationality as divine cognition, since the boundedly rational decision-maker (who pays cost q to think about A) miraculously and costlessly knows the optimal solution to the problem of how long to think about A. In other words, divine cognition assumes that thinking about A is costly but that thinking about thinking about A is costless.

2.3

The problem with divine cognition

The divine cognition model has two clear advantages. It makes sharp predictions and it is highly parsimonious. But, it has one overwhelming disadvantage. Realistically, it can only be applied to a small set of situations. Generic generalizations of our decision problem are not analytically tractable.1 For example, we can not let the number of goods go from two to N. We can not consider σ functions that are more general than the constant and affine cases. We can not consider the empirically sensible case in which σ depends not only on total time that has transpired, but also on the specific allocation of time among different cognitive activities: σ(tf , tg , th , ...). We can not let σ vary with the particular information that was recovered during those previous searches: σ(tf , θf , tg , θg , th , θh , ...). We can not consider the possibility that learning about good i also gives me some information about good j (i.e., covariances in information revelation). These are just a few of the realistic attributes of real world problems. At the moment economic research is hamstrung by its reliance on pure maximization. Economists can’t evaluate many real-world problems because we can’t 1

A strand of the operations research literature is devoted to finding particular classes of problems whose solutions have a simple characterization. One of them uses Gittins indices (Gittins 1979), basically a means to decouple n choice problems. The Gittins solution applies only to setups with stationary environments (such as multi-arm bandit problems), and in particular does not apply (as stressed by Gittins) to finite horizon problems. See Weitzman (1979) and Roberts and Weitzman (1980) for related techniques that solve special cases of the divine cognition problem. We consider problems that do not fall within this class of special cases.

6

solve for the optimal value functions that characterize those problems. We specialize in building toy models that deliver intuition. But, we might be more scientific if we could build practical models that make quantitatively meaningful predictions that can be tested. Moreover, the constraints that economists face are not just momentary difficulties that will pass once we have a new generation of computers or better knowledge of continuous-time maximization techniques. Many realworld problems have far too many state variables to be amenable to our standard methods. Thinking about N goods with a single constant and uniform variance implies N state variables. Thinking about N goods with historydependent variances, generates 2N state variables. Thinking about N goods with history dependent variances and covariances generates 12 N(N +3) state variables. Thinking about N goods with M attributes with history dependent variances and covariances generates 12 M N(MN + 3) state variables. And so on. Such multidimensional problems lie far, far beyond our current capabilities. At the moment, sophisticated numerical methods can solve generic problems with as many as ten continuous state variables. Imagine that you are trying to model the boundedly rational thought process of a college student who is thinking about a residential choice. The student picks among 13 dormitories and 4 off-campus apartments each of which has 8 attributes. For the last case discussed above, the divine cognition model requires that the researcher solve a dynamic programming problem with 9452 continuous state variables. This example is extreme. Perhaps it is not necessary to model the covariances. Perhaps half of the attributes can be ignored because they are sufficiently minor. Perhaps all of the off-campus housing options can be ignored because few students ultimately choose them. In the stripped down problem, the student chooses among 13 dormitories each of which has 4 attributes. With history dependent variances and no covariances, this problem still generates an unworkable number of state variables: 104. Some economists might argue that we clearly have our work cut out for us. To study bounded rationality (and the rich cognitive states that accompany such problems), we should get busy learning how to solve highdimension dynamic programming problems. We suggest that another approach may be more fruitful. We motivate this alternative approach with two observations. First, as we have already noted, the divine cognition approach is not practically feasible. Second, even if it were feasible, it seems likely that the mindboggling complexity of the resulting analysis should make us at least a little skeptical that it is a good representation of real-world decision-making. 7

2.4

Directed cognition

We suggest an alternative framework that can handle arbitrarily complex problems, but is nevertheless easy to implement. Moreover, this alternative is parsimonious and makes precise quantitative predictions. We call this approach directed cognition. Specifically, we propose the use of bounded rationality models in which decision-makers use proxy value functions that are not fully forward looking. We make this approach precise below and characterize the proxy value functions. This approach is particularly apt for bounded rationality applications, since modelling cognitive processes necessarily requires a large number of cognitive state variables. Memories and beliefs are all state variables in decisions about cognition. Reconsider the boundedly rational decision problem described above. The consumer picks a stopping time, T , to maximize E [max{aT , 0} − qT ] where dat = σ(t)dzt If the consumer had access to a proxy value function, say Vˆ (a, t), the consumer would continue thinking as long as she expected a net improvement in expected value. Formally, the agent will continue thinking if i h (3) max Et Vˆ (at+τ , t + τ ) − Vˆ (at , t) − qτ > 0. τ ∈R+

We could assume that the agent continues thinking for duration τ ∗ before calculating a new τ ∗0 . Or we could assume that she continuously updates her evaluation of equation (3), stopping thinking only when τ ∗ = 0. In this section, we make the latter assumption for tractability. We now turn to the selection of the proxy value function. In this paper we consider the simplest possible candidate (for the choice between a and 0) Vˆ (a, t) = max {a, 0} .

(4)

Equation (4) implies that the decision-maker holds partially myopic beliefs, since she fails to recognize option values beyond the current thinking horizon of τ ∗ .

8

Note that this framework can be easily generalized to handle problems of arbitrary complexity. For example, if the agent were thinking about N goods, then her proxy value function would be, 1 2 N Vˆ (a1t , a2t , ..., aN t , t) = max{at , at , ..., at }.

We now consider this general N−good problem and analyze the behavior that it predicts. Suppose that thinking for interval τ about good Ai transforms one’s expectation from ait to ait + η(i, τ , t), where η(i, τ , t) is a mean zero random j variable. Let a−i t = maxj6=i at . Then the expression h i N ˆ 1 2 (5) Et Vˆ (a1t , ..., ait + η(i, τ , t), ..., aN t , t + τ ) − V (at , at , ..., at , t) can be rewritten as, ¤ £ i −i E max(ait + η(i, τ , t), a−i t ) − max(at , at )

£ ¤ = E (ait + η(i, τ , t) − a−i )+ − (ait + η(i, τ , t) − a−i )+ ,

where x+ = max{0, x}. Let σ(i, τ , t) represent the standard deviation of η(i, τ , t), and define u such that η(i, τ , t) = σ(i, τ , t)u. We assume that such a random variable u exists and that it is symmetrical around 0. Then the cognition decision reduces to (i∗ , τ ∗ ) ≡ arg max w(ait − a−i t , σ(i, τ , t)) − qτ .

(6)

w (a, σ) = v(a, σ) − a+ = v (− |a| , σ) ,

(7)

i,τ

where

v(a, σ) = aΦ

³a´ σ

+ σφ

³a´ σ

.

(8)

For the case in which thinking generates normally distributed increments to knowledge, we can write, ¶ µ ³a´ |a| + σφ . (9) w(a, σ) = −|a|Φ − σ σ Our boundedly rational agents won’t necessarily use the “true” distribution of the information shocks u. Our agent may use simplified versions of it. 9

Hence, we also consider the symmetric distribution with atoms at u = ± 12 . For this special case, which we denote with the superscript 0, we get, ½ 1 if |a| ≤ σ 0 2 (a + σ) v (a, σ) = if |a| > σ a+ so that 1 w0 (a, σ) = (σ − |a|)+ . 2 The simple, linear form above makes w0 convenient to use.

2.5

Comparing directed cognition to divine cognition

We are now in a position to compare the predictions of the two models. Thinking for τ more minutes will reveal an amount of information of standard deviation: ¶1/2 µZ t+τ 2 σ (s)ds σ(t, τ ) = t

The strategy in the directed cognition model is to continue thinking iff there is a τ > 0 such that w (at , σ(t, τ )) − qτ > 0 This leads to the policy “continue iff |at | < at ”, where at is implicitly defined by sup w (at , σ(t, τ )) − qτ = 0

(10)

τ ≥0

For the standard case of the directed cognition model (i.e., normally distributed information innovations), at is defined by ¶ µ ¶ µ at at + σφ − qτ = 0. sup −at Φ − σ(t, τ ) σ(t, τ ) τ ≥0 For the case with atoms of information at ±1, at is defined by 1 sup (σ(t, τ ) − |at |)+ − qτ = 0 τ ≥0 2

10

So, at = sup σ(t, τ ) − 2qτ . τ ≥0

We now consider the special case in which σ 2 (t) = σ2 , so σ 2 (t, τ ) = τ σ2 . We call this the constant variance case. Appendix A also considers the case in which the variance falls linearly with time until it hits zero. For the constant variance case the value function is independent of t,  if a ≤ −a  0 q 2 1 σ2 a V (a) = if −a < a < a α( 2 − α) q + 2 + σ2 a  a if a ≤ a 2

The stopping rule is “stop if |a| ≥ a = α σq .” For the rational boundedly rational model α = 1/4. In the directed cognition model with normally distributed innovations α = .1012. For the directed cognition model with atoms at ±1, α = 1/8. Figure 1 plots the value function under the three different cases: α = 1/4 (solid line), α = .1012 (dot-dashed line), and α = 1/8 (dashed). 0.35 0.3 0.25 0.2 0.15 0.1 0.05 -0.3

-0.2

-0.1

0.1

0.2

0.3

Figure 1 — Value functions. We can also calculate the expected additional thinking time until stopping, ½ 0 if |a| ≥ a T (a) = E[T |a] = . (11) −a2 /σ2 + α2 σ2 /q 2 if |a| < a This derivation is reported in Appendix A. 11

Finally, we can also calculate the expected   0¡ ¢ 1 ασ2 + aq C(a) = E[a+ |a] = τ 2  a

2.6

consumption utility, if if if

a ≤ −a −a < a < a . a≤a

Quantitative relationship between the value functions of divine and directed cognition

We can formalize the normative gap between divine and directed cognition by bounding the distance between their respective value functions. We begin with a general result before turning to a useful corrollary. Proposition 1 Let V (x) represent the value function under the optimal policy (divine cognition). Suppose the variance function, σ (·) = (σ (t))t≥0 , is deterministic and weakly decreasing.2 Then, ³ p ´ S 0 x, 2/πσ (·) ≤ S 1 (x, σ (·)) ≤ V 1 (x, σ (·)) ≤ V (x, σ (·)) ≤ S 2 (x, σ (·)) ≤ ... ... ≤ S 0 (x, σ (·)) + min(|x|, S 0 (0, σ (·)))/2

where S i is the subjective value function, Ã µZ S i (x, σ (·)) = sup wi x, t∈R+

0

t

(12)

¶1/2 ! σ2 (s)ds − qt,

w0 and w1 are as before in the text, ³p ´ w2 (x, s) = x2 + s2 − |x| /2,

and V 1 is the objective value function induced by directed cognition with normal innovations. ¤ £ R +∞ Remark 2 One can always write wi (x, s) = E (x + su)+ −x+ = −∞ (− |x| + su)+ f i (u)du for some density f i . Recall, f 0 (u) = (δ(u + 1) + δ(u − 1)) /2 (i.e. u = ±1 with equal probability) 1 2 f 1 (u) = √ e−u /2 (i.e. u is normally distributed) 2π 2 The nonincreasing σ implies that “high payoff” cognitive operations are done first. Hence, thinking displays decreasing returns over time.

12

For the new case, i = 2, we have the density, f 2 (u) =

1 , 2(1 + u2 )3/2

which has finite mean but infinite variance. Proposition 1 is proved in Appendix B. The Proposition is stated in terms of “subjective value functions”, which bound the objective value functions. Such subjective value functions represent the expected utility outcomes under the mistaken assumption that the current cognitive operation (with incremental stopping time τ ) is the final cognitive operation. The subjective value functions are therefore based on static beliefs, which presume that there will be no future cognitive operations. Such subjective value functions are relatively easy to calculate because they assume deterministic stopping times. By contrast, optimal value functions incorporate optimally chosen stopping times that depend on future realizations of random variables. Proposition 1 implies that the objective value functions for divine cognition (V ) and directed cognition (V 1 ) are bounded below, by S 1 and above by S 2 . S 1 represents the subjective value function associated with normal innovations. S 2 represents the subjective value function associated with the f 2 (u) density. It is useful to evaluate the bounds at the point x = 0, as this is where the option value V (x) − x+ is greatest3 . V (0, σ (·)) gives the order of ³R ´1/2 t 2 magnitude of the option value. With σ(0, t) := σ (s)ds , the S 0 0 bounds of Proposition 1 imply, ³ p ´ S 0 0, 2/πσ (·) ≤ V 1 (0, σ (·)) ≤ V (0, σ (·)) ≤ S 0 (0, σ (·)) , (13) or,

1 1 sup √ σ(0, t) − qt ≤ V 1 (0, σ (·)) ≤ V (0, σ (·)) ≤ sup σ(0, t) − qt. 2π t∈R+ t∈R+ 2 The two objective value functions, V 1 and V , are bounded below by the subjective p value function associated with the w0 -strategy with σ multiplied by factor 2/π ' .80. V 1 and V are bounded above by the subjective 3

This can easily be proved. [...]

13

value function associated with the w0 -strategy. So the difference between between subjective value functions V 1 and V is bounded by the difference p generated by standard deviations 2/πσ ≈ .8σ and σ. In this sense, the two value functions are relatively close. Proposition 1 has a corollary that reinforces these quantitative interpretations. Corollary 3 ³ p ´ ´ ³ p p V x, 2/πσ (·) − min(|x|, S 0 (0, 2/πσ (·)))/2 ≤ S 0 x, 2/πσ (·)

(14)

Evaluating the corollary at x = 0 implies, ´ ³ p V 0, 2/πσ (·) ≤ V 1 (0, σ (·)) ≤ V (0, σ (·))

(15)

between So the difference between V 1 and V is bounded by the difference p objective value functions generated p by standard deviations 2/πσ ≈ .8σ and σ. Note that the factor 2/π is the same whether we use the S 0 bounds (above) or the V bounds. Comparative static analysis also implies that our directed cognition model is “close” to the divine cognition model. The subjective and objective value functions are all increasing in σ(·) and decreasing in q. They also share other properties. All of the value functions tend to x+ as |x| → ∞; rise with x; have slope 1/2 at 0; and are symmetric around 0 (e.g., V (x) − x+ is even). This section has described our directed cognition model. We were able to compare it to the divine cognition model for a special case in which the decision-maker chooses between A and 0.4 For this extremely simple example, the divine cognition approach is well-defined and solvable in practice. In most problems of interest, the divine cognition model will not be of any use, since economists can’t solve dynamic programming problems with a sufficiently large number of state variables. Below, we turn to an example of such a problem. Specifically, this problem involves the analysis of 10×10 (or 10×5) decision trees. Compared to real-world decision problems, such trees are not particularly complex. However, even simple decision trees like these generate hundreds of cognitive state variables. Divine cognition is useless here. But directed cognition, with its proxy value function, can be applied without difficulty. We turn now to this application. 4 All of the bounding results in this subsection assume that the decision-maker selects among two options (n = 2). We can show that the lower bounds that we derive generalize to the case n ≥ 3. Obtaining analogous upper bounds for the general case remains an open question.

14

3

Decision trees and an informal algorithm

We use the directed cognition model to predict how experimental subjects will make choices in decision trees. We analyze decision trees, since trees can be used to represent a wide class of problems. We describe our decision trees at the beginning of this section, and then apply the model to the analysis of these trees. At the end of the paper, we discuss how the model can be generalized to analyze sequential games of complete information. Consider the decision tree in Figure 2, which is one of twelve randomly generated trees that we asked experimental subjects to analyze. Each starting box in the left-hand column leads probabilistically to boxes in the second column. Branches connect boxes and each branch is associated with a given probability. For example, the first box in the last row contains a 5. From this box there are two branches with respective probabilities .65 and .35. The numbers inside the boxes represent flow payoffs. Starting from the last row, there exist 7,776 outcome paths. For example, the outcome path that follows the highest probability branch at each node is h5, 4, 4, 3, 1, −2, 5, 5, −3, 4i . Integrating with appropriate probability weights over all 7,776 paths, the expected payoff of starting from row five is 4.12. We asked undergraduate subjects to choose one of the boxes in the first column of Figure 2. We told the subjects that they would be paid the expected value associated with whatever box they chose. What box would you choose? < Insert Table 1 here > To clarify exposition, we will refer to the final choice of a starting row as the “consumption choice.” This observable choice represents a subject’s response in the experiment. The subject also makes many “cognition choices” that guide her internal thought processes. Our goal is to jointly model consumption and cognition choices, and then test the model with the observed consumption choices.

3.1

An informal application of the directed cognition model.

The directed cognition model can be used to analyze trees like those in Figure 2. We informally describe such an application of the model in this

15

subsection, before providing a mathematically precise description in the next subsection. Our application is built on a single basic cognitive operation: extending a partially examined path one column deeper into the tree. Such extensions enable the decision-maker to improve her forecast of the expected value of any given starting row. For example, consider again the last row of Figure 2. Imagine that a decision-maker is trying to approximate the expected value of that starting row, and that he has calculated only the expected value of the two truncated paths leading from column 1 to column 2. Hence, the estimated value would be: a10 = 5 + (.65)(4) + (.35)(−5). Following the highest probability path (i.e., the upper path with probability .85), the decision maker could look ahead one additional column and come up with a revised estimated value: a100 = a10 + .65[.15(−3) + .5(4) + .2(−5) + .15(−4)]. Such path extensions are the basic cognitive operation that we consider. Note that path extensions replace the reduced form cognitive operators in the model discussed in section 2. In that previous analysis we did not specify the nature of the cognitive operators, assuming only that the operators had shadow cost q and revealed information about the available choices. In the current application, the cognitive operators reveal structural information about the decision problem. Path extensions refine the decision-maker’s expectations about the value of a starting row. We will also allow concatenated path extensions. Specifically, if the decision-maker executes a two-step path extension from a particular node, he twice follows the branch with the highest probability. Hence, starting from the last row of Figure 2, his updated estimate would be, a1000 = a100 + (.65)(.5)[.3(0) + .05(3) + .65(3)]. Such concatenated path extensions generalize naturally. A τ -step path extension looks τ columns more deeply into the tree. At each intermediate node the extension follows the branch with highest probability. A given path from a particular starting node can be extended in many different ways, since a path will generally have many subpaths. Let f represent one such path extension; f embeds information about the starting row, the subpath which is to be extended, and the number of steps in that extension. Let σf represent the standard deviation of the updated estimate resulting from application of f. Hence, σ2f = E(f (ai(f ) ) − ai(f ) )2 , 16

where i(f ) represents the starting row which begins the path that f extends, ai(f ) represents the initial estimate of the value of row i(f), and f(ai(f ) ) represents the updated value resulting from the path extension. We now apply the model of section 2. The decision-maker selects the path extension with the greatest expected benefit. Hence, the decisionmaker picks the cognitive operation, f, which is maximally useful. f ∗ ≡ arg max w(ai(f ) − a−i(f ) , σf ) − qf . f

(16)

Note that qf is the cognitive cost associated with cognitive operator f. Hence, if f is a τ -step concatenated operation, then qf = τ q. The decisionmaker keeps identifying and executing such cognitive operations until he concludes his search. Specifically, we adopt the following three-step procedure. Step 1 Evaluate the expected gain of the available cognitive operations. The expected gain of a cognitive operation is the option value of changing one’s consumption choices as a result of insights generated by the cognitive operation. The expected gain of the best cognitive operation is given by: G := max w(ai(f ) − a−i(f ) , σf ) − qf f

(17)

For the decision tree application, the cognitive operations are path extensions. Step 2 Choose and execute the best cognitive operation, i.e f ∗ s.t. f ∗ = arg max w(ai(f ) − a−i(f ) , σf ) − qf f

Apply f ∗ , yielding updated information about the problem. For the decision tree application, the path extension with the highest expected gain is executed. Step 3 Keep cycling back to step 1, as long as the predicted marginal cost of another cycle of cognitive operations is less than the predicted marginal benefit, i.e. as long as G ≥ F q. When cognition costs exceed cognition benefits, stop thinking and choose the consumption option which is currently best given what is already known about the problem: choose the starting row with the maximal expected value. Here F q is the fixed cost of solving the problem in equation 17. 17

Before turning to the formal presentation of the application, it is helpful to discuss several observations about the model. First, Step 1 is computationally complex. We view Step 1 as an approximation of the less technical intuitive insights that decision-makers use to identify cognitive operations that are likely to be useful. Even if Step 1 is intuition-based, these intuitions generate some cost. We assume that each pass through Step 1 has shadow cost F q. Second, if decision-makers use their intuition to identify cognitive operations with high expected gains, they may sometimes make mistakes and fail to pick the operation with the highest expected gain. Hence, it may be appropriate to use a probabilistic logit model to predict which cognitive operation the decision-maker will execute in Step 2. However, we overlook this issue in the current paper and assume that the decision-maker always selects the cognitive operation with the highest expected gain. Third, in the decision tree application we use rational expectations to calculate σ, but many other approaches are sensible. For example, σ2 could be estimated using past observations of (f (ai(f ) ) − ai(f ) )2 . Fourth, we assume that Step 3 is backward looking. Specifically, the decision-maker uses past information about the expected gains of cognitive operators to determine whether it is worth paying cost F q to generate a new set of intuitions. No doubt this treatment could be improved in several ways. But, consistent with our modeling strategy, we chose the simplest approach that was broadly in accord the psychology and the “cognitive economics” of the task.

4

A formal statement of the model

This section formally describes our applied model. Because of the notational complexity, readers who are satisfied with the informal description of the directed cognition model, may wish to jump ahead to section 5.

4.1

Notation for Decision Trees

The following notation is used to formally define the algorithm. A node n is a box in the game, and n0 represents a node in the next column. Let p(n0 |n) represent the probability of the branch leading from n to n0 .5 The flow payoff of node n is un . For the games that we study the un have mean 0 and standard deviation σu . 5

If no such branch exists p(n0 |n) = 0.

18

A path θ is a set of connected nodes (n1 , ..., nm ), with node ni in column the probability (along i. The starting node is b(θ) = n1 . Let π(θ) representQ 6 the path) of the last node of the path, i.e. π(θ) = m−1 i=1 p(ni+1 |ni ). Let l(θ) = nm represent the last node in path θ. C is the total number of columns in a given decision tree and col (θ) = m is the column of the last box in θ. Call θ[1,j] = (n1 , ..., nj ) the truncation of θ to its j first nodes, and θ −1 = (n1 , ..., nm−1 ) its truncation by the last node. If θ = (n1 , ..., nm ) is a path, e(θ) is the path constructed by adding to the last node of θ the node with the highest probability branch: e(θ) = (n1 , ..., nm+1 ), where7 p(nm+1 |nm ) = sup{p(n|nm )|n branching from nm }. If Θ is a set of examined paths, e(Θ) is the set of its extensions: i.e. a path θ = (n1 , ..., nm+1 ) ∈ e(Θ) iff (n1 , ..., nm ) = θ0[1,m] for some θ 0 ∈ Θ (i.e. θ is the extension of a segment of a path already in Θ), and (nm , nm+1 ) has the highest possible probability among branches starting from nm that are not already traced in Θ: p(nm+1 |nm ) = sup{p(n|nm )|there is no θ 0 ∈ Θ, θ0[1,m+1] = (n1 , ..., nm , n)} Note that extensions can branch from the “middle” of a pre-existing path. A time index t represents the accumulated number of incremental extensions which have been implemented by the algorithm. Each tick in t represents an additional one node extension of a path. θ(t) is the path which was extended/examined during period t. Θ(t) is the current set of all paths that have been extended/examined (including θ(t)). At time t, E(b, t), is the estimate of the value of starting node b. At time t, the “best alternative” to starting node b is starting node a(b, t). If b is the best node, then a(b, t) is the second-best node. If b is not the best node, then a(b, t) is the best node.

4.2

The directed cognition algorithm

The algorithm uses two cost variables. Recall that Step 1 of the algorithm generates intuitions about the relative merits of different cognitive operations. Building these intuitions costs qF. Step 2 executes those cognitive operations. The cost of implementing a single cognitive operation is q, 6

If m = 1, π(θ) = 1. In case of ties, take the top branch. Note that e(θ) is not defined if θ already has reached the last column, and cannot be extended. 7

19

and our algorithm allows the decision-maker to implement strings of such operations. Hence, Step 2 costs will always be multiples of q : τ q where τ ∈ {0, 1, . . . , C − 2} . To start the algorithm, set t = 0. Let Θ(0) contain the set of paths made of starting boxes. X p(n0 |n)un0 E(b, 0) = un + π(θ(t + 1)) n0 branching from n

where n represents node (b, 1). Hence, E(b, 0) represents the flow payoff of starting box b plus the probability weighted payoffs of the boxes that branch from starting box b. Now iterate through three steps, beginning with Step 1. 4.2.1

Step 1

For θ ∈ e(Θ(t)), let [θ∗ , τ ∗ ] = arg

max

θ∈e(Θ(t)),0≤τ ≤C−col(θ)

w (E (b(θ), t) − E(a(b(θ), t), t), π(θ)σ(τ )) − qτ , (18)

The interpretation of the maximum is how much gain we can expect to get by exploring path θ∗ , τ ∗ steps ahead. Recall, b(θ) is the starting box of θ, and a(b, t) is the best alternative to b.8 In the informal description of the algorithm, we assumed that the new information has standard deviation σ. In the equation above, the standard deviation is π(θ)σ(τ ). The function σ(τ ) is the rational expectations standard deviation generated by a concatenated τ -column extension of any path, given that the probability of reaching the beginning of the extended portion of the path is unity. The factor π(θ) adjusts for the probability of actually reaching the extended portion of the path. Recall that π(θ) is the probability of reaching the last node of path Qm−1 θ, i.e. π(θ) = i=1 p(ni+1 |ni ). The σ(τ ) function is derived below, after Step 3. Call G the value of the maximum in (18). Go to step 2. 8

We adopt the convention that max S = −∞ if S is empty.

20

4.2.2

Step 2

In this step we execute the (concatenated) cognitive operation with the highest expected gain. We “explore path θ∗ at depth τ ∗ ”. In detail: i. If τ ∗ = 0, go directly to Step 3: ii. Let θ(t + 1) = θ ∗ . In the set of examined paths, add θ(t + 1), and, if it is the extension of a path θ0 at its terminal node, remove that path: Θ(t + 1) : = Θ(t) ∪ {θ(t + 1)}\{θ−1 (t + 1)} if θ −1 (t + 1) ∈ Θ(t) : = Θ(t) ∪ {θ(t + 1)} otherwise iii. Update the value of the starting box b(θ(t + 1)). If b 6= b(θ(t + 1)), then set E(b, t + 1) = E(b, t). If b = b(θ(t + 1)), then set E(b, t + 1) = X p(n0 |n)un0 (19) E(b, t) + π(θ(t + 1)) n0 branching from n

where n = l(θ(t + 1)). iv. Set τ ∗ = τ ∗ − 1. Set t = t + 1. v. If τ ∗ ≥ 1, set θ(t + 1) = e(θ(t)) and set θ∗ = e(θ(t)). Then, go to substep ii above. vi. If τ ∗ = 0, go to Step 3:

4.2.3

Step 3

i. If G ≥ qF , go to Step 1. ii. If G < qF , stop the search and select as choice node arg maxb∈B E(b, t). This ends the algorithm, but we have one remaining loose end. We still need to derive the standard deviation function σ(τ ).

21

4.3

The value of σ(τ )

For a generic draw of probabilities at a node n0 , we have r(n0 ) branches and probabilities that we can order: p1 (n0 ) ≥ ... ≥ pr(n0 ) (n0 ). They are random variables. Recall that σu represents the standard deviation of the payoff in a box. Exploring one column ahead will give an expected square value: σ2 (1) = σ2u E[p21 + ... + p2r ]

(20)

Exploring two steps ahead, following the highest probability path between columns, gives: σ2 (2) = E[(p21 + ... + p2r )σ2u + p21 · σ2 (1)] = (1 + E[p21 ])σ 2 (1) In general: σ2 (τ ) = (1 + E[p21 ] + ... + E[p21 ]τ −1 )σ2 (1) 1−E[p2 ]τ = 1−E[p12 ] σ2 (1) 1

Consistently, we set σ2 (0) = 0.

5

Other decision algorithms

In the next section, we empirically compare the directed cognition model to the rational actor model and three other choice models, which we call the column cutoff model, the discounting model, and the FTL model. The rational actor model with zero cognition cost is the standard assumption in almost all economic models. This corresponds to the limit case q = 0. For presentational simplicity, we refer to this benchmark as the rational actor model, but we emphasize that this standard model assumes both rationality and zero cognition costs. The column cutoff model assumes that decision-makers calculate perfectly rationally but pay attention to only the first Q columns of the Ccolumn tree, completely ignoring the remaining C − Q columns. The discounting model assumes that decision-makers follow all paths, but exponentially discount payoffs according to the column in which those payoffs arise. Both of these algorithms are designed to capture the idea that decisionmakers ignore information that is relatively less likely to be useful. For 22

example, the discounting model can be interpreted as a model in which the decision-maker has a reduced probability of seeing payoffs in “later” columns. We also consider a bounded rationality model that we called “follow the leaders” (FTL) in previous work (Gabaix and Laibson, 2000). This algorithm features a cutoff probability p. From any starting box b follow all branches that have a probability greater than or equal to p. Continue in this way, moving from left to right across the columns in the decision tree. If a branch has a probability less than p, consider the box to which the branch leads but do not advance beyond that box. Weight all boxes that you consider with their associated cumulative probabilities, and calculate the weighted sum of the boxes. This heuristic describes the FTL value of a starting box b. In our econometric analysis, we estimate p = .30; perfect rationality with zero cognition cost corresponds to p = 0. More precisely9 , we take the average payoffs over trajectories starting from box b, with the transition probabilities indicated on the branches, but with the additional rule that the trajectory stops if the last transition probability has been < p. Formally this gives:

E F T L(p) (b) =

C X

X

i=1 ni in column i

uni

X

(nj )j=2,...,i−1 in column j

p (ni |ni−1 ) ...p(n2 |b)1p(ni−1 |ni−2 )≥p ...1p(n2 |b)≥p (21)

Recall that C is the number of columns in the decision-tree. For example, consider the following FTL (p = .60) calculation for the second row in Figure 2. FTL follows all paths that originate in the last row, stopping only when the last branch of a path has a probability less than p. E F T L(p) (b2 ) = 0 + 2 + .15 · (−2) + .01 · (−4) + .14 · (−4) + .7(5 + .5 · 1 + .5 · 1) = 5.3. This FTL value can be compared to the rational value of 5.78. We consider FTL because it corresponds to some of our intuitions about how decision-makers “see” decision trees that do not contain large outlier 9

See Gabaix and Laibson (2000) for an alternative but equivalent description of FTL.

23

payoffs. Subjects follow paths that have high probability branches at every node. In the process of determining which branches to follow, subjects see low probability branches that branch off of high probability paths. Subjects encode such low probability branches, but subjects do not follow them deeper into the tree. All three of the quasi-rational algorithms described in this section, (column cutoff, column discounting, and FTL) are likely to do a poor job when they are applied to a wider class of decision problems. None of these algorithms are based on a structural microfoundation (like cognition costs). The parameters in column cutoff, column discounting, and FTL are unlikely to generalize to other games. In addition, we do not believe that column cutoff, column discounting, and FTL are as psychologically plausible as the directed cognition model. Only the directed cognition model features the property of “thinking about when to stop thinking,” an important component of any decision task. For all of these reasons we do not propose column cutoff, column discounting, and FTL as general models of bounded rationality. However, they are interesting benchmarks with which to compare the directed cognition model. Finally, we need to close all of our models by providing a theory that translates payoff evaluations into choices. Since subjects rarely agree on the “best choice,” it would be foolish to assume that subjects all choose the starting row with the highest evaluation (whether that evaluation is rational or quasi-rational). Instead, we assume that actions with the highest evaluations will be chosen with the greatest frequency. Specifically, assume that there are B possible choices with evaluations {E1 , E2 , . . . , EB } given by one of the candidate models. Then the probability of picking box b is given by: exp(βEb ) . Pb = PB b0 =1 exp(βEb0 )

(22)

We estimate the nuisance parameter β in our econometric analysis.

6

Experimental Design

We tested the model using experimental data from 259 subjects recruited from the general Harvard undergraduate population. Subjects were guaranteed a minimum payment of $7 and could receive more if they performed well. They received an average payment of $20.08. 24

6.1

Decision Trees

The experiment was based around twelve randomly generated trees, one of which is reproduced in Figure 2. We chose large trees, because we wanted undergraduates to use heuristics to “solve” these problems. Half of the trees have 10 rows and 5 columns of payoff boxes; the other half of the trees are 10 × 10. The branching structure which exits from each box was chosen randomly and is iid across boxes within each tree. The distributions that we used and the twelve resulting trees themselves are presented in an accompanying working paper. On average the twelve trees have approximately three exit branches per box (excluding the boxes in the last column of each tree which have no exit branches). The payoffs in the trees were also chosen randomly and are iid across boxes.10 The distribution is given below: ½ uniform[-5,5] probability .9 individual box payoff = uniform[-5,5]+uniform[-10,10] probability .1 Payoffs were rounded to whole numbers and these rounded numbers were used in the payoff boxes. Experimental instructions (see appendix 1) described the concept of expected value to the subjects. Subjects were told to choose one starting row from each of the twelve trees. Subjects were told that one of the trees would be randomly selected and that they would be paid the true expected value for the starting box that they chose in that tree. Subjects were given a maximal time of 40 minutes to read the experimental instructions and make all twelve of their choices. We analyzed data only from the 251 subjects who made a choice in all of the twelve decision trees.

6.2

Debriefing form and expected value calculations

After passing in their completed tree forms, subjects filled out a debriefing form which asked them to “describe how [they] approached the problem of finding the best starting boxes in the 12 games.” Subjects were also asked for background information (e.g., major and math/statistics/economics background). Subjects were then asked to make 14 expected value calculations in simple trees (three 2×2 trees , three 2×3 trees, and one 2×4 tree). Subjects were told to estimate the expected value of each starting row in these 10 One caveat applies. The payoffs in the boxes in the first three columns were drawn from the uniform distribution with support [−5, 5].

25

trees. Figure 3 provides an example of one such tree. Subjects were told that they would be given a fixed reward (≈ $1.25) for every expected value that they calculated correctly (i.e., within 10% of the exact answer). We gave subjects these expected value questions because we wanted to identify any subjects who did not understand the concept of expected value. We exclude any subject from our econometric analysis who did not answer correctly at least seven of the fourteen expected value questions. Out of the 251 subjects who provided an answer for all twelve decision trees, 230, or 92%, answered correctly at least seven of the fourteen expected value questions. Out of this subpopulation, the median score was 12 out of 14 correct; 190 out of 230 answered at least one of the two expected values in Figure 3 correctly.

7

Statistical analysis

We restrict analysis to the 230 subjects who met our inclusion criteria, but the results are almost identical when we include all 251 subjects who completed the decision trees. Had subjects chosen randomly, they would have chosen starting boxes with average payoff $1.30.11 Had the subjects chosen the starting boxes with the highest payoff, the average chosen payoff would have been $9.74. In fact, subjects chose starting boxes with average payoff $6.72. We are interested in comparing five different classes of models: rationality, directed cognition, column cutoff, discounting, and FTL. All of these models have a nuisance parameter, β, which captures the tendency to pick starting boxes with the highest evaluations (see equation 22). For each model, we estimate a different β parameter. Directed cognition, column cutoff, discounting, and FTL also require an additional parameter which we estimate in our econometric analysis.

7.1

Formal model testing

There are G = 12 games, B = 10 starting boxes per game, and I = 230 subjects who satisfy our inclusion criterion (they chose a row in all twelve decision trees and answered correctly at least half of the expected value 11 In theory this would have been zero since our games our randomly drawn from a distribution with zero mean. Our realized value is well within the two-standard error bands.

26

questions on the post-experiment debriefing form). Call X i ∈ RBG the i = 1 if subject i played box b at game g, choice vector such that Xbg and 0 otherwise. Call P = E[X i ] the true distribution of choices, Pb := P I 1 i i i0 0 I ( i=1 X ) the empirical distribution of choices, Σ = E[X X ] − P P the i (BG ×P BG) variance-covariance matrix of the choices X , and its estimator b := ( I X i X i0 /I − PbPb0 )/(1 − 1/I). Σ i=1 m to Say that a model m (Rationality, BR, FTL etc.) gives the value Ebg box b in game g. Recall that we parameterize the link between this value and the probability that a player will choose b in game g by exp(βEbg ) m (β) = (P (E m , β))bg = PB Pbg b0 =1 exp(βEb0 g )

for a given β. To determine β, we minimize the distance (in RBG ) between the empirical distribution Pb and the distribution predicted by the model, given β. Specifically, we take kP k = ³

´

Ã X X g

b

2 Pbg

!1/2

for P ∈ RBG

= kP m (β) − Pbk L Pb , β ³ ´ ³ ´ = min Lm Pb, β Lm Pb m

β∈R+

³ ´ ³ ´ m m m m b b b and set β := arg minβ∈R+ L P , β . We call L = L Pb the empirical distance between the model and the data. Call Lm and β m the analogues for the true distribution of choices, i.e. Lm = Minβ kP m (β) − P k. 12 b m . ε := Pb − P has a ³distribution L such We ´ √ now derive statistics on 0 that Iε ∼ N(0, Σ) for large I s. Because ε is small, Lm Pb = Lm (P ) + ∂P Lm (P ) · ε + O(1/I), and as, by the envelope theorem, ∂P Lm (P ) = ∆m , where:

we get:

m (β m ) Pbbg − Pbg = ∆m µ bg ´2 ¶1/2 P ³b m (β m ) 0 P − P bg b0 b0 g

12

(23)

We use the fact, exploited by Vuong (1989) and Hansen-Heaton-Luttmer (1995), that bm . b m to get the distribution of L we do not have to explicitly derive the distribution of β

27

b m = Lm + ∆m · ε + O (1/I) L

(24)

where · is the scalar product in RBG . Suppose that ∆m 6= 0, i.e. the model m is misspecified (this will √ bemthe 13 b − case for all the models considered ). Equation (24) implies that I(L m m0 m L ) ∼ N(0, ∆ Σ∆ ), which gives us a consistent estimator of the standard bm: deviation on L ³ ´1/2 0 m Σ m /I d b∆ d . (25) σ bLm = ∆ We will be interested in the difference Lm − Ln of the performance of two theories m and n. Equation (24) implies that b n = Lm − Ln + (∆m − ∆n ) · ε + O (1/I) bm − L L

bm − L bm : so that we get a consistent estimator of the standard deviation of L ³ ´1/2 m−∆ m−∆ d cn )0 Σ( b ∆ d cn )/I . σ bLm −Ln = (∆

0

(26)

Those formulae carry over to models that have been optimized. Our models have a free parameter: in directed cognition, the parameter is q, the cost (per box) of cognition; in column-discounting, it is δ, the discount rate; in column cutoff, it is Q, the final column analyzed; in FTL it is p, the cutoff probability. Say that in general, when we consider a class of models M, it can have a real (or vector) parameter x: M = {M (x) , x ∈ XM }. For instance, if M = F T L, and x = .25, then M(x) represents FTL with probability cutoff p = .25, and the space of parameter x is XF T L = [0, 1]. The space can be discrete: for M =Column Cutoff, XM = {1, 2, ..., 10}. For and ¢hence a each x, M(x) gives a vector of values E M (x) for the boxes, ¡ distance to the empirical distribution, for a given β : ||P E M (x), β − Pb||. For model M, we can optimize over parameter x to find the best possible fit by calculating: ¡ ¢ b M := min min ||P E M (x), β − Pb|| (27) L x∈XM β∈R+

bM the argmin values in (25). As this is again an Also call x bM and β application of the delta method, the standard errors calculated in (25) and (26) are still valid, even though we optimize on parameter x.

b m and σ A test that Lm 6= 0 can be readily performed using the estimates L b Lm given in the tables. 13

28

7.2

Empirical results

For the directed cognition model we estimate q = .012, implying that the shadow price of extending a path one-column deeper into the tree is a little above 1 cent. Given the amount of time that subjects had, this q value implies a time cost of 2.25 seconds per one-column extension.14 We view both of these numbers as reasonable. We had chosen 1 cent per box and 1 second per box as our preferred values before we estimated these values. For the column-cutoff model, we estimate Q = 8, implying that the decision-maker evaluates only the first 8 columns of each tree (and hence all of the columns in 10 × 5 trees). For the column-discounting model we estimate δ = .91, implying that the probability that the decision-maker sees a payoff in column c is (.91)c . Finally, for FTL we estimate p = .30, implying that the decision-maker follows branches until she encounters a branch probability which is less than .30. We had exogenously imposed a p value of .25 in our previous work (Gabaix and Laibson, 2000). Table 2 reports test statistics for each of our models. The rows of the table represent the different models, m ∈ {rational, directed cognition, column-cutoff, column-discounting, FTL}. Column 1 reports distance metbm , which measures the distance between the estimated model and the ric L b m0 , and the assobm − L empirical data. Columns 2-5 report the differences L 0 bm > 0, model m performs bm − L ciated asymptotic standard errors. When L relatively poorly, since the distance of model m from the empirical data b m ) is greater than the distance of the model m0 from the empirical data (L b m0 ). (L

As column 2 shows, all of the decision cost models beat rationality. The bdirected cognition = brational − L differences between the distance measures are: L rational directed cognition rational column cutoff = 0.221, b b b b −L = 0.399, L −L 0.190, L b column discounting = 0.190, and L b rational − L bFTL = 0.488. All of brational − L L these differences have associated t-statistics of at least 5. Among the decision cost models the two winners are directed cognition and FTL. Both of these models dominate column cutoff and column discounting. The difference between directed cognition and FTL is not statistically significant. The general success of the directed cognition model is somewhat surprising, since column cutoff, column discounting, and FTL were designed explicitly for this game against nature, whereas directed cognition is a far more generally applicable algorithm. 14

Explain calculation.

29

8 8.1

Applications of the directed cognition model N-player sequential games of perfect information

The model has a general structure – mental operators, selected on the basis of their expected net payoffs, evaluated via a proxy value function – that can be used to predict behavior in a wide range of settings. Here we sketch such an application to N−player sequential games of perfect information. In this game theory application, players choose among two types of cognitive operators: “forward sweeps” and “backward sweeps.” Forward sweeps simulate game paths from left to right, following the arrow of time, and following branches that are predicted to convey the most useful information about the characteristics of the game. Backward sweeps execute constrained backward induction operations, where the backward induction is started at some selected node in the middle of the game, and may follow a subset of paths backward from that node. The quantity of anticipated information, σ f , associated with these operations is approximated and used with the w (x, σ) function, to derive expected gains of those cognitive operations. The application uses the same three-step procedure as above. Step 1 evaluates the expected gains from different forward and backward sweeps. Step 2 executes the sweep with the highest expected gain. Step 3 iterates back to the beginning of the cycle, until G < qF . Then the decision-maker stops thinking and chooses her “best” action.

8.2

Salience (attention)

The directed cognition model implies that the decision-maker only thinks about a small subset of the aspects of the decision-tree. In this sense, the directed cognition model provides a microfoundation for the psychological concepts of salience and attention. The model predicts that the decisionmaker will only pay attention to a small amount of relevant information. By and large, most information in the decision-tree (and in life in general) will be ignored. First, the decision-maker will not consider information that is not relevant to her decision. Decision-makers won’t extend/see paths that start in low-payoff rows. For example, the algorithm does not extend/see any paths in Figure 2 that start in row seven. This row does not catch the eye of the decision-maker, or the attention of the algorithm, since the starting payoffs are not sufficiently promising. In addition, the decision-maker will not consider information that is not likely to make a big difference to her decision. For example, the decision30

maker won’t extend/see paths that start with low probability branches. For example, the algorithm extends row 10, and this extension follows the branch with probability .65. The other branch (with probability .35) is never extended/seen past column two. These salience properties will generalize to other settings. We attend to information sources that are likely to hold large amounts of information that matter for decisions that we will take. Thus, at a cocktail party we immediately tune into a conversation across the room if we hear our own name spoken. We sensibly focus on information that matters to us and screen out the rest.

8.3

Myopia

The directed cognition model generates decisions that necessarily look myopic from the perspective of a perfectly rational outsider. The algorithm executes operations that reveal relatively large amounts of information about expected payoffs. In almost all settings, the near-term future is more predictable than the distant future, so cost-effective information revelation will tend to overlook long-term events. Long-term events tend to be low probability events and hence are not “worth” evaluating. For example, our algorithm endogenously decides to ignore columns nine and ten in Figure 2. The algorithm estimates that the expected value of information in these columns is more than offset by the cognitive costs of processing that information. Hence, the model predicts that decision-makers will act as if the “future” doesn’t matter. The decision-makers are sophisticatedly ignoring the future, since the low level of predictability makes analysis cost-ineffective.

8.4

Comparative statics with incentives

The directed cognition model generates quantitative predictions about the role of incentives. For example, the model predicts what will happen as incentives are multiplicatively adjusted from a to λa. The equilibrium strategy + maximizes λa+ τ −qτ = λ(aτ −qτ /λ). So the strategies will adjust in the same way that they would have if shadow cost q were replaced by q/λ. Naturally, higher incentives induce longer thinking and more accurate choices.15 The model makes quantitative (testable) predictions about these relationships. 15 A simple measure of the performance of the choice is the expected consumption utility of the decision maker, E[a∞ 1aT >0 ]

31

8.5

Anchoring and adjustment

A large psychology literature has studied the phenomena of anchoring and adjustment.16 When subjects think about a problem, they begin with an initial guess that is anchored to initial information. After further thought, their answers tend to adjust toward the correct answer. The more they are encouraged/allowed to think about the problem, the more adjustment occurs, and the closer is the final answer to the correct answer. The directed cognition model generates exactly this dynamic adjustment process. The model explains why initial guess are distinct from final answers, and predicts quantitatively how additional thinking will affect the respondent’s payoffs.

8.6

Contracts

Many economists believe that incomplete contracts arise because the costs of writing detailed contingent contracts offset the probabilistic benefits. Economists routinely assume that contract costs exist, but rarely formally model those costs. The directed cognition model provides a preliminary step toward quantitatively measuring cognition costs that arise when contracts are written or read. For example, our decision-tree application generates implied variation in the cognition costs required to satisfactorily analyze each of our twelve decision trees. By providing a quantitative theory of cognitive costs, the model can predict what situations will lead parties to avoid contract/cognition costs. Hence extensions of the model can predict when parties will write incomplete contracts or rely on boiler-plate contracts.17

9

Conclusion

We have described a model of boundedly rational choice. This model is modular and there are many improvements that could be made to each of its parts: expanding the set of mental operators, improving the sophistication of the myopic proxy value function, adding trembles in the choice of the best cognitive operation, etc. In general, we have adopted the simplest possible version of each of the model’s features. We wish to highlight two important shortcomings of the model: its practical implementation requires some computer programming and it has so far only been applied to a restricted class of decision problems. On a positive note, we believe that the 16

See Fiske and Taylor (1991, p.389) for a review. Boiler-plate contracts do not need to be read because they are ‘fair’ or already wellunderstood. 17

32

model realizes three goals. First, the model is psychologically plausible – indeed subjects claim to use strategies similar to those in the model. Second the model generates precise quantitative predictions and can be empirically tested; the data rejects the rational model in favor of our bounded rationality alternative. Third, the general structure of the model (choice of mental operators, selected on basis of their expected payoffs, evaluated via a proxy value function) can be applied to a wide class of problems. In another paper, we are using this framework to analyze N−player games. We hope that applications to more concrete economic applications, such as contracts, savings behavior and portfolio choice, will follow.

10

Appendix A: Derivation of the value function for the divine and directed cognition models

The value function, V (a, t), is characterized by a partial differential equation in the continuation region, 1 0 = −q + Vt + σ2 (t)Vaa . 2 Because of the symmetry of the problem, the decision-maker will stop when at crosses either of the time-contingent thresholds −at and at . At those thresholds standard value matching conditions apply: V (−at , t) = 0, V (at , t) = at . Smooth pasting conditions also apply when the thresholds are chosen optimally, V (−at , t) = 0, Va (at , t) = 1. Let T (a, t) represent the expectation of the remaining time spent on the problem, T (a, t) = E[τ |at = a] − t. So T (a, t) is the solution of the differential equation, 1 0 = 1 + Tt + σ 2 (t)Taa = 0 2

(28)

with value matching conditions T (−at , t) = 0 and T (−at , t) = 0. Let C(a, t) represent the expected value of consumption, not including cognition costs. Then C(a, t) = E[aT 1aT >0 ] = E[a+ T ]. The differential equation that characterizes C is, σ2 (t)Caa /2 + Ct = 0, 33

(29)

with value matching conditions C(at , t) = at and C(−at , t) = 0. We consider two special cases which admit analytic solutions: σ2 (t) = σ 2 and σ2 (t) = max{σ 2 (0) − 2βt, 0}. We refer to these respectively as the constant and affine cases. In the constant case, the problem is stationary and t drops out of all expressions. The solution to the Bellman equation is, V (a, t) =

q 2 a + C1 a + C2 σ2

(30)

with constants C1 and C2 determined by the value matching conditions. Smooth pasting conditions are also applied when we solve for the optimal V . For example, in the optimal case, V (a, t) =

σ2 1 q 2 a + α ( − α) a + σ2 2 q 2 2

where a is the optimal stopping threshold and a = α σq , for α = 14 . Using equation (28) we can show that the expected thinking time remaining is T (a, t) = −a2 /σ2 + α2 σ2 /q2 .

(31)

Using the equation (29) we can show that the expected consumption utility is ¢ 1¡ 2 ασ + qa . (32) C(a, t) = 2

Equations (31) and (32) apply to both the rational case and the directed cognition cases of our model. When the variance declines linearly (the affine case), it is convenient to let t = 0 represent the moment at which variance falls to zero. Then we have σ 2 (t) = −2s2 t, for t ∈ [t0 , 0], t0 < 0 = 0 for t > 0. Now the solution of the Bellman equation is a relatively simple expression, V (a, t) = C1 v(a, −st) + C2 a + qt, where v(a, σ) = aΦ

³a´ σ

34

+ σφ

³a´ σ

(33)

(34)

with Φ (·) is the normal distribution function and φ (·) is the normal density. Note that v (a, σ) ≡ E[(a + σu)+ ] for a standard normal u. The solution proposed in equation (33) can be verified by noting, a va (a, σ) = Φ( ), σ a vσ (a, σ) = φ( ). σ Solving for constants C1 and C2 implies that the α + 2q/s ³ v(a, −st) − V (a, t) = 2v(α, 1) − α

value function is given by a´ a + + qt (35) 2 2

for the stopping value at = −αst. The optimal α is the (smallest in R+ ) solution of φ(α) q = 2Φ(α) − 1 s

(36)

For³ boundedly rational agents reasoning with atoms at ±1, we have α = ´ p 1/ 2q/s + 1 + (2q/s)2 . The expected time remaining is T (a, t) = −

2v(a, −st) − a − t, (2v(α, 1) − α) s

(37)

and the expected consumption utility is C(a, t) =

11

αv(a, −st) + (v(α, 1) − α) a . 2v(α, 1) − α

(38)

Appendix B: Proof of Proposition 1:

We derive each of the five inequalities. ³ p ´ 1. S 0 x, 2/πσ ≤ S 1 (x, σ) : because w1 is convex in x and wx1 (0, σ) = 1/2, we have 1 w1 (x, σ(0, t)) − qt ≥ w1 (0, σ(0, t)) + x/2 − qt = √ σ(0, t) + x/2 − qt. 2π

35

Taking the supt of both sides we get 1 S 1 (x, σ) ≥ x/2 + sup √ σ(0, t) − qt 2π t p 0 = x/2 + S (0, 2/πσ).

p Applying max(·, x+ ) to both sides, we get: S 1 (x, σ) ≥ S 0 (x, 2/πσ). £ ¤ 1 2. S 1 (x, σ) ≤ V 1 (x, σ) : Call S 1 (xt , t, k) = E x+ t+k − qk , so£that S (x, t) = ¤ 1 (x , t, k) = E S 1 (x for 0 ≤ s ≤ k, S , t + s, k − s) − supk∈R+ £S 1 (x, t, k). As, for t t+s ¤ on the leftqs ≤ E S 1 (xt+s , t + s) − qs, and£ taking the sup + ¤ k∈R , t + s) − qs for s small hand side, we get S 1 (xt£, t) ≤ E S 1 (xt+s ¤ 1 1 enough. As V (xt , t) = E V (xt+s , t + s) £−qs, we get, for¤D1 (xt , t) = V 1 (xt , t) − S 1 (xt , t), that D1 (xt , t) ≥ E D1 (xt+s , t + s) , i.e. that D1 is a submartingale. The search process stops at a random time + 1 1 1 T , and then S T,T) = £ (x ¤ V (xT , T ) = xT , so D1 (xT , T ) =1 0. As 1 1 D (xt , t) ≥ E D (xT , T ) = 0, we can conclude V (xt , t) ≥ S (xt , t). 3. V 1 (x, σ (·)) ≤ V (x, σ (·)) : this is immediate because this holds for any politicy wi , as V is the best possible policy.

4. First, observe (with, again, x0 = x) E[max(xτ , 0)] = E[xτ /2 + |xτ |/2] by the identity max(x, 0) = (x + |x|)/2 = x/2 + E[|xτ |]/2 as xt is a martingale which stops a.s. ≤ x/2 + E[x2τ ]1/2 /2

= x/2 + (x2 + E[σ(0, t)2 ])1/2 /2 as x2t − σ(0, t)2 is a martingale (39) ≤ x/2 + (x2 + σ(0, m)2 )1/2 /2 with m = E [τ |x0 = x] as σ2 (0, t) is concave

so we get: V (x) ≤ x/2 + (x2 + m)1/2 /2 − qm, hence with W 2 (x, σ) = x/2 + sup (x2 + σ(0, t)2 )1/2 /2 − qt t∈R+

we get V (x) ≤ W 2 (x). We just have to prove W 2 = S 2 , i.e. x/2 + (x2 + s2 )1/2 /2 = w2 (x, s) which can be done by direct calculation of w2 .

36

√ √ √ a+b≤ a+ b: ³ √ ´ p x/2 + x2 + σ2 (0, t)/2 − qt ≤ x + x2 /2 + σ(0, t)/2 − qt

5. We have, because

= x+ + σ(0, t)/2 − qt

so by taking the supt of both sides we get: S 2 (x, σ) ≤ x+ + S 0 (0, σ) whose right-hand side is less than or equal to S 0 (x, σ)+min(|x|, S 0 (0, σ))/2 by direct verification using the definition of w0 .¤

12

References

Bazerman, Max and William F. Samuelson “I won the auction but don’t want the prize,” Journal of Conflict Resolution, December 1983, 27, 618-34. Camerer, Colin, E. J. Johnson, T. Rymon, and S. Sen, “Cognition and Framing in Sequential Bargaining for Gains and Losses,” in Ken Binmore, Alan Kirman, and P. Tani, eds., Frontiers of game theory. Cambridge, MA: MIT Press, 1994, pp. 27-47. Camerer, Colin and Ho, Teck-Hua “Experience weighted attraction learning in normal-form games.” Econometrica 1998 pp. Conlisk, John. “Why bounded rationality?” J. Econ. Literature, June 1996, 34 (2), pp.669-700. Erev, Ido and Roth, Alvin. “Predicting how people play games”. Amer. Econ. Rev., Sept. 1998, 88 (2), pp.848-81. Fiske, Susan and Shelley Taylor. “Social cognition”. New York, NY: McGraw-Hill, 1991. Gabaix, Xavier and Laibson, David. “A boundedly rational decision algorithm” AER Papers and Proceedings, May 200. Gittins, J.C. (1979) “Bandit processes and dynamic allocation indices”, Journal of the Royal Statistical Society B, 41, 2, p.148-77.

37

Hansen, Lars; Heaton, Jonathan; and Luttmer, Erzo. “Econometric evaluation of asset pricing models” Rev. Fin. Stud., Summer 1995, 8 (1), pp.237-274. Jehiel, Philippe. “Limited horizon forecast in repeated alternate games,” Journal of Economic Theory, 1995, 67: 497-519. Kahneman, Daniel and Tversky, Amos. “Judgment Under Uncertainty: Heuristics and Biases”. Science, 1974. Macleod, W. Bentley. (1999) “Complexity, Bounded Rationality and Heuristic Search,” mimeo, University of Southern California. Moscarini, Giuseppe, and Lones Smith. “The Optimal Level of Experimentation,” Yale and University of Michigan mimeo. Mullainathan, Sendhil, “A memory-based model of bounded rationality”, MIT mimeo, 1998. Payne, John W., James R. Bettman, and Eric J. Johnson, The Adaptive Decision Maker, 1993, Cambridge: Cambridge University Press. Roberts, Kevin and Martin Weitzman (1980) “On a general approach to search and. information gathering”, unpublished. Rubinstein, Ariel. (1998) Modeling Bounded Rationality. MIT Press: Cambridge. Simon, Herbert. “A behavioral model of rational choice,” Quart. J. Econ., Feb 1955, 69 (1). Thaler, Richard. Quasi Rational Economics, New York: Russell Sage, 1991. Thaler, Richard. The winner’s curse. New York: Russell Sage, 1994. Vuong, Q. H. “Likelihood ratio tests for model selection and non-tested hypothesis”, Econometrica, 1989, 57 (XXX), pp.307-33. Weitzman, Martin (1979) “Optimal search for the best alternative”, Econometrica 47, 3, p.641-54.

38

a b c d e f g h i j

0

.25 .15 .2 .07 .33

0

2

3

-2

1 4 5

-1 .45 .5 .05

.75 .24 .01

1 -5

.85 .01 .03 .01 .1

1

5 4

.65 .35

.15 .01 .14 .7 .05 .85 .05 .05

.55 .45

0

-2

-1 5

2

.05 .95

-5

-2 -4

.01 .75 .24 .25 .14 .25 .01 .35 .15 .5 .2 .15 .01 .65 .3 .01 .03

.15 .5 .2 .15

-4 5 -2

.3 .15 .3 .1 .15

.05 .2 .75

1 -3 4 -5 -4

-3 0

.1 .9

.5 .4 .1

1 .5 .5

.25 .7 .05

.95 .01 .04

.75 .25

.3 .05 .65

.03 .01 .01 .95 .7 .04 .25 .01

1 4

.15 .04 .8 .01

.85 .15

-4 0 3 3 1

-2 -2

.25 .1 .65

.65 .05 .1 .2

4

.01 .03 .01 .15 .8

-7

.05 .6 .2 .05 .1

3

.35 .01 .15 .14 .35

-8 .33 .67

.55 .05 .3 .05 .05

.49 .5 .01

-5 3

.01 .3 .03 .01 .65

1

.15 .45 .05 .3 .05

-4

.52 .33 .1 .05

-2 -2

.6 .4

.05 .05 .07 .33 .5

0 -1 3 3

.4 .6

.85 .01 .03 .01 .1 .01 .01 .18 .8

-2 -4

.75 .25

-1

13 -5

.95 .05

-1

-1

.05 .9 .05

-2 .15 .85

.85 .05 .09 .01

3 -5 5 3 -5 -1 0

-4 -5 5

.6 .07 .33 .1 .55 .1 .1 .15 .15 .75 .1

.01 .2 .79

.1 .75 .15

.25 .5 .25

5 -1

.33 .67

.25 .4 .05 .2 .1 .33 .15 .52

.15 .05 .2 .6 .6 .25 .14 .01

.66 .29 .05

2

2 -5 3

-5

.1 .8 .04 .05 .01

-4

0

-4 5

0 2 0 4

.25 .55 .2

3

-6

-1

.45 .55

-3

2

.33 .67

.15 .33 .01 .01 .5

1 -2

.3 .3 .4

.25 .2 .05 .5

4 -3 -1

.65 .35

5

Table 1: Predictions of the algorithms for the game in Figure 1 Values Box 1 2 3 4 5 6 7 8 9 10

Rational

-0.57 5.78 -1.31 4.11 6.19 6.88 -7.29 -1.89 3.00 4.12

Dir. Cog.

0.33 5.75 0.91 0.00 3.60 5.36 -1.00 -4.48 5.06 7.19

Probabilities (%)

Column

Column

Cutoff

Discount.

0.39 6.76 -0.27 4.96 6.99 7.68 -6.27 -0.88 4.59 5.54

-0.07 5.37 -0.20 3.65 5.83 6.56 -5.40 -2.25 2.70 4.56

FTL

0.77 5.88 -1.90 4.35 6.22 6.21 -8.29 -1.36 4.64 5.13

Empirical

6.1 4.3 2.6 3.0 11.3 23.9 1.3 1.7 9.1 36.5

Rational

3.1 17.6 2.5 11.1 19.8 23.9 0.5 2.1 8.2 11.2

Dir. Cog.

3.6 17.9 4.3 3.3 9.5 16.0 2.5 0.9 14.6 27.4

Column

Column

Cutoff

Discount.

2.8 17.8 2.3 10.6 19.0 23.1 0.4 2.0 9.5 12.5

3.2 17.0 3.0 10.0 19.5 24.5 0.6 1.6 7.4 13.2

FTL

2.7 18.2 1.0 10.3 20.6 20.6 0.1 1.2 11.5 13.8

Table 2 m=Model Rational Directed cognition (q=0.012) Column Cutoff (Q=8) Column Discounting (d=.91) FTL (p=0.30)

L(m) 2.969 (0.120) 2.570 (0.107) 2.748 (0.119) 2.780 (0.120) 2.481 (0.095)

L(m) - L(Rational)

L(m) - L(DC)

-0.399 (0.080) -0.221 (0.026) -0.190 (0.037) -0.488 (0.046)

0.177 (0.071) 0.209 (0.070) -0.089 (0.057)

L(m) - L(CC) L(m) - L(CD)

0.032 (0.017) -0.267 (0.033)

L(m) is the distance between the predictions of model m and the empirical data. Parameters q, Q, d and p have been set to maximize the fit of the models they correspond to. Standard errors are in parentheses.

-0.298 (0.034)