36 learning and risk attitudes in route choice dynamics

1 downloads 0 Views 8MB Size Report
reveal that learning and uncertainty are important determinants of route choice decisions, showing ..... Learning and Risk Attitudes in Route Choice Dynamics 13.
Learning and Risk Attitudes in Route Choice Dynamics 1

36 LEARNING AND RISK ATTITUDES IN ROUTE CHOICE DYNAMICS Roger B. Chen, Northwestern University, Evanston, IL 60201, USA Hani S. Mahmassani, Northwestern University, Evanston, IL 60201, USA

ABSTRACT This study examines individual risk attitudes and travel time perceptions under different learning mechanisms, and their effect on the day-to-day behaviour of traffic flows. Depending on the behavioural framework, risk attitudes are captured either through the shape of the utility function or in the assessment (subjective weighing) of objective probabilities. Under the decision-making framework proposed in this study, individuals subjectively weight objective probabilities according to risk attitudes. This framework is used to examine the role of risk attitudes and travel time uncertainty on the day-to-day evolution of network flows. Additionally, three learning types are also considered: i) reinforcement; ii) belief; and iii) Bayesian. Depending on the learning type, individual travel time perceptions vary with new experiences. These learning and risk mechanisms are modelled and embedded inside a microscopic (agent-based) simulation framework to study their collective effects on the dayto-day behaviour of traffic flows. These experiments provide an initial exploration of how different learning rules affect individual travel times and the role of risk attitudes. Additionally, the role of risk seekers in driving system-wide properties of traffic networks over time is examined.

1. INTRODUCTION AND BACKGROUND Learning, from a human perspective, is the process of acquiring information or experiences and relating them with current information to make decisions. In the context of route choice, individuals continually learn about the travel times in a network as they make repeated route choices day-to-day. Many dynamic system-wide properties of traffic networks, such as the convergence, robustness, and existence of equilibrium states may result from the learning behaviours of users. Thus, learning plays an important role from a network performance

2 Learning and Risk Attitudes in Route Choice Dynamics

standpoint in driving the day-to-day evolution of flows. In the context of route choice decisions, learning processes allow individuals to relate historical with current travel time experiences, thus shaping their estimates or perceptions of network travel times. Additionally, learning processes may help individuals reduce the perceived uncertainty surrounding these travel time estimates, consequently affecting risk attitudes over time. Learning and risk attitudes are two interrelated parts of a decision making process. However the specific mechanisms operating behind their relationship in the context of individual route choice and network traffic flow evolution have not been fully investigated. 1.1. Measures of Risk and Risk Attitudes Measures of risk and risk attitudes have been extensively examined in studies of decision making under uncertainty, which requires an assessment of (i) the desirability (or “value”) of possible outcomes and (ii) their respective likelihoods. Under the classical theory of decisions under uncertainty, the utility of each outcome is weighted by its probability of occurrence (von Neumann and Morgenstern 1947; Bernoulli (1738) 1954). In expected utility theory (EUT), the decision maker’s risk attitudes are reflected in the shape of the utility function. A concave utility function indicates risk aversions, while risk seeking is associated with a convex utility function. The expected utility model lends itself to operational use, and thus underlies many normative applications of decision analysis in practice. However, experimental studies of actual decisions have shown that individuals often violate the expected utility perspective. An alternate perspective is provided by prospect theory, including its extension cumulative prospect theory (Kahneman and Tversky 1982). Under prospect theory, risk attitudes are reflected through a value function and associated weighing function, which overweighs small probabilities and under-weighs moderate and high probabilities, explaining behaviours encountered in experimental data (Kahneman and Tversky 1979; Payne et al. 1981; Wehrung 1989). This weighing function has been estimated for gains and losses using median data (Kahneman and Tversky 1982). Despite its conceptual attractiveness to behavioural decision theorists, prospect theory has not been made operational using datasets collected outside of controlled laboratory settings. In the context of route choice, a growing number of researchers are focusing on the effects of learning, travel time uncertainty and risk. Both early and more recent laboratory experiments reveal that learning and uncertainty are important determinants of route choice decisions, showing that route switching depends on previously experienced travel time differences and their variance (Mahmassani and Liu 1999; Nakayama et al. 1999; Srinivasan and Mahmassani 2000; Mahmassani and Srinivasan 2004; Avineri and Prashker 2003, 2005). Many studies have also examined risk and uncertainty in route choice at a more microscopic level, focusing on individual attitudes and perceptions, but not examining the network performance effects. Econometric methods for measuring risk aversion and their application to survey data on route choice were recently examined (de Palma and Picard 2005). The authors highlight the significance of key socio-economic factors in explaining risk aversion but not risk seeking. Their methodology is consistent with situations where individuals tend to over or under

Learning and Risk Attitudes in Route Choice Dynamics 3

evaluate the probability of risky events, but confounds risk aversion and biased perceptions of probabilities. Route choice has also been modelled as a one-armed bandit problem (choice between a random and safe route), under different information regimes (Chancelier et al. 2007). Through numerical examples, the authors show that individuals reduce their uncertainty about travel times as a function of their risk aversion. Risk neutral individuals tend to select the risky route and stick, while more risk-averse individuals pick the safe route more frequently. Interestingly, the authors show that users indifferent between the safe and random route after experiencing one or the other value learning more before settling on a final route choice (convergence). The authors’ approach allows study of the individual economic benefits of learning, but not the interrelationship between all users’ choices, reflected in the congestion resulting from the collective decisions of users. 1.2 Learning and Information Integration Past travel experiences are likely to influence day-to-day perceptions of network performance. Thus, modelling and understanding the mechanisms by which individuals integrate (or learn from) past experiences and information from other sources is important. Several generic theories of learning have been proposed in a variety of fields, such as machine learning, game theory, and behavioural decision theory. Behavioural decision theorists (psychologists) have examined learning at the individual level, focusing on information acquisition and integration in decision making in both deterministic and uncertain environments (Einhorn and Hogarth 1981; Ariely and Carmon 2000; Wallsten et al. 2006). However, psychological studies have typically ignored the effects of other decision makers and different information conditions. Information availability plays an important role in determining the feasible of theories in different environments. Economists have investigated learning both experimentally and theoretically, studying how simple information adjustment rules drive equilibrium processes in games under different information environments (Roth and Erev 1993 Crawford 1995; Camerer et al. 2002). Theoretical work has generally relied on the mathematics of stochastic processes to prove theorems about the limiting properties of different rules (Weibull 1995, Fudenberg and Levine 1998). Learning strategies with realistic limiting properties are often regarded as useful models of “actual” learning, but if limiting behaviours take too long to unfold these theorems are less useful than modelling the actual path of equilibration over time. Additionally, game theory studies are less concerned with the attributes of the players, thus learning in their studies does not affect the perceptions of payoffs and related uncertainty. Learning in the context of machine learning looks at determining classification based on new samples, and is more algorithmic than behavioural in nature (Mitchell 1996; Duda et al. 2001). Their applicability in actual human decision making is limited due to the intense information processing and calculation requirements of their rules. Transportation studies that have examined the integration or learning of past experiences or other information sources typically used an averaging rule applied to route or departure time choices (Horowitz 1984; Tong et al., 1987; Mahmassani and Chang 1988; Ben-Akiva et al. 1991; Hu and Mahmassani, 1995). Although these studies examined different integration rules and their effect on travel choices and system properties, they do not address travel time perceptions and the learning processes for updating these perceptions and other latent attributes, such as uncertainty or variance associated with travel times, and risk attitudes and

4 Learning and Risk Attitudes in Route Choice Dynamics perceptions. To account for both the integration of travel times and the associated uncertainty, a Bayesian updating model has been proposed (Kaysi 1991, Jha et al. 1998). A Bayesian statistical framework can account for updating both the estimate of the mean and variance of a distribution or statistical process in light of new information (DeGroot 1970). However, Bayesian statistics do not explicitly address the frequency of learning, information sources for updating, or the relationship between the updated parameters (mean and variance) on other latent attributes, such as risk perception. Recently, triggering and terminating mechanisms in the context of travel time learning were addressed, but information availability and risk perceptions were not addressed (Chen and Mahmassani 2004). Experimental studies on route choice and learning have revealed that learning plays an important role at the aggregate system level by steering traffic networks towards cooperative states (Helbing et al 2005) and at the individual level by reducing uncertainty (Avineri and Prashker 2005; Chancelier et al. 2007). Several studies have examined some aspect of learning, uncertainty perception, or risk attitudes in the context of route choice. However, the connection between perceived uncertainty, risk attitudes and their aggregate effects in traffic systems where payoffs (travel time savings) are dependent on the decisions of all users has not been fully addressed. 1.3 Research Objectives This study models risk attitudes and travel time perception under different learning rules, and examines their effect on the day-to-day behaviour of traffic flows in a network. In this study, risk attitudes are captured through a subjective weighing function applied to objective probabilities, similarly to prospect theory (Tversky and Fox 1995). Additionally three different types of learning are considered: i) reinforcement; ii) belief; and iii) Bayesian. Changes in travel time perception under these learning rules and risk mechanisms are examined in the context of day-to-day route choices. The learning rules and risk mechanisms are modelled and embedded inside a microscopic (agent-based) simulation framework to study their collective effects on the day-to-day behaviour of traffic flows. Simulation experiments are conducted using this model to examine the effect of different travel time learning processes and risk mechanisms on (i) travel time perceptions over time, including the degree of uncertainty; (ii) risk attitudes and perceptions of uncertainty over time, and (iii) the relationship of the latent attributes described in (i) and (ii) on traffic flow evolution and other dynamic system properties, particularly convergence and stability. This study extends previous work by (i) further considering the individual travel time learning process, (ii) further considering the role of risk perception on route choice and its systemwide effects; (iii) further examining the effects of different perception mechanisms on risk perception, (iv) determining the relationship across different travel time learning processes, and (v) capturing the effect of the above on day-to-day network dynamics, in particular convergence and stability.

2. MODELING FRAMEWORK Network traffic flow results from the interaction between users, their evaluation of past experiences, the resulting travel decisions, and the supply-side characteristics of the network.

Learning and Risk Attitudes in Route Choice Dynamics 5 The following section presents different rules by which users integrate past experiences with current ones, including mechanisms that describe route switching decisions. Learning mechanisms determine the role of past experiences in current choices. Route switching or choice mechanisms describe the evaluation and choice of alternatives, given updated individual experiences, perceived uncertainty, and risk attitude. For a given day, an individual’s route choice yields an outcome or experience (travel time) that is a function of the individual’s decision and those of other system users. This travel experience is integrated with past experiences through some learning mechanism. Based on the acceptability of the experience in light of past experiences, the individual will decide to switch routes or keep the current choice. Acceptability is based on the individual’s current perception (judgment) of travel time, which depends on travel times experienced over a number of days, and to some extent an individual’s risk attitudes. Different mechanisms for learning determine the effects of prior experiences on perceptions of current ones. Risk mechanisms that capture the perception of likelihoods reflect risk attitudes and their role in evaluating current choices relative to other alternatives. Furthermore, users may perceive each route to have a distribution of travel times that changes with each new experience. Thus, route choice is decision between different routes, each with a different perceived (overall) travel time distributions, including associated uncertainty. Figure 1 below depicts the framework for route choice decisions. The following sections articulate the individual components of this framework that captures the interaction between risk mechanisms, learning mechanisms, route switching mechanisms, and travel time perception.  

System

Traffic   Network

!!,! !,!

!!!,!

User Route   Choice

Travel  Time   Learning/Updating

Attitudes  and   Perceptions

!!!,!

!!,!

Route   Switching

!!! !,! Decision Flow

Unobserved Components

Influence

Observed Components

Information Flow

!!,! Route Switching Decision

Figure 1: Route Choice Decision Framework

Experienced Travel Time !!,! !,! !!!,! Updated Travel Time !!!,! Route Choice Decision

6 Learning and Risk Attitudes in Route Choice Dynamics 2.1 Travel Time Perception In this study, individuals base route choice decisions on the perceived travel times for routes in the system. These perceived travel times vary across individuals in the system, and are updated in light of travel times experienced across time. Thus, the perceived travel time is constantly updated, or altered and calibrated, as new travel times are experienced from dayto-day. The perceived updated travel time can be stated as follows: !!!,! = !!!,! + !!!,! , ∀  ! ∈ !, ! ∈ !

(1)

where !!!,! : updated perceived travel time for individual i on route k !!!,! : mean of the updated perceived travel time !!!,! : associated error that is distributed Normal ~ N(0,  !!!,! ) !!!,! : standard deviation of the associated error for individual i on route k. Consequently, !!!,! is distributed Normal ~ N(!!!,! ,  !!!,! ), with the distribution varying across routes and individuals. As individuals experience new travel times, !!!,! and !!!,! are updated accordingly. The perceived experienced travel time can be stated as follows: !,! !,! !!,! !,! = !!,! + !!,! , ∀  ! ∈ !, ! ∈

(2)

where !!,! !,! : perceived experienced travel time for individual i on route k !,! !!,! : mean perceived experienced travel time !,! !!,! !,! : associated error, distributed Normal ~ N(0,  !!,! ) !,! !,! Consequently, !!,! !,! is distributed Normal ~ N(!!,! ,  !!,! ), with the distribution varying across each route for each individual, and varying across individuals. In this study !!,! !,!  is assumed to be the objective (actual) travel time on a particular route. The perceived experienced travel ! time is assumed to have the same error as the perceived updated travel time (!!,! !,! =!!,! ). Behaviourally, this implies that individuals perceive their experienced travel times with the same error as the travel time they learn or keep in memory, implying further that the uncertainty associated with the travel times in memory carries over and influences the perception of experienced travel times. Thus, the experienced route travel time users perceive reflects or is correlated with past experienced travel times for a particular route.

Experienced travel times are integrated with updated travel times through learning mechanisms. Additionally, individuals make route switching and choice decisions based on these perceived travel times in conjunction with risk attitudes that affect the perception of gains and losses amongst routes in the choice set. Both learning mechanisms and risk attitudes play important roles in individuals’ route choices across time. The next section describes

Learning and Risk Attitudes in Route Choice Dynamics 7 different learning mechanisms considered in this study by which individuals update their travel times in memory. 2.2 Learning Mechanisms Information availability plays an important role in determining which learning mechanisms are feasible under different environments (Duda et al. 2001; Camerer 2003). In addition to relating experiences with current choices, learning processes may also help reduce uncertainty perceived by individuals, influencing the risk perception of alternatives over time. In the context of day-to-day route choice, individuals update a perceived travel time !!!,! with new travel times experienced !!,! !,! with different learning mechanisms. Since perceived travel times are distributed according to a mean !!!,! and associated variance !!!,! , learning mechanisms update both of these components, yielding new updated travel time distributions in light of new experienced travel times. Several generic theories of learning or information updating have been proposed in the psychology, game theory, and machine learning literatures, such as reinforcement, belief, sophisticated (anticipatory), directional, Bayesian, and Boltzmann learning, each with different information requirements. In this study, three general types of learning are considered: i) reinforcement; ii) belief; and iii) Bayesian. Each of these learning types is presented next in the context of day-to-day route choice and discussed. (Note: hereafter the subscripts i (individual) and k (route) are dropped for convenience) 1) Reinforcement Learning. Under this learning model, alternatives or routes are “reinforced” by their previous payoffs only when they are chosen and a positive outcome occurs, possibly “spilling over” to similar alternatives (routes with overlapping links) (Erev et al. 1999). In terms of perceived travel times, a reinforcement type learning rule for updating the mean and variance can be expressed as: ! !! =

!!! !!!"#$" ∙ !! + ∙ !!!                                                                                                                              (3) !!!"#$" + !!! !!!"#$" + !!!

where ! !! : updated perceived travel time ! ! : prior perceived travel time !: parameter reflecting the weight on past experiences !! : days from which the travel time experiences have not been integrated into memory !!"#$" : total number of times a route was chosen in previously and a lower travel time relative to a reference travel time (travel time gain) was experienced !!! : total number of times the route is chosen during period Ne and a lower travel time

relative to a reference travel time (travel time gain) was experienced

8 Learning and Risk Attitudes in Route Choice Dynamics !!! sample average of travel times experienced by the individual in period Ne that were below

a reference travel time (travel time gain) Under reinforcement learning strategies, individuals make choices based only on their own experiences, requiring only information on received payoffs from actual choices (Roth and Erev 1993). In the context of day-to-day route choice, travel times for a particular route are updated only when the route is selected and an improved travel time results relative to the updated travel time for that route, thus “reinforcing” the estimates of these travel times. Thus, the reference travel time used for judging improvements is the updated travel time. Consequently, the only piece of information required by individuals are the travel times experienced for the chosen route. According to the expressions above, reinforcement learning is governed by !, which reflects the “strength of memory” or “rate of forgetting.” As the memory weight (!) increases in value, the rate of forgetting for an individual decreases, and the greater the effect of past experiences on current travel time perceptions. Additionally, the reinforcement of chosen routes by their experienced travel times is reflected by the weights corresponding to the prior updated travel time and the recently experienced travel times. Thus, although an individual may have a high strength of memory (high  !), if the number of times a route is chosen (and an improvement in travel time results) since updating is high (high!!! ) then the weight on new experiences will be greater compared to the weight on past experiences. !!! is affected by both the number of times an individual selects a routes and the frequency of learning, suggesting a trade-off between the rate of learning, amount of experimentation (how often an individual decides to sample a route) and the success rate (number of times the route choice yields a positive payoff). A second learning mechanism similar to reinforcement that also considers the travel times on alternatives or routes not chosen is belief learning. 2) Belief Learning. A belief learning mechanism assumes that individuals form and update beliefs about the choices of other individuals, making choices based on these beliefs (Crawford 1995). One example of belief learning is fictitious play, where individuals keep track of the relative frequency by which other individuals make choices, selecting the alternative with the highest relative frequency of choice. In this case, the relative frequencies are the “beliefs” individuals use to make their next choices. Belief learning strategies assume that individuals formulate beliefs about other individuals’ choices and base their own choices on these beliefs. Thus, belief learning requires information on choices of other and their associated payoffs. In the context of route choice decisions, this can be expressed as: ! !! =

!!!,!! !!!"#$",!! ∙ !! + ∙ !!! ,!!                                                                                          (4) !!!"#$",!! + !!!,!! !!!"#$",!! + !!!,!!

where ! !! : updated perceived travel time ! ! : prior perceived travel time !: parameter reflecting the weight on past experiences !! : days from which the travel time experiences have not been integrated into memory Ie: set of individuals who selected a particular route in Ne days !!"#$",!! : total number of times a particular route is chosen previously by users Ie

Learning and Risk Attitudes in Route Choice Dynamics 9 !!!,!! : total number of times a particular route is chosen during period Ne by Ie !!! ,!! : sample average of travel times experienced by the Ie in period Ne Both belief and reinforcement learning are a type of weighted average between past and current experiences. The main departure lies in the source of information used to update past experiences, and that in belief learning both choices that result in gains and losses are used. In reinforcement learning, an individual’s own experiences are used, whereas in belief learning, individuals consider the choices of all individuals. Also, the weight used in Eqs 5 and 6 exhibits a trade-off between “strength of memory” and frequency of learning, similar to reinforcement learning. Many game theory studies have shown that heterogeneity in beliefs across individuals lead to different equilibria in coordination games (Van Huyck et al. 1991). The adaptive dynamics in coordination games have been shown to produce results similar to experiments with belief learning models (Crawford 1995; Ho and Wiegelt 1996; Battalio et al 1999). Helbing et a. (2004) have shown that day-to-day route choice resembles coordination games, and that after time players learn to take turns on a two-link network. However, the study did not investigate the different mechanisms that lead to coordination. 3) Bayesian Learning. Similar to both reinforcement and belief learning, Bayesian learning is also a type of “weighted” average between past and current experiences. More specifically, in Bayesian learning probability distributions and their parameters are updated in light of new samples taken. Thus, Bayesian learning is amenable to this study since the route travel times in memory as assumed to be normally distributed. In the context of route choice decisions, Bayesian learning is expressed as: ! !! =

!

!!

1 !! !! !!! ! ∙ ! + ∙ !!                                                                                                                    (5) 1 !! + !! !! 1 !! + !! !!

!! ∙ !! 1 = =                                                                                                                                                                                    (6) !! + !! ∙ !! 1 !! + !! !!

where ! !! : updated perceived travel time ! ! : prior perceived travel time !!! : updated variance in memory !! : prior variance in memory !! : is the number of experienced travel times in the sample. !! : sample mean of experienced perceived travel times !! : sample variance of the experienced travel times Bayesian learning departs from other learning rules in the weight places on past experiences. Reinforcement and belief learning assume that the weight placed on historic experiences is a characteristic of the individual. Under Bayesian learning, these weights are determined statistically as a function of the parameters of the sample of experiences. If “confidence” is assumed to be the inverse of variance, then as variance increases, confidence decreases. Three

10 Learning and Risk Attitudes in Route Choice Dynamics important properties result from Bayesian learning: i) with every experienced travel time, the variance associated with the travel time in memory always (!! and !! are always positive) decreases and confidence always increases; ii) as the number of experienced travel time increases, the confidence associated with the posterior travel time in memory increases; and iii) as the confidence associated with the posterior travel time in memory increases such that the confidence in memory is much greater than that of the sample, the effect of newly experienced travel times decreases. Interestingly, Bayesian, belief, and reinforcement learning share two common properties: i) updated travel times are a weighted average of the (prior) updated travel time and the travel times experienced; and ii) these weights imply a trade-off between frequency of updates and size of each update sample. The departure point between the different rules is the source of experiences used in learning. Reinforcement relies on travel times from individual choices. Belief learning considers experiences from other individuals in the population. Bayesian learning does not specify the source of the sample (how the sample is constructed or taken). These similarities and differences suggest, all else being equal, Bayesian learning may lead to a different rate of convergence compared to belief and reinforcement learning since its weights are a function of the actual travel time experienced (through the use of sample variance) and not just the frequency of choice. 2.3 Risk and Risk Attitudes Decision making in environments with uncertainty requires the evaluation of the desirability of outcomes and their likelihood of occurrence. Day-to-day route choice may be framed as a choice between routes with travel times expressed as a distribution with a perceived mean and variance. Given a perceived travel time distribution for each route, the choice of alternatives can be framed as a decision that considers the likelihood if a particular route yielding a travel time less than a reference point. In this study, the reference point is taken to be the updated perceived travel time ! !! . According to EUT the classical framework for decisions under uncertainty, individuals consider an expected utility that is a weighted sum of alternatives and their probability of occurrence. Risk attitudes are reflected in the shape (concavity or convexity) of an individual’s utility curve, where gains and losses are mapped through a function u(x), and x is the value (payoff) of an alternative. Although the EUT has dominated economic studies, experimental studies have shown inconsistent behavioural results with EUT (Kahneman and Tversky 1979; Payne et al. 1981; Wehrung 1989). In particular, experimental studies suggest that individuals tend to underweight outcomes that are merely probable in comparison with those of certainty, depending gain or loss. An alternative theory to account for these inconsistencies is prospect theory (PT). The prospect of a lottery is determined by summing the values of alternatives weighted by their subjective (or weighted) probabilities of occurrence, and a choice is made based on these prospects. Under prospect theory individuals exhibit four different patterns of risk aversion and risk seeking behaviours (Kahneman and Tversky 1979, 1992; Tversky and Fox 1995): i) risk seeking for gains and ii) risk aversion for losses of low probabilities; and iii) risk aversion for gains and risk seeking for losses of high probability. This fourfold pattern is based on the assumption of overweighing low probabilities and under-weighing high probabilities, independent of gain or loss. This study proposes a similar model, under which individuals subjectively weigh objective probabilities

Learning and Risk Attitudes in Route Choice Dynamics 11 of gains and losses, independent of whether the probability is high or low. Risk-averse behaviour is indicated by under-weighing probabilities of gains and overweighing probabilities of losses. Risk-seeking individuals exhibit the converse, under-weighing probabilities of losses and overweighing probabilities of gains. In this study, a probability weighing function (Eq. 8) that gives this fourfold pattern is used, along with the following value function (Eq. 9). Given these functions, this study considers the following model: ! = ! !"#$ ! !!"#$ (7) ! ! =

∙ ! !!"#$ + !!"## ! ! !"##

1−! !∙! ! 1 − ! ∙ ! − 1 − 2!

! ∆! =

∆! ! −! ∆! ! !

!!"#$ = !

∆!!!

! !"## = !

∆!!!

!

!"  ∆! ≥ 0 !"  ∆! < 0

! !!"#$ !

! !!

1−!

! ∆! ∙ ∆! !∆!

! ∆! ∙ ∆! !∆!

∙ ! ! !"##

! ≤ !, 0 ≤ ! ≤ 1 ! > !, 0 ≤ ! ≤ 1

(8) (9) (10) (11)

where, ! ∆!: is the difference between a travel time and the best travel time among routes !!"#$ − !! ! . : probability distribution function (pdf) for a Normal distribution α and λ: parameters that determine the shape of the value function ! . π: parameter between 0 and 1 that determines the position of the infliction point of the probability weighing function ! !

12 Learning and Risk Attitudes in Route Choice Dynamics

1 Objective Probability Probability of Losses Probability of Gains

weighted probability

0.8

0.6

0.4

0.2

0 0

0.2

0.4

0.6

0.8

1

probability

Figure 2: Weighing Functions (Eq.8) for Risk Averse Individual (πloss=0.25; πgain = 0.75) The attractiveness of a route is determined by Eq. 7, as the weighted sum of the value of a gain (positive ΔT ) and the value of a loss (negative ΔT ). These values are weighted by their probabilities of occurrence, which are subjectively weighted according to Eq. 8, plotted below in Fig. 2. Objective probabilities are weighed subjectively according to parameter π which varies with risk attitude. A risk averse individual corresponds to a low π for losses (low πloss), resulting in an overweighing of probabilities, and a high for gains (πgain) for gains, resulting in an under-weighing of probabilities, where πgain and πloss sum to one (πgain + πloss = 1). Risk seekers exhibit the opposite. The value function (Eq. 9) is assumed to be concave for gains and convex for losses, determined by the shape parameter α. Given that the shape parameter λ is positive, the function is steeper for losses compared to gains. 2.4 Route Switching Route switching is based on the difference between the best and current routes, following a boundedly-rational rule used in several previous studies (Mahmassani and Jayakrishnan 1991; Hu and Mahmassani, 1995). Acceptability or tolerance for travel time differences is defined by the difference between the current and best travel times. This tolerance based switching mechanism can be stated as follows: !,!"#$ !,!"#$ ! ! !!" ≥ ∆! ∙ !!" !!" = 1 !"   !!" − !!" , !ℎ!"!    0 ≤ ∆! ≤ 1 0 !"ℎ!"#$%!

where

(12)

Learning and Risk Attitudes in Route Choice Dynamics 13 !!" : a binary variable that takes a value of 1 if the difference between the mean current and best learned travel times are acceptable to individual i, and 0 otherwise; ∆! : the acceptability or tolerance threshold for the difference in travel times that defines the percent improvement over the current travel time to warrant switching routes. As the tolerance for travel time differences increases (∆! increases) individuals are more tolerant of travel time differences, and are less willing to switch for only marginal travel time improvements. As the tolerance for travel time differences decreases, individuals are more willing to switch for even small travel time differences. 2.5 Route Choice Evaluation Previously, learning mechanisms for integrating experiences with an updated perceived travel time were presented, in addition to a mechanism for route switching decisions. However, the actual choice among routes was not addressed. Furthermore, given that updating does not occur every day, individuals may base route choice decisions on the travel time experienced previously or the updated perceived travel time. Three possibilities are as follows: ! !! = !!"

!! =

!

(13) ! !!"

! !!" !! = !,! !!,!

! !!" + 1 − ! !!,! !,!

!"#$%&  !"#$  !"#$%&'(  !""#$% !"ℎ!"#$%!

!"#$%&  !"#$  !"#$%&'(  !""#$% !"ℎ!"#$%!

(14)

(15)

where !! : route travel time used for making route choices β: weighing parameter The expressions above imply different types of route choice evaluations. Under Eq. 13 route choice each day is only on the basis of perceived updated travel times. If an extremely long travel times for a particular route is experienced on day n, if this experience has little impact on the updated travel time, perhaps due to a long history of short travel times, route switching would not occur. Eq. 15 is the opposite, under which route choice is based on experienced travel times unless updating occurs. The next section describes the simulation experiments conducted using the framework and mechanisms described in the previous sections.

3. DESCRIPTION OF SIMULATION EXPERIMENTS This section describes the system features and related details of the simulation experiments, principal factors investigated, and specific properties and performance descriptors considered in this investigation.

14 Learning and Risk Attitudes in Route Choice Dynamics 3.1 System Features The network used for this study, shown in Figure 3, consists of 9 nodes and 12 links. Link cost-flow functions were used with a linearly varying cost beyond the value elink · caplink, according to the following expressions for link l: !! =

!!!"# ∙ 1 + !! !! !"#! − !! !!!"# ∙ 1 + !! !! 1 − !! + !! !! !"#! − !!

1 − !!

!

!! ≤ !! ∙ !"#! !! > !! ∙ !"#!

(16)

where !!!"# : the free-flow travel time; !! : defines the slope of the curve; !"#! : the link capacity; 0 ≤ !! ≤ 1: defines the under saturation limit. Links located near the center of the network have smaller capacities compared with links on the border, and thus their cost-flow functions are more sensitive to varying flows. Links of the border have larger free flow times compared with the links in the center. Nodes 1, 4, 5, 8, and 9 are origins and destinations and all possible OD pairs are connected. Parameter values and OD pairs and base demand values are given in Tables 1 and 2. 1

1

2

3

2

3

4 6

4

5

5

8

7

10

9 11

7

6

8

12

9

Figure 3: Network used in Simulation Experiment Table 1: Link Characteristics and Parameters link 1

min

t 20

capacity 360

b 0.1

E 0.95

Learning and Risk Attitudes in Route Choice Dynamics 15 2 3 4 5 6 7 8 9 10 11 12

12 15 12 12 10 12 15 10 30 15 15

360 240 180 360 150 180 240 150 360 240 240

0.1 0.12 0.15 0.1 0.15 0.15 0.12 0.15 0.1 0.12 0.12

0.95 0.95 0.95 0.95 0.95 0.95 0.95 0.95 0.95 0.95 0.95

Table 2: OD Demand O-D 1-8 1-9 9-8 1-5 5-8 1-4 4-8

Routes 6 2 2 1 1 1 1

Demand 60 40 10 10 10 10 10

In order to initiate the dynamics of the system, travel times for the initial iteration are specified using the initial loading pattern, using the cost-flow functions. Consequently, the initial mean updated travel time is set to the initial travel time, and the variance set to β τ 0u . β is interpreted as the initial variance of the perceived travel time over a segment of unit travel time and is the same for all users. Thus, a large β indicates that the initial overall level of uncertainty is high in the system, which is realistic for systems with many "new" users. τ 0u is the initial perceived updated travel time. Users are loaded uniformly across ODs and subsequently paths. Different probabilistic loading patterns could also be used. Other specifics that have been varied across simulations are discussed next. 3.2 Experimental Factors The experimental factors investigated are grouped broadly into two categories: a) factors related to risk and attitudes and b) factors relating to learning. Two scenarios were considered: a) a population with risk seekers and avoiders, and b) a population that does not explicitly consider their risk attitudes. Individuals under the first scenario make use of the subjective weighing of objective probabilities shown in Eqs. 7-11. A summary of factors considered is shown below in Table 3. Table 3: Experimental Factors Considered Factors Relating to Experiments Considering Risk Attitudes i) Percentage of Risk Seekers and Avoiders

Factors Common to All Experiments i) Demand Level (V) ii) Initial Uncertainty (β)

16 Learning and Risk Attitudes in Route Choice Dynamics

π gain and π loss

ii) Degree of Risk Attitude (

)

iii) Perceived Travel Times iv) Learning Mechanisms

3.2.1 Risk Attitude Factors Equation 7 states that the score or prospect of an alternative is the weighted average of values of alternatives weighted by their subjective probabilities. Values are evaluated based on differences from a reference point (the best travel time of all routes) and determined according to Eq. 13. Furthermore, in evaluating alternatives, three possibilities are expressed in Eq. 17-19. The degree to which individuals over and under-weigh objective probabilities is governed parameters ! !"## and  ! !"#$ . The parameter was normally distributed across the population of agents using a mean ! !"## with a variance of ! !"## vpop. The following values were used: ! !"## = {0.10, 0.2, …,0.5}. Additionally the percentage of risk seekers in the population (γrisk total number of users) were varied by setting γrisk = {0, 0.1, 0.2, …, 1}. 3.2.2 Learning Factors The main parameter governing the reinforcement and belief learning mechanisms is the weight φi ∈ [0,1] placed on historical experiences, as shown in Eqs. 3-6. As φi increases, the greater an individual’s memory, and the more weight placed on historical experiences. φ is normally distributed across the population with a mean φ and a variance of φ vpop. 3.2.3 Population Factors In addition to the factors described previously, two population related factors were also considered: 1. Demand Level. Five different demand levels were considered in this study for each OD (a set number of users was assigned to each OD). The base case was 100 users corresponding to a population factor of 1 (V = 1). Other population levels considered were: V = {0.75, 1.2, 1.5, and 2}. Previous studies have shown that convergence is harder to obtain at higher levels of population. 2. Initial Degree of Perceived Dispersion. Additionally, different levels of initial perceived dispersion or variance in travel times were also considered. Dispersion is measured by the initial beta used to determine the initial variance of travel time. Three different values of beta were considered: β = {1, 2, and 2.5}. 3.3 Performance Measures and Properties 1. Day-to-day flow pattern of traffic, in particular convergence. Convergence is reached when users have stopped switching routes for the remainder of the simulation. For cases where a strict convergence is unattainable, a plot of the day-to-day flow is shown to facilitate a qualitative analysis.

Learning and Risk Attitudes in Route Choice Dynamics 17

2. Number of days until convergence. The number of days till convergence is the number of days from the start of the simulation till convergence is reached. For cases where strict convergence is unattainable, the number of days till convergence is the number of days till the flows on all paths change within an acceptable tolerance.

4. SIMULATION RESULTS The results from three sets of simulation experiments are presented and discussed in this section. First the effects of varying demand level under Bayesian, reinforcement, and belief learning mechanisms are presented. The next set of experiments considers the effects of varying mean π values and different percentages of risk seekers and avoiders within the population. The third set of experiments considers the effects of varying initial travel time perception uncertainty (variance) on the convergence of the system. The final set of experiments examines the effects of the perceived travel time on convergence. 4.1 Varying Demand Levels In traffic systems, demand levels fluctuate over time, due to latent demand for travel and time-varying activity patterns. Past studies have shown that as demand levels increase, convergence is more difficult to obtain (Mahmassani 1984; Chen and Mahmassani 2004). In the first set of experiments conducted in this study, demand levels were varied across different learning mechanisms. Demand levels are varied by increasing the base demand level (100 users) through a demand factor (V). Thus, V = 1 corresponds to the based demand level, while V = 2 corresponds to an increase in demand by a factor of two. The results from these experiments are show below in Table 4. Table 4: Iterations until Convergence for Different Demand Levels (V); V = demand level Bayesian

Reinforcement

Belief

V = 1.00

11

16

7

V = 1.50

10

61

7

V = 2.00

15

NC

7

V = 3.00

NC

NC

7

Under Bayesian and reinforcement learning, lower usage levels show a greater propensity towards convergence compared to high levels, confirming past results, but under different learning mechanisms. However, under a belief learning which updates using averages of experiences across all users on a particular route, convergence appears less sensitive to demand levels. High demand showed lower propensity towards convergence principally because the travel times are more sensitive to flow with more congestion in the system, as captured in the link flow-cost functions (and would be predicted by virtually all standard queuing or traffic flow models). Under belief learning, since users update using travel times averaged across all user experiences for a particular route, the effects of travel time

18 Learning and Risk Attitudes in Route Choice Dynamics fluctuation or variation across users may be reduced, leading to similar travel time perceptions across all users on a particular route, all else being equal. Finally, strict convergence under reinforcement learning was more difficult to obtain, relative to other learning mechanisms. One plausible explanation is that reinforcement is a selective updating mechanism that leads to updating only for experienced travel time gains (choices that lead to a reduction in travel times). Thus, under reinforcement learning, updating may occur less frequently and with smaller samples in general compared to other mechanisms. One assumption of the learning rules used in this study is that with each update, the confidence increases (variance decreases), leading to perceived travel time distributions that become tighter around the mean with each update. Thus new experiences (travel times) have less an impact on users’ travel time perceptions. Under reinforcement learning since updating only occurs for travel time gains, the perceived travel time uncertainty (variance) may not decrease at the same rate as other mechanisms, thus leading to slower convergence compared to Bayesian and belief learning. 4.2 Varying Initial Uncertainty Experiments were also conducted to examine the effects of the initial uncertainty, determined by the value of β, under each of the three learning rules. These results are shown below for two demand levels (V=1 and V=2). The results above indicate that under Bayesian and Reinforcement learning, as the initial uncertainty increases, convergence in traffic flows is more difficult to obtain. One possible explanation for this is that given that users have a higher perceived uncertainty or judgment error, more new experiences are required to decrease this perception error. In general, reinforcement takes more time until convergence relative to Bayesian since, since the travel time experiences sampled under reinforcement learning only consists of travel time “gains” (reduction in travel time). Under belief learning, since users update using travel times averaged across all user experiences for a particular route, the effects of travel time fluctuation or variation across users may be reduced, leading to similar travel time perceptions across all users on a particular route, all else being equal. Finally, similar to the results in Table 4, higher demand levels lead to more difficulty with respect to convergence. Table 5: Number of Iterations until Convergence for Different Initial Perceived Error (β); β = Variance Associated with a Unit of Travel Time for Two Demand Levels. Demand: V=1 Beta=1 Beta=2 Beta=3

Bayesian 11 13 15

Reinforcement 16 22 38

Belief 7 7 7

Demand: V=2 Beta=1 Beta=2 Beta=3

Bayesian 15 16 16

Reinforcement NC NC NC

Belief 7 7 7

Learning and Risk Attitudes in Route Choice Dynamics 19

4.3 Risk Attitudes Under a decision process that accounts for perceptions of uncertainty, risk attitudes play important roles in the evaluation of travel time likelihoods in route choice. The parameter πloss indicates the inflection point in Eq. 12, indicating the degree users’ subjectively overweigh or under-weigh objective probabilities. Results show that risk attitudes do affect the convergence of traffic systems. The results for Bayesian and belief learning mechanisms under a decision process that takes into account risk attitudes are presented below in Figures 4 and 5. Figure 4 shows that under Bayesian learning, as πloss increases, the propensity towards convergence is greater, relative to a lower πloss. Furthermore, a high percentage of risk seekers (90%) leads to more propensity towards convergence under Bayesian learning, compared to a low percentage (10%). In the risk framework in this study, risk seekers would under-weigh probabilities of losses and over weigh probabilities of gains. Thus, risk seekers may have a higher propensity towards switching to routes with larger perceived variances, unless the travel time gain between the current and alternative routes is huge. Risk avoiders on the other hand show greater propensity towards staying on routes with lower variances, despite the possibility of a travel time gain for switching. One consequence of the Bayesian learning rule is that as users gain travel time experiences over time, their perceived variance decreases, thus users’ perceived travel times are insensitive to new experiences. One plausible explanation for the higher propensity towards convergence exhibited by systems with more risk seekers relative risk avoiders is that risk seekers switch at a greater frequency due to their propensity towards routes with huge variances, relative to risk avoiders, thus reducing their perceptions of travel time uncertainty at a greater rate compared to risk avoiders.

200 180 10% Risk Seekers

Iterations Until Convergence

160

90% Risk Seekers

140 120 100 80 60 40 20 0 0.1

0.15

0.2

0.25

0.3

0.35

Mean Pi for Loss for Risk Adverse Individuals

0.4

0.45

0.5

20 Learning and Risk Attitudes in Route Choice Dynamics Figure 4: Number of iterations until convergence as the mean πloss increases, for different percentages of risk seekers in the population: Bayesian Learning Experiments 200 180

Iterations Until Convergence

160 140 120

10% Risk Seekers 90% Risk Seekers

100 80 60 40 20 0 0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

Mean Pi for Loss for RIsk Adverse Individuals

Figure 5: Number of iterations until convergence as the mean πloss increases, for different percentages of risk seekers in the population: Belief Learning Experiments The results for belief learning (Fig. 5) show that although a system with a low percentage of risk seekers has a greater propensity towards convergence than one with a high percentage of risk seekers, the difference in propensities is less relative to the results from a Bayesian learning rule. Under belief learning perceived travel times are updated using averaged travel times across all users choosing the same route. Thus, the effects of travel time fluctuations or variation across users is reduced, leading to similar travel time perceptions across all users of a particular route, all else being equal, leading to a greater propensity towards convergence, compared to systems where individuals are perceiving different travel times. 4.4 Initial Perceived Variance (β) and Risk The parameter β indicates the initial dispersion the perceived travel times. Thus, a higher β indicates greater initial perceived variance in the travel times (low confidence). The number of iterations until convergence for different percentages of risk seekers in the population and different values of initial perceived travel time variance are presented below in Figures 6 and 7 for different initial values (β) for perceived travel time uncertainty. The initial perceived dispersion (variance) of travel times seems to have no effect on convergence under Bayesian and belief learning. This departs from previous studies that show that if the initial perceived variance is too low (low β), the system has a lower propensity towards converging since additional learning has marginal effects on the perceived variance (Chen and Mahmassani 2004). One plausible explanation for this difference is that risk attitudes are explicitly considered in this study. Some users may be very risk seeking, thus switching routes for any small probability of a travel time gain. Thus, a low perceived variance may not have a pronounced effect since some risk seeking individuals would be switching regardless.

Learning and Risk Attitudes in Route Choice Dynamics 21

200 180

Beta = 2 Beta = 1

Iterations Until Convergence

160

Beta = 2.5

140 120 100 80 60 40 20 0 0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Percent of the Population that is Risk Seeking

Figure 6 Number of iterations until convergence as the number of risk seeking individuals in the population increases, for different initial perceived variances (β): Bayesian Learning 200

180

Beta = 1 Beta = 2

160

Beta = 2.5

Iterations Until Convergence

140

120

100

80

60 40

20

0 0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Percent of the Population that is Risk Seeking

Figure 7 Number of iterations until convergence as the number of risk seeking individuals in the population increases, for different initial perceived variances (β): Belief Learning Also, note that a low percentage of risk seekers does not necessarily indicate the absence of extremely risk seeking behaviours (high πloss), since the values are drawn from a normal

22 Learning and Risk Attitudes in Route Choice Dynamics distribution. Thus, for any percentage of risk seeking users there would be users with a high degree of risk seeking behaviour (high πloss). Finally, under Bayesian learning, as the percentage of users who are risk seeking increases, convergence appears easier to obtain. Also, convergence is easier to obtain under belief learning compared to Bayesian learning overall. These results are consistent with those observed in Figures 4 and 5. The effects of varying initial perceived travel time variance may be reduced due to the presence of risk seeking (and risk avoiding) users in the system that may switch routes or stay despite small probabilities of gains or losses. 4.5 Reference Travel Time Finally, the effects of a reference travel time affects convergence were examined (Eqs. 1719). These results show that as users place more weight on updated travel times, the propensity towards convergence increases, compared to users who place more weight on recently experienced travel times. These results are shown below in Figure 8. The result show that under Bayesian learning, as the number of risk seekers in the population increases, convergence is more difficult to obtain in general, similar to other results obtained in this study. Risk seekers may exhibit greater switching, relative to risk avoiders, and leading to a greater spread from iteration to iteration, resulting in higher propensity towards convergence. Additionally, as the experienced travel time is weighed more than the perceived travel time, convergence is more difficult to obtain. Also, as users choose (sample) the same route more frequently from day-to-day, their confidence in the perceived travel time for that route increases (variance decreases), and thus future experiences will have less an impact. 200 180 Updated

Iterations Until Convergence

160

Weighte d Experience

140

120 100 80

60 40

20 0 0

0.1

0.2

0.3

0.4

0.5

0.6

Percent of the Population that is Risk Seeking

0.7

0.8

0.9

1

Learning and Risk Attitudes in Route Choice Dynamics 23 Figure 8: Number of iterations until convergence as the percentage of risk seekers increases, for different types of perceived travel time, under Bayesian learning

5. CONCLUSIONS This study examines the role of risk attitudes and individual perceptions of travel time on the day-to-day behaviour of traffic flows. In this study a prospect theory type decision making framework is used to examine the role of risk attitudes and travel time uncertainty on day-today network flows. Additionally, three learning types are considered: i) reinforcement; ii) belief; and iii) Bayesian. These learning and risk mechanisms are modelled and embedded inside a microscopic (agent-based) simulation framework to study their collective effects on the day-to-day behaviour of traffic flows. We also examined the role of risk seekers in driving system-wide properties of traffic networks over time. The results show that explicitly considering risk attitudes and their effect on an individual’s perception of uncertainty does influence the convergence of traffic flows in a network. Additionally, in the case of belief learning, they also affect the spread of individuals across routes at convergence. Risk attitudes affect route choice decisions by influencing how individuals perceive uncertainty and how uncertainty relates to route travel times experienced in the decision making process. Additionally, the results show that the percentage of risk seekers in the population affects the rate of convergence, possibly by affecting the rate of sampling taken by individuals and by adding variability in travel times for individuals who are not risk seeking. Additionally, for Bayesian learning, any mechanism that affects the rate of sampling will affect the rate of convergence. Convergence under Bayesian learning is a function of both the perceived travel times and the perceived dispersion of these travel times. Reinforcement learning describes how travel times experienced are integrated, but does not explicitly say anything about how uncertainty changes over time. There is no assumption in reinforcement learning that individuals perceive less dispersion in travel times as more experiences are gained. Thus, unlike a system with Bayesian learners, convergence is more difficult to achieve. Although belief learning faces the same issue, since it considers experiences of all users, this may serve to lead a system to faster convergence compared to reinforcement learning. Finally, results show that there are system-wide properties that are common to all cases, regardless of learning rule or the explicit consideration of risk attitudes. First as demand levels increase, convergence is more difficult to achieve. Second, as individuals weigh their perceived updated travel time more, less switching among routes occurs and individuals choose a particular route more consistently. REFERENCES Ariely D. and Z. Camron (2000). Gestalt Characteristics of Experiences: the Defining Features of Summarized Events. J. of Behavioral Decision Making, 13(2), 191-201. Avineri E. and J. N. Prashker (2003). Sensitivity to Uncertainty: the Need for a Paradigm Shift. Transportation Research Record, 1854, 90-98. Avineri E. and J. N. Prashker (2005) Sensitivity to Travel Time Variability. Transportation

24 Learning and Risk Attitudes in Route Choice Dynamics Research Part C, 13, 157-183. Battalio, R., L. Samuelson, and J. van Huyck (2001). Optimization Incentives and Coordination Failure in Laboratory Stag Hunt Games. Econometrica, 69, 749-764. Ben-Akiva, M., A. De Palma, A and I. Kaysi (1991). Dynamic Network Models and Driver Information Systems. Transportation Research, 25A(5), 251-266. Bernoulli D. (1738). Exposition of a New Theory on the Measurement of Risk Translated Sommer L. (1954). Econometrica, 22(1), 23-36. Camerer C., T. Ho, and K. Chong (2002) Sophisticated Experienced-weighted Attraction Learning and Strategic Teaching in Repeated Games. Journal of Economic Theory 104, 137-188. Camerer, C. F. (2003) Behavioral Game Theory. Princeton University Press. Chancelier J-P, M. de Lara, and A. de Palma. (2007) Road Choice and the One Armed Bandit. Transportation Science, 41(1), 1-14. Chen, R. B. and H. S. Mahmassani (2004). Learning and Travel Time Perception in Traffic Networks. Transportation Research Record, 1894, 209-221. Crawford V. P. (1995). Adaptive Dynamics in Coordination Games. Econometrica 63(1), 103-143. de Palma A. and N. Picard (2005) Route Choice under Travel Time Uncertainty. Transportation Research Part A ,39, 295-324. DeGroot, M. (1970). Optimal Statistical Decisions. McGraw-Hill Book Company. Duda R., P. Hart, and D. Stork (2001). Pattern Classification. Wiley Science. Einhorn H. and R. Hogarth (1981) Behavioral Decision Theory: Processes of Judgment and Choice. Annals of Psychology 32, 53-88. Erev, I., Y. Bereby-Meyer, and A. Roth. (1999). The Effect of Adding a Constant to all Payoffs: Experimental Investigation, and Implications for Reinforcement Learning Models. Journal of Economic Behavior and Organization, 39(1), 111-128. Fudenberg D. and D. Levine (1998) The Theory of Learning in Games. Cambridge, Massachusetts, MIT Press. Helbing D., M. Schonhof, M.and H-U. Stark (2005). How Individuals Learn to Take Turns: Emergence of Alternating Cooperation in a Congestion Game and the Prisoner’s Dilemma. Advances in Complex Systems 8, 87-116. Ho T. and K. Wiegelt (1996). Task Complexity, Equilibrium Selection, and Learning: An Experimental Study. Management Science 42, 659-679. Horowitz, J. L. (1984). The Stability of Stochastic Equilibrium in a Two-Link Transportation Network. Transportation Research , 18B(1), 13-28. Hu, T-Y and H.S. Mahmassani (1995). Evolution of Network Flows under Real-time Information: A Day-to-Day Simulation Assignment Framework, Transportation Research Record, 1493, 46-56. Jha, M., S. Madanat S. and S. Peeta. (1998). Perception Updating and Day-to-Day Travel Choice Dynamics in Traffic Networks with Information Provision. Transportation Research 6C(3), 189-212. Kaysi, I. (1991). Framework and Models for Provision of Driver Information System, PhD. Thesis, Department of Civil Engineering, Massachusetts Institute of Technology, Cambridge, MA. Kahneman D. and A. Tversky (1979). Prospect Theory: An Analysis of Decision Under Risk.

Learning and Risk Attitudes in Route Choice Dynamics 25 Econometrica 47(2), 263-292. Kahneman D. and A. Tversky (1982). Advances in Prospect Theory: Cumulative Representation of Uncertainty. Journal of Risk and Uncertainty 5, 297-323. Tong, C-C., H.S. Mahmassani and G.-L. Chang (1987). Travel Time Prediction and Information Availability in Commuter Behaviour Dynamics. Transportation Research Record 1138, 1-7. Mahmassani H. S., and G.-L. Chang (1988). Travel Time Prediction and Departure Time Adjustment Behaviour Dynamics in a Congested Traffic System, Transportation Research B, 22B(3), 217-232. Mahmassani, H.S. and R. Jayakrishnan (1991). System Performance and User Response Under Real-Time Information in a Congested Traffic Corridor, Transportation Research A, Vol. 25A(5), 293-308. Mahmassani, H.S. and Y.-H. Liu (1999). Dynamics of Commuting Decision Behaviour under Advanced Traveler Information Systems. Transportation Research C, 7, 91-107. Mahmassani, H.S., and K.K. Srinivasan (2004). Experiments with Route and Departure Time Choices of Commuters under real-time Information: Heuristics and Adjustment Processes. Chapter 4 in Human Behavior and Traffic Networks, edited by M. Schreckenburg and R. Selten, Springer-Verlag. Mitchell, T. M. (1997) Machine Learning. McGraw Hill. Nakayama, S., R. Kitamura, and S. Fujii (1999). Drivers’ Learning and Network Behavior: A Systematic Analysis of the Driver-Network System as a Complex System. Transportation Research Record 1493, 30-36. Payne, J. W., D. J. Laughhunn, and R. Crum. (1981). Aspiration level effects in risk behavior. Management Science 27, 953-958. Roth A. and I. Erev (1993). Learning in Extensive-Form Games: Experimental Data and Simple Dynamic Models in the Intermediate Term. Games and Economic Behavior 8, 164-212. Srinivasan, K.K., and H.S. Mahmassani (2000). Modeling Inertia and Compliance Mechanisms in Route Choice Behavior Under Real-Time Information. Transportation Research Record 1725, 45-53. Tversky, A. and C. Fox (1995). Weighing Risk and Uncertainty. Psychological Review 102(2), 269-283. Van Huyck J., R. Battalio, and R. Beil (1991). Strategic Uncertainty, Equilibrium, Selection, and Coordination Failure in Average Opinion Games. Quarterly Journal of Economics. 106, 885-909. von Neumann J. and O. Morgenstern (1947). Theory of Games and Economic Behavior 2nd Edition. Princeton University Press. Princeton, NJ. Wallsten T., T. J. Pleskac and C. Lejuez (2006). Modeling Behavior in a Clinically Diagnostic Sequential Risk-Taking Task. Psychological Review 112(4), 862-880. Wehrung D. A. (1989) Risk Taking Over Gains and Losses: A Study of Oil executives. Annals of Operations Research 19, 115-139. Weibull J. (1995). Evolutionary Game Theory. Cambridge, Massachusetts, MIT Press.