Transportation Letters The International Journal of Transportation Research

ISSN: 1942-7867 (Print) 1942-7875 (Online) Journal homepage: http://www.tandfonline.com/loi/ytrl20

Modeling travel mode choice behavior with bounded rationality using Markov Logic Networks Tao Li, Hongzhi Guan, Jiaqi Ma, Guohui Zhang & Keke Liang To cite this article: Tao Li, Hongzhi Guan, Jiaqi Ma, Guohui Zhang & Keke Liang (2017): Modeling travel mode choice behavior with bounded rationality using Markov Logic Networks, Transportation Letters, DOI: 10.1080/19427867.2017.1342945 To link to this article: http://dx.doi.org/10.1080/19427867.2017.1342945

Published online: 29 Jun 2017.

Submit your article to this journal

View related articles

View Crossmark data

Full Terms & Conditions of access and use can be found at http://www.tandfonline.com/action/journalInformation?journalCode=ytrl20 Download by: [174.205.2.203]

Date: 29 June 2017, At: 08:09

Transportation Letters, 2017 https://doi.org/10.1080/19427867.2017.1342945

Modeling travel mode choice behavior with bounded rationality using Markov Logic Networks Tao Lia, Hongzhi Guana, Jiaqi Mab, Guohui Zhangc and Keke Lianga a

Transportation Research Center, Beijing University of Technology, Beijing, China; bTransportation Solutions and Technology Applications Division, Leidos, Inc., McLean, VA, USA; cDepartment of Civil Engineering, University of New Mexico, Albuquerque, NM, USA

ABSTRACT

Disaggregate choice models have been widely studied to quantify the influence of the characteristics of travelers as well as the attributes of alternatives and choices in their travel modes. However, due to their model specifications and primary assumptions on unobserved disturbances, their modeling capability is constrained. In this study, a Markov Logic Network (MLN)-based approach is developed to combine bounded rationality principles with travelers’ behavior in travel mode choices. This approach is established based on logical domain knowledge and probabilistic models. MLN can extract logical domain knowledge and represent the impacts of significant attributes using independent logical formulas that are weighted correspondingly by their relative relationships. Travel-mode choice is determined based on travelers’ personal preferences and logical domain knowledge. The numerical examples and parameter sensitivity analyses indicate this approach performs reasonably well. The research findings are helpful to better understand travel mode-choice model specifications and travel behavior interpretations.

Introduction Disaggregate choice models have been widely investigated and applied in various research areas due to their flexibility in quantifying the influence of the characteristics of decision-makers and the attributes of alternatives and choices. Logit and probit models, developed by Luce (1959) and Aldrich and Nelson (1974), are two prevalent discrete choice models that assume the unobserved disturbances are Gumbel and normally distributed, respectively. A probit model is intuitively reasonable and supported by its theoretical grounds for assumptions about the normal distribution of the unobserved disturbances. However, its property of not having a closed form for its normally-distributed disturbance results in significant computational costs and impedes its wide application. A logit model can overcome this constraint and provide more convenient, analytical modeling results. Therefore, the multi-nomial logit model is extensively used because of its simplicity and ease of estimation, although its application is limited, to some extent, by the restrictive assumptions of error independence across alternatives. The Independence from Irrelevant Alternatives (IIA) property exhibited by the multi-nomial logit model can be solved partially by the nested multi-nomial logit model or hierarchical logit model (Daly et al. 1978; Heiss 2016; Hensher 2001), which allows error covariance and different competitiveness between pairs of alternatives. Many of the research work directed at reducing or eliminating the effects of the IIA property can be classified into three groups; the first set of models, which include nested (Ben-Akiva and Lerman 1988; Vovsha 1997) and cross-nested logit (Bierlaire 2002; McFadden 2000) models, seeks to account for the similarities in the unobserved utilities relating to the alternatives (inter-alternative correlation). The second group of models looks at the existence of taste heterogeneity among travelers. Models accounting for this phenomenon include the mixed multi-nomial logit (Davidson and Teye 2012; Greene, Hensher,

CONTACT Tao Li

[email protected]

© 2017 Informa UK Limited, trading as Taylor & Francis Group

KEYWORDS

Discrete choice model; travel-mode choice; Markov Logic Network; sensitivity analysis

and Rose 2006) and latent class logit (Greene and Hensher 2003) models. The third group of models seeks to account simultaneously for both the inter-alternative correlation and taste heterogeneity. Models of this type involve mixed GEV (Hess, Bierlaire, and Polak 2005) and Latent Class GEV (Teye-Ali and Davidson 2013) models. GEV models, or Generalized Extreme Value models, are presented in the subsequent section. GEV models are able to capture unobserved similarities among alternatives, thereby relaxing the restrictions of MNL and NL models. However, none of the models can address the IIA issues completely (Teye, Davidson, and Culley 2014). To address this gap, we proposed a Markov Logic Network (MLN)based approach to formulate travel-mode choices. MLN was first proposed by Domingos and Richardson of the University of Washington to deal with the complexity and uncertainty in the field of artificial intelligence (Richardson and Domingos 2006). At present, it is generally accepted that MLN is a kind of method to deal with the complexity and uncertainty of the first-order predicate logic and probabilistic graph model (Domingos 2006; Kok and Domingos 2007). MLN is mainly used in artificial intelligence, machine learning, and other fields (Domingos and Lowd 2009). Over the years, Domingos research team continues to improve the theoretical system of MLN (Lowd and Domingos 2007; Singla and Domingos 2008), and provide a platform for learning and developing the theoretical system (Alchemy).The application prospect of MLN is very broad. Some scholars have proved the practical value of MLN in the aspects of information collection (Poon and Domingos 2007), social network (Singla et al. 2008), geographic information system (Lin 2006) and so on. In the field of decision-making, Nath et al. Proposed a decision theoretic framework based on MLN, and proved that the classical programming theory and the Markov decision process are special forms

2

T. LI ET AL.

(Nath and Domingos 2009). Different from logit and nested logit models, the developed MLN-based approach aims to combine bounded rationality principles with travelers’ behavior in mode choices without strict model assumptions on unobserved disturbances. This approach is established based on logical domain knowledge and probabilistic models. MLN can extract logical domain knowledge and represent the impacts of significant attributes using independent logical formulas that are weighted correspondingly by their relative relationships. In order to comply with the bounded rationality characteristics of travelers, travel-mode choice is modeled based on a satisfaction utility criterion rather than a maximum utility criterion as in the proposed approach. Additionally, the effects of different utility satisfaction criteria are analyzed and the dependency levels of their logical formulas for travel-mode choices are investigated. Further, the model parameter sensitivity is analyzed and the effects of different logical relationships between the variables and travel mode-choice outcomes are discussed as well. The remainder of this paper is organized as follows: Section Research methodology details the definition of MLN, logical domain knowledge extraction, MLN structure establishment, and probability model formulation. Numerical examples and the characteristics of MLN models are discussed in Section Numerical examples and discussions. Section Case study presents an actual survey case about travel-mode choice using MLN model. The conclusion is provided in Section, Conclusions and recommendations.

Research methodology In this study, a MLN-based approach is proposed to overcome the limitations of logit models for travel mode-choice formulation. MNL is a probabilistic logic that combines a Markov network and first-order logic for uncertain inference (Richardson and Domingos 2006). A first-order knowledge base can be generalized with corresponding weights by MLNs, which provides a compact framework to construct very large Markov networks and flexibly incorporates a wide range of domain knowledge into them. Based on first-order logic establishments, MLNs can effectively handle uncertain references based on imperfect and contradictory knowledge. Many statistical analysis techniques and tools, such as collective classification, link prediction, link-based clustering, social network modeling, object identification, are naturally formulated as specific instances of MLN learning and inference. Markov networks A Markov network, also known as Markov random field, is a model of the joint distribution consisting of a set of variables X = (X1 , X1 , … , Xn ) ∈ 𝜒 (Jordan 2003). It is composed of an undirected graph G and a set of potential functions 𝜙k. The graph has a node representing each variable, and there is a potential function for each clique (contains several associated variables) in the graph(suppose there are k cliques). A potential function is a non-negative real-valued function indicating the state of the corresponding clique. The joint distribution represented by a Markov network is given by:

1∏ 𝜙 (x ) (1) k k {k} Z where x{k} is the state of the kth clique (i.e. the state of the variables that appear in clique). Z, known as the normalization factor, is given ∑that∏ by Z = x∈𝜒 k 𝜙k (x{k} ). Markov networks are often represented as log-linear models, with each clique potential replaced by an exponentially weighted sum of the state features, leading to } {∑ 1 wj fj (x) P(X = x) = exp j Z (2) P(X = x) =

where wj is the weight of logical formula, it represent traveler’s trust degree in logical formula. A state feature may be any real-valued function of the state. In this study, binary features are utilized as fj(x) ∊ {0, 1}. In the most direct form of the potential-function in Equation (1), there is one feature corresponding to each possible statex{k} for each clique and its weight is log𝜙k (x{k} ). This representation is exponentially increased in terms of the size of the cliques. However, by specifying a much smaller number of features (e.g. logical functions of the state of the clique), a more compact representation is possible to be constructed in lieu of the potential-function form, especially when large cliques are present. Markov Logic Network formulation A first-order knowledge base (KB) is a set of sentences or formulas in first-order logic. Formulas are constructed using four types of symbols: constants, variables, functions, and predicates. Constant symbols represent objects in the domain of interest (e.g. people: Anna, Bob, Chris). Variable symbols range over the objects in the domain. Function symbols (e.g. MotherOf) represent mappings from tuples of objects to objects. Predicate symbols represent relations among objects in the domain (e.g. Friends) or attributes of objects (e.g. Smokes). An interpretation specifies which objects, functions and relations in the domain are represented by which symbols. Variables and constants may be typed, in which case variables range only over objects of the corresponding type, and constants can only represent objects of the corresponding type. For example, the variable x might range over people (e.g. Anna, Bob), and the constant C might represent a city (e.g. Seattle). The major principle in MLNs is to relax these constraints: when a term violates one formula in the KB, it is less probable, but not impossible. The fewer formulas a term violates, the more probable it is. Each formula has an associated weight that reflects how strong a constraint is: the higher the weight, the greater the difference in log probability between a term that satisfies the formula and one that does not, given the other conditions are equal. Definition 1: A MLN, L, is a set of pairs (Fi, wi), where Fi is a real number. Together with a formula in first-order logic { and wi is a } finite set of constants C = c1 , c2 , … , cn , it defines a Markov network ML,C as follows (Richardson and Domingos 2006): (1) ML,C contains one binary node for each possible grounding of each predicate appearing in L. The value of the node is 1 if the ground atom is true, and 0 otherwise. (2) ML,C contains one feature for each possible grounding of each formula Fi in L. The value of this feature is 1 if the ground formula is true, and 0 otherwise. The weight of the feature is the wi associated with Fi in L. The syntax of the formulas in an MLN is the standard syntax of first-order logic (Genesereth and Nilsson 1987; Pearl 1988). Free, or unquantified, variables are treated as universally quantified at the outer-most level of the formula. An MLN can be viewed as a template for constructing Markov networks. Given different sets of constants, it will formulate different networks. While, these may widely vary in size, all will have certain regularities in structures and parameters, given by the MLN (e.g. all groundings of the same formula will have the same weight). We define each of these networks as a ground Markov network to distinguish it from the first-order MLN. From Definition 1 and Equations (1) and (2), the probability distribution over possible terms, x, specified by the ground Markov network ML,C is given as follows: } {∑ 1∏ 1 wi ni (x) = 𝜙 (x )ni (x) (3) P(X = x) = exp i i i {i} Z Z

TRANSPORTATION LETTERS

where ni (x) is the number of true groundings of Fi in x, x{i} is the state (truth values) of the atoms appearing in Fi, and 𝜙(x{i} ) = ewi. Notice that although we defined MLNs as log-linear models, they could be defined as products of potential functions, as the second equality. This will be the most convenient approach in domains with a mixture of hard and soft constraints, where some formulas hold with certainty, leading to zero probabilities for some terms. Modified implication operator in first-order logic In logic, statement X→Y means: if X(X = 1), then Y(Y = 1). f(x) represents the truth value of statement X→Y, where x represents the distribution of variables(X, Y). If the statement is true, f(x) = 1, otherwise f(x) = 0. According to the definition of truth value in logic, the statement is true (f(x) = 1), unless X = 0 and Y = 1(as states 4). However, we can’t judge Y’s value if X = 0(state 3 and state 4) in reality. In other words, if X = 0, then Y may be established(Y = 1) or not(Y = 0). Therefore, the reliability of states 3 or 4 is less than that of state 1. Then we set a function g(x) to highlight the reliability of state 1. In the consciousness of travelers, there are numbers of statements, such as X→Y shows as Table 7. Different traveler has different belief of each statement, So we use weights wi to characterize the belief of travelers to statements. In this study, we assume that travelers make the final decision by matching the current state of X&Y with their KB. In other words, the decision should make the current state of X&Y and travelers’ first-order knowledge base match the highest degree. This is the difference between the proposed model and Ordinary logit model. The truth table of implication operator in first-order logic is shown in Table 1. X→Y and f(x) are the truth values of the implication operator. It is easy to see that the truth values of States ①, ③, and ④ are equal to each other in first-order logic. However, the credibility of States ③ or ④ is less than that of State ① based on the experience in reality. Therefore, another formula g(x) is added to distinguish the credibility between States ③, ④, and ①. The truth value of the implication operator is represented by f(x) + g(x). The probability distribution of possible term, x, using modified implication operator is as follows: ]} {∑ [ 1 wj fj (x) + gj (x) P(X = x) = exp (4) j Z where gj(x) is used to distinguish credibility. The truth value of modified implication operator ‘X→Y’ in State ① is 2, while it is 1 in States ③ and ④. The probability of possible State ① is larger than that of any other state, while the probability of State ② is the smallest.

Numerical examples and discussions

Table 1. Logic relationship of modified implication operators. State ① ② ③ ④

X 1 1 0 0

Y 1 0 1 0

f(x) 1 0 1 1

g(x) 1 0 0 0

Formula symbol f1(x) + g1(x) f2(x) + g2(x)

Logical formula A ∩ ┐B → C ┐A ∩ B → ┐C

Weight w1 w2

Table 3. Probability distribution of different travel-mode choice state. The state of A, B and C (1,1,1) (1,1,0) (1,0,1) (1,0,0) (0,1,1) (0,1,0) (0,0,1) (0,0,0)

Probability 1/Z·exp(w1+w2) 1/Z·exp(w1+w2) 1/Z·exp(2w1+w2) 1/Z·exp(w2) 1/Z·exp(w1) 1/Z·exp(w1+2w2) 1/Z·exp(w1+w2) 1/Z·exp(w1+w2)

There are two logical formulas(KB) among the three-closed atoms shown in Table 2. The KB of travelers which similar to X→Y in Section Modified implication operator in first-order logic is just a little complicated. Table 7 shows more complicated KB. In Section Numerical examples and discussions, we calculate the statement’s truth value of distribution of A, B, and C using fi(x) + gi(x), and then calculate the probability using formula 4 and weight wi. where f1(x) + g1(x) represents the scenario where a traveler chooses car when he/she satisfies with the utility of this mode and dissatisfies with the utility of transit, as shown in formula A ∩ ┐B → C. Similarly, f2(x) + g2(x) signifies the traveler chooses bus when he/she satisfies with the utility of buses and dissatisfies with the utility of cars, as shown in formula ┐A ∩ B → ┐C where wi is the weight of the formula. Its value is positive and represents the degree of satisfaction of decision makers using the logical formula. If the state of A, B, and C is known, the truth value of each logical formula can be calculated, and then the probability of the occurrence of A, B, and C can be calculated according to formula (4). U10 and U20 are used to represent the utility thresholds of the travel modes, car and transit bus, respectively. The two-closed atoms A and B can be described as: if Ucar = V1 + ɛ1 > U10, A = 1, and if Ucar U20, B = 1, and if Ubus U10 ) (5) P(B = 1) = Pr ob(Ubus > U20 ) According to the total probability calculation equation, we can obtain the probability of choosing cars as the travel mode: ∑ ∑ P(C = 1) = P(C = 1|A, B) ⋅ P(A) ⋅ P(B) (6) A=1,0 B=1,0

4

T. LI ET AL.

Result analysis We can obtain the probability of selecting a car as the travel mode by computing the satisfactory utilities of car and bus as shown in Table 4. As can be seen in Table 4, the probability of choosing a car is larger than choosing a bus when the traveler satisfies with the utility of the car and dissatisfies with the utility of the bus. Otherwise, the probability of choosing a bus is higher. The probability value is influenced by the weights of the formulas. The complete calculation results are shown in Table 5. If w1 = w2 = 0, the probabilities of choosing a car or bus are equal under four different states. The results indicate that travelers will question the two logical formulas when the weights of the formulas are 0, and the decision-making processes cannot follow any rules that are completely randomly selected. If w1 = w2 → ∞, travelers will entirely rely on the two logical formulas. Travelers will choose a car if they satisfy with the utility of car choice and dissatisfy with bus choice. They will choose a bus if they satisfy with the utility of bus choice and dissatisfy with car choice. If travelers only know the utility distribution between two modes (utilities of car and bus choices are mutually independent and follow the Gumbel distribution with the same parameters), and w1 = w2 = w > 0, then the probability of choosing the car can be described as: 1 1 −(U10 −V1 ) 1 −e−(U20 −V2 ) P(C = 1) = − e−e − e 2 2 2 (7) ew e3w −e−(U20 −V2 ) −e−(U10 −V1 ) ⋅ e ⋅ e + + 3w e + ew e3w + ew Equation (7) shows that the selection probability is related to ΔU1 = U10 − V1 and 𝛥U2 = U20 − V2. If ΔU1 = ΔU2, then P(C = 1) = 1/2, where ΔU1 is the utility threshold (e.g. the satisfaction threshold) of car choice (U10) minus the expected value of the utility distribution (V1). If ΔU1of a car is equal to ΔU2 of a bus, then the selection probabilities of these two modes are equal. Partial differentials of the utility threshold of these two modes can be calculated as: ( ) −(U10 −V1 ) 𝜕P 1 ew = ⋅ e−(U10 −V1 ) ⋅ e−e 0 = 𝜕U20 e3w + ew 2 With the increased utility threshold of U10, the probability of choosing a car decreases. As the utility threshold of U20 increases, the probability

Utility of bus choice

Probability of choosing the car

Satisfied

Dissatisfied

P(C = 1|A = 1, B = 0 ) =

Satisfied

Satisfied

P(C = 1|A = 1, B = 1 ) = 0.5

Dissatisfied

Dissatisfied

P(C = 1|A = 0, B = 0 ) = 0.5

Dissatisfied

Satisfied

P(C = 1|A = 0, B = 1 ) =

e2w1 + w2 e2w1 +w2 + ew2

e w1 ew1 + ew1 + 2w2

Case study How can the model be used for predicting the share of each mode in a city? First of all, a travel survey, Revealed Preference (RP) and Stated Preference (SP), should be implemented such as the questionnaire survey in Section “Case study” of this paper. Second, Influence factors and Logical formulas should be determined according to the results of data analysis as shown in Table 7. Then, maximum-likelihood estimation method is used to calibrate the weights of each logical formula according to the results of survey, shown in Table 9. In the application state, the future value of influence factors are added into the calibrated model for predicting the share of each mode in a city.

Table 4. Conditional probability of travel-mode choice. Utility of car choice

increases. This implies that when the satisfactory standard of a car increases or the satisfactory standard of a bus decreases, travelers are more likely to choose bus. [ ] −𝛥U2 −𝛥U1 If w1 = w2 = + ∞, P(C = 1) = 21 1 + e−e − e−e , the three- dimensional visualization of the selection probability and ΔU1, ΔU2 is shown in Figure 1. Figure 1 shows that the selection probability is negatively correlated with ΔU1 and positively correlated with ΔU2. When ΔU1 and ΔU2 are indifferent from 0, the most obvious change of the selection probability occurs. When the absolute values of ΔU1 and ΔU2 are bigger than 5, the selection probability can’t be affected by them. That means that travelers have absolute trust of the two logical formulas in Table 2. The final selection results are randomly distributed because of the utility perception error of travelers. And the difference between perception error and satisfactory threshold affect the final travel mode-choice probability. The farther the distance between the perception utility expectation value and the satisfactory threshold of a travel mode is, the greater the likelihood of this travel mode of being chosen. If w1 = w2 > 0, the value range of the selection probability will change. The relationship between the minimum and maximum values of the selection probabilities and the weight (w) is shown in Figure 2. When the weights of logical formulas in Table 2 are equal and lower (w1 = w2

ISSN: 1942-7867 (Print) 1942-7875 (Online) Journal homepage: http://www.tandfonline.com/loi/ytrl20

Modeling travel mode choice behavior with bounded rationality using Markov Logic Networks Tao Li, Hongzhi Guan, Jiaqi Ma, Guohui Zhang & Keke Liang To cite this article: Tao Li, Hongzhi Guan, Jiaqi Ma, Guohui Zhang & Keke Liang (2017): Modeling travel mode choice behavior with bounded rationality using Markov Logic Networks, Transportation Letters, DOI: 10.1080/19427867.2017.1342945 To link to this article: http://dx.doi.org/10.1080/19427867.2017.1342945

Published online: 29 Jun 2017.

Submit your article to this journal

View related articles

View Crossmark data

Full Terms & Conditions of access and use can be found at http://www.tandfonline.com/action/journalInformation?journalCode=ytrl20 Download by: [174.205.2.203]

Date: 29 June 2017, At: 08:09

Transportation Letters, 2017 https://doi.org/10.1080/19427867.2017.1342945

Modeling travel mode choice behavior with bounded rationality using Markov Logic Networks Tao Lia, Hongzhi Guana, Jiaqi Mab, Guohui Zhangc and Keke Lianga a

Transportation Research Center, Beijing University of Technology, Beijing, China; bTransportation Solutions and Technology Applications Division, Leidos, Inc., McLean, VA, USA; cDepartment of Civil Engineering, University of New Mexico, Albuquerque, NM, USA

ABSTRACT

Disaggregate choice models have been widely studied to quantify the influence of the characteristics of travelers as well as the attributes of alternatives and choices in their travel modes. However, due to their model specifications and primary assumptions on unobserved disturbances, their modeling capability is constrained. In this study, a Markov Logic Network (MLN)-based approach is developed to combine bounded rationality principles with travelers’ behavior in travel mode choices. This approach is established based on logical domain knowledge and probabilistic models. MLN can extract logical domain knowledge and represent the impacts of significant attributes using independent logical formulas that are weighted correspondingly by their relative relationships. Travel-mode choice is determined based on travelers’ personal preferences and logical domain knowledge. The numerical examples and parameter sensitivity analyses indicate this approach performs reasonably well. The research findings are helpful to better understand travel mode-choice model specifications and travel behavior interpretations.

Introduction Disaggregate choice models have been widely investigated and applied in various research areas due to their flexibility in quantifying the influence of the characteristics of decision-makers and the attributes of alternatives and choices. Logit and probit models, developed by Luce (1959) and Aldrich and Nelson (1974), are two prevalent discrete choice models that assume the unobserved disturbances are Gumbel and normally distributed, respectively. A probit model is intuitively reasonable and supported by its theoretical grounds for assumptions about the normal distribution of the unobserved disturbances. However, its property of not having a closed form for its normally-distributed disturbance results in significant computational costs and impedes its wide application. A logit model can overcome this constraint and provide more convenient, analytical modeling results. Therefore, the multi-nomial logit model is extensively used because of its simplicity and ease of estimation, although its application is limited, to some extent, by the restrictive assumptions of error independence across alternatives. The Independence from Irrelevant Alternatives (IIA) property exhibited by the multi-nomial logit model can be solved partially by the nested multi-nomial logit model or hierarchical logit model (Daly et al. 1978; Heiss 2016; Hensher 2001), which allows error covariance and different competitiveness between pairs of alternatives. Many of the research work directed at reducing or eliminating the effects of the IIA property can be classified into three groups; the first set of models, which include nested (Ben-Akiva and Lerman 1988; Vovsha 1997) and cross-nested logit (Bierlaire 2002; McFadden 2000) models, seeks to account for the similarities in the unobserved utilities relating to the alternatives (inter-alternative correlation). The second group of models looks at the existence of taste heterogeneity among travelers. Models accounting for this phenomenon include the mixed multi-nomial logit (Davidson and Teye 2012; Greene, Hensher,

CONTACT Tao Li

[email protected]

© 2017 Informa UK Limited, trading as Taylor & Francis Group

KEYWORDS

Discrete choice model; travel-mode choice; Markov Logic Network; sensitivity analysis

and Rose 2006) and latent class logit (Greene and Hensher 2003) models. The third group of models seeks to account simultaneously for both the inter-alternative correlation and taste heterogeneity. Models of this type involve mixed GEV (Hess, Bierlaire, and Polak 2005) and Latent Class GEV (Teye-Ali and Davidson 2013) models. GEV models, or Generalized Extreme Value models, are presented in the subsequent section. GEV models are able to capture unobserved similarities among alternatives, thereby relaxing the restrictions of MNL and NL models. However, none of the models can address the IIA issues completely (Teye, Davidson, and Culley 2014). To address this gap, we proposed a Markov Logic Network (MLN)based approach to formulate travel-mode choices. MLN was first proposed by Domingos and Richardson of the University of Washington to deal with the complexity and uncertainty in the field of artificial intelligence (Richardson and Domingos 2006). At present, it is generally accepted that MLN is a kind of method to deal with the complexity and uncertainty of the first-order predicate logic and probabilistic graph model (Domingos 2006; Kok and Domingos 2007). MLN is mainly used in artificial intelligence, machine learning, and other fields (Domingos and Lowd 2009). Over the years, Domingos research team continues to improve the theoretical system of MLN (Lowd and Domingos 2007; Singla and Domingos 2008), and provide a platform for learning and developing the theoretical system (Alchemy).The application prospect of MLN is very broad. Some scholars have proved the practical value of MLN in the aspects of information collection (Poon and Domingos 2007), social network (Singla et al. 2008), geographic information system (Lin 2006) and so on. In the field of decision-making, Nath et al. Proposed a decision theoretic framework based on MLN, and proved that the classical programming theory and the Markov decision process are special forms

2

T. LI ET AL.

(Nath and Domingos 2009). Different from logit and nested logit models, the developed MLN-based approach aims to combine bounded rationality principles with travelers’ behavior in mode choices without strict model assumptions on unobserved disturbances. This approach is established based on logical domain knowledge and probabilistic models. MLN can extract logical domain knowledge and represent the impacts of significant attributes using independent logical formulas that are weighted correspondingly by their relative relationships. In order to comply with the bounded rationality characteristics of travelers, travel-mode choice is modeled based on a satisfaction utility criterion rather than a maximum utility criterion as in the proposed approach. Additionally, the effects of different utility satisfaction criteria are analyzed and the dependency levels of their logical formulas for travel-mode choices are investigated. Further, the model parameter sensitivity is analyzed and the effects of different logical relationships between the variables and travel mode-choice outcomes are discussed as well. The remainder of this paper is organized as follows: Section Research methodology details the definition of MLN, logical domain knowledge extraction, MLN structure establishment, and probability model formulation. Numerical examples and the characteristics of MLN models are discussed in Section Numerical examples and discussions. Section Case study presents an actual survey case about travel-mode choice using MLN model. The conclusion is provided in Section, Conclusions and recommendations.

Research methodology In this study, a MLN-based approach is proposed to overcome the limitations of logit models for travel mode-choice formulation. MNL is a probabilistic logic that combines a Markov network and first-order logic for uncertain inference (Richardson and Domingos 2006). A first-order knowledge base can be generalized with corresponding weights by MLNs, which provides a compact framework to construct very large Markov networks and flexibly incorporates a wide range of domain knowledge into them. Based on first-order logic establishments, MLNs can effectively handle uncertain references based on imperfect and contradictory knowledge. Many statistical analysis techniques and tools, such as collective classification, link prediction, link-based clustering, social network modeling, object identification, are naturally formulated as specific instances of MLN learning and inference. Markov networks A Markov network, also known as Markov random field, is a model of the joint distribution consisting of a set of variables X = (X1 , X1 , … , Xn ) ∈ 𝜒 (Jordan 2003). It is composed of an undirected graph G and a set of potential functions 𝜙k. The graph has a node representing each variable, and there is a potential function for each clique (contains several associated variables) in the graph(suppose there are k cliques). A potential function is a non-negative real-valued function indicating the state of the corresponding clique. The joint distribution represented by a Markov network is given by:

1∏ 𝜙 (x ) (1) k k {k} Z where x{k} is the state of the kth clique (i.e. the state of the variables that appear in clique). Z, known as the normalization factor, is given ∑that∏ by Z = x∈𝜒 k 𝜙k (x{k} ). Markov networks are often represented as log-linear models, with each clique potential replaced by an exponentially weighted sum of the state features, leading to } {∑ 1 wj fj (x) P(X = x) = exp j Z (2) P(X = x) =

where wj is the weight of logical formula, it represent traveler’s trust degree in logical formula. A state feature may be any real-valued function of the state. In this study, binary features are utilized as fj(x) ∊ {0, 1}. In the most direct form of the potential-function in Equation (1), there is one feature corresponding to each possible statex{k} for each clique and its weight is log𝜙k (x{k} ). This representation is exponentially increased in terms of the size of the cliques. However, by specifying a much smaller number of features (e.g. logical functions of the state of the clique), a more compact representation is possible to be constructed in lieu of the potential-function form, especially when large cliques are present. Markov Logic Network formulation A first-order knowledge base (KB) is a set of sentences or formulas in first-order logic. Formulas are constructed using four types of symbols: constants, variables, functions, and predicates. Constant symbols represent objects in the domain of interest (e.g. people: Anna, Bob, Chris). Variable symbols range over the objects in the domain. Function symbols (e.g. MotherOf) represent mappings from tuples of objects to objects. Predicate symbols represent relations among objects in the domain (e.g. Friends) or attributes of objects (e.g. Smokes). An interpretation specifies which objects, functions and relations in the domain are represented by which symbols. Variables and constants may be typed, in which case variables range only over objects of the corresponding type, and constants can only represent objects of the corresponding type. For example, the variable x might range over people (e.g. Anna, Bob), and the constant C might represent a city (e.g. Seattle). The major principle in MLNs is to relax these constraints: when a term violates one formula in the KB, it is less probable, but not impossible. The fewer formulas a term violates, the more probable it is. Each formula has an associated weight that reflects how strong a constraint is: the higher the weight, the greater the difference in log probability between a term that satisfies the formula and one that does not, given the other conditions are equal. Definition 1: A MLN, L, is a set of pairs (Fi, wi), where Fi is a real number. Together with a formula in first-order logic { and wi is a } finite set of constants C = c1 , c2 , … , cn , it defines a Markov network ML,C as follows (Richardson and Domingos 2006): (1) ML,C contains one binary node for each possible grounding of each predicate appearing in L. The value of the node is 1 if the ground atom is true, and 0 otherwise. (2) ML,C contains one feature for each possible grounding of each formula Fi in L. The value of this feature is 1 if the ground formula is true, and 0 otherwise. The weight of the feature is the wi associated with Fi in L. The syntax of the formulas in an MLN is the standard syntax of first-order logic (Genesereth and Nilsson 1987; Pearl 1988). Free, or unquantified, variables are treated as universally quantified at the outer-most level of the formula. An MLN can be viewed as a template for constructing Markov networks. Given different sets of constants, it will formulate different networks. While, these may widely vary in size, all will have certain regularities in structures and parameters, given by the MLN (e.g. all groundings of the same formula will have the same weight). We define each of these networks as a ground Markov network to distinguish it from the first-order MLN. From Definition 1 and Equations (1) and (2), the probability distribution over possible terms, x, specified by the ground Markov network ML,C is given as follows: } {∑ 1∏ 1 wi ni (x) = 𝜙 (x )ni (x) (3) P(X = x) = exp i i i {i} Z Z

TRANSPORTATION LETTERS

where ni (x) is the number of true groundings of Fi in x, x{i} is the state (truth values) of the atoms appearing in Fi, and 𝜙(x{i} ) = ewi. Notice that although we defined MLNs as log-linear models, they could be defined as products of potential functions, as the second equality. This will be the most convenient approach in domains with a mixture of hard and soft constraints, where some formulas hold with certainty, leading to zero probabilities for some terms. Modified implication operator in first-order logic In logic, statement X→Y means: if X(X = 1), then Y(Y = 1). f(x) represents the truth value of statement X→Y, where x represents the distribution of variables(X, Y). If the statement is true, f(x) = 1, otherwise f(x) = 0. According to the definition of truth value in logic, the statement is true (f(x) = 1), unless X = 0 and Y = 1(as states 4). However, we can’t judge Y’s value if X = 0(state 3 and state 4) in reality. In other words, if X = 0, then Y may be established(Y = 1) or not(Y = 0). Therefore, the reliability of states 3 or 4 is less than that of state 1. Then we set a function g(x) to highlight the reliability of state 1. In the consciousness of travelers, there are numbers of statements, such as X→Y shows as Table 7. Different traveler has different belief of each statement, So we use weights wi to characterize the belief of travelers to statements. In this study, we assume that travelers make the final decision by matching the current state of X&Y with their KB. In other words, the decision should make the current state of X&Y and travelers’ first-order knowledge base match the highest degree. This is the difference between the proposed model and Ordinary logit model. The truth table of implication operator in first-order logic is shown in Table 1. X→Y and f(x) are the truth values of the implication operator. It is easy to see that the truth values of States ①, ③, and ④ are equal to each other in first-order logic. However, the credibility of States ③ or ④ is less than that of State ① based on the experience in reality. Therefore, another formula g(x) is added to distinguish the credibility between States ③, ④, and ①. The truth value of the implication operator is represented by f(x) + g(x). The probability distribution of possible term, x, using modified implication operator is as follows: ]} {∑ [ 1 wj fj (x) + gj (x) P(X = x) = exp (4) j Z where gj(x) is used to distinguish credibility. The truth value of modified implication operator ‘X→Y’ in State ① is 2, while it is 1 in States ③ and ④. The probability of possible State ① is larger than that of any other state, while the probability of State ② is the smallest.

Numerical examples and discussions

Table 1. Logic relationship of modified implication operators. State ① ② ③ ④

X 1 1 0 0

Y 1 0 1 0

f(x) 1 0 1 1

g(x) 1 0 0 0

Formula symbol f1(x) + g1(x) f2(x) + g2(x)

Logical formula A ∩ ┐B → C ┐A ∩ B → ┐C

Weight w1 w2

Table 3. Probability distribution of different travel-mode choice state. The state of A, B and C (1,1,1) (1,1,0) (1,0,1) (1,0,0) (0,1,1) (0,1,0) (0,0,1) (0,0,0)

Probability 1/Z·exp(w1+w2) 1/Z·exp(w1+w2) 1/Z·exp(2w1+w2) 1/Z·exp(w2) 1/Z·exp(w1) 1/Z·exp(w1+2w2) 1/Z·exp(w1+w2) 1/Z·exp(w1+w2)

There are two logical formulas(KB) among the three-closed atoms shown in Table 2. The KB of travelers which similar to X→Y in Section Modified implication operator in first-order logic is just a little complicated. Table 7 shows more complicated KB. In Section Numerical examples and discussions, we calculate the statement’s truth value of distribution of A, B, and C using fi(x) + gi(x), and then calculate the probability using formula 4 and weight wi. where f1(x) + g1(x) represents the scenario where a traveler chooses car when he/she satisfies with the utility of this mode and dissatisfies with the utility of transit, as shown in formula A ∩ ┐B → C. Similarly, f2(x) + g2(x) signifies the traveler chooses bus when he/she satisfies with the utility of buses and dissatisfies with the utility of cars, as shown in formula ┐A ∩ B → ┐C where wi is the weight of the formula. Its value is positive and represents the degree of satisfaction of decision makers using the logical formula. If the state of A, B, and C is known, the truth value of each logical formula can be calculated, and then the probability of the occurrence of A, B, and C can be calculated according to formula (4). U10 and U20 are used to represent the utility thresholds of the travel modes, car and transit bus, respectively. The two-closed atoms A and B can be described as: if Ucar = V1 + ɛ1 > U10, A = 1, and if Ucar U20, B = 1, and if Ubus U10 ) (5) P(B = 1) = Pr ob(Ubus > U20 ) According to the total probability calculation equation, we can obtain the probability of choosing cars as the travel mode: ∑ ∑ P(C = 1) = P(C = 1|A, B) ⋅ P(A) ⋅ P(B) (6) A=1,0 B=1,0

4

T. LI ET AL.

Result analysis We can obtain the probability of selecting a car as the travel mode by computing the satisfactory utilities of car and bus as shown in Table 4. As can be seen in Table 4, the probability of choosing a car is larger than choosing a bus when the traveler satisfies with the utility of the car and dissatisfies with the utility of the bus. Otherwise, the probability of choosing a bus is higher. The probability value is influenced by the weights of the formulas. The complete calculation results are shown in Table 5. If w1 = w2 = 0, the probabilities of choosing a car or bus are equal under four different states. The results indicate that travelers will question the two logical formulas when the weights of the formulas are 0, and the decision-making processes cannot follow any rules that are completely randomly selected. If w1 = w2 → ∞, travelers will entirely rely on the two logical formulas. Travelers will choose a car if they satisfy with the utility of car choice and dissatisfy with bus choice. They will choose a bus if they satisfy with the utility of bus choice and dissatisfy with car choice. If travelers only know the utility distribution between two modes (utilities of car and bus choices are mutually independent and follow the Gumbel distribution with the same parameters), and w1 = w2 = w > 0, then the probability of choosing the car can be described as: 1 1 −(U10 −V1 ) 1 −e−(U20 −V2 ) P(C = 1) = − e−e − e 2 2 2 (7) ew e3w −e−(U20 −V2 ) −e−(U10 −V1 ) ⋅ e ⋅ e + + 3w e + ew e3w + ew Equation (7) shows that the selection probability is related to ΔU1 = U10 − V1 and 𝛥U2 = U20 − V2. If ΔU1 = ΔU2, then P(C = 1) = 1/2, where ΔU1 is the utility threshold (e.g. the satisfaction threshold) of car choice (U10) minus the expected value of the utility distribution (V1). If ΔU1of a car is equal to ΔU2 of a bus, then the selection probabilities of these two modes are equal. Partial differentials of the utility threshold of these two modes can be calculated as: ( ) −(U10 −V1 ) 𝜕P 1 ew = ⋅ e−(U10 −V1 ) ⋅ e−e 0 = 𝜕U20 e3w + ew 2 With the increased utility threshold of U10, the probability of choosing a car decreases. As the utility threshold of U20 increases, the probability

Utility of bus choice

Probability of choosing the car

Satisfied

Dissatisfied

P(C = 1|A = 1, B = 0 ) =

Satisfied

Satisfied

P(C = 1|A = 1, B = 1 ) = 0.5

Dissatisfied

Dissatisfied

P(C = 1|A = 0, B = 0 ) = 0.5

Dissatisfied

Satisfied

P(C = 1|A = 0, B = 1 ) =

e2w1 + w2 e2w1 +w2 + ew2

e w1 ew1 + ew1 + 2w2

Case study How can the model be used for predicting the share of each mode in a city? First of all, a travel survey, Revealed Preference (RP) and Stated Preference (SP), should be implemented such as the questionnaire survey in Section “Case study” of this paper. Second, Influence factors and Logical formulas should be determined according to the results of data analysis as shown in Table 7. Then, maximum-likelihood estimation method is used to calibrate the weights of each logical formula according to the results of survey, shown in Table 9. In the application state, the future value of influence factors are added into the calibrated model for predicting the share of each mode in a city.

Table 4. Conditional probability of travel-mode choice. Utility of car choice

increases. This implies that when the satisfactory standard of a car increases or the satisfactory standard of a bus decreases, travelers are more likely to choose bus. [ ] −𝛥U2 −𝛥U1 If w1 = w2 = + ∞, P(C = 1) = 21 1 + e−e − e−e , the three- dimensional visualization of the selection probability and ΔU1, ΔU2 is shown in Figure 1. Figure 1 shows that the selection probability is negatively correlated with ΔU1 and positively correlated with ΔU2. When ΔU1 and ΔU2 are indifferent from 0, the most obvious change of the selection probability occurs. When the absolute values of ΔU1 and ΔU2 are bigger than 5, the selection probability can’t be affected by them. That means that travelers have absolute trust of the two logical formulas in Table 2. The final selection results are randomly distributed because of the utility perception error of travelers. And the difference between perception error and satisfactory threshold affect the final travel mode-choice probability. The farther the distance between the perception utility expectation value and the satisfactory threshold of a travel mode is, the greater the likelihood of this travel mode of being chosen. If w1 = w2 > 0, the value range of the selection probability will change. The relationship between the minimum and maximum values of the selection probabilities and the weight (w) is shown in Figure 2. When the weights of logical formulas in Table 2 are equal and lower (w1 = w2