Finding Decision Rules with Genetic Algorithms - Semantic Scholar

Finding Decision Rules with Genetic Algorithms

file:///Users/samoore/Documents/Teaching/phd2004/f...

Finding Decision Rules with Genetic Algorithms by Jim Oliver Description: Making decisions based on a broad range of choices has never been so easy now that we have genetic algorithms Many decisions involve conflicting objectives that cannot be optimized simultaneously. For example, in choosing an apartment, none of the available options typically has the best commute, is in the nicest neighborhood, and rents for the lowest price; in making a decision of this sort, we examine tradeoffs. How these tradeoffs are made depends on the decision process and preferences of the decision maker. A broad spectrum of previous research methods is used in studying preferences and decisions, ranging from approaches with formal foundations, such as conjoint methodology, to the purely behavioral using protocol analyses. In this article, the beginning point is a set of choice data (a report of choices made by an individual) or a preference ranking (a list of alternatives ranked in order of preference). Using results of other research, we will place varying constraints on the form that preferences or decision making may take. We will show that the genetic algorithm (GA) successfully discovers decision-process models that reproduce the choices. First, we will review two types of decision models. Compensatory rules explicitly model how good features of an alternative can compensate for bad features. The second approach, termed noncompensatory, recognizes that multiple attributes create complexity; individuals try to simplify the decision situation using a variety of heuristic techniques. Compensatory rules assume that a decision maker, when evaluating alternatives, trades off one attribute against another in a way that good characteristics can compensate for bad characteristics. Multiattribute value techniques are a way to quantify the importance of competing objectives that are then aggregated, yielding a single overall numerical valuation of each alternative. That is, the value of each option available to the decision maker is a function of its associated set of attributes, such as cost, weight, repair frequency, or lives saved. The simplest functional form assumes each attribute has an (additively) independent value, which is weighted and summed to get the overall value of the alternative. This approach is attractive because, given a data set of an individual's preference ranking of alternatives, these models fit easily with standard regression methods.

1 of 12

9/12/05 5:26 PM



Noncompensatory rules choose an alternative based on a subset of the attributes, regardless of the values of the other attributes. For example, a house buyer might consider a one-way commute of more than two hours unacceptable and reject such houses, regardless of price and other attributes. To investigate this type of cognitive processing, several research methods have been used, including verbal protocols, eye movements, and mouselab experiments. E. Johnson and J. Payne, and E. Johnson, R. Meyer, and S Ghose describe several noncompensatory models: Elimination by aspects (EBA) A Tversky model in which attributes are screened in descending order of importance. Choices that do not meet a threshold value are eliminated. For example, a house buyer may first eliminate all houses outside a certain geographic area, then eliminate those outside a certain price range, and so on. Lexicographic Pick the alternative that is best on the most important dimension, regardless of the other attribute values. An example is the student traveler who selects an airline ticket solely on price, despite a 2:00 a.m. departure and three plane changes. Conjunctive Alternatives are eliminated that do not pass a combination of thresholds. This model is also called satisficing. Phased EBA/compensatory If applying EBA results in a choice set of two or more alternatives, then apply a compensatory rule to those alternatives remaining.

Decision operators The research methods reported (verbal protocols, eye movements, and mouselab) involve extensive analysis by researchers. We will depart from these approaches and show how a GA can be used to fit a process decision rule to choice data. Roughly, the idea is to begin with a superset of "elementary decision operators," which can be combined to form empirically valid choice rules. The GA searches to combine the operators in a manner that best fits the choice data. The operators are derived from common decision heuristics. For example, a decision maker can look across alternatives or look across attributes; values can be compared to each other, maximum and minimum values may be noted, alternatives may be eliminated, and so on until a decision is reached.

GA coding and results We will present the methodology and results for using a GA to discover sequential decision rules for various data sets. We will begin with a simple example to illustrate a type of data set common in marketing and show how the

2 of 12

9/12/05 5:26 PM



GA is encoded. As expected, the GA quickly learns the decision rule, which is a simple Boolean concept. Little marketing research has been directed toward fitting noncompensatory choice rules to a data set. One group that has is I. Currim, R. Meyer, and N. Le, who use the machine-learning system CLS to discover decision trees that fit a set of choice data. They illustrate the workings of CLS with the data set for hypothetical apartments shown in Table 1. Table 1: The workings of CLS with the Apartment Rent ($) Distance (miles) 1 450 1.5 2 500 1.5 3 600 1.5 4 450 2 5 500 2 6 600 2 7 450 3 8 500 3 9 600 3

data set Acceptable? yes yes no yes yes no no no no

We will assume, as did Currim, Meyer, and Le, that acceptability means a certain combination of thresholds are met. Thus, we will consider a conjunctive rule composed of the elementary operators greater than or equal to a threshold and less than or equal to a threshold. In fact, inspection of Table 1 shows that the decision rule is rather simple: If (rent ≤ 500) and (distance ≤ 2 miles), then accept. In related research, D. Greene and S. Smith used simulation to compare the performance of a GA, CLS, and logit model. The items in the choice set had binary-valued attributes. The GA performed well, discovering disjunctive normal form rules even in noisy environments. These decision rules are encoded easily into a binary structure. The simplicity of this illustrative problem requires only a six-bit chromosome: Bits 1 and 4 Value: 0 Value: 1 2 and 3 Value: 00

3 of 12

Meaning Operator ≥ ≤ Rent value No value

9/12/05 5:26 PM


Bits Value: 01 Value: 10 Value: 11 5 and 6 Value: 00 Value: 01 Value: 10 Value: 11


Meaning 450/month 500/month 600/month Miles value No value 1.5 miles 2 miles 3 miles

Each rule represented by a chromosome is assumed to be a rule for acceptability. As an example, the chromosome 110101 decodes to If (rent ≤ 500) and (distance ≤ 1.5), then accept. The objective function rewarded a rule with the number of correctly predicted acceptable apartments. For this simple problem, the GA easily discovered the correct conjunctive rule. The simple apartment problem is for illustration. Now, we will fit a more sophisticated noncompensatory model, EBA, to a more complex data set, which was designed to illustrate the use of compensatory conjoint models. The data set that will be used comes from P. Green and Y. Wind. It is the preference ranking for hypothetical carpet cleaners that have five attributes. Two of these attributes, the seal of approval and money-back guarantee, are binary; a sample product either has the attribute or does not. The remaining three attributes are ternary; the product has one of three names, one of three package designs, and one of three prices. Table 2 shows the ranking for 18 potential products. Table 2: The ranking for 18 potential products. Package Brand Seal of No. Price design name approval 1 A K2R $1.19 No 2 A Glory $1.39 No 3 A Bissell $1.59 Yes 4 B K2R $1.39 Yes 5 B Glory $1.59 No 6 B Bissell $1.19 No 7 C K2R $1.59 No 8 C Glory $1.19 Yes

4 of 12

Money-back guarantee No Yes No Yes No No Yes No

Evaluation 13 11 17 2 14 3 12 7

9/12/05 5:26 PM



Package Brand Seal of Money-back Price design name approval guarantee 9 C Bissell $1.39 No No 10 A K2R $1.59 Yes No 11 A Glory $1.19 No Yes 12 A Bissell $1.39 No No 13 B K2R $1.19 No No 14 B Glory $1.39 Yes No 15 B Bissell $1.59 No Yes 16 C K2R $1.39 No No 17 C Glory $1.59 No No 18 C Bissell $1.19 Yes Yes Note: Evaluation = respondent's preference ranking. No.

Evaluation 9 18 8 15 4 6 5 10 16 1

We will assume that the decision maker is following a slightly generalized EBA strategy. When decision makers are faced with two alternatives, they first look at the most important attribute. Either the alternative that is best on this attribute is selected, or the alternative that does not meet a threshold is eliminated. If a tie occurs, or if both alternatives meet the threshold, then the next most important attribute is considered. To code this problem, a maximum of five steps is allowed and the form shown in Listing 1 is used. Listing 1: A maximum of five steps is allowed. rule step 1 rule step 5 -------------------------------op attr value ... op attr value ---- ---- ----- ... ---- ---- ----123 456 789 ... 37 ...... 45 position op: operator (positions 1-3; 10-12; ... 37-39) 000 = l.t. or equal to threshold 001 = g.t. or equal to threshold 010 = maximum value 011 = minimum value 100 = middle value (101, 110, and 111 are not used) attr: 001 = 010 = 011 = 100 = 101 = (000,

5 of 12

attribute (positions 4-6; 13-15; ... 40-42) package design brand name price approval seal guarantee 110, and 111 are not used)

9/12/05 5:26 PM



value: attribute value code (positions 7-9; 15-19; ... 43-45) e.g., 001 = first attribute value which for package design is A, for price is 1.19 and so on.

As an example, suppose that the chromosome's first nine bits, which represent the first step of the rule, are 000011010. The first three bits, 000, code the less-than or equal-to threshold operator; 011 is the third attribute, price; 010 is the second price value, which is $1.39. Hence, an alternative is considered if it costs $1.39 or less. This decoding of the first nine bits represents the first step of the EBA rule. Should both alternatives in a choice set pass this step, then the next step of the EBA rule (the next nine bits) is invoked. Each member of the population must be evaluated for fitness. Given a ranking as in Table 2, multiple ways can create an objective function that can be used as a fitness measure. The full ranking of the 18 alternatives is equivalent to 17 binary comparisons. That is, rank 1 is preferred to rank 2 (denoted 1>2), 2>3, and so on. Transitivity determines the outcome of any other comparison. Thus, testing any rule against these 17 comparisons would evaluate it fully. However, these 17 comparisons are the most sensitive and experimentally the least repeatable. For this reason, the objective function actually used is derived from the maximum number of comparisons that can be made from ranking the 18 alternatives. This is 153, which is 17 + 16 + 15 + ... 2 + 1. That is, the best alternative (number 1) is compared to the 17 that are worse than it, number 2 is compared to the 16 worse alternatives, and so on. If a rule eliminates both alternatives, then no choice is made and the action is counted as a failure. We note that more sophisticated objective functions, which sampled the data set rather than performed the exhaustive tests described, improved the GA running time. However, the quality of the solutions did not improve and the computational time was not excessive in this case, so the simpler objective function typically was used. No single noncompensatory rule had the predictive power of the compensatory preference models in Green and Wind. However, this model is a very specialized data set originally used to illustrate compensatory models, not the noncompensatory models we seek here. Nevertheless, some rules predicted about three-fourths of the choices. For example, a rule that is correct in 76% of the comparisons is: minimum price, money-back guarantee, package B. The rule is interpreted as follows: Given two alternative carpet cleaners, first look at price and select the one with the lowest price. If both are equal, then choose the one with the money-back guarantee, and so on. A three-part rule is enough to decide between any choice pair in the training set. This rule, as others follow, can have fewer than the maximum of five steps (Listing 1) for two reasons. First, "no-ops" are ignored by the evaluation function and consequently are not shown. Second, the left-to-right decoding means that rule parts positioned to the right are exercised less frequently. Those parts not

6 of 12

9/12/05 5:26 PM



exercised at all are extraneous and are not reported. However, these parts may become important in future generations of the population. Another rule with the same fitness (76%) was package B, money-back guarantee, minimum price. Note how both of the earlier rules have the same elements, but the order is different. This difference does not affect the overall prediction rate, but some individual predictions are different. Finally, a rule with a slightly lower fitness, 75%, in which brand does become important, is package B, minimum price, K2R. From these rules a picture emerges showing the importance of price, package B, brand K2R, and the money-back guarantee. It is interesting that the overall prediction rate is not particularly sensitive to the EBA sequence (although which items are correctly predicted varies). Recall that EBA is a sequential rule that can terminate at any step, frequently the first one. You would need further research to explore the reasons for and the generality of this invariance. A variety of factors increases the sensitivity of linear (additive) models to changes in attribute weights. It would be interesting to see if similar results hold for sequencing in process rules of choice. Because differing decision rules discovered by the GA can have differing behavioral implications, the multiple rules discovered by the GA can guide additional market research, which would be required to distinguish between the competing rules.

Acceptable carpet cleaners Many choices are binary, such as a purchase decision. Here, rather than reproduce a ranking, a partition is discovered by the GA. We again use the carpet cleaner data set. The top six carpet cleaners arbitrarily are chosen as acceptable. The goal is to find rules that discriminate the acceptable from the unacceptable. The chromosomal structure is an extension of the first example, the problem of determining acceptable apartments. The two threshold operators, greater than or equal to a threshold and less than or equal to a threshold were used in addition to equal to and not equal to. The objective function must reward correct matches and punish incorrect matches. You can accomplish this task in many ways, and we considered two. The first required that any rule giving a false positive, that is, prediction of acceptability for an unacceptable alternative, be given a fitness of zero. The other approach rewarded one point to a rule for each correct match and punished for false matches by subtracting one point from the fitness. Other objective functions were not tried because both these simple approaches provided good solutions. This result is of practical importance because how to reward and punish in a given supervised learning situation is a difficult and, in general, unsolved problem. Recall that the top six alternatives (from Table 2) were deemed "acceptable"; here are some of the individual rules that were discovered by several runs of

7 of 12

9/12/05 5:26 PM



the Ga. A very simple rule accept if package design is B, correctly matches five of the six acceptable alternatives. It also falsely matches the alternative that is ranked 14th. A refinement of the rule that does not have any false positives is: accept if (package=B) and (price not equal $1.39). This correctly matches four of the six acceptable alternatives. The flexibility of GAs is clear. Different encodings can help discover a variety of process models of choice. The generation of multiple solutions of similar quality is a valuable characteristic of the GA implementation.

A classifier system A simple classifier system also was used to learn and predict choices. An attraction of the classifier system is that it can implement arbitrary Boolean expressions and places few restrictions on the form of the rules that are discovered. Once again, we use the carpet cleaner data set (Table 2). We assume the top six carpet cleaners are acceptable. The goal is to find a set of rules that discriminates the acceptable from the unacceptable. A classifier system interfaces to the outside world via a list of messages. The input messages communicate information about a particular carpet cleaner, which is described by its attributes. A rule (chromosome) that matches a carpet cleaner from the data set then must predict accurately whether this cleaner is acceptable or unacceptable. The chromosome structure, the first eight bits of which match the input messages, is shown in Table 3. Table 3: The chromosome structure, the first eight bits of which match the input messages. Bits Attribute Encoding 1, 2 Package 00=no-op, 01=A, 10=B, 11=C 3, 4 Name 00=no-op, 01=K2R, 10=Glory, 11=Bissell 5, 6 Price 01=no-op, 01=$1.19, 10=$1.39, 11=$1.59 7 Approval Seal 0=no, 1=yes 8 Money-back guarantee 0=no, 1=yes 9 Decision 0=unacceptable, 1=acceptable As an example, consider the following rule: 101###111. The first two bits, 10, match carpet cleaners with package B. The # symbol is the logical "don't care"; thus, the second two bits, 1#, will match either Glory or Bissell. The rule does not discriminate price (##), but will only match cleaners with a seal of approval and money-back guarantee. Finally the last bit, a one, ranks the cleaners that match the rule as acceptable.

8 of 12

9/12/05 5:26 PM



A simple reward system was used: If a rule matches an input, bids, wins, and predicts the correct choice, it gets a point. Thus, only high-performance rules are reinforced, while low-performance rules are replaced by the products of the GA. Two other processes, typical of classifier systems, also affect rule strength. First, each rule is subject to a tax; second, rules that match and bid, but do not win the bidding, still have their bids subtracted from their fitness. In several runs, the classifier system discovered viable sets of noncompensatory rules that discriminated the acceptable carpet cleaners. While the final rule sets are the most interesting, their evolution is also of interest. The system begins with a random set of rules. After four generations (2,000 training examples), the system had learned a set of six rules that accurately partitioned the cleaners into acceptable and not acceptable. The evolution of the same rule set after a total of 20 generations (10,000 training examples) is shown below: A carpet cleaner is acceptable if Rule 1: (package=B or C) and (money back guar.) or Rule 2: (package=B or C) and (price=1.19 or 1.59) A carpet cleaner is unacceptable if Rule 3: (brand=Glory) and (price=1.19 or 1.59) or Rule 4: (package=A or C) and (price=1.39 or 1.59) or Rule 5: (package=A or C) and (price =1.19)

Although individual rule changes were minor, the rule set shrunk from six to five active rules. Note that the second and fourth rule, which have the same specificity, both match a cleaner with package C and a price of $1.59, but conflict in their prediction of acceptability. The fourth rule has the higher fitness, so it wins and accurately predicts the cleaner to be unacceptable. An attractive feature of classifier systems is that generality in the rule set can be favored or discouraged. When the parameters were set to favor very general rules, and some false predictions were allowed, the set of rules shown below evolved. Note that only three rules exist: A carpet cleaner is acceptable if Rule 1: (package = B or C) and (money back) or Rule 2: (package=B) and (price=1.19) and (no money back) A carpet cleaner is unacceptable if Rule 3: (package=A or C)

While this rule set is compact, it makes two mistakes. One acceptable cleaner is miscategorized as unacceptable and vice versa. This rule set illustrates a default hierarchy: a very general rule that categorizes many cleaners as unacceptable, and specific rules that split out acceptable cleaners.

9 of 12

9/12/05 5:26 PM



The discrimination task for the classifier system was very simple. Other researchers have investigated performance for more difficult Boolean learning tasks. However, we chose a deliberately simple task so it would be easier to compare the results qualitatively with that of published human results. (The cognitive relevance of classifier systems is a continuing topic of discussion and investigation.) The classifier system was successful in developing an accurate set of categorization rules. An examination of the rule sets illustrates several positive and negative aspects of using classifier systems in this domain. On the plus side, besides discovering an accurate set of rules, the system evolved a more compact rule set with repeated exposure to the training data. Furthermore, the evolution of the classifier rule set was gradual; new and similar choice information modified some but not all the rules. This contrasts to compensatory conjoint methods in which new information changes all the weights. On the minus side, the rules that were discovered are not very psychologically appealing; rather, they seem awkward and different from what might be extracted from a protocol analysis.

A promising alternative We have shown how GAs can help discover decision rules from a set of choice data. Based on the results of this study, we argue that GAs provide a promising alternative for modeling consumer choice. The GA discovered important classes of rules that fit the data, although care should be taken to not assume these rules match the exact cognitive processes of the decision maker. A major benefit of the GA approach is that it is very flexible. The assumption of specific elementary decision operators is less restrictive than the more common practice of assuming a particular, high- level noncompensatory model and less demanding than protocol analysis. Another benefit is the GA's ability to generate multiple solutions quickly. This feature especially would be valuable to a more interactive system trying to understand an individual's decision making because the system could tailor the options presented in a manner that efficiently distinguishes between the competing models. Finally, the rule-based approach of the classifier system raised the question of what is desirable for a rule set that mimics choice behavior. We offer three characteristics: The rule set accurately fits the choice data, the rules are intuitively reasonable, and the individual rule structures are consistent with experimental research results. While the classifier system did not achieve these goals uniformly, it seems a promising start. Thanks to Steve Kimbrough, Scott Moore, and Jerry Lohse for the discussions on this topic.

Suggested reading 10 of 12

9/12/05 5:26 PM



Bonelli,P., and A. Parodi. "An Efficient Classifier System and Its Experimental Comparison With Two Representative Leaning Methods on Three Medical Domains," in Genetic Algorithms: Proceedings of the Fourth International Conference (GA91). R. K. Belew and L. B. Booker, eds., San Mateo, Calif.: Morgan Kaufmann, 1991, pp. 288-295. Currim, I., R. Meyer, and N. Lee. "Disaggregate Tree-Structured Modeling of Consumer Choice Data," Journal of Marketing Research 25: 253-65, August 1988. Goldberg, D. Genetic Algorithms in Search, Optimization, and Machine Learning. Reading, Mass.: Addison-Wesley, 1989. Green, P., and Y. Wind. "New Way to Measure Consumers' Judgments," Harvard Business Review July-Aug. 1975: 107. Greene, D. P., and S. F. Smith. "A Genetic System for Learning Models of Consumer Choice," in Genetic Algorithms And Their Applications: Proceedings Of The Second International Conference On Genetic Algorithms. J. J. Grefenstette, ed. Hillsdale, N.J.: Lawrence Erlbaum Associates, 1987, pp. 217-223. Holland, J., K. Holyoak, R. Nisbett, R., and P. Thagard, Induction. Cambridge, Mass.: MIT Press, 1986. Johnson, E., and J. Payne. "Effort and Accuracy in Choice," Management Science 31: 395-414, April 1985. Johnson, E., D. Schkade, and J. Bettman. "Monitoring Information Procession and Decision: The Mouselab System." Pittsburgh, Penn.: Graduate School of Industrial Administration, Carnegie Melon University, 1988. Johnson, E., R. Meyer, and S. Ghose. "When Choice Models Fail: Compensatory Models in Negatively Correlated Environments," Journal of Marketing Research 26: 255-270. Keeney, R., and H. Raiffa. Decisions With Multiple Objectives: Preferences and Value Tradeoffs. New York, N.Y.: Wiley, 1976. Liepins, G. E., and L. A. Wang. "Classifier System Learning of Boolean Concepts," in Genetic Algorithms: Proceedings of the Fourth International Conference (GA91). R. K. Belew and L. B. Booker, eds. San Mateo, Calif.: Morgan Kaufmann, 1991, pp. 318-323. Payne, J., and E. Eason Ragsdale. "Verbal Protocols and Direct Observations of Supermarket Shopping Behavior: Some Findings and a Discussion of Methods," in Advances in Consumer Research, Vol. 5. H. Keith Hunt, ed., Urbana, Ill: Association for Consumer Research, 1978, pp. 571-577.

11 of 12

9/12/05 5:26 PM



Riolo, R. L. "Modeling Simple Human Category Learning with a Classifier System," in Genetic Algorithms: Proceedings of the Fourth International Conference (GA91). R. K. Belew and L. B. Booker, eds. San Mateo, Calif.: Morgan Kaufmann, 1991, pp. 324-333. Russo, J., and B. Dosher. "Strategies for Multiattribute Binary Choice," Journal of Experimental Psychology: Learning, Memory and Cognition 9: 676-96, October 1983. Tversky, A. "Elimination by Aspects: 'A Theory of Choice'," Psychological Review 79 (July): 281- 99.

About the Author Jim Oliver is a graduate student in the Department of Operations and Information Management, The Wharton School, University of Pennsylvania (Pittsburgh). His areas of research include genetic algorithms, computer-assisted negotiations, and interorganizational information systems. AI Expert, March 1994, pages 33-39. Copyright © 1994 by Miller-Freeman, Inc. All rights reserved.

12 of 12

9/12/05 5:26 PM