Dynamic scheduling for multi-site companies: a

0 downloads 0 Views 1MB Size Report
23 Sep 2010 - Keywords Production control · Scheduling · Multi-agent system · Reinforcement ..... cal system's “myopic” behaviour (Zambrano et al. 2011) by ...... nizational science (NAACSOS 2004), June 27–29, Pittsburgh,. PA. Tang, C. S. ...
Dynamic scheduling for multi-site companies: a decisional approach based on reinforcement multi-agent learning

N. Aissani, A. Bekrar, D. Trentesaux & B. Beldjilali

Journal of Intelligent Manufacturing ISSN 0956-5515 J Intell Manuf DOI 10.1007/s10845-011-0580y

1 23

Your article is protected by copyright and all rights are held exclusively by Springer Science+Business Media, LLC. This e-offprint is for personal use only and shall not be selfarchived in electronic repositories. If you wish to self-archive your work, please use the accepted author’s version for posting to your own website or your institution’s repository. You may further deposit the accepted author’s version on a funder’s repository at a funder’s request, provided it is not made publicly available until 12 months after publication.

1 23

Author's personal copy J Intell Manuf DOI 10.1007/s10845-011-0580-y

Dynamic scheduling for multi-site companies: a decisional approach based on reinforcement multi-agent learning N. Aissani · A. Bekrar · D. Trentesaux · B. Beldjilali

Received: 23 March 2010 / Accepted: 21 July 2011 © Springer Science+Business Media, LLC 2011

Abstract In recent years, most companies have resorted to multi-site or supply-chain organization in order to improve their competitiveness and adapt to existing real conditions. In this article, a model for adaptive scheduling in multi-site companies is proposed. To do this, a multi-agent approach is adopted in which intelligent agents have reactive learning capabilities based on reinforcement learning. This reactive learning technique allows the agents to make accurate shortterm decisions and to adapt these decisions to environmental fluctuations. The proposed model is implemented on a 3-tier architecture that ensures the security of the data exchanged between the various company sites. The proposed approach is compared to a genetic algorithm and a mixed integer linear program algorithm to prove its feasibility and especially, its reactivity. Experimentations on a real case study demonstrate the applicability and the effectiveness of the model in terms of both optimality and reactivity. Keywords Production control · Scheduling · Multi-agent system · Reinforcement learning · Multi-site company N. Aissani (B) · B. Beldjilali LIO, Department of Computer Sciences, University of Oran, Oran, Algeria e-mail: [email protected] B. Beldjilali e-mail: [email protected] A. Bekrar · D. Trentesaux University Lille Nord de France, 59000 Lille, France e-mail: [email protected] D. Trentesaux e-mail: [email protected] A. Bekrar · D. Trentesaux UVHC, TEMPO Lab., 59313 Valenciennes, France

Introduction In recent years, most companies have resorted to multi-site organization in an effort to improve their competitiveness and to adapt to current conditions. These companies have to deal with the influence of four main factors. The first factor is the passage from a supply market (standardized production) to a demand market (differentiated production), which has forced companies to be more responsive, thus shortening the product life cycle. The second factor is the globalization of the economy. The third factor is technological innovation, which results in complex products that allow a smaller number of companies to control all the know-how in their field of production. The fourth and final factor is the financialization of the economy (Galliano and Soulie 2007). Multi-site organization allows companies to compete on the market and develop their competitive advantage, reviewing the structure of their logistics networks to make them more efficient. These networks are characterized by the number, location, size, mandate and modus operandi of the various entities that make up the company. There are many examples of company mutations. Among the first experiments, Ouchi & Jaeger’s Firm Z, described by Mintzberg (1980), presents a project-based organization. Miles and Snow (1992) have discussed the possibility of using networks to successfully reintroduce market benefits in companies. But, at the end, a multi-site organization still involves distributing the company’s functional processes while physically distributing the company entities. In this paper, a model is proposed for the manufacturing control in multi-site companies able to provide effective and adaptable schedules. For each site, decisions may be made online to meet the company’s long-term goals. The proposed model is organized heterarchically, which ensures the controlled distribution of decisions among entities on the same

123

Author's personal copy J Intell Manuf

hierarchical level (Trentesaux 2009; Zbib et al. 2010), and uses a multi-agent approach to benefit from their dynamic behaviour. Our aim is to obtain a system that is able to control the coordination among the different sites online in order to optimize online scheduling, all the while reacting effectively to disturbances. This approach is called “dynamic scheduling”. The rest of this paper is organized as follows. First, the problem is introduced by analyzing the literature about planning and scheduling in multi-site companies. Then, the motivation of our research is explained. Next, a formal description of the problem is given and discussed, allowing us to introduce an accurate multi-agent model with reactive learning abilities. This model is deployed on multi-tier platform. After the proposed model is described, the results of its experimental implementation using data from a real case and our analysis of these results are presented. Finally, conclusions are presented, summarizing our contribution and introducing our prospects for future research.

Control in multi-site companies: state of the art Companies that have chosen multi-site organization have spread their production over several sites or production units. These production units have to produce a final product for customers, and some of these units produce the same semifinal or final product. Planning and scheduling in multi-site companies is thus focused on the problem of coordinating the production and distribution processes, while taking into account two important constraints: product quantities & costs and production times (Thierry 2003). Product quantities and costs Dealing with quantity constraints involves dealing with the capacity of the supply chain, resources, storage areas and inventory, according to the due dates and the needed quantities, while also keeping in mind that manufacturing certain products requires the presence of other products at some specific point in the process. Quantity constraints must also cope with resource capacity since, for example, each resource cannot be used for more than its available time period (e.g.,12 h/day) (Fontan et al. 2001; Marquès et al. 2009). Determining the optimal supplier quantity discount policies for a given quantity and a constant demand has attracted much attention. For instance, Monahan (1984) analyzed an all-unit quantity discount policy that maximized the vendor’s gain without adding to the buyer’s costs. Lee and Rosenblatt (1986) generalized Monahan’s model to increase the supplier’s profit by incorporating constraints imposed on the discount rate and relaxing the assumption of a lot-for-lot supply policy. Weng and Wong (1993) developed a general

123

all-unit quantity discount model for multiple buyers in order to determine the optimal pricing and replenishment policy. Yang (2004) proposed an optimal pricing and ordering policy for a failing item with a price-sensitive demand. Tsai (2007) introduced a Supply Chain Management (SCM) model capable of dealing with various quantity discount functions by using linearization techniques; this model deals with price variations, which are linked to demand quantity variations. Papers related to this topic are too numerous to be all cited. There are, however, several comprehensive literature reviews available, such as the one by Narasimhan and Mahapatra (2004). Production times Multi-site companies have certain specific temporal constraints. For example, depending on the size of the network (i.e., the number of decision-making entities), decisionmaking can be costly in terms of time. In international companies, transfer times must be taken into consideration. For some companies, it is even crucial to take time into account (e.g., in food companies, deadlines are tight). To deal with these temporal constraints, several approaches have focused on time and on reducing cycle times when planning and scheduling for multi-site companies (Thierry 2003). Tang (1990) presented a discrete time model of a multi-stage production system that faces two major kinds of uncertainty: (1) the output rate at each production stage and (2) the demand for the finished product. He analyzed the impact of these kinds of uncertainty on the production planning, inventory control, quality improvement and capacity planning. Dabbene et al. (2005) proposed an hybrid event-time model for a fresh-food supply chain, taking into account both event– driven dynamics (the supply chain itself) and time-driven dynamics (the parameters characterizing the jobs in the supply chain). This model manages a trade-off between logistics and an index measuring the quality of the food itself. Ouhimmou et al. (2008) proposed a mathematical model for tactical planning for a supply chain subset: an integrated furniture company. They developed aheuristic using time decomposition in order to obtain good solutions to large problems within a reasonable time limit. The approaches mentioned above are considered to be static optimization approaches or predictive approaches, in which planning and scheduling are done before launching the operating system. These high-level approaches that seek to optimize cost and benefits do not consider operational constraints, such as scheduling constraints, which explains why such predictive or off-line approaches often use linear programming and heuristics. At a lower more operational level, new techniques from artificial intelligence have been successfully exploited. In their book, Voss and Woodruff (2006) showed that the benefit of establishing a supply chain is the global

Author's personal copy J Intell Manuf

optimization of planning: each entity in the supply chain tries to obtain large profits in the shortest time, using bionic approaches and multi-agent approaches. In the same supply chain context, Ait Si Larbi et al. (2008) used precedence constraints in a genetic algorithm for scheduling in multi-site companies. Hong et al. (2007) developed an improved artificial fish-swarm algorithm to optimize supply chain scheduling. However, all these contributions are still predictive, in that they do not consider the real-time behaviour of the supply chain. Intelligent approaches, especially multi-agent approaches, are also used to dynamically optimize and control supply chains. For example, Swaminathan et al. (1997) used a multiagent approach to model supply chain dynamics, which incorporates supply, process and demand uncertainty, as well as analytic and heuristic decision procedures. Chuin Lau et al. (2008) proposed a multi-agent system for real-time control of a supply chain. Mourtzis et al. (2008) used internet communication and real-time information from RFID sensors to dynamically control a supply chain. Tarantilis (2008) reviewed the literature about real-time control and optimization in supply chains. He concluded that these intelligent approaches offer reactivity, including multi-agent approaches that can produce decisions at any time, taking changes in their environment into account through a process of collaboration or interaction. From this state-of-the-art analysis it appears that only few researchers are proposing an integrated adaptive scheduling approach, able to support both online scheduling optimization and reactivity in multi-site production systems. In this paper, our objective is to propose a model of a manufacturing control system that generates online scheduling solutions for multi-site companies (e.g., resource allocation at the time of the job request), while simultaneously respecting an overall goal (e.g., minimizing the overall production time, minimizing the total cost). In the next section, an overview of the literature is provided about dynamic control and distributed scheduling in order to analyze the studies to determine what are the best practices for dynamic control and distributed scheduling of supply chains. Dynamic control and distributed scheduling One point that dynamic control and distributed scheduling have in common is that they both adopt a heterarchical model for agent-based manufacturing systems (Silva et al. 1998). This heterarchical model is being used more and more as time goes on. Hierarchical architectures have several drawbacks, notably their limits in terms of scalability, reconfigurability, reliability and redundancy. Vaario and Ueda (1998); Sauer et al. (2000) and Tehrani Nik Nejad et al. (2008) have implemented agent groups in hierarchical architectures. Agents are also appropriate for solving distributed prob-

lems because of their autonomy and decision-making capacities. These autonomous, intelligent agents are perfectly suitable for production planning and scheduling. Heterarchical architectures, on the other hand, use a hierarchical organization with links between same-level entities, and thus are more responsive and help to reduce costs (Trentesaux et al. 2000; Trentesaux 2009). Bousbia and Trentesaux (2002) analyzed the possibilities of self-organized heterarchical control systems within a dynamic production environment, highlighting the advantages of such an approach. With heterarchical architectures, it is possible to use a Multi-Agent System (MAS) model because the agents are independent and are able to receive information from their environment, act on that information and generally behave in a rational manner. Aissani et al. have also experimented with this heterarchical approach to control for scheduling in job-shops (2008a) and for scheduling production and maintenance tasks in the petroleum industry (2009). A major problem in these heterarchical system models is “myopic” behavior (Trentesaux 2009), which makes it difficult to provide efficient results and optimization mechanisms. Models could potentially be based on a holonic approach since such approaches have been successfully used to solve distributed scheduling problems. For example, in the ADACOR project, Leitao and Restivo (2008) used simple scheduling algorithms embedded in holons. These authors integrated dynamic mechanisms to increase system performance and used industrial scenarios that required fast scheduling solutions. Although their approach does not include any learning capabilities for adaptive behavior (as we aim to), Leitao and Restivo’s experiments showed that their approach could improve performance, especially in terms of agility. The PROSA architecture (Van Brussel et al. 1998) and its extensions have also been used for dynamic control (Ounnar and Pujo 2009). However, in this study, a multiagent approach is preferred since the proposed models do not currently rely on the main assumption of the holonic paradigm: the integration of physical and informational components within a recursive entity (holon). Of course, this does not mean that we refuse to consider this approach in future work, but other theoretical considerations have to be discussed first. Thus, consideration of holonic solutions is beyond the scope of our current work. In the next section, multi-agent based heterarchical organizations and relevant advantages are presented in terms of adaptability and self-organization. Heterarchy and multi-agent systems Bousbia and Trentesaux (2002) analyzed the possibilities of self-organized, heterarchical control systems within a dynamic production environment, highlighting the advantages of such an approach. The heterarchical architecture

123

Author's personal copy J Intell Manuf

is generally considered as an extension of the hierarchical architecture, with a relationship between entities on the same hierarchical level and with the summit potentially replaced by a group of entities. Initially proposed for the field of medical biology (McCulloch 1945), this kind of architecture has been experimented in several domains (Prabhu 2003; Haruno and Kawato 2006). Heterarchical organization has great potential for communication between same-level decisional entities, which helps to increase greater reactivity in online systems. To formalize this concept, let us consider an oriented graph composed of nodes that are decisional entities, with arcs formalizing the master-slave relationships. If the graph (or a sub-graph) is strongly connected, then it forms a heterarchy. If strongly-connected, hierarchically-dependent sub-graphs can be identified, it is possible for hierarchy and heterarchy to co-exist within a graph. This coexistence means that there is at least one strongly connected sub-graph. This definition is consistent with, and also formalizes, McCulloch‘s initial definition of heterarchy (Trentesaux 2009). Heterarchical architectures allow systems to remain responsive and self-organized. However, adaptability requires other techniques, including learning. The learning technique should be chosen to allow the system to remain responsive while improving its performance. Reinforcement learning is an appropriate way to reach this objective in multiagent systems. Thus, reactive learning technology, such as Reinforcement Learning, are integrated into system’s agents to improve the quality of their decision-making so that the system can offer adaptive scheduling. Our initial model, developed for the adaptive scheduling of a single flexible manufacturing system (Aissani et al. 2008a, 2009), has been adapted and extended to the context of multi-site companies. Both the initial model and its extension are based on a heterarchical organization in which entities are modeled as agents with online learning capacities through reinforcement learning. Reinforcement learning is learning by trial and error. This type of learning will help us manage the heterarchical system’s “myopic” behaviour (Zambrano et al. 2011) by constantly improving the agent performances. In the next section, adaptation of Reinforcement learning (RL) for the multi-agent decisional context is presented. Agent learning capabilities and reinforcement learning techniques

et al. (2004) proved the ability of Multi-Agent Systems to control job-shops and the usefulness of integrating learning capabilities into the agents in order to make them adaptive and able to react effectively to resolve disturbances. More recently, Aissani et al. (2008b) used intelligent agents with multi-objective learning capabilities based on reinforcement learning. Sallez et al. (2009) have developed a multi-agent learning approach for dynamically adapting product routing in flexible cells using stigmergy, which is a reinforcementbased approach. Reinforcement learning is learning by trial and error. In other words, agents perform actions and wait for an evaluation of the quality of the chosen action to determine whether or not the action should be added to their repertoire. Reinforcement learning is thus a reactive learning technique and can be appropriate for generating online solutions and improving them over time. This technique has often been used in robotics to teach robots proper behavior with respect to goals and obstacles (Dongbing and Yang 2007). Problem distribution must be done intelligently on an appropriate model that can control different entities that often have different goals and different convergences. When these entities are also endowed with a reactive learning technique, such as reinforcement learning, learning must be controlled so that entities can learn a policy that allows them to accomplish their objectives, while simultaneously responding to a global goal. To do this, modeling the learning function must involve entities learning local parameters as well as the overall system parameters. The following section shows how this learning function is modeled and describes the multi-agent architecture, including the RL mechanisms used to control multi-site companies.

Proposed model In this section, details about the proposed model of a multi-agent system with learning capabilities (MAS-RL) for the dynamic control of multi-site company are presented. A MILP (Mixed Integer Linear Program) formalization of the multi-site scheduling problem inspired from a real industrial system is first presented. This model will then be discussed and used to construct the multi-agent model. MILP formulation of the problem

The integration of learning capabilities has already been studied in the context of scheduling, but not specifically as related to multi-site production. For example, Katalinic and Kordic (2004) examined a scheduling problem for a very expensive electric motor production system. These authors considered the production units as insect colonies that were able to organize themselves to carry out a task, thus allowing production risk problems to be solved more easily. Monostori

123

This formulation is based upon the real case study of a multi-site off-the-rack clothing company, ENADITEX. ENADITEX is consisting of 5 sites and various subcontractors distributed over a national territory. The company supplies finished products to 5 demand zones, according to the requests transmitted. This multi-site company manufactures 20 finished products, which require 3 types of

Author's personal copy J Intell Manuf

production resources. These resources are present in the different manufacturing sites. For example, sweater production includes several stages: “whipping the sides of the front of the sweater”, “decorating with the logo” or “ironing”, to name a few. Since several sites may be able to perform several of the operations, the parent company must respond to a diversified request by assigning different quantities to the different sites. This problem is close to what the literature calls the flexible job-shop problem (FJSP), since each product has its own processing operations and each operation can be processed by one among a set of available machines. The assignment of operations to machines is not a priori compulsory, as in the traditional job-shop problem. Many models have been proposed to deal with the FJSP problem, particularly from the operational research community. Due to the complexity of the FJSP, most of the solving methods proposed use heuristics and meta-heuristics approaches. The few papers that presented exact algorithms were limited to small instances or limited number of jobs and machines. Brucker and Schlie (1990) were the first to propose a geometric approach to solve the FSJP with two jobs. The tabu search algorithm has been widely used with disjunctive graph representations: Brandimarte (1993), Dauzère-Pérès and Paulli (1994, 1997), and Mastrolilli and Gambardella (2000). This same algorithm was used by Mati et al. (2001) to minimize the makespan and to avoid gridlock in flexible manufacturing systems. Other metaheuristics were used to solve the problem: for instance, a genetic algorithm was proposed by Kacem et al. (2002), Zhang and Gen (2005), Gao et al. (2007), Vilcot (2007), Pezzella et al. (2008) and Mati et al. (2010). Fattahi et al. (2007) proposed a mathematical model and a hybridization of the simulated annealing and tabu search to minimize the makespan in the FJSP problem. In the best of our knowledge, the FJSP in the current multisite context, especially with non-negligible transfer time between sites, has never been studied. The developped MILP model is thus inspired from the flexible job-shop scheduling problem because of the presence of classic constraints relevant to this problem (e.g., precedence constraints, disjunctive constraints). However, this model must be adapted to take the specific constraints of multi-site production into account. In this paper, three specific constraints are considered:

set of operation of the job j, Ij = {1, 2, . . ., |Ij |}, j, J = {1, 2, . . ., n} operation i of the job j set of possible machines for the i operation (i ∈ Ij ) processing time of operation i (i ∈ Ij ) on machine m in site s Transfer duration from site s to site s

Ij Oij Mi j pij ms Css

b. Notations for variables: completion time of operation i (i ∈ Ij ) tij >0  1 if operation Oij precede operation Okl bij kl = 0 otherwise ⎧ ⎨ 1 if operation ij is performed on machine μij ms = m of site s ⎩ 0 otherwise  1 if i operation is transferred from s to s  trij ss = 0 otherwise tij

c. MILP model

Minimize PFD s.t. tij + pklms + BM ∗ bij kl ≤ tkl +BM∀i ∈ Ij ∀k ∈ Il ∀j, l ∈ J, ∀m ∈ Mij , ∀s ∈ S (1) where BM is a large number  t(i+1)j ≥ tij + trij ss  Cs  s + μ(i+1)j ms p(i+1)j ms   s ∈S,s =s ∀i ∈ Ij , ∀j ∈ J, ∀m ∈ Mij , ∀s ∈ S (2) tij + BM(μij ms + μklms − 2) ≤ tkl − pklms ∀j, l ∈ J, j < l ∀i ∈ Ij , ∀k ∈ Il ∀m ∈ Mij , ∀s ∈ S (3)  μij ms = 1 ∀i ∈ Ij ∀j ∈ J, (4) m∈Mij s∈S



tij ≥

m∈Mij s∈S

μij ms +

μij ms pij ms ∀i ∈ Ij ∀j ∈ J, 

(5)

μ(i+1)j m s  − 1 ≤ trij s  s ∀i ∈ I

m ∈M(i+1)j

• Each job has a priority rating, • The operational processing time depends on the selected machine, and • The inter-site transfer time cannot be ignored. a. Notations for parameters: J set of jobs, J = {1, 2, . . .n} S set of sites, S = {1, 2, . . .s} M set of machines, M = {1, 2, . . .m}

∀j ∈ J, ∀m ∈ Mij ∀s, s  ∈ S, s = s  μij ms + μ(i+1)j m s  1 ≥ (1 + ε)trij s  s

(6)

m ∈M(i+1)j

∀i ∈ I ∀j ∈ J, ∀m ∈ Mij ∀s, s  ∈ S, s = s  trij ss  ≤ 1, ∀i ∈ IJ , ∀j ∈ J

(7) (8)

s,s  ∈M s=s 

tij > 0 ∀i ∈ Ij, j ∈ J

(9)

123

Author's personal copy J Intell Manuf

bijkl ∈ {0, 1} ∀i ∈ Ij, j ∈ J, ∀k ∈ Il , l ∈ J

(10)

trijss ∈ {0, 1} ∀i ∈ Ij, j ∈ J, ∀s, s ∈ S,

(11)

μijms ∈ {0, 1} ∀i ∈ Ij, jJ, ∀m ∈ Mij , ∀s ∈ S,

(12)



The objective of this MILP is to minimize the completion time for all the production schedule, or the Project Final Date (PFD), where: PFD = Max∀i∈Ij ∀j ∈J tij with Max∀i∈Ij ∀j ∈J tij = C max . Constraints 1 define the disjunctive constraints; two operations i and k are not performed at the same time on the same machine. Precedence constraints presented in 2 ensure that the operations of one job are performed according to the sequence defined by the job sequence. Constraints 3 state that if two operations Oij and Okl occur on a machine at the same time, the operation with highest priority is processed first. For that, jobs are sorted on decreasing order of their priorities. Each operation is performed on only one machine constraints. Constraints 6 and 7 defines the relationship between resources affectation and transfer inter-site; if successive operations of a job are performed on different sites, the variable transfer should be set to one. Constraints 8 ensure that an operation can be transferred at most one time. The remaining of the constraints defines the nature of variables. Most of the flexible job shop problems were proved to be NP-hard (Conway et al. 1967). Exact methods that explore the whole search space are efficient only for the cases in which the number of jobs or machines is limited to two. For an instance of n-jobs and m-machines, the number of possible solutions is equal to ((n!)m ) (Zobolas et al. 2008). According to Brandimarte (1993), flexibility greatly increases the complexity of the flexible job shop problem because it requires an additional level of decisions (e.g., the selection of machines on which job should be proceed). In our study, the machines are located in different sites. If a job is not performed at the same site, a transfer time must be added to the final completion time, which also complicates the problem. The complexity of the proposed model is given by the number of variables and constraints that depends on the size of the instance. For n jobs, m machines, o operations and s sites, there are approximately o * n * m * s(o *n+m *s) variables and o∗ n(m∗ s + o∗ n + s 2 + 1) constraints. For real-life instances, this problem is very complicated to solve, even by powerful industrial solvers, such as IBM Ilog Cplex (IBM 2010). In addition, the specific constraints make it hard to use the traditional meta-heuristic approaches because this would imply simplifying the model drastically and relaxing some of the specific constraints introduced above. In our opinion, one way to take all these specific constraints into account

123

simultaneously is to adopt a multi-agent approach, which would allow the creation of a very accurate and very precise model of the multi-site system and the associated decision system. The next section presents the heterarchical resolution approach. Multi-site manufacturing system for adaptive scheduling An agent-based model of company’s decision-making and operational entities on the site level is proposed. The metamodel that organizes the agents, which will make controlling them easier is presented. Each agent has a decisional module, composed of: a perception function, an action function and a learning/decisional function, and the way these functions collaborate is described. The Learning module, which is part of the learning/decisional function, is modeled as a Markovien Decisional Process and thus is suitable for dynamic environments (Russell and Norvig 1995), such as the supply chain environment. This Markovien Decisional Process model allows the agent to make an action policy, using reinforcement learning algorithm. The use the SARSA algorithm (Rummery and Niranjan 1994) to learn this action policy in order to keep the overall goal in sight is then highlighted. Finally, the way learning agents interact and negotiate to respond to control problems is explained. In this study, a multi-site company is considered as a set of decision-making groups, where each group is a set of decision-making entities. These entities are modeled using agent technology because of its autonomy and decisionmaking capacity. These agents are regrouped in on-site groups according to the Aalaadin meta-model (Ferber and Gutknecht 1998). The Aalaadin meta-model allows for a simple description of coordination and negotiation schemes through multi-agent systems that help to overcome problems of heterogeneity in languages, applications and agent architectures. Then, the multi-agent development platform Madkit (see http://www.madkit.org) was used to implement this model. Each site was modeled as a set composed of one observer agent and a subset of many stock agents and resource agents: • Observer Agent: On-site, the observer agent is responsible for carrying out the site objectives. It also processes all the acquisition requests, manages the stock supply and communicates with other sites. • Stock Agent: The stock agent is a reactive agent representing the stock and information about its contents (e.g., part a has failed, or there is a surplus of part b.) • Resource Agent: The resource agent represents the site’s production resources, which are workshops on this level. It transmits the resource report and launches actions (e.g., execute production, stop production or request stock)

Author's personal copy J Intell Manuf Fig. 1 Multi-site system model using the Aalaadin model

request and sends it to its stock agent and to other sites likely to have that product in stock. Finally, the observer agent receives many responses or solutions and accords the contract to the agent that gives it the best price (e.g., the shortest project achievement time). A simple duplication (mirror) of this agent may be provided to overcome the problem of server failure.

Stock agent behavior Fig. 2 Hierarchical and heterarchical organization of multi-site company entities

according to the decisions made by its decision-making module. Figure 1 shows the particular connectivity platform needed by both individual agents and agent groups to ensure the security of data that must be transmitted from site to site. To organize these groups, the traditional hierarchical company decomposition is extended, using levels with heterarchical relationships among each of the agents on the same level (Fig. 2). Deploying this model on the real site requires a platform that is securely connected to the larger Internet network, which allows modular development to facilitate changes and updates. For this study, a multi-tier platform has been chosen. This multi-tier platform is a client-server architecture in which the user interfaces, functional process “logic”, computer data storage and data access are developed and maintained as independent modules, most often on separate platforms. Observer agent behavior Each site has its own observer agent. A reactive agent, it receives product requests from other sites or directly from customers. In response to these requests, it elaborates a stock

Each site has its own stock agent. Also a reactive agent, it receives a stock request from the observer agent. If the requested quantity is available, it answers the observer agent directly, telling the observer agent that it can respond to this request. Otherwise, it sends a query to the resource agents if they are able to produce the necessary quantities and at what cost.

Resource agent behavior Each site of the company has its own resource agents, which are a cognitive agents with a decision-making module that makes them capable of responding online to requests. This decision-making module obtains its intelligence from a reinforcement learning method (i.e., the SARSA algorithm) that is part of the Learning module. The architecture of this agent is presented in Fig. 3. The active part of the agent contains the Learning and decisional module. The Perception module, which perceives the environment, codes the system state St and the physical resource state (Mq ) with rewards determined by agent interaction, and then the agent chooses an action At to execute via the Action module. Depending on the result of this action, the agent receives a numerical reward Rt , which can be positive or negative, destined to reward or punish the executed action according to a Markovien Decision Process (Russell and Norvig 1995). In this model, t is a given instant.

123

Author's personal copy J Intell Manuf

Fig. 3 a Learning agent architecture (resource), b Other agent architecture (stock and observer)

The ‘learning and decisional’ module of the Resource agent is based on SARSA learning algorithm. In the next section, this algorithm is described. SARSA algorithm for solving online scheduling problems A Markovien Decision Process is a tuple S, A, T , R , where S is a set of problem states, A is a set of actions, T(s, a, s )→ [0, 1] is a function defining the probability that taking action a in state s results in a transition to state s  , and R(s, a, s )→ R defines the reward received after such a transition. If all the parameters of the Markovien Decisional Process are known, an optimal policy can be found by dynamic programming. If T and R are initially unknown, however, Reinforcement learning (RL) methods can be used to learn the optimal policy through direct interaction with the environment. Reinforcement learning methods involve learning to act by trial and error. Agents perceive their individual states and perform actions for which numerical rewards are given. The goal of the agents is thus to maximize the total reward received over time. These methods are often used in robotics, in order to teach a robot the proper behaviour to achieve its goals and to overcome obstacles. The most popular method is Q-learning (Watkins 1989). Other algorithms could also be tested, such as QI or QIIIlearning or TD(λ), but in this paper, a more developed algorithm is explored, called the “SARSA algorithm” (Rummery and Niranjan 1994). Takadama and Fujita (2004) have shown that SARSA can converge earlier than the Q-learning algorithm. Essentially, this algorithm is performing a gradient descent through the reward function in order to maximize return on the path to the goal. Since the goal is reached many times, the values along the path from start state to goal converge on their true values. Once every action has been experienced sufficiently in every state, the optimal policy is to act

123

greedily with regard to the value function. Formally, SARSA replaces the max over the Q-values with the Q-value of the state action pair that is actually observed in the next step. Since updates are based on the actions that are actually taken, rather than on the best possible action, SARSA-based modules discover Q-values that are closer to the true expected return under the composite policy. SARSA algorithm is experimented with in order to design an adaptive multi-site manufacturing scheduling system based on heterarchical multi-agent architecture. The SARSA algorithm is used to learn the function Qπ (s, a), defined as the expected total discounted return when starting in state s, executing action a and thereafter using the policy π to choose actions:  T (s, a, s  )[R(s, a, s  ) + γ Qπ (s  , π(s  ))] Qπ (s, a) = s

(13) The discount factor γ ∈ [0,1] determines the relative importance of short term and long term rewards. For each s and a, a floating point number Q(s, a) is stored for the current estimate of Qπ (s, a). As experience tuples s, a, r, s ,a are generated through interaction with the environment, the table of Q-values is updated using the following rule: Q(s, a) = (α − 1)Q(s, a) + α(r + γ Q(s  , a  )).

(14)

The learning rate α ∈ [0,1] determines how much the existing estimate of Qπ (s,a) contributes to the new estimate. If the agent’s policy tends towards greedy choices as time passes, the Q(s, a) values will eventually converge on the optimal value function Q∗ (s,a). To achieve this, a Boltzman probability is used, which determines the probability of choosing a random action. Figure 4 shows the steps of the SARSA algorithm. In our case, this algorithm will make the Resource Agent learn its action policy π , which in turn makes it able to choose

Author's personal copy J Intell Manuf

Fig. 4 The SARSA algorithm (Rummery and Niranjan (1994))

the best action for each state (accept operation/request, or not). This algorithm works with the following data: • State parameters are the current time t ∈ 0…T ; the inventory of products j1 …jJ and their various stock states Sj1 …Sjn (capacity); the list of production sites S1 …S S, their resources S1 M1 …S i Mj …S m MK and their states Ss1 m1 …Ssi mj …Ssm mk (e.g., working, stopped) and their sequence Mq , and the production duration for each resource for each operation (Pij ms ). • Action parameters are used to assign (or not to assign) an operation to a site or resource. • Reward functions assign no reward to most states, positive rewards to the goal state and negative rewards to nondesirable states. For more precision and to obtain a proper convergence, the reward function is a state combination engendered by an action and a CmaxSi :  R(s) =

0........................not_accepted_request 1/C maxSi .........accepted_request.

This algorithm is integrated into the resource agent, where it plays a central role in the live process of the agent. As seen in Fig. 5 (which is an instance of Fig. 3), agent behavior is an endless cycle that can be characterized as follows: After an action, the agent perceives its reward from the environment. This reward is used to update the Q-values table by increasing or minimizing the value attributed to the chosen action in a statement. When the agent receives a new request from another agent, it evaluates its internal state and the request to determine the request’s state. Depending on its internal state and the Q-value table, the agent choses the action that maximizes the Q-value in that state. Following this, the agent performs the chosen action and looks around its environment to get the new reward. The resource agents evolve on MadKit platform in an application server in a 3-tier architecture. The next section presents deployment of the model within this 3-tier architecture.

Fig. 5 Live agent process

layer of the architecture to evolve relatively independently of others, which also helps to ensure data security. The multiagent system model connects the data server and the presentation application. This MAS model obtains and updates information from the data server and gives responses to the presentation application (interface), according to the diagram in Fig. 6. Each company site has a multi-tier application, which is an interface that allows the user to enter or retrieve data from the system. This interface is connected via the internet to application server, which implements the core business of the application. This server includes the multi-agent platform that runs the group of agents dedicated to this site. A user query is retrieved and processed by this group of agents. To process a query, the agent group needs a set of data concerning this company site; based on this data, it will conduct a search through the database server. The multi-tier application of each site is connected to the others via the Internet, which means that sites are very well-connected and the data are secured because the data are not directly accessed by the interface (presentation) layer. At this point, the agents are well-deployed on the application server, but how do they interact to control the supply chain? In the next section, an answer to this question is proposed by presenting how they respond to a production request.

The multi-tier platform

Multi-agent interaction

Multi-tier architectures separate presentation, processing and data into different layers, with the objective of allowing each

In the parent company, the observer agent receives a request and the overall goals of the company (Fig. 7). It sends this

123

Author's personal copy J Intell Manuf Fig. 6 The deployment of the model

request to the different sites. Initially, the stock agent is consulted to see whether the product is in stock or not. In either case, the stock agent asks the resource agents to propose a solution. These agents launch their decision-making algorithm based on reinforcement learning using the data system (e.g., concerning the resource state or the operation duration). A solution is then proposed, along with its reward. All these solutions will be sent back to the observer agent, who will choose the best solution in terms of rewards, which completes the scheduling process. This interaction process helps agents to make a better decision for the production plan, using the Cmax parameter, which is integrated to its evaluation function. The next section presents the MAS-RL implementation, as well as compares and analyzes results. Implementation and experiments The proposed model was simulated in the Borland Jbuilder environment because of its potential for facilitating communication and thread programming. This environment also offers a J2EE platform (Java 2 Enterprise Edition) (see http://java.sun.com/j2ee/overview.html), which facilitates the development of multi-tier applications. In addition, Java is also compatible with MADKIT, the platform architecture chosen for MAS development (see http://www.madkit. org/downloads). Development of simulation tool The development of the model presented above for a 3-tier architecture went through three phases (Fig. 8). Each tier is deployed on independent machine. First, Java Server Pages (JSP) were developed, which are the Web interfaces for the different company sites. These JSPs can display and collect data for each site. Then, the Java Beans (JB) were developed, which are deployed on the application server. Their role is to

123

codify information, start the Madkit kernel and manage the site’s agents. Finally, the agents and their communication modules were developed, the decision-making module and the learning algorithm for the resource agents. The Contract Net protocol was used to develop the communication modules because it is easily implemented on Madkit using “Communicator Agent” class and to develop the resource agent’s ‘decisional and learning’ module, especially the implementation of the SARSA algorithm. Since it was already present in the Madkit platform, it was not necessary to develop the “Communicator Agent”, whose role is to intercept non-local messages and reroute them to a distant kernel. It also helps the kernel to synchronize its actions with remote entities (groups, roles, communities). For each site, a server type is assumed, supporting each of the three-tier platforms: presentation (the JSP pages to present the site), application (the MAS application to create the site’s agent group) and data (a database that stores the site configuration: S, M) (Fig. 9). A client application was also developed to introduce client requests into the system and to set the configuration of the desired products. Experiments and investigations The basic ENADITEX multi-site structure is hierarchical. To illustrate the possibilities of the proposed approach, it was necessary to add a network allowing interaction between the various company sites to fit the model and to give the sites a view of the benefits that they can achieve if they adopt this model. According to this configuration, the model was instantiated as follow: Each site has a group composed of: one observer agent, one stock agent and as many resource agents as machines on this site, which has only one workshop. The production configuration was represented by the number of sites, the number of machines per site, the number of batches to produce, and the number of operations per batch

Author's personal copy J Intell Manuf

Fig. 7 Interaction needed to respond to a request (Sequence diagram)

Fig. 8 The components of the three tiers

(site× machine × batch × operation). An experimental production of 3 types of sweaters with 5 operations spread over the 5 sites (5 ×3 ×3 × 5) inspired from the real industrial case is used. The resulted diagram is shown in Fig. 10. The learning parameters should be set, shown in the evaluation function (14): α and γ . α is equal to 1 at the beginning;

thereafter, it decreases according to the Eq. 15 because this gives more weight to the experiments carried out. As soon as a state/action pair is visited, it means that enough experience for this state has been acquired. In other words, it is now possible to know what to do in this state. Experiments done by Takadama and Fujita (2004) and Aissani et al. (2008a) have shown that if γ is close to one, the algorithm converges faster. Although more extensive experiments must be done, 0.9 was chosen as the value for γ for this study. 1/nbr

Fig. 9 The Deployment diagram

(15)

where nbr is the number of (state, action) pair visited. As mentioned earlier, our goal is to propose a system model that is adaptive and is able to propose both optimal and reactive scheduling solutions. To illustrate these features, the proposed MAS-RL is compared to a Genetic Algorithm (GA) model developed by Ait Si Larbi et al. (2008) and to the MILP previously introduced. The following part describes the main features of the GA model.

123

Author's personal copy J Intell Manuf

Fig. 10 An example of a Gantt chart for the executed request considering transfer duration

Short description of the GA model Developed in 1975 by Holland, the GA has become a very popular algorithm and has been applied to a very large number of optimization problems. Genetic Algorithms determine populations of solutions (sometimes also referred to chromosomes) using selection, crossover and mutation operators (James et al. 2007). When applied to scheduling, the GA generates at each iteration a number of different plans or schedules, which are carried over to the next iteration. The GA operators that have been designed in this experiment are: Encoding; The first step in a GA implementation involves representing the problem to be solved with a set of chromosomes. Generally, these chromosomes are bits, but in our context of job sequences, this representation is not appropriate. For this reason, a job is coded by its identifier, its duration and the site, shop and machine on which it can be executed. It is important to mention that transfer times are not considered within this model since they require a high complexity in programming and it induces a difficulty at maintaining consistency while using various genetic operators. This is definitely a perspective for the authors. Initialization; An initial population is obtained by using uniformly distributed probabilities. Selection; The tournament and elitist selection procedures are used to achieve maximum potential while retaining the most interesting individuals. Crossover; The PMX crossover (Partially Mapped Crossover) is chosen for our GA. Partially Mapped Crossover is proposed by Goldberg and Lingle (1985), with the aim of maintaining as much as possible the order and position of lots of other parents. The crossover coefficient is 0.8. Mutation The mutation coefficient is 0.2.

123

Stopping; The stopping condition, chosen after several experiments based on several criteria (e.g., number of lots), is the number of generations to be generated. This value was determined based on trial runs with the specific problems examined in this study. Fitness function; Many different types of objectives are important in a manufacturing context. The most important of these objectives are the makespan objective, which is important when there are a finite number of jobs. The makespan is denoted by Cmax and is defined as the time when the last job leaves the system: Cmax = max(C1 , . . . , Cn ) where Cj is the completion time of job j. The makespan objective is closely related to the throughput objective. The fitness function minimizes Cmax but is transformed for a multi-site configuration with a PFD, Sect. 9, expressed as: F (t) = minPFD Comparison The three approaches are compared (the proposed MAS-RL, GA and MILP) on the same case study. Since the GA cannot model the transportation time, despite the fact that the presented model integrates inter-site transportation times, this value has been set to zero in the MAS-RL model and the transportation constraint is relaxed in the MILP to produce consistent comparisons. The PFD is used as our optimal criterion of comparison. For the company described above, the presented model has been run for many different kinds of products (e.g., sweaters, pants, skirts). Instances are named using a representation of different problem dimensions: s_m_j_t, where s is the number of sites, m is the number of machines, j is the number of jobs and t is the number of operations.

Author's personal copy J Intell Manuf Table 1 Results comparison of GA, MAS-RL and MILP

Instances GA PFD

Proposed MAS-RL Time (s) PFD

PFD

Gap % Time (s) Lower bound Time (s)

5_2_1_7

200

8

168

14

168

062

000

5_3_1_7

165

10

162

15

162

069

000

5_3_2_5

1,032

12

1,033

19

1,032

062

010

5_3_2_6

1,052

13

1,052

18

1,052

191

000

5_3_3_5

1,368

8

1,299

10

1,270

229

223

5_3_3_6

1,369

6

1,358

11

1,358

781

000

5_3_3_7

1,400

8

1,399

10

1,380

490

136

5_3_4_7

1,600

20

1,581

2,829

5_3_5_7

1,720

18

1,640

48,722

465

5_3_6_7

1,910

31

1,660

21,217

1,309

Table 1 shows some of our results for instances with no transportation time. The results shown in Table 1 are the best solutions for both approaches (GA and MAS-RL), using a maximum of 5 replications for each experiment. For each method, the obtained solution (PFD) and the computation time in seconds (sec) are presented. The MILP is executed first without relaxation on Cplex 12.2. When it fails to find a solution in reasonable time, the transfer constraints are relaxed to obtain a lower bound. As shown in Table 1, from three jobs and seven tasks, the MILP takes a long time to obtain a solution (see instances 5_3_3_7, 5_3_5_7 and 5_3_6_7). Cplex is unable to solve the last two instances with the whole group of constraints. Though, a relaxation of the transfer constraints leads to the solution shown in the column Lower bound. For each instance, a gap between the PFD obtained by MAS-RL and the MILP is computed according to the following formula: gap = 100 ∗

Time (s)

MILP

RL − MILP RL

Table 1 shows that the MAS-RL method has an average gap of 2.26% and it achieves good results in reasonable computation times, compared to the MILP method. Meanwhile, it takes on average more time to calculate these solutions compared to the GA approach due to the time needed to generate action policies. However, this time will be gained later when the system is able to react rapidly to disturbances. Unlike our approach, the GA approach operates off-line, which doesn’t allow it to react to disturbances in real time; the GA approach uses rescheduling to react to disturbances. Policies are generated through a 2-phase process, including an exploration phase in which actions are carried out randomly according to a Boltzmann probability, and an exploitation phase in which action policy is generated and becomes ready to use. The exploitation phase attains the best performances (best PFD). Nonetheless, our approach provided better solutions for 50%

119

of the cases studied, with an average improvement of 17 time units. The proposed MAS-RL approach is also compared to the MILP taking into account transfer duration; results are shown in Table 2. As seen in Table 2, the MAS-RL approach using the transfer duration constraint reaches very interesting near-optimal solutions: for the instances where optimal values are available, the average gap is only 3%. The MILP is unable to solve the last instance, even after applying the relaxation of transfer constraints. Cplex stopped in solving the relaxed MILP at 10% of the optimum. Additionally, MILP method takes a long time for solving the instances 5_3_3_7 to 5_3_6_7 (7116.5 s in average), making this approach unusable within a reactive context.

Reactive behavior As mentioned above, one advantage of reinforcement learning algorithms is that they allow evaluation while learning. Fig. 11 presents a graph that shows the evolution of the PFD values for the case of continuously arriving requests for batches with 7 operations. In the exploration phase, the scheduling solutions have a large PFD (up to 1,000) at 5,000 and 6,000 iterations. However, in the exploitation phase— just before and after the 10,000 iterations—the solutions have a PFD value of about 168. These exploration and exploitation stages have been experimented with, using the same disturbance twice (i.e., stopping the units), once at 2,000th iteration and again at 15,000th. The studied disturbances concern a shutdown of one or more sites. This example concerns the shutdown of site 1 for 8 time units, which is an average length for a site shutdown in the group.

123

Author's personal copy J Intell Manuf Table 2 Results comparison—Best solutions with transfer time

Instances

5_3_2_6

Proposed MAS-RL

MILP

PFD

Time (s)

PFD

1,092

15

1,092

Gap % Time (s)

Lower bound

Time (s)

157

000

5_3_3_5

1,338

11

1,300

138

284

5_3_3_6

1,439

11

1,379

319

417

5_3_3_7

1,490

17

1,429

31,151

409

5_3_4_7

1,700

20

1,621

10,532

465

5_3_5_7

1,720

16

1,680

3,714,14

5_3_6_7

1,990

45

233 1,781,1

18,044,1

Fig. 11 Graph of PFD evolution

As Fig. 11 shows, in the exploitation stage, the disturbance is quickly compensated for, and the system is brought back to its best performance levels: PFD ≈ BestPFD. These results show that our system is able to learn an optimal control policy that is continuously improving, while at the same time trying to reduce the number of attempts needed to perform operations correctly in order to meet the deadlines for multi-site production. In the GA approach, when a disturbance occurs, the system considers the new state and launches rescheduling operations, which takes time. However, the proposed MAS-RL capitalizes on RL reactivity feature during the exploitation phase when an action policy is created. Figure 12 shows the comparison between the times taken by the two approaches to react to a disturbance (i.e., a resource breakdown). In Table 1 and Fig. 12, times were rounded to the nearest second. Nonetheless, the proposed approach is significantly faster than the GA in compensating for the disturbance. The GA approach requires restarting the whole calculation process for the rest of the production with the new constraints, which takes about 11 s. The proposed MAS-RL approach takes no more than one second, corresponding to the time to code the state, to search for it in the evaluation table (Q-learning table) and to choose the corresponding action.

123

These results clearly illustrate the adaptability of the proposed model and its possible extensibility to larger case studies.

Conclusions and future works In this paper, a control problem for multi-site companies is addressed. Our exclusive focus is dynamic time optimization for supply chain planning. A proposed MAS-RL model is based on multi-agent system for problems involving adaptive scheduling in multi-site companies. Agents were regrouped in groups according to Aalaadin meta-model, which ensures the coherency of agent coordination and interaction. In order for agents to be adaptive and provide an optimal schedule in dynamic context, a reactive learning algorithm using the SARSA reinforcement learning has been designed. This algorithm allows the system to be reactive and respond immediately to disturbances that might arise (e.g., client urgent demands, product life cycle changes, resource breakdowns). Experiments provided results about the time optimality of the proposed system model in a predictive context, showing that the scheduling solutions provided by the model are very interesting compared to a MILP, particularly in exploitation phase.

Author's personal copy J Intell Manuf Fig. 12 Results comparison—time taken to react to a disturbance

In addition, results show that the model’s reactivity faced with disturbances is high, compared to the GA approach. The proposed system model is able to compensate for disturbances, providing decisions in few seconds. Nonetheless, several prospects for future research have been identified. First, MAS-RL will be extended theoretically by taking into consideration more constraints, such as the availability of raw materials. Cost and quality optimization will also be addressed. In addition, the use of learning techniques to other agent in the system will be considered. As for practical objectives, the performance of the learning model and experimenting with many other learning algorithms will be tested. The learning parameters through experimentation must also been fine-tuned. Finally, the proposed system shall be applied on other real cases, particularly in oil industry and their supply chain.

References Aissani, N., Trentesaux, D., & Beldjilali, B. (2008). Use of machine learning for continuous improvement of the real time manufacturing control system performances. International Journal of Industrial System Engineering, 3(4), 474–497. Aissani, N., Beldjilali, B., & Trentesaux, D. (2008b). Efficient and effective reactive scheduling of manufacturing system using SARSA-multi-objective-agents. In Proceedings of the 7th international conference MOSIM, Paris, pp. 698–707. Aissani, N., Trentesaux, D., & Beldjilali, B. (2009). Dynamic scheduling of maintenance tasks in the petroleum industry: A reinforcement approach. EAAI: Engineering Applications of Artificial Intelligence, 22, 1089–1103. Ait Si Larbi, E. Y., Aissani, N., & Beldjilali, B. (2008). Un Modèle de Planification pour les Entreprises Multi Sites basé sur les Systèmes Multi Agents et les Algorithmes Génétiques. In Proceedings of the 10th maghrebian conference on information technologies, Oran, Algeria, pp. 506–511. Bousbia, S., & Trentesaux, D. (2002). Self-organization in distributed manufacturing control: State-of-the-art and future trends. IEEE International Conference on Systems, Man & Cybernetics, 5, 6. Brucker, P., & Schlie, R. (1990). Job shop scheduling with multipurpose machines. Computing, 45, 369–375.

Brandimarte, P. (1993). Routing and scheduling in a flexible job shop by tabu search. Annals of Operations Research, 41, 157–183. Chuin Lau, H., Agussurja, L., & Thangarajoo, R. (2008). Real-time supply chain control via multi-agent adjustable autonomy. Computers and Operations Research, 35(11), 3452–3464. Conway, R. W., Maxwell, W. L., & Miller, L. W. (1967). Theory of scheduling. Reading, MA: Addison-Wesley. Dabbene, F., Gay, P., Tortia, C., & Sacco, N. (2005). Optimization of fresh–food supply chains in uncertain environments: An application to the meat-refrigeration process, decision and control, European Control Conference. CDC-ECC;05. 44th IEEE Conference on Volume 12, Issue 15, pp. 2077–2082. Dauzere-Peres S., & Paulli, J. (1994). Solving the general job-shop scheduling problem. Management. Report Series, vol. 182. Erasmus University Rotterdam, Rotterdam School of Management, Rotterdam. Dauzère-Pérès, S., & Paulli, J. (1997). An integrated approach for modeling and solving the general multiprocessor job-shop scheduling problem using tabu search. Annals of Operations Resarch, 70, 281–306. Dongbing, G., & Yang, E. (2007). Fuzzy policy reinforcement learning in cooperative multi-robot systems. Journal of Intelligent and Robotic Systems, 48(1), 7–22. Fattahi, P., Saidi Mehrabad, M., & Jolai, F. (2007). Mathematical modelling and heuristic approaches to flexible job shop scheduling problems. Journal of Intelligent and Manufacturing, 18, 331–342. Ferber, J., & Gutknecht, O. (1998). A meta-model for the analysis and design of organizations in multi-agent systems. In Proceedings ICMAS’98, pp. 128–135. Fontan, G., Merce, C., & Erschler, J. (2001). La planification des flux de production, Performance industrielle et gestion des flux, Hermes Lavoisier, Traité IC2 Information-Commande-Communication, 2001, Chap 3, pp. 69–112. Galliano, D., & Soulie, N. (2007). Organisational and spatial determinants of the multiunit firm: Evidence from the French indus, Cahiers du GRES, No 17. Gao, J., Gen, M., Sun, L., & Zhao, X. (2007). A hybrid of genetic algorithm and bottleneck shifting for multiobjective flexible job shop scheduling problems. Computers and Industrial Engineering, 53, 149–162. Goldberg, D. E., & Lingle, R. (1985). Alleles, loci and the traveling salesman problem. In Proceedings of the first international conference on Genetic Algorithms, pp. 10–19. Haruno, M., & Kawato, M. (2006). Heterarchical reinforcementlearning model for integration of multiple cortico-striatal loops: FMRI examination in stimulus-action-reward association learning. Neural Networks, 19(Special Issue), 1242–1254.

123

Author's personal copy J Intell Manuf Holland, J. H. (1975). Adaptation in natural and artificial systems. Ann Arbor, IL: University of Michigan Press. Hong, G., Chuan, L., Zhang, L., & Xianming, Z. (2007) Study on supply chain optimization scheduling of networked manufacturing, ACOS’07: In Proceedings of WSEAS: 6th international conference on applied computer science, Vol 6. IBM. (2010). IBM ILOG CPLEX optimizer, high performance mathematical optimization engines. http://www-01.ibm.com/software/ integration/optimization/cplex-optimizer/. James, T. L., Brown, E. C., & Keeling, K. B. (2007). A hybrid grouping genetic algorithm for the cell formation problem. Computers and operations research. No 34, pp. 2059–2079. Kacem, I., Hammadi, S., & Borne, P. (2002). Approach by localization and multiobjective evolutionary optimization for flexible job shop scheduling problems. IEEE Transactions on Systems, Man and Cybernetics, Part C, 32(1), 408–419. Katalinic, B., & Kordic, V. (2004). Bionic assembly system: Concept, structure and function. In Proceedings of the 5th IDMME, Bath, UK. Lee, H.L., & Rosenblatt, J. (1986). A generalized quantity discount pricing model to increase supplier’s profits. Management Science, 33(9), 1167–1185. Leitao, P., & Restivo, F. (2008). A holonic approach to dynamic manufacturing scheduling. Robotics and Computer-Integrated Manufacturing, 24, 625–634. Marquès, G., Lamothe, J., Thierry, C., & Gourc, D. (2009). A supply chain performance analysis of a pull inspired supply strategy faced to demande uncertainties. Journal of Intel Manufacturing, 20(6). doi:10.1007/s10845-009-0337-z. Mastrolilli, M., & Gambardella, L. M. (2000). Effective neighbourhood functions for the flexible job shop problem. Journal of Scheduling, 3, 3–20. Mati, Y., Rezg, N., & Xie, X.L. (2001). Geometric approach and taboo search for scheduling flexible manufacturing systems. IEEE Transactions on Robotics and Automation, 17, 805–818. Mati, Y., Lahlou, C., & Dauzère-Pérès, S.(2010). Modelling and solving a practical flexible job-shop scheduling problem with blocking constraints. International Journal of Production Research, 1366588X, First published on 23 Sep. 2010. McCulloch, W. S. (1945). A heterarchy of values determined by the topology of nervous nets. Bulletin of Mathematical Biology, 7, 89– 93. Miles, R. E., & Snow, C. C. (1992). Managing 21st century network organisations. Organizational dynamics, winter session. Mintzberg, H. (1980). Structure in 5’s: A synthesis of the research. Organization Design Management Science, 26(3). Monahan, J. P. (1984). A quantity pricing model to increase vendor profits. Management Science, 30(6), 720–726. Monostori, L., Csáji, B. Cs., & Kádár, B. (2004). Adaptation and learning in distributed production control. CIRP Annals-Manufacturing Technology, 53(1), 349–352. Mourtzis, D., Papakostas, N., Makris, S., Xanthakis, V., & Chryssolouris, G. (2008). Supply chain modeling and control for producing highly customized products. General assembly of CIRP No58, Manchester, UK (24/08/2008), Vol. 57, No 1, p. 588. Narasimhan, R., & Mahapatra, S. (2004). Decision models in global supply chain management. Industrial Marketing Management, 33, 21–27. Ouhimmou, M., D’Amours, S., Beauregard, R., Ait-Kadi, D., & Singh Chauhan, S. (2008). Furniture supply chain tactical planning optimization using a time decomposition approach. European Journal of Operational Research, 189(3), 952–970. Ounnar, F., & Pujo, P. (2009). Pull control for job shop: Holonic manufacturing system approach using multicriteria decision-making. Journal of Intelligent Manufacturing. doi:10.1007/ s10845-009-0288-4.

123

Pezzella, F., Morganti, G., & Ciaschetti, G. (2008). A genetic algorithm for the flexible job-shop scheduling problem. Computers & Operations Research, 35(10), Oct. 2008. Prabhu, V. V. (2003). Stability and fault adaptation in distributed control of heterarchical manufacturing job shops. IEEE Transactions on Robotics and Automation, 19(1), 142–147. Rummery, G., & Niranjan, M. (1994). On-line q-learning using connectionist systems. Technical Report CUED/F-INFENG/TR 166. Cambridge University, Engineering Department. Russell, S., & Norvig, P. (1995). Artificial intelligence: A modern approach, the intelligent agent book. Prentice Hall Series in Artificial Intelligence. Sallez, Y., Berger, T., & Trentesaux, D. (2009). A stigmergic approach for dynamic routing of active products in FMS. Computers in Industry, 60(3), 204–216. Sauer, J., Freese, T., & Teschke, T. (2000). Towards agent-based multisite scheduling. ECAI 2000 workshop on new results in planning, scheduling and design, pp. 123–130. Silva, N., Sousa, P., & Ramos, C. (1998). A holonic manufacturing system implementation. Advanced Summer Institute(ASI’98). Bremen, Germany; 14–17 June 1998. Swaminathan, J., Smith, S., & Sadeh-Koniecpol, N. (1997). Modeling supply chain dynamics: A multiagent approach. Decision Sciences. Takadama, K., & Fujita, H. (2004). Lessons learned from comparison between Q-learning and sarsa agents in bargaining game. In North American association for computational social and organizational science (NAACSOS 2004), June 27–29, Pittsburgh, PA. Tang, C. S. (1990). The impact of uncertainty on a production line. Management Science, 36(12), 1518–1531. Tarantilis, C. D. (2008). Topics in real-time supply chain management. Computers & Operations Research, 35(11), 3393–3396. TehraniNik Nejad, H., Sugimura, N., Iwamura, K., & Tanimizu, Y. (2008). Multi agent architecture for dynamic incremental process planning in the flexible manufacturing system. Journal of Intelligent Manufacturing, 21(4), 487–499. Thierry, C. (2003). gestion de chaînes logistiques, modèles et mise en œuvre pour l’aide à la décision à moyen terme. HDR thesis, Toulouse 2 university. Trentesaux, D., Pesin, P., & Tahon, C. (2000). Distributed artificial intelligence for FMS scheduling, control and design support. Journal of Intelligent Manufacturing, 11, 573–589. Trentesaux, D. (2009). Distributed control of production systems. Engineering Applications of Artificial Intelligence, 22(7), 971–978. Tsai, J.-F. (2007). An optimization approach for supply chain management models with quantity discount policy. European Jounal of Operational Research, 177(2), 982–994. Vaario, J., & Ueda, K. (1998). An emergent modelling method for dynamic scheduling. Journal of Intelligent Manufacturing, 9, 129– 140. Van Brussel, H., Wyns, J., Valckenaers, P., Bongaerts, L., & Peeters, L. (1998). Reference architecture for holonic manufacturing systems: PROSA. Computers in Industry, 37(3), 255–274. Vilcot, G. (2007). Algorithmes approchés pour des problèmes d’ordonnancement multicritère de type job shop flexible et job shop multiressource, PhD thesis, University of François-Rabelais, Tours, France. Voss, S., & Woodruff, D. L. (2006). Introduction to computational optimization models for production planning in supply chain. Berlin: Springer. Watkins, C. J. C. H. (1989). Learning from delayed rewards, PhD thesis, Cambridge University, Cambridge, England. Weng, Z. K., & Wong, R. T. (1993). General models for the supplier’s all-unit quantity discount policy. Naval Research Logistics, 40(6), 971–991.

Author's personal copy J Intell Manuf Yang, P. C. (2004). Pricing strategy for deteriorating items using quantity discount when demand is price sensitive. European Journal of Operational Research, 157, 389–397. Zambrano, G., Aissani, N., Pach, C., Berger, T., & Trentesaux, D. (2011). An approach for temporal myopia reduction in heterarchical control architectures. In Proceedings of 20th IEEE inter symposium on industrial electronics, 27–30 June 2011. Zbib, N., Pach, C., Sallez, Y., & Trentesaux, D. (2010). Heterarchical production control in manufacturing systems using the potential

fields concept. Journal of Intelligent Manufacturing. doi:10.1007/ s10845-010-0467-3. Zhang, H., & Gen, M. (2005). Multistage-based genetic algorithm for flexible job-shop scheduling problem. Journal of Complexity International, 11, 223–232. Zobolas, G. I., Tarantilis, C. D., & Ioannou, G. (2008). Exact, Heuristic and Meta-heuristic Algorithms for Solving Shop Scheduling Problems, Studies in Computational Intelligence (SCI). (pp. 1–40). Berlin: Springer.

123