Switching Techniques, Adaptive Routing and Deadlock ... - IEEE

Parallel Architectures Group Grupo de Arquitecturas Paralelas (GAP)

Switching Techniques, Adaptive Routing and Deadlock Handling in Interconnection Networks Jose Duato Dept. de Ingeniera de Sistemas, Computadores y Automatica Universidad Politecnica de Valencia, Spain 1


Adaptive Routing and Deadlock Handling in Interconnection Networks Jose Duato Dept. de Ingeniera de Sistemas, Computadores y Automatica Universidad Politecnica de Valencia, Spain

1


Outline

Introduction Switching techniques Optimized switching techniques Deadlock handling Theory of deadlock avoidance Design methodologies Application to deadlock recovery Application to networks of workstations Performance evaluation

2


Outline Introduction Deadlock handling Theory of deadlock avoidance Design methodologies Application to deadlock recovery Application to networks of workstations Performance evaluation

2


Introduction (From W. J. Dally) The performance of most digital systems today is limited by their communication or interconnection, not by their logic or memory Most of the power is used to drive wires and most of the clock cycle is spent on wire delay, not gate delay As technology improves, pin density and wiring density are scaling at a slower rate than the components themselves. Also, the frequency of communication between components is lagging far beyond the clock rates of modern processors These factors combine to make interconnection the key factor in the success of future digital systems 3


Introduction (From W. J. Dally) As designers strive to make more ecient use of scarce interconnection bandwidth, interconnection networks are emerging as a nearly universal solution to the system-level communication problems for modern digital systems Originally developed for the demanding communication requirements of multicomputers, interconnection networks are beginning to replace buses as the standard system-level interconnection Interconnection networks are also replacing dedicated wiring in special-purpose systems as designers discover that routing packets is both faster and more economical than routing wires 4


Introduction Interconnection networks are currently being used for many dierent applications, ranging from internal buses in VLSI circuits to wide area computer networks. These applications include: System area networks Telephone switches Internal networks for ATM switches Processor/memory interconnects for vector supercomputers Interconnects for multicomputers Interconnects for distributed shared-memory multiprocessors Clusters of workstations Local area networks Metropolitan area networks Computer networks Wide area networks

}

5


Introduction Parallel computers should be designed using commodity components to be cost-eective Unfortunately, commodity communication subsystems have been designed to meet a dierent set of requirements, i.e., those arising in computer networks Designing high performance interconnection networks becomes a critical issue to exploit the performance of parallel computers Most manufacturers designed custom interconnection networks Recently, several high performance switches have been developed to build inexpensive parallel computers by connecting cost-eective computers through those switches

6


Main design parameters Topology: De nes how the nodes are interconnected by channels

Direct networks, switch-based networks

Routing algorithm: Determines the path selected by a message to reach its destination Deterministic routing, adaptive routing

Switching technique: Determines how and when buers are

reserved and switches are con gured Packet switching, circuit switching, wormhole, virtual cut-through 7

Interconnection Networks Shared-Medium Networks Local Area Networks Contention Bus (Ethernet) Token Bus (Arcnet) Token Ring (FDDI Ring, IBM Token Ring) Backplane Bus (Sun Gigaplane, DEC AlphaServer8X00, SGI PowerPath-2)

Direct Networks (Router-Based Networks) Strictly Orthogonal Topologies Mesh 2-D Mesh (Intel Paragon) 3-D Mesh (MIT J-Machine) Torus (k-ary n-cube) 1-D Unidirectional Torus or Ring (KSR first-level ring) 2-D Bidirectional Torus (Intel/CMU iWarp) 3-D Bidirectional Torus (Cray T3D, Cray T3E) Hypercube (Intel iPSC, nCUBE) Other Topologies: Trees, Cube-Connected Cycles, de Bruijn, Star Graphs, etc.

Indirect Networks (Switch-Based Networks)

Irregular Topologies (DEC Autonet, Myrinet, ServerNet)

Hybrid Networks Multiple-Backplane Buses (Sun XDBus) Hierarchical Networks (Bridged LANs, KSR) Cluster-Based Networks (Stanford DASH, HP/Convex Exemplar) Other Hypergraph Topologies: Hyperbuses, Hypermeshes, etc.


Regular Topologies Crossbar (Cray X/Y-MP, DEC GIGAswitch, Myrinet) Multistage Interconnection Networks Blocking Networks Unidirectional MIN (NEC Cenju-3, IBM RP3) Bidirectional MIN (IBM SP, TMC CM-5) Nonblocking Networks: Clos Network

8


Direct networks

(a) 2-ary 4-cube (hypercube)

(b) 3-ary 2-cube

(c) 3-ary 3D-mesh

9


Multistage interconnection networks 0000

0000

0000

0000

0001

0001

0001

0001

0010

0010

0010

0010

0011

0011

0011

0011

0100

0100

0100

0100

0101

0101

0101

0101

0110

0110

0110

0110

0111

0111

0111

0111

1000

1000

1000

1000

1001

1001

1001

1001

1010

1010

1010

1010

1011

1011

1011

1011

1100

1100

1100

1100

1101

1101

1101

1101

1110

1110

1110

1110

1111

1111

1111

1111

Multistage butterfly network

Omega network 10


Switch-based irregular topologies Bidirectional Links

1

0

2

1 5

7

5

0

2 7

3

Switch

3

4

4 6

6

Processing Elements

Switch-Based Network

Graph Representation 11


Generalized MIN model

N

M

P o r t s

P o r t s

C0

G0

C1

G1

Gg − 1

Cg 12


Uni ed View

Some manufacturers developed switches that are suitable to implement either direct or indirect networks (Inmos C104, SGI SPIDER) We can view networks using point-to-point links as a set of interconnected switches, each one connected to zero, one, or more nodes: Direct networks correspond to the case where every switch is connected to a single node Crossbar networks correspond to the case where there is a single switch connected to all the nodes Multistage interconnection networks correspond to the case where switches are arranged into several stages and the switches in intermediate stages are not connected to any processor

13


Router organization LC

LC

Ejection Channel

LC

LC

LC

LC Switch

LC LC

LC

Routing & Arbitration

Output Channels

Input Channels

Injection Channel

LC

14


Switching Switching: Determines how and when buers are reserved and switches are con gured

Flow control: Synchronization protocol for transmitting and receiving a unit of information

Unit of ow control: Portion of the message whose transfer must be synchronized

Flow control occurs at two levels: message ow control and physical channel ow control 15


Packet switching and circuit switching Channel

Time-space diagram (packet switching)

Time

Channel

Time-space diagram (circuit switching)

Time 16


Virtual cut-through and wormhole switching T D D D D D D D D D D D D D D H

Channel

Time-space diagram

Time

17


Virtual channels

Flit buffers

channel

To switch

Physical

Channel demultiplexor

Channel multiplexor

From switch

Virtual channel controller

Flit buffers

18


Performance of switching techniques Packet switching is well suited for very short messages Circuit switching is well suited for very long messages Virtual cut-through switching is well suited for messages of any length but requires splitting messages into xed-size packets Wormhole switching is well suited for messages of any length but saturates at moderate loads. Virtual channels alleviate this situation Wormhole switching has been preferred for electronic routers because buers can be small and the resulting circuits are compact and fast 19


Optimized switching techniques Trac from real applications may be bimodal and may vary over time Wormhole switching can be used for short messages Circuit switching can be used for very long messages Path set-up can be overlapped with useful computation and/or circuits can be reused Physical circuits do not need buers at intermediate routers and can be made much faster than conventional links either by using wave pipelining or optical technology 20

Pipelined Output Channels

Pipelined Input Channels

Sync Switch S k Sync

Sync Switch S 1 Sync

Wormhole Control Unit

PCS Control Unit

Output Channels

mux mux

Switch S 0


Input Channels

mux

Control Channels

Optimized router organization

From/to Local Processor

21


Performance for multimedia applications 16000

’CS 28+4’ ’WSNR 16+16 ’ ’WH’

Average Latency (cycles)

14000

10% short messages (16 its) 90% long messages (1024 its)

12000 10000 8000 6000 4000 2000 0 0.05

0.1 0.15 0.2 Traffic ( CLK x 2) (flits/node/cycle)

0.25

22


Performance for multimedia applications 18000 ’CS 28+4’ ’WSNR 16+16 ’ ’WH’


16000 14000 12000 10000 8000 6000 4000


2000 0 0.05


23


Performance for multimedia applications ’CS 28+4’ ’WSNR 16+16 ’ ’WH’


35000 30000 25000 20000 15000 10000


5000 0 0.05


24


Performance for multimedia applications 40000 ’WS 16+16’ ’WH’ ’WH 2 VC’ ’WH 3 VC’


35000 30000 25000 20000 15000 10000 5000

10% short messages (16 its) 90% long messages (1024 its) Only long messages are shown

0 0

0.05 0.1 0.15 0.2 Long messages traffic (1024 flits long, 90%)

25



’WS 16+16’ ’WH’ ’WH 2 VC’ ’WH 3 VC’


25000

10% short messages (16 its) 90% long messages (1024 its) Only short messages are shown

20000 15000 10000 5000 0 0

0.005 0.01 0.015 0.02 0.025 Short messages traffic (16 flits long, 10%)

0.03

26



’WS 16+16’ ’WH ’ ’WH 2 VC’ ’WH 3 VC’


25000 20000 15000 10000 5000

10% short messages (16 its) 90% long messages (1024 its) Only short messages are shown

0 0

1.0e-4

2.0e-4

3.0e-4

4.0e-4

Short messages traffic (16 flits long, 10%)

27


Performance for multimedia applications 100000 ’32 slots’ ’16 slots’ ’8 slots’ ’4 slots’ ’2 slots’ ’1 slot’


80000

60000

40000

20000

10% short messages (16 its) 90% long messages (1024 its) 256 Gbps link bandwidth

0 0

0.01

0.02

0.03 0.04 0.05 0.06 0.07 Traffic for 256 Gbps (10% 16 flits, 90% 1024 flits)

0.08

0.09

28


Performance for multimedia applications 120000 ’32 slots’ ’16 slots’ ’8 slots’ ’4 slots’ ’2 slots’ ’1 slot’

100000

Average Latency

80000

60000

40000

20000

40% short messages (16 its) 60% long messages (1024 its) 256 Gbps link bandwidth

0 0

0.01

0.02

0.03 0.04 0.05 0.06 Traffic for 256 Gbps (40% 16 flits, 60% 1024 flits)

0.07

0.08

29


Routing Algorithms

Number of Destinations

Routing Decisions

Implementation

Adaptivity

Progressiveness

Unicast Routing

Centralized Routing

Source Routing

Multicast Routing

Distributed Routing Multiphase Routing

Table Lookup

Finite-State Machine

Deterministic Routing

Adaptive Routing

Progressive

Backtracking

Minimality

Profitable

Misrouting

Number of Paths

Complete

Partial

30


Situations that may prevent packet delivery Undeliverable Packets Deadlock Prevention Avoidance Recovery Livelock Minimal Paths Restricted Nonminimal Paths Probabilistic Avoidance Starvation Resource Assignment Scheme 31


Deadlock handling Deadlock prevention: Backtracking Deadlock avoidance: Acyclic graph, acyclic subgraph Regressive deadlock recovery: Message removal, message abortion Progressive deadlock recovery: Disha

Main goal Design of ecient deadlock-free fully adaptive routing algorithms 32


Deadlocked con guration N0 2 2 2 2 1 1 1 1

N3

2 2 2 2 1 1 1 1

0 0 0 0

3 3 3 3

N1

Messages wait for resources held by other messages in a cyclic way ) Removing cyclic dependencies will avoid deadlock

3 3 3 3 0 0 0 0

N2

33


Allowing cyclic dependencies Example for the unidirectional ring: Ai channels can be used to forward messages to all the destinations. Hi channels can only be used if the destination is higher than the current node. c

c

cH0 n0

n1 cA0

cA3

c

cA1

cH1

cA2 n3

n2 cH2

There exist cyclic dependencies between Ai channels However, Hi channels have no cyclic dependencies There is no deadlock because messages waiting for resources can always escape by using Hi channels c

c

34


Theory of deadlock avoidance (informal) Interconnection network

35


Adaptive routing function and selection function

nc

nc

nd

Routing Function

nc

nd

nd

Selection Function

36


Routing subfunction Network channels can be split into two subsets: adaptive and escape channels The routing function will be referred to as routing subfunction when restricted to escape channels

37


Approach to avoid deadlock An adaptive routing function may allow cyclic dependencies between channels as long as: There exist a subset of channels (escape channels) that have no cyclic dependencies between them It is possible to establish a path from the current node to the destination node using only escape channels For wormhole switching, when a message reserves an escape channel and then an adaptive channel, it must be able to select an escape channel at the current node, i.e., escape channels should have no cyclic dependencies indirectly through adaptive channels

38


Deadlock produced by indirect dependencies A set of messages are cyclically waiting for channels occupied by other messages in the set Some messages are able to use escape channels but reach another cycle. Messages using escape channels are cyclically waiting indirectly through adaptive channels ) There is a deadlock 39


Design methodology Based on the extension of other routing functions Allows the use of all the alternative minimal paths Does not increase the number of physical channels Provides a way to:

Extend the network topology and the routing function

Guarantee the absence of deadlocks 40


Design methodology Steps: Given an interconnection network 1, de ne a minimal path connected deadlock-free routing function 1

I

R

Split each physical channel into a set of additional virtual channels. The new routing function can use any of the new channels belonging to a minimal path or, alternatively, the channels supplied by 1

R

Verify that the extended channel dependency graph for 1 is acyclic. If it is, the routing algorithm is valid. Otherwise, it must be discarded. This step is not required for store-and-forward and virtual cut-through

R

41


Design example Routing algorithm for n-dimensional meshes Basic algorithm: Dimension order routing Step2: Split each physical channel i;1; ai;2; : : : ; ai;k,1; bi

i

c

into

k

virtual channels

a

New algorithm: Route over any minimal path using any of the channels. Alternatively, route over the lowest useful dimension using the corresponding channel

a

b

The MIT Reliable Router uses two virtual channels for fully adaptive minimal routing and two virtual channels for dimension-order routing in the absence of faults (on a 2-D mesh) 42


Example routing paths for 2-D meshes 0

1

2

3

4

5

6

7

Source node Destination node

8

9

10

11

12

13

14

15

Channels supplied by R

43


Extended channel dependency graph for R1 b01

b10

b12

b21

b14 b03

b25

b30

b52 b34

b45 b54

b43 b36

b58

b63

b85 b74

b67

b76

b87 b78

44


Performance evaluation for the 2-D mesh 400

0.55 ’Deterministic (1 vc)’ ’Deterministic (2 vc)’ ’Adaptive (2 vc)’


350

0.52

Network size: 256 processors. Message length: 16 its. Random trac

300 250 200 150 100

0.1

0.2

0.3

0.4

0.5

0.6

Normalized Accepted Traffic

45


Performance evaluation for the 3-D mesh 350 0.52


300

’Deterministic (1 vc)’ ’Deterministic (2 vc)’ ’Adaptive (2 vc)’


250 200 150 100 50 0.1

0.2

0.3

0.4

0.5


46


Performance evaluation for the 2-D torus Average Latency (cycles)

250

0.52

’Deterministic (2 vc)’ ’Part-Adaptive (2 vc)’ ’Adaptive (3 vc)’

200

150

100


50 0.1

0.2

0.3

0.4

0.5


47


Performance evaluation for the 3-D torus 220

0.52

’Deterministic (2 vc)’ ’Part-Adaptive (2 vc)’ ’Part-Adaptive (3 vc)’ ’Adaptive (3 vc)’

200


180 160 140 120 100 80


60 40 0.1

0.2

0.3

0.4

0.5


48


Performance evaluation for the 3-D torus (II) Average Latency (cycles)

80

Network size: 512 processors. Message length: 16 its. Local trac

70 60 50 ’Deterministic (2 vc)’ ’Part-Adaptive (2 vc)’ ’Adaptive (3 vc)’

40 30 0.2

0.4

0.6

0.8

1

1.2


49


Performance evaluation for the 3-D torus (III) Average Latency (cycles)

100

Network size: 512 processors. Message length: 16 its. Bit-reversal trac pattern


90 80 70 60 50 40 0.05

0.1

0.15

0.2

0.25

0.3

0.35


50


Accurate performance evaluation for the 3-D torus 550


Average Latency (ns)

500 450 400 350 ’Deterministic (2 vc)’ ’Part-Adaptive (2 vc)’ ’Adaptive (3 vc)’

300 250 10

20

30

40

50

60

Traffic (flits/node/us)

51


Accurate performance evaluation for the 3-D torus (II) 550

Network size: 512 processors. Message length: 16 its. Local trac


500 450 400 350 300 ’Deterministic (2 vc)’ ’Part-Adaptive (2 vc)’ ’Adaptive (3 vc)’

250 200 20

40

60

80

100

120

140

160


52


Accurate performance evaluation for the 3-D torus (III) 700


650

Network size: 512 processors. Message length: 16 its. Bit-reversal trac pattern


600 550 500 450 400 350 300 250 5

10

15

20

25

30

35

40

45


53


Application to deadlock recovery Routing resources (channels or buers) are split into two classes: adaptive and escape Adaptive resources can be freely used by all the packets When a packet is waiting for longer than a timeout, it moves to an escape resource Once a packet uses an escape resource, it cannot use an adaptive resource again This routing scheme eliminates all the indirect dependencies between adaptive and escape resources 54


Router organization for Disha LC

Ejection Channel

LC

LC

LC

LC

LC

Switch LC

LC

Output Channels

Input Channels

Injection LC Channel

LC Routing and Arbitration

Deadlock Buffer

55


Routing on edge and deadlock buers 0

1

2

3

7

6

5

4

8

9

10

11

15

14

13

12

0

1

2

3

7

6

5

4

8

9

10

11

15

14

13

12

Deadlock buers can only be used in increasing label order When a deadlock is detected, the packet header can be routed to the deadlock buer Edge buers allow fully adaptive minimal routing Escape channels are de ned so that the routing subfunction is able to deliver messages for any destination (including deadlock buers) 56


Extended channel dependency graph for edge buers n0

c10

c21

n1

c50

n2

c41

n5

n4

c65

c32 n3

c74

n6

n7

c83

n8

57


Performance evaluation 180 .56

.27 160

.77 .49

Average Latency (Cycles)

140 120 100 80 −o− Avoidance−Det (2 VC)

60

−x− Recovery−Det (2 VC)

40

−+− Avoidance−Adap (3 VC) 20 0

−*− Recovery−Adap (3 VC) 0

0.1

0.2

0.3

0.4

0.5

0.6

0.7


58


Injection limitation Prevents performance degradation at saturation Reduces the frequency of deadlock occurrence to negligible values

RESERVE

RELEASE

BUSY OUTPUT CHANNELS COUNTER

COMPARATOR

INJECTION PERMITTED

THRESHOLD

59


Improved injection limitation mechanism PHYSICAL CHANNELS Vn-1

V1 V0 RESERVE

RELEASE

1 2 3 BUSY OUTPUT CHANNELS COUNTER m-1 BIT =1 counter

TRANSLATION

COMPARATOR

MESSAGE NUM.

0

INJECTION PERMITTED

TABLE

Bitwise OR

60


Improved deadlock detection mechanism 0

I

Threshold

1

2

Switch

3

Output Channels

Input Channels

Counter

Counter I

Thresho ld

61


Application to networks of workstations Networks of workstations are emerging as a cost-eective alternative to parallel computers. Switch-based interconnects like Autonet, Myrinet and ServerNet have been proposed to build networks of workstations with irregular topology. The irregularity provides: Wiring exibility. Scalability. Incremental expansion capability.

62


Drawback: The irregularity makes deadlock avoidance and routing quite complicated. Simplest solution: Avoid deadlock by eliminating all the cyclic dependencies between channels ) Many messages are routed following non-minimal paths. ! Higher message latency ! Waste of resources ! Lower throughput Alternative solution: Allow cyclic dependencies between channels ! Reduces contention by increasing routing adaptivity ! Allows more messages to follow minimal paths

63


Switch-based networks with irregular topologies Bidirectional Links

1

0

2

1 5

7

5

0

2 7

3

Switch

3

4

4 6

6

Processing Elements

Switch-Based Network

Graph Representation 64


The Autonet routing algorithm General characteristics: Deadlock-free routing scheme (up/down routing). Provides partially adaptive communication between nodes. Distributed. Implemented using table-lookup.

65


The up/down routing algorithm 0

"up" direction

4

6

2

7

5

1

3

Routing is based on an assignment of direction to the operational links. Routing rule: a legal route must traverse zero or more links in the \up" direction followed by zero or more links in the \down" direction.

Each cycle has at least one link in the \up" direction and one link in the \down" direction. Cyclic dependencies are avoided: messages cannot cross a link in the \up" direction after one in the \down" direction.

66


Routing eciency 0

"up" direction

From 7 to 0: OK From 2 to 5: lack of adaptivity From 4 to 1: non-minimal routing

4

6

2

7

5

1

3

The basic routing rule prevents from using minimal routing and adaptivity in most cases because of \down" to \up" con icts. Probability of non-minimal routing increases with network size.

67


A design methodology for adaptive routing algorithms interconnection network + deadlock-free routing function

new methodology

)

physical channels duplicated or split into two virtual channels (original and new) + extended routing function 68


Extended routing function Newly injected messages can use the new channels without any restriction. For performance reasons, only minimal paths are allowed Original channels are used exactly in the same way as in the original routing function Once a message reserves one of the original channels, it cannot use any of the new channels again When the routing table provides both kinds of channels, give preference to new channels

The extended routing function is deadlock-free

69


Improving the eciency of the methodology Idea: Focus on minimal routing, even if adaptivity is reduced Restrict the transition from new channels to original channels Improved adaptive routing function: { Newly injected messages can only use new channels { At intermediate switches, a higher priority is assigned to the new channels belonging to minimal paths { If all the new channels are busy, then an original channel belonging to a minimal path (if any) is selected { If none exists, then the one that provides the shortest path is used (this ensures deadlock-freedom) Once a message reserves an original channel, it can no longer reserve a new one

70


Performance evaluation Evaluation of four routing schemes: Basic up/down routing scheme (UD). Up/down routing scheme using two virtual channels per physical channel (UD-2VC). Adaptive routing scheme using two virtual channels per physical channel (A-2VC). Improved adaptive routing scheme using two virtual channels per physical channel (MA-2VC).

Performance evaluation carried out by simulation.

71


Network model: Topology generated randomly (8-port switches) 4 nodes (processors) connected to each switch Two adjacent switches are connected by a single link One routing control unit per switch (assigned in a round-robin fashion) Message destination is randomly chosen among nodes It takes one clock cycle to compute the routing algorithm, to transfer one it from an input buer to an output buer, or to transfer one it across a physical channel

72


Simulation results (I) 60

Average Latency (Cycles)

55 50

UD UD-2VC A-2VC MA-2VC

Network size: 16 switches. Message length: 16 its.

45 40 35 30 25 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Traffic (Flits/Cycle/Node)

73


Simulation results (II) Average Latency (Cycles)

60 ’UD’ ’UD-2VC’ ’A-2VC’ ’MA-2VC’

55 50


45 40 35 30 0.1

0.2

0.3

0.4

0.5

Traffic (Flits/Cycle/Node) 74


Simulation results (III) Average Latency (Cycles)

80 UD UD-2VC A-2VC MA-2VC

70


60 50 40 30 0.05

0.1

0.15

0.2

0.25

0.3



Simulation results (IV) Average Latency (Cycles)


180 160


140 120 100 80 0.05

0.1

0.15

0.2

0.25

0.3



Simulation results (V) Average Latency (Cycles)


800 700


600 500 400 300 0.05

0.1

0.15

0.2

0.25

0.3



Simulation results for application traces 160000

Amount of messages

140000 120000

Messages

100000

Traces from Barnes-Hut executed on 64 processors

80000 60000 40000 20000 0

1e+07

2e+07

3e+07

4e+07

5e+07

6e+07

7e+07

8e+07

Time

78


Simulation results for application traces MA-2VC UD-2VC UD

80000 70000

Latency (Cycles)

60000 50000 40000 30000 20000 10000 0

1e+07

2e+07

3e+07

4e+07 5e+07 Time (Cycles)

6e+07

7e+07

8e+07

79


Zoom of the rst peak 450000

MA-2VC UD-2VC UD

400000

Latency (Cycles)

350000 300000 250000 200000 150000 100000 50000 0 1.8e+07

1.85e+07

1.9e+07 1.95e+07 Time (Cycles)

2e+07

80


Zoom of the second peak 250000

MA-2VC UD-2VC UD

Latency (Cycles)

200000

150000

100000

50000

0

3.7e+07

3.75e+07

3.8e+07 3.85e+07 Time (Cycles)

3.9e+07

81


Zoom of the third peak MA-2VC UD-2VC UD

180000 160000

Latency (Cycles)

140000 120000 100000 80000 60000 40000 20000 0

5.2e+07

5.25e+07 5.3e+07 5.35e+07 Time (Cycles)

5.4e+07

82


Final Remarks

Hybrid switching techniques may considerably increase performance by using the appropriate switching technique for each message class Circuit switching can take advantage of wave pipelining and optical technology to increase link bandwidth Flexible deadlock avoidance and recovery schemes allow the design of more ecient routing algorithms These routing algorithms have been implemented in the MIT Reliable Router and the Cray T3E Adaptive routing and virtual channels are especially interesting when applications produce bursty trac that saturates the network during some time intervals (usually prior to synchronization points) Adaptive routing and virtual channels must be implemented eciently to minimize the increment in clock cycle time

83


Final Remarks Flexible deadlock avoidance and recovery schemes allow the design of more ecient routing algorithms These routing algorithms have been implemented in the MIT Reliable Router and the Cray T3E Adaptive routing and virtual channels are especially interesting when applications produce bursty trac that saturates the network during some time intervals (usually prior to synchronization points) Adaptive routing and virtual channels must be implemented eciently to minimize the increment in clock cycle time 83

Switching Techniques, Adaptive Routing and Deadlock ... - IEEE

Switching Techniques, Adaptive Routing and Deadlock ... - IEEE

Suggest Documents

Deadlock Analysis in Minimal Adaptive Routing ... - IEEE Xplore

Adaptive Deadlock- and Livelock-Free Routing With All ... - CiteSeerX

An Adaptive Deadlock and Livelock Free Routing Algorithm - CiteSeerX

Efficient Deadlock-Free Routing

Adaptive Routing Algorithm for Lambda Switching Networks

Transitively Deadlock-Free Routing Algorithms

Guest Editorial: Switching and Routing for Scalable and ... - IEEE Xplore

Guest Editorial: Switching and Routing for Scalable and ... - IEEE Xplore

Dissipativity-Based Switching Adaptive Control - IEEE Xplore

contemporary deadlock recovery techniques

Plane-Balanced and Deadlock-Free Adaptive

Campus Fabric Switching and Routing

Cisco Routing and Switching Certifications

CCIE Routing and Switching - jodoi.com

CCNA Routing and Switching FAQ

Adaptive Routing for Multihop IEEE 802.15.6 Wireless ... - IEEE Xplore

An Adaptive Routing Algorithm Over Packet Switching Networks for ...

Adaptive Mixed On-Time and Switching Frequency ... - IEEE Xplore

Joint Adaptive Modulation and Switching Schemes for ... - IEEE Xplore

A NEW DEADLOCK-FREE FAULT-TOLERANT ROUTING ...

Designing Deadlock-Free Turn-Restricted Routing ... - CiteSeerX

Adaptive and Fault-Tolerant Routing with 100% Node ... - IEEE Xplore

adaptive search techniques for problems in vehicle routing, part i

Capacitor Switching Techniques - Icrepq.com