Massively Parallel Processing Using Optical

1 downloads 0 Views 237KB Size Report
interconnection of these g groups requires g2 couplers. Communication takes place in time .... 11] I. Chlamtac and A. Ganz. Channel allocation protocols inĀ ...
Massively Parallel Processing Using Optical Interconnections

C. Salisbury R. Melhem Department of Computer Science University of Pittsburgh Pittsburgh, PA 15260 This chapter discusses why and how optical communication technology might be used in future large scale parallel processing systems. Commercial systems currently being built can interconnect thousands of processors, and future systems may be even larger. Systems with thousands of processors are often called massively parallel systems. In these systems, the ability to eciently communicate and share data between processors is critical to obtaining high performance. While optical technologies related to computation have been under investigation for many years, the technologies closest to practical implementations are those that can be applied to transmitting and processing optical signals. The use of optical technology to provide communication links between a large number of electronic processors holds promise for constructing very high performance parallel computer systems in the future.

1 Introduction Small scale parallel systems of two or more processors have been used commercially since the 1970's. Some of these systems were used to maintain large data bases while the user waited for the next advance in technology to provide more power in a single processor. Other users were working at the forefront of engineering and scienti c disciplines, where dicult problems could be solved only with large amounts of computational power. At the other end of the computer power spectrum, micro-computers developed in the early 1980's found a large market and were put to use in a wide variety of applications. This fueled the development of successively more powerful, yet low cost microprocessors. The computational power available on a desktop today often exceeds the power once available only on \mainframe" computers. As challenging engineering and scienti c problems have grown larger and the cost of high-end processors has increased, researchers have investigated ways to provide powerful, low cost processing systems by coupling a large number of microprocessors and putting them to work in parallel. To do this e ectively requires signi cant changes from the familiar serial approach to computing. These changes a ect virtually all areas of computing, including computer architecture, network architecture, operating systems, language compilers, and computational algorithms. To understand the changes that are required, a large number of parallel systems have been designed and built by both research and commercial organizations over the last two decades. These systems have used di erent models of parallel operation, often described as single-instruction, multiple-data (SIMD) and multiple-instruction, multiple-data (MIMD). A review of parallel systems issues and a description of many of the machines that were built is provided in [27]. Some of the systems described in this review were intended to interconnect over one hundred processors. 1

Examples include the NYU Ultracomputer, the Intel Paragon, and the IBM GF11. The Thinking Machines CM-2 system was built from very simple microprocessors and designed to interconnect over 64,000 of them. Other massively parallel systems that have been marketed commercially include the KSR-1 from Kendall Square Research and the TC2000 from BBN Systems. Systems currently being marketed include the Fujitsu VPP700[19], the Hitachi SR2201[26], the NEC SX4[14], the IBM SP-2[13], and the Cray T3E[15]. The latter two are built from micro-processors, and up to 2048 micro-processors can be combined in a single system. The focus of this chapter is on the use of optical technology to interconnect a very large number of processors. While networks such as the Internet can connect an extremely large number of processors and may use optical links, this type of processor interconnection represents distributed processing rather than parallel processing. Parallel processing requires extremely short communication delays so that a processor can use the network with minimal impact on its computation speed. In many parallel architectures, processors use the network to access parts of memory. As a result, specialized networking hardware and techniques are required rather than a general purpose network such as the Internet. Despite the use of specialized networks, communication speed in parallel systems is still much slower than computation speed. Parallel algorithms are therefore designed to minimize the amount of communication required. In the rest of this chapter, we will discuss optical devices, architectures, and control techniques that can be used to build high speed interconnection networks to improve overall system performance. In section 2, we will describe the potential advantages of optical interconnections over electrical interconnections. A number of relevant optical technologies and devices are discussed brie y in section 3. In section 4, we review some proposed optical architectures and discuss the techniques used to control them. Some nal observations are given in section 5.

2 Why Optics? The characteristics of electronic signals and optical signals have been compared for purposes ranging from connecting continents via telecommunication links to connecting gates on a VLSI circuit[3]. Some studies are theoretical in nature in order to understand the fundamental physical limitations[44]. While optics have been used for telecommunication systems for many years, studies such as [39, 42, 50] have concluded that optical interconnections can be cost e ective over distances as short as those between circuit boards. The advantage of optical connections increases as distance and data rate increase. Thus, optical interconnection networks can often provide better performance than electronic networks. Two key measures used to describe network performance are latency and bandwidth.

 High bandwidth. The bandwidth of a communication link is the rate at which data can be

sent through the link. For optical networks, the rate is independent of the distance the signal is to be transmitted. For electronic networks however, the attenuation of a signal is related to its frequency. With a xed amount of power, increasing the data rate of a signal reduces the distance over which it can be transmitted. Thus, optical links have an advantage when higher data rates (e.g. above 200 Mb/s) or longer distances (e.g. above one meter) are required[42]. To understand how large the potential optical bandwidth is, consider a hypothetical processor that operates at 500 MHz and produces a 64 bit result every cycle. To transmit these results over a network requires a bandwidth of 32 Gb/s. Current electronic busses may transmit a maximum of 1 to 10 Gb/s using parallel transmission lines each operating at a rate of hundreds of Mb/s. Optical systems in research labs have been shown to operate at burst 2

speeds of up to 250 Gb/s over a single optical link[46]. While practical systems operating at such speeds are years away, optics have the potential to increase communication bandwidth by orders of magnitude.  Low latency. Latency refers to the end-to-end delay in communication. Since optical signals can readily be converted to and from electronic form, low latency designs for electronic networks can be adapted for optics. Latency may be reduced further if the network can provide all-optical connections. Optical interconnections also have advantages for system construction and packaging.

 Reduced electrical power and cooling requirements. In high density VLSI chips, electrical

power consumption and the resultant need for cooling are important issues. It has been estimated that in a massively parallel system such as the CM-5, 60% to 80% of the power is consumed by the interconnection network [3]. Optical connections require less power than electrical connections as the data rate and distance of the connection increases.  Increased fan-out. Fan-out refers to sending a signal from a single transmitter to multiple receivers, such as when a signal is placed on a bus. The fan-out of an electronic signal is limited by the need for termination devices at each receiver to prevent unwanted signal re ections. Optical signals do not have this problem and can support a much larger degree of fan-out.  Immunity to electromagnetic interference. The conductors in electronic circuits act as antennas for electromagnetic radiation. This is a source of noise on an electrical connection. Photons do not interact with each other or with other radiation, thus extending the range of operating environments for optical systems.  Increased connection density. Electronic signals interfere with each other when placed close together, thus limiting the density of interconnections. The amount of interference is related to the frequency of the signal, so that circuit density decreases as bandwidth increases. Light beams do not interact in this way. This increases both the exibility of circuit design and the possible interconnection density. Although there are advantages to optical interconnections, it is not clear when they will become common in parallel processing systems. The introduction of optical technology requires the mass production of optical devices with suitable function, cost, reliability, performance, and packaging characteristics, and on the general acceptance of massively parallel systems.

3 Optical Technologies in Computer Interconnection Networks.

3.1 Overview

In this section, we review basic optical technologies and their application to computer interconnection networks. The intent is to provide a high level overview of the operation of optical devices and to describe the types of technological advances that are necessary to make optical interconnection networks a reality. Optical devices have a wide range of applications including telecommunications, image processing, pattern recognition, neural networks, printers, CD players, data storage, remote sensing, and defense applications such as radar, sonar, and electronic warfare. This wide range of applications 3

has fostered research into devices with a correspondingly wide range of functions. In some cases, devices that perform similar functions have characteristics that make them most suitable for different applications. The result is an enormous variety of devices and technologies from which to choose the components of an optical network. Initial developments in optical technologies were used by the telecommunications industry. Telecommunications networks require the propagation of a large number of low bandwidth signals over long distances with bit error rates of about 10?9 . High-speed trunk connections are essentially point-to-point and equipment must often function reliably in extreme physical environments. Optical ber characteristics are a signi cant concern in network design. For practical implementations, research has been directed toward miniaturizing devices, increasing their reliability, and developing a means to manufacture large quantities at low cost. The resulting technology is, however, not sucient to meet the needs of shorter distance communications. Applications such as cable TV, local area networking, and multimedia require increased signal processing capabilities from optoelectronic devices. Parallel processor interconnection networks are extremely sensitive to network latency and require high performance, as well. For example, an interconnection network requires the transmission of accurately synchronized, high bandwidth signals over short distances. It may require frequent switching or large fan-out and requires error rates of 10?12 or less. The integration of more complex optical circuitry into semiconductor devices has fostered the development of new semiconductor materials and manufacturing techniques. In the rest of this section, we list several types of optical devices and the underlying physical principles that are used to generate, convey, manipulate, and detect optical signals. A more complete overview is available from many sources such as [18, 28, 45], while detailed discussions on devices are available from [4, 20, 23, 49, 52]. The physical principles are covered in, for example, [21, 32].

3.2 Physical basis of optical technologies

An optical signal is generated in a semi-conductor when electrons and holes recombine across the junction between di erently doped materials and emit photons. The same principal is behind both lasers and light emitting diodes. However lasers use higher electrical currents so that stimulated emissions dominate spontaneous emissions, resulting in a coherent beam of light with a very narrow spectrum. The wavelengths emitted are a complex function of the physical characteristics of the device. Photodetectors receive optical signals by essentially the reverse process. Light energy is converted to electrical energy by the absorption of photons and the creation of electron-hole pairs in the semi-conductor. These pairs move through the semi-conductor material to create an electrical current. Optical waveguides such as optical bers are able to conduct light using the principle of total internal re ection. This occurs when a thin, transparent material is surrounded by materials with a lower index of refraction. Light traveling along the waveguide which strikes the interface with the surrounding material is re ected back into the waveguide. The physical characteristics of the waveguide a ect the distance the signal can be carried. The need to increase device capabilities for routing and ltering optical signals has fostered the development of new semi-conductor materials and the advancement of new technologies that provide electronic control over optical signals. In some crystals, the speed at which light travels depends on the polarization of the light and on the direction it is traveling relative to the crystal structure. These crystals are called birefringent, and light passing through them will emerge as two images, each with di erent polarization. This birefringence property may be altered by the presence of an externally applied electric eld. This change is known as the electro-optic e ect. Electro-optic 4

crystals include quartz (SiO2 ), lithium niobate (LiNbO 3), and gallium arsenide (GaAs). The acousto-optic e ect is similar to the electro-optic e ect in that it changes the refractive index of a crystal. In this case, the changes are due to the mechanical strain produced by an acoustic wave. An acoustic transducer is used to inject an acoustic wave into the crystal, creating a periodic variation in the crystal's refractive index. This variation can be used to di ract an input optical signal or to change its polarization, based on its wavelength. Materials widely used in acousto-optic devices include lithium niobate (LiNbO3 ), tellurium dioxide (TeO2 ), and fused quartz (SiO2). These physical e ects have been exploited to produce a wide variety of devices. These devices can be placed into two categories based on their means of conveying optical signals. One category of devices is most suited for use with optical bers, while the other is appropriate for handling optical signals conveyed directly through free space.

3.3 Fiber-based systems

Fiber-optic interconnection networks use optical bers as \wires" between light sources, switching and routing devices, and detectors. The operation of a device may either determine or depend on the wavelength of the optical signal, making wavelength sensitivity an important device characteristic.

 Sources. { Light emitting diodes emit light with a broad range of wavelengths. An electronic signal

can be used to modulate the intensity of the emitted light. As with electronic data transmission, digital data is transmitted in an optical signal through the use of an encoding technique (e.g. NRZ, pulse bipolar, et al.). { Lasers emit a narrow spectrum of coherent light. For low-loss transmission over glass optical bers, the wavelengths most often used are in the infra-red regions around 850, 1300, or 1550 nm. For the short distances involved in interconnections networks, the choice of wavelength is probably not critical. A common semi-conductor laser design is the Fabry-Perot laser. Mirrored surfaces placed at both ends of an active region of the semiconductor are used to obtain the lasing e ect. The width of the emitted spectrum can be reduced by distributed feedback (DFB) and distributed Bragg re ector (DBR) designs. Rather than using mirrors, these designs use a grating inside the laser cavity in order to select the wavelength of the light that propagates within the active region. { Tunable lasers based on the DBR and DFB designs can be made by electronically altering the index of refraction of the material within a section of the laser cavity. The current injected into this section determines the density of carriers, which in turn a ects the material's index of refraction and the output wavelength. Multiple quantum well (MQW) designs have even narrower line widths and a broader tuning range. Alternatively, wavelength control may be obtained by selecting a desired laser from an array of lasers that emit at di erent, xed wavelengths. It may also be possible to select the desired wavelength from a source with a wide spectral bandwidth by using a narrow bandwidth tunable lter.

 Propagation. { Optical bers have characteristics such as dispersion and attenuation that a ect the

intensity and shape of the propagating signal. For long distance transmissions, singlemode glass bers with a core diameter of two to ten microns and an overall diameter of 5

111 000 000 111

signal in

V applied

11111 00000 00000 11111 00000 11111 00000 11111 00000 11111 00000 11111

000 111 01

111 000 000 111 switched signal out

111 000 000 111

Figure 1: An optical directional coupler. 125 microns may be used. Ampli cation and reshaping of the optical pulses are required periodically. If ampli cation alone is needed, erbium-doped ber ampli ers may be used to amplify a wide range of wavelengths simultaneously[49]. These are not primary concerns in an interconnection network, however, where the distances are short enough that it may be possible to transmit satisfactory signals over low cost optical bers made from polymers[34].

 Routing. { Optical signals can be routed under electrical control using the electro-optic and acoustooptic e ects described earlier. Devices that use these e ects can often be designed as either switches or lters. Two key characteristics of a device include the the range of wavelengths over which the device operates and the speed of operation. To understand how the electro-optic e ect can be used, consider the optical directional coupler in Figure 1. In this device electrodes are deposited over two waveguides which have been routed closely together in a semi-conductor. An optical signal is input at one end of one waveguide. A voltage is applied across the electrodes, creating an electric eld that a ects the index of refraction in the two waveguides in opposite ways. In e ect, this eld changes the amount of internal re ection experienced by the input signal, allowing light to leak out of one waveguide and become trapped in the other. By carefully choosing the device geometry and the applied voltage, the portion of the signal appearing at each of the output ports can be controlled. Thus, this device can be operated as either a modulator or a switch. { Optical lters can be used to separate signals based on wavelength or polarization[29]. A Fabry-Perot lter consists of two highly re ective mirrors in parallel, forming a resonant cavity. At certain wavelengths determined by geometry, light passes through the cavity. The other wavelengths are trapped inside the cavity by the re ective surfaces. An electronically tunable optical lter can be built using the acousto-optic e ect. The basic geometry of the lter is shown in Figure 2[8]. Polarized incoming light passes through a crystal of, for example, lithium niobate. A transducer places an acoustic signal into the crystal. The interaction between the acoustic and optic signals in the crystal results in a change in the polarization of light at a selected wavelength. A polarization lter is then used to remove the unwanted signals from the output light beam. By driving the transducer with several acoustic signals, this device can be used to select several corresponding optical signals. Its tuning time is in the microsecond range. Electro-optical lters can be built with tuning times in the nanosecond range, but over a much narrower range of wavelengths. { A passive star coupler is essentially a glass hub with optical bers from transmitters entering at one end and optical bers to receivers leaving at the other end. The light 6

Input signals

Output signal on selected wavelength

Polarizer

Semi-conductor crystal

Polarizer / Analyzer

Acoustic wave alters polarization of selected wavelength

Figure 2: An acousto-optic lter. 11 00 00 11 00 11

11 00 00 11 00 11

11 00 00 11 00 11

11 00 00 11 00 11

11 00 00 Network node 11 00 11 00 11

Star coupler

Figure 3: Nodes connected to a passive star coupler. signal from any transmitter is di used by the coupler so that it is distributed equally to all of the bers leading to the receivers. The coupler is called passive because it uses no electrical power and performs no logical or routing functions. It simply di uses and transmits light. Signal routing is accomplished by using wavelength sensitive devices and/or suitable network control techniques. When the same set of nodes is connected to the input and output bers as shown in Figure 3, the passive star has the logical appearance of a bus.

 Detection { Photodiodes and phototransistors may be used to detect an optical signal. Direct detec-

tion is used when the signal modulates the intensity of the light. Coherent detection is used when the signal has been modulated onto a continuous wave. The detected signal must be ampli ed, reshaped, and decoded by receiver circuitry to convert it to the form used by the electronics. { Tunable detectors. An alternative to using a tunable transmitter is to use a tunable receiver. This e ect can be obtained by placing a tunable lter in front of the detector.

3.4 Free space systems

One of the advantages of optical interconnections is that optical signals can propagate close to each other, and even pass through each other without interacting. To exploit the high connection density possible with optical signals, two-dimensional arrays of light sources and detectors are used. This, in e ect, creates a single, two-dimensional optical beam. The information content of the beam can be independently controlled at each location within its cross-section by using spatial light modulators. To direct the light in free space, lenses, mirrors, and beam-splitters are used. A simple free space system is depicted in Figure 4.

 Sources. 7

Array of sources

Array of optical switches

Array of detectors

Figure 4: Simple free space interconnections.

{ Two-dimensional arrays of lasers can be built using vertical cavity surface emitting

laser (VCSEL) technology and gallium arsenide (GaAs) substrates[59]. Previous semiconductor lasers emitted light from the edge of the substrate rather than the surface, and hence could not be built in two-dimensional arrays. { It is also possible to produce a two-dimensional array of signals using optical modulators. In such systems, the signal from a single laser source is imaged into an array of spots which illuminate the modulators in the array. The modulators may be self-electro-optic e ect devices (SEEDs), which use an electronic signal to control the re ectivity of the surface of the device. Re ection of the "read-out" beam by the SEED array produces a modulated two-dimensional optical beam of signals.

 Propagation. { Lenses are used to focus and collimate beams of light between sources and detectors.

There are two approaches. Micro-optics refers to components that range in size from microns to millimeters and which can be manufactured in arrays and mounted on a substrate[3]. For example, a lenslet array is an array of small lenses each of which focuses the signal from a single light source. Bulk optics refers to objects that operate on the entire cross section of the beam. These objects require more space than integrated optics. { A critical issue in using free space optical transmission in parallel processing is establishing and maintaining the proper alignment of the optical components[3, 54]. High quality lenses are needed to focus light precisely, and each component must be carefully aligned in three spatial and three angular coordinates to within very narrow tolerances. This alignment is sensitive to thermal and vibrational in uences in the environment. This requirement has fostered the development of misalignment tolerant designs [54], and the integration of optical components onto substrates [35].

 Routing. { A spatial light modulator (SLM) is a two dimensional array of cells which operates on

the cross-section of an optical beam. The operation of each cell can be controlled individually by electronic or optical means. Each cell may be transparent or may modify a characteristic of the incident light, such as the intensity, polarization, direction, or phase. The key characteristics of an SLM are the cell density and the speed of cell operation. Many implementations of SLMs have been proposed using magnetic garnets, liquid crystals, multiple quantum well (MQW) structures, deformable mirrors, and devices using the electro-optic and acousto-optic e ects[3]. 8

{ A hologram, like a lens, can be used to create an optical image of an object[21, 18]. Unlike

a lens, the hologram contains an optical interference pattern with all the information necessary to recreate the image. The image is reproduced by illuminating the hologram with a laser. An image consisting of a pattern of spots can be used to route the input signal to one or more di erent locations. The interference pattern in the hologram of a single spot at a given location can be computed and reproduced on a black-and-white printer. This can be used to generate the pattern that would be present if the hologram was produced photographically, and is known as a computer generated hologram (CGH). Similarly, subject to the limits of computational complexity, a CGH can be created for any pattern of spots that represents a desired signal routing. An SLM may be designed with each cell as a CGH that routes the corresponding signal in the cross-section of the optical beam. { A beam splitter is a semi-transparent mirror often placed at a 45 degree angle to the incident signal. A portion of the signal is re ected in a new direction and the remaining signal passes through una ected. The portion of the beam that is re ected is determined by the amount of re ective coating at each location of the mirror. A beam splitter that re ects/transmits light based on its polarization is called a polarizer or analyzer.

 Detection. { An array of photodetectors is used to receive the signal beam from an array of trans-

mitters. In many designs, electronic processing logic is combined with photodetectors, VCSELs, or modulators into cells of an array built into a single chip. This arrangement is called a smart pixel array. It functions much like an SLM with increased processing

exibility, in that each signal within the beam can be independently received, processed, and, if desired, retransmitted.

4 Optical Interconnection Networks. With an understanding of the general capabilities of optical devices, we now turn our attention to how the devices can be used in optical interconnection networks for massively parallel processing. Systems can be built from several di erent models of parallel computation, so that a network may interconnect processors, memories, micro-computers, or clusters of components. We will refer to these devices attached to the network as nodes. To fully understand an interconnection network we need to know not only the internal components and how they are connected, but also the techniques that are used to control the message trac in the network. We begin this discussion by describing the requirements of a network used for typical parallel applications.

4.1 Network requirements.

A network must not only support the communication needs of the application, but must do so in a cost-e ective manner. Some considerations for each of these follow.

4.1.1 Communication in parallel applications. Algorithms used in parallel applications often decompose the problems to be solved into regular structures. Each processor is assigned a piece of a problem and must communicate with processors 9

working on related pieces. In the most general case, any node of a parallel system may communicate with any other node, and the interconnection network must be able to provide for this. However a more typical case is that a node will communicate with only a small number of other nodes, and in a regular pattern[31]. Typical communication patterns found in parallel applications include the following.  Broadcast and multicast. In a multicast, one processor communicates the same information to a set of processors. If the set includes all processors, then it is referred to as a broadcast.  Reduction. This is the inverse of a multicast. A set of processors communicate some information to a single processor. This processor then combines the information in some way. Often, a reduction is followed by a broadcast.  Nearest neighbor. This pattern is often found in algorithms in which the processors are logically arranged as an n-dimensional array called a mesh. Information held by one processor a ects only the processors adjacent to it along a dimension of the array. This pattern is found in some matrix multiplication algorithms, in algorithms for solving linear systems of equations, and in algorithms that use a systolic or pipelined data ow.  Hypercube. In this pattern, every bit in a b bit processor address represents a dimension of a binary hypercube with 2b processors (i.e a b-dimensional binary array). Processors communicate only with the b processors whose address is exactly one bit di erent from their own. There are two variations on the use of this pattern. In divide-and-conquer, all processors communicate across a given dimension at one time. This pattern is found in a wide variety of applications such as sorting. Another variation is the recursive doubling pattern. In this pattern, 2i?1 processors send messages over dimension i, in sequence from i = 1 to i = b. This can be used to propagate information from one processor to all other processors. The hypercube pattern is ecient because only b steps are required to exchange information between 2b processors. Many parallel applications are built from loops consisting of synchronized phases of computation and communication. One consequence of this is that message trac may arrive in bursts. Another is that the communication pattern may be repetitive. Thus, nodes tend to communicate often with a small subset of other nodes. The localized, repeating aspects of a communication pattern are often described as communication locality of reference. Processor caches and virtual memory systems are used to exploit locality of memory references to improve computer performance. Similarly, it is desirable to build a network that exploits locality of communication references to improve performance. The size of a message in a parallel application can range from a few bytes to a few thousand bytes, depending on the application and on the design of the parallel system. Fast programming techniques can reduce software communication overhead to a few microseconds[55]. To avoid becoming a performance bottleneck, switching and control operations for network hardware must be performed in the nanosecond range.

4.1.2 Network structure and scalability A straight-forward way of providing fast communication between any pair of network nodes is with a direct connection through a cross-bar switch. A cross-bar switch allows any of its N inputs to be independently connected directly to any of its outputs. Many di erent approaches have been used for designing small optical cross-bars (N  32) using bers or other waveguides. 10

 One approach fans-out each input signal to a column in an N xN array of optical switches,

and couples the signals from each row of switches into a single output ber[3].  A passive star coupler with N wavelengths may be used as a cross-bar. Each node receives messages on a unique wavelength. Network control must handle or avoid collisions.  A highly integrated cross-bar has been proposed in [56], where light and sound interact in a thin crystal lm deposited on the surface of a semi-conductor. A lens is created at each end of the lm, bracketing the area where the acousto-optic interaction occurs. The acoustic signal that determines the switch settings is a combination of acoustic signals that control the di raction of each optical input. With this approach, an integrated cross-bar of approximately 12 x 12 may be possible.  An electronic cross-bar switch with optical inputs and outputs is described in [30]. Bytes of data are transmitted using a separate wavelength for each bit. These wavelengths are then multiplexed into an optical ber. At the switch, the bits are demultiplexed, spatially separated, and switched through a set of parallel one-bit wide electronic cross-bars. The output optical signal is regenerated and combined in the same way as the input signal. This design is expected to connect up to 64 nodes. While the performance of a cross-bar switch interconnecting N nodes increases linearly with the number of nodes, the increase in internal complexity is O(N 2). Complexity is an estimate of the number of switches, wires, transmitters, receivers, or other network components that are required by a design. Designs like a cross-bar perform well because they can provide a direct connection between any pair of nodes simultaneously. However, they present an enormously complex wiring problem and have a signi cant hardware cost. A cross-bar network for one thousand processors would require millions of components. Thus, this approach is impractical for massively parallel processing. It is desirable to have an architecture where both performance and cost increase linearly with the number of nodes. Such an architecture is said to be scalable. While a perfectly scalable architecture does not exist, there are a wide variety of designs re ecting di erent tradeo s between cost and performance. Optical interconnection networks often use designs similar to those used for electronic networks. These designs include buses, meshes, multi-stage interconnection networks, and trees. The simplest hardware arrangement is a bus, which connects all nodes to a single shared communication link. A set of access control procedures must be de ned to handle communication. As we will see, many optical architectures use a bus-based approach. A mesh is an n-dimensional array of nodes with a xed number of nodes in each dimension. A common electronic design called a hypercube has exactly two nodes in each dimension. When nodes at the edge of the array are connected to the nodes at the opposite, parallel edge of the array, the network is called a torus. In many electronic designs, each node is connected only to its nearest neighbors along each dimensional axis of the array. Fiber optic designs for an n-dimensional array of nodes have been proposed using several di erent physical interconnection patterns. In a multi-stage interconnection network (MIN), sets of routers or switches are interconnected in stages. A message is sent through a path consisting of one router at each stage. Well-known MINs include the omega, butter y, shue, and Clos networks. Both electronic and optical MINs can be constructed by interconnecting a large number of small (e.g. 2x2) cross-bar switches. Compared to a cross-bar, a MIN o ers less performance (O(log N ) for latency), but has less complexity (O(N log N )). 11

A tree or hierarchical arrangement has also been used in both ber-optic and electronic designs. The processing nodes are placed at the leaves of the tree, and the branches of the tree are formed by links between switches or routers.

4.2 Controlling an optical network.

The object of network control is to provide communication paths for the exchange of data between nodes when the number of connections the network can provide simultaneously is limited. These paths are provided by controlling when, where, and how the optical signal is transmitted. After some general discussion, we will look more closely at techniques that can be used to control one simple network, the optical bus.

4.2.1 Managing connections.

Two approaches are commonly used to pass data through a network. In packet switching, messages are routed between senders and receivers through intermediate nodes. This is done using routing information added to each message. While research is being done to decode addresses and route messages optically at rates up to 250 Gb/s[22], in the near term packet switching will require electronic processing. This means that an optical signal must be converted to electronic form, stored in a bu er for processing, and retransmitted optically over the next link toward the destination. A message may be delayed when the link is in use for another message or a bu er is not available at the next node in the path. Packet switched networks require mechanisms to ensure that messages make continued progress toward their destinations, and that trac congestion does not degrade performance. Adaptive routing may be provided so that component failures can be tolerated. Since routing latency can be much larger than transmission time in an optical network, the best performance is normally obtained when the number of routing steps is minimized. Another approach is to use circuit switching, in which the network provides a direct physical connection between the source and destination of a message. With an all-optical connection, circuit switching can exploit the full performance capability of optics. However circuit switched networks require complex controls to coordinate the entire network to provide the nodes with the connections needed by the parallel application. For example, when the connection passes through optical switches, lters, or other electronically controlled devices, message transmission must be coordinated with device control. This coordination can be predetermined and implemented for the duration of the parallel application. This is called static control. Alternatively, dynamic control can be used to establish circuits that meet the immediate needs of the parallel application. Dynamic control can be implemented in a centralized manner in which a single network controller determines how circuits should be provided in response to application needs. However, in a large system the controller can become a performance bottleneck and if it breaks the entire system will halt. To avoid this, a distributed approach can be used in which the nodes themselves control the network using a protocol for establishing circuits.

4.2.2 Optical Circuits Optical circuits can be provided using the following two techniques.

 We can exploit the wavelength sensitivity of optical devices and reduce the number of network components (and hence cost) by managing the use of wavelengths for circuit establishment. Since signals on di erent wavelengths do not interfere with each other, they can be sent simultaneously through optical bers and couplers and separated only when they need to be 12

1 1 2

2

1

2

3

3

3

Multiplexer

Demultiplexer

Figure 5: Multiplexing three wavelengths together. routed or received. This is known as wavelength division multiplexing (WDM)[6], and is shown pictorially in Figure 5 for three wavelengths, 1 ; 2, and 3. WDM is often used with tunable transmitters and/or receivers. It allows a structure such as a passive star to provide many connections simultaneously, as well as o ering broadcast and multicast capability. WDM is not a complete solution for massively parallel processing, however, because the number of di erent wavelengths available in a network in the near term is likely to be on the order of tens rather than thousands.  Another technique often used to coordinate circuit establishment is time division multiplexing (TDM)[11]. In TDM, the operation of the network is divided into small time intervals. Through the use of a global clock, these intervals, or time slots, are synchronized across all network components and attached nodes. During each time slot the network provides a speci ed set of connections. A sequence of these sets is created, and the network automatically cycles through this sequence so that all required connections are established in turn. The length of a time slot is chosen to allow the devices and nodes to establish a new set of connections and transmit a message. Since the connection requirements are predetermined, device control signals are prespeci ed and new connections can be established very quickly. A 10 Gb/s network could transmit four words of 64 bits in a time slot of 30 nanoseconds. While TDM can be used in either optical or electronic networks, the large optical bandwidth exceeds the requirements of a single processor and makes multiplexing especially attractive in optical networks[9]. WDM and TDM are most often used in ber-based architectures. Free-space architectures normally exploit the high connection density provided by the cross section of a two-dimensional optical beam, often referred to as spatial parallelism. The use of TDM and WDM together is referred to as time-wavelength division multiplexing (TWDM). Since an all-optical connection is required to fully exploit the potential performance of an optical network, we will look more closely at the performance of a circuit switched network. Static control of a circuit-switched optical network can be implemented using a TWDM schedule built from prior knowledge of the application's overall communication pattern. However, in some cases a static schedule will provide a large number of unused or seldom used connections. For example, when the communication pattern of an application cannot be precisely determined in advance, a static schedule may need to provide all possible connections. The number of time slots required can be very large, and a message may have a long access delay until the network provides the connection it needs. To handle situations where the communication pattern cannot be determined in advance, a dynamic protocol may be used to establish connections. As with packet switching, this network control is performed in the electronic domain. This establishment delay can be long relative to the time needed to transmit a message in a high bandwidth optical network. 13

A more exible implementation of TDM can be used to balance these two types of delay. In some cases, a compiler can use sophisticated algorithms to identify communication patterns similar to the ones described earlier[33] and multiplex these patterns together[47]. This use of compiled communication reduces establishment delays by providing the network with TDM to change connections rapidly. It also reduces access delay by establishing a sequence of connections tailored to the application's immediate needs. TDM can also be combined with dynamic protocols such as those described in the next section. It may also be possible to manage connections with TDM using dynamic techniques analogous to those used to manage pages in virtual memory[9].

4.2.3 Optical busses A bus architecture is a simple, well studied design that is easily implemented in optics. Like a 2x2 switch, a bus can be used as the building block of more complex networks. A optical bus can be logically implemented with a variety of physical designs, or through a passive star as shown in Figure 3[16]. These designs exploit the fan-out capability of optics and don't require complex optical switches. A wide variety of approaches for controlling these networks have been studied. In general, a wavelength and a time interval is reserved for communication between a particular set of nodes[11]. Some designs restrict the set to a single source and a single destination, while others employ a multicast capability or a protocol to handle con icts in data transmission. The choice of protocol depends on the number and type of transmitters and receivers at each node (i.e. xed, tunable). One simple technique is known as broadcast-and-select. All nodes transmit and receive using the same wavelength. A message and the address of its destination are broadcast to all nodes, which then process the information. The destination node selects (i.e. receives) the message, and all other nodes discard it. All nodes must examine all message trac, placing a large processing load on the network interface electronics. Collisions occur when two nodes transmit messages at the same time on the same wavelength. Since all nodes receive the same wavelength, these collisions can be detected and messages resent using a randomized algorithm such as those used in electronic networks. Performance may be improved by the use of multiple wavelengths for message transmission. For example, each node may have a tunable transmitter and a receiver tuned to a unique, xed wavelength. This arrangement provides self-routing of messages. Each node receives only trac destined for itself. Self-routing allows full use of optical bandwidth without overwhelming the electronics with message processing. Collisions occur only for transmissions to the same destination node. However, the sending nodes cannot detect these collisions because they don't receive on the wavelength used by the destination node. Thus, some kind of reservation or acknowledgment scheme must be used. A static TWDM reservation scheme can be developed using N ? 1 time slots. In time slot i, the node at address A can transmit messages to the node at address (A + i)modN . When the number of wavelengths is limited, a static reservation scheme can still be developed by extending the length of the xed schedule. An acknowledgment scheme can be implemented dividing time slots into two sub-slots: one for message transmission and a second for acknowledgment. Messages that are successfully transmitted are acknowledged in the subsequent sub-slot. Unacknowledged messages are assumed to have been in a collision, and a random retransmission scheme is used. The considerations are slightly di erent when each node is provided with a tunable receiver and a xed, unique transmission wavelength. In this design, a media access protocol must be used to inform the destination node that a message is coming. Again, a xed TWDM allocation scheme 14

can be used for this purpose. An alternative is to use a separate control channel and reservation protocol, requiring an additional xed receiver at each node. We can further increase the performance (and cost) of the network if each node is provided with multiple transmitters or receivers. For example, consider a network where each node has a tunable transmitter and a receiver on every wavelength. In such networks, control protocols must resolve destination con icts. These occur when several nodes transmit messages to the same destination at the same time, but on di erent wavelengths. The control scheme must allocate both a wavelength and a time slot to each connection. The additional receivers allow each node to detect collisions for all destinations. This provides an alternative to increasing the length of a static TWDM schedule when the number of wavelengths in the network is limited. At low loads, the number of time slots required may be reduced by allowing multiple senders to transmit in any time slot, detecting and recovering from collisions as necessary. Many protocols have been developed to dynamically allocate connections rather than use a static TWDM schedule. These protocols often use a control channel to allow a sender to reserve an optical wavelength for communication to a receiver. Slotted protocols use synchronized time slots in a communication channel. Contention protocols rely on collision detection and recovery. Both kinds of protocol have been proposed for use in the control channel and in the data channel. Various assumptions have been made about the speed with which network devices can be tuned. These designs can be characterized by the number of transmitters and receivers required at each node, as well as by the use of xed or tunable devices. A survey of these protocols is provided in [40] and [41]. In addition to performance considerations, networks built from a single passive star are limited in size by the optical power budget. This refers to the amount of the emitted signal that arrives at the detector. Emitted signal strength and detector sensitivity are characteristics of the devices, and can be related to electrical power consumption. Signal losses during transmission are a ected by the number and type of network components and the amount of fan-out required. The physical restriction on the size of a passive star means that networks to interconnect thousands of processors will require more complex architectures.

4.3 Fiber-based Architectures

One approach to building an optical network is to simply replace electronic links with optical links. This approach does not account for the advantages or the limitations of optics[7]. One way to exploit the optical advantages of high fan-out and multiple signal wavelengths is to use passive stars as the building blocks of larger networks. Each star can be controlled using techniques described in the previous section. Since optical components are complex and costly, the number of these components is an important consideration in network design.

4.3.1 Packet Switched Architectures Packet switching is commonly used in mesh and MIN electronic architectures. Both architectures can be built with routers that process message headers to determine the next link in the path to the destination. Many optical networks have been designed to interconnect an n-dimensional array of nodes using passive stars to form connections along the dimensions. Exploiting the fan-out of the passive star greatly reduces the number of routing steps required compared to electronic mesh networks. The optical designs di er in the arrangement of the connections, the routing algorithms, and the number and type of network components required. Three examples are described below. 15

Star coupler

Network node

Figure 6: A small WMCH mesh connected with optical passive stars.

Figure 7: A Bus-Mesh with two nodes at each mesh location.

 A wavelength division multiple access channel hypercube (WMCH) is proposed in [16]. Nodes

are arranged in a mesh, and all processors that share a dimensional axis are connected via a single passive star. Figure 6 shows an example of a small two-dimensional mesh. Tunable transmitters or receivers can be used with WDM so that messages are sent directly to the desired node along any one dimension using the techniques described earlier. Messages pass through the array one dimension at a time until they reach their destination. Using passive star couplers with a fan-out of 256, a network of 64K nodes could be constructed with a maximum of one intermediate routing step. Compared to networks with point-to-point connections, the WMCH uses fewer transmitters and receivers and requires far fewer message routing steps.  A Bus-Mesh architecture is described in [57] for a two-dimensional array of nodes. Each bus connects the transmitters from a row of nodes to the receivers of a column of nodes. There are no nodes at the diagonal locations of the mesh. Each row of nodes transmits at a unique wavelength and each column receives at a unique wavelength. This allows all busses to be implemented with a single passive star coupler. TDM is used to give each node an equal opportunity to transmit. Multiple nodes can occupy a mesh location as shown in Figure 7 by adding additional TDM slots. Routing requires a maximum of 2 steps. 16

00

01

02

03

10

11

12

13

20

21

22

23

30

31

32

33

Figure 8: A dBus array.

 The dBus architecture for interconnecting processors in a mesh based on de Bruijn digraphs is

described in [36]. An optical bus is used to connect the transmitters of nodes along one dimension to the receivers of nodes along another dimension. For example, in a three-dimensional array the nodes at array locations (x; a; b) transmit to nodes at locations (a; b; x), where a and b are any xed values and x represents all locations connected by the bus along a dimension of the array. A two-dimensional dBus array is shown in Figure 8. Each node has only one transmitter and one receiver. The busses may be implemented by a star coupler and WDM can be used to allow multiple messages to be sent on the bus at one time. Compared to the WMCH architecture, the dBus requires signi cantly fewer star couplers, transmitters, and receivers, at the cost of a slightly larger average number of routing steps between nodes. A dBus-array can also be implemented using TWDM and a MIN architecture known as a dilated slipped banyan network[53].

4.3.2 Circuit Switched Architectures While the packet switched architectures use large fan-out to reduce the number of routing steps, circuit switched architectures avoid routing entirely by providing direct connections. Circuit switched architectures that have been proposed include the following:  An optical bus may be used to interconnect a small number of nodes. Optical busses may be bi-directional or uni-directional. Di erent designs and control techniques have been analyzed in studies such as [16, 24].  A MIN can be constructed from electronically controlled optical switches. When the switches can be set to provide an all-optical path for any connection, the MIN implements circuit switching. A circuit switched MIN that provides all-to-all connectivity is described in [53]. A distributed protocol that can be used to establish all-optical paths in MINs is given in [48].  Figure 9 depicts a circuit switched optical mesh constructed by attaching electronically controlled optical switches to the processing nodes and interconnecting the switches in a mesh arrangement using optical bers. Signals can be sent between switch controllers using a mesh network that parallels the optical network and consists of low-bandwidth electronic links. The optical switches can be set to provide all-optical, TDM connections using a reservation protocol between the controllers, as described in [58]. 17

0

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

Figure 9: An optical two-dimensional mesh network.

 A hierarchical optical architecture based on the Fat-Tree network is proposed in [17]. The

space-wavelength hierarchical architecture (SWHA) places processing nodes at the leaves of a hierarchy, and uses switches called FatNodes inside the hierarchy. A FatNode is a space/wavelength switch constructed from passive couplers and an acousto-optic tunable lter (AOTF). Signals from the processing nodes are sent up the hierarchy using WDM. At each level, the incoming signals are coupled. The AOTF then separates the wavelengths, switching them either up or down the hierarchy. A coupler at each level merges the signals switched back down the hierarchy with signals heading down from higher levels. The number of wavelengths heading up/down the hierarchy can be recon gured to meet the trac requirements of the application, and can provide increased bandwidth at the top of the hierarchy. Spatial reuse of wavelengths is achieved by partitioning wavelengths consistently at each level of the hierarchy. For example, a set of wavelengths can be reserved for communication between processing nodes joined by a level 1 FatNode. Since these wavelengths will never be switched to a level 2 FatNode, each group of processing nodes that communicates through a level 1 FatNode can use the same wavelengths without con ict. Once the wavelengths have been partitioned at di erent levels of the hierarchy, each processor can be assigned a frequency on which to receive messages from processors communicating through each di erent level. Thus, each processor must have a receiver capable of receiving one wavelength per level of hierarchy. A xed TWDM schedule can be constructed to provide each processor with a direct optical connection to every other processor connected through a given hierarchical level using a broadcast-and-select approach.  Another way to interconnect a large number of nodes using passive star couplers is with the partitioned optical passive star (POPS) topology described in [10]. As shown in Figure 10, the nodes are divided into g equal-sized groups. Transmitters from a group of nodes are connected to receivers in another group via a single passive star. Thus, communication between groups requires one dedicated transmitter and receiver at each node. Overall, each node requires g transmitters and receivers for communicating with each group including its own. Complete interconnection of these g groups requires g 2 couplers. Communication takes place in time slots used alternately for control and data. A two step procedure is used to establish connections dynamically. In the rst step, each node submits a connection request to a controller node within the group. The group controller resolves contention for the use of the couplers through which the group transmits. The resolution of these con icts can be arranged in a TDM cycle to reduce the frequency of

18

0

0

1

1

2

2

3

3

4

4

5

5

6

6

7

7

8

8

9

9

10

10

11

11

12

12

13

13

14

14

15

Source Nodes

15

Couplers

Destination Nodes

Figure 10: A POPS network. request submissions to the group controller. The second step consists of an exchange of control information. The controllers simultaneously broadcast the intended use of the couplers to all nodes in each group. Each node examines the broadcasts to determine if it is the destination of multiple messages. The nodes resolve these destination con icts and respond with status information indicating which node should be allowed to transmit. The control protocol makes use of the broadcast capability of passive stars and exploits locality of references to reduce the frequency of control operations and to share the high bandwidth of the optical network.  A power-optimal network built from interconnected passive stars is described in [5]. Groups of log N transmitting nodes are connected to all N receiving nodes in a manner based on addresses so that no two receivers hear identical sets of transmitters. To allow for a small number of transmitters and receivers at each node, extra stages of couplers are placed in the optical path to fan-out and fan-in the optical signals. Interconnections for all N nodes can be established in a power-optimal manner with three stages of couplers. A static TDM schedule is used to provide all-to-all communications.

4.4 Free Space Optical Architectures.

Within the con nes of a parallel processor, optical signals need to travel only short distances in a carefully controlled environment free of dust and other particles. Thus, processors can exchange optical signals directly through free space. The emergence of smart pixel technology has given free space architectures a boost by providing a signal processing capability not easily achieved through prior SLMs. Proposed architectures for free space optical networks include the following.

 Small cross bar switches are the building blocks of multi-stage interconnection networks. A

Banyan MIN built from 2x2 free space optical switches is described in [38]. Larger free space cross bar switches that can be used to build MINs are described in [12]. Both designs use birefringent computer generated holograms which are sensitive to the polarization of light. The polarization of the input signals is adjusted by electronically controlled polarization modulators, thus a ecting the path of the signal through the hologram and thereby switching the signal. 19

Processor clusters

Transmitter array

Faceted mirror

Receiver array

Optical data channels

Figure 11: The ICN architecture.

 The interconnection-cached network (ICN) described in [2, 37] is one design for a large free

space interconnection network. This design is based on circuit boards which contain a cluster of processors. Within the cluster, an electronic cross-bar network is used. Between clusters, an optical backbone network is used. The backbone provides a high data rate but is slow to recon gure. Thus, the distribution of work to processors must take into account communication patterns between clusters of processors to avoid the need to recon gure the backbone network. As shown in Figure 11, circuit boards containing processors are stacked in one or more columns. Each board is connected to an array of transmitters and an array of receivers. Optical signals traveling in a beam along a column are re ected to receivers by a faceted mirror. Similarly, transmitted signals can be inserted into the beam. At one end of the column, the beam is merged into a uni-directional ring that encompasses all columns, again using faceted mirror splitter/combiner cubes. At the other end of the column, a splitter cube is used to direct a portion of the beam into the column. A direct connection from any cluster to any other cluster is formed by a careful arrangement of the mirror facets and sub-arrays of transmitters and receivers used for each connection. Since the path from any VCSEL transmitter ends at a unique photodiode detector, an all-optical connection is provided between every pair of clusters.  Smart pixels are used in the hyperplane and intelligent backplane designs described in [25, 51]. Each node in the network is electronically attached to a smart pixel array which serves as its interface to the optical backplane. The design looks much like Figure 4 without the need for an array of switches. The pixel arrays are arranged in a column, and signals are sent from one array to another in a daisy-chained fashion to form a unidirectional ring. A sub-array of pixels is allocated to a communication channel, allowing parallel data transmission. Electronic logic in the pixel array determines the use of each channel. Signals may be injected from the node into the channel, extracted from the channel to the node, or retransmitted along the channel.

5 Conclusions In the past decade, advances have been made in a wide range of areas related to the use of optics in interconnection networks. Developments have included new materials, fabrication techniques, devices, and the network architectures and control techniques that tie them all together. Electronic interconnection networks have been built with a wide range of architectures, and optical device capabilities can only increase the number of design options. It remains to be seen which device 20

technologies can be developed to meet the cost, function, and performance requirements of practical parallel systems. In the meantime electronic technologies will also improve, raising the bar for the introduction of optical interconnection networks. For massively parallel processing to become commonplace, signi cant issues of cost, performance, and programmability must be addressed. Optical interconnection networks are likely to be successful to the extent that computing techniques in general are successful in exploiting the power of massive parallelism. Optical technology is capable of extending the size of useable parallel systems beyond that possible with electrical interconnections. Continued advances in this eld will be necessary, however, before this potential can be realized.

References [1] 11th International Parallel Processing Symposium, Workshop on Optics and Computer Science. IEEE, April 1997. [2] S. Araki et al. Experimental free-space optical network for massively parallel computers. Applied Optics, 35(8):1269{81, March 1996. [3] H. Arsenault and Y. Sheng. An Introduction to Optics in Computers, volume TT8 of Tutorial Texts in Optical Engineering. SPIE Optical Engineering Press, 1992. [4] N. Berg and J. Pellegrino, editors. Acousto-optic Signal Processing. Optical Engineering. Marcel Dekker, Inc., second edition, 1996. [5] Y. Birk. Power-optimal layout of passive, single-hop, ber-optic interconnections whose capacity increases with the number of stations. In INFOCOM '93: 12th Joint Conference of the Computer and Communication Societies, pages 565{572. IEEE, March 1993. [6] C. A. Brackett. Dense wavelength division multiplexing networks: Principles and applications. IEEE Journal on Selected Areas of Communications, 8(6):948{964, August 1990. [7] R. Chamberlain and R. Krchnavek. Architectures for optically interconnected multicomputers. In GLOBECOM '93: Proceedings of the IEEE Global Telecommunication Conference, pages 1181{1186. IEEE, November 1993. [8] I. Chang. Acousto-optic tunable lters. In Berg and Pellegrino [4], pages 139{167. [9] D. M. Chiarulli, S. P. Levitan, R. G. Melhem, and C. Qiao. Locality based control algorithms for recon gurable optical interconnection networks. Applied Optics, 33:1528{1537, March 1994. [10] D. M. Chiarulli, S. P. Levitan, R. G. Melhem, J. P. Teza, and G. Gravenstreter. Partitioned optical passive star (POPS) multiprocessor interconnection networks with distributed control. Journal of Lightwave Technology, 14(7):1601{1612, July 1996. [11] I. Chlamtac and A. Ganz. Channel allocation protocols in frequency-time controlled high speed networks. IEEE Trans. on Communications, 36(4):430{440, April 1988. [12] N. Cohen, D. Mendlovic, B. Leibner, and E. Marom. Folded architecture for modular alloptical switch. In 11th International Parallel Processing Symposium, Workshop on Optics and Computer Science [1]. [13] IBM Corporation. http://ibm.tc.cornell.edu/ibm/pps. 21

[14] NEC Corporation. marketing brochure, 1996. [15] Inc. Cray Research. marketing brochure, 1996. [16] P. Dowd. Wavelength division multiple access channel hypercube processor interconnection. IEEE Transactions on Computers, 42(10):1223{1241, October 1992. [17] P. Dowd, K. Bogineni, K. Aly, and J. Perreault. Hierarchical scalable photonic architectures for high-performance processor interconnection. IEEE Transactions on Computers, 42(9):1105{ 1120, September 1993. [18] D. Feitelson. Optical Computing: A survey for computer scientists. The MIT Press, 1988. [19] Fujitsu. marketing brochure, 1996. [20] H. Ghafouri-Shiraz and B. S. K. Lo. Distributed Feedback Laser Diodes. John Wiley and Sons, 1996. [21] A. Ghatak and K. Thyagarajan. Optical Electronics. Cambridge University Press, 1989. [22] I. Glesk and P. Prucnal. Demonstration of 250 gb/s all-optical routing control of a photonic crossbar switch. In Tamir et al. [52], pages 25{33. Proceedings of the Fourth Weber Research Institute (WRI) International Symposium on Guided-Wave Optoelectronics: Device Characterization, Analysis, and Design held October 1994. [23] P. E. Green, Jr. Fiber Optic Networks. Prentice Hall, 1993. [24] Z. Guo, R. Melhem, R. Hall, D. Chiarulli, and S. Levitan. Pipelined communication in optically interconnected arrays. Journal of Parallel and Distributed Computing, 12:269{281, 1991. [25] H. Hinton and T. Szymanski. Intelligent optical backplanes. In Proceedings of the Second International Workshop on Massively Parallel Processing Using Optical Interconnections, pages 133{143. IEEE, October 1995. [26] Hitachi. marketing brochure, 1996. [27] K. Hwang. Advanced Computer Architecture. McGraw-Hill, New York, NY, 1993. [28] Special issue on lightwave systems and components. IEEE Communication Magazine, 27(10), October 1989. [29] H. Kobrinski and K.-W. Cheung. Wavelength-tunable optical lters: Applications and technologies. IEEE Communication Magazine, 27(10):53{63, October 1989. [30] A. V. Krishnamoorthy et al. The amoeba chip: An optoelectronic switch for multiprocessor networking using dense-wdm. In Proceedings of the Third International Conference on Massively Parallel Processing Using Optical Interconnections, pages 94{100. IEEE, October 1996. [31] D. Lahaut and C. Germain. Static communications in parallel scienti c programs. In PARLE '94 Parallel Architecture and Languages. IEEE, July 1994. [32] W. B. Leigh, editor. Devices for Optoelectronics. Marcel Dekker, Inc., 1996. 22

[33] J. Li and M. Chen. Compiling communication-ecient programs for massively parallel machines. IEEE Transactions on Parallel and Distributed Systems, 2(3):361{375, 1991. [34] Y. Li and T. Wang. Side-coupling polymer ber optics for optical interconnects. In Optics in Computing [43], pages 255{257. [35] L. Y. Lin, J. L. Shen, S. S. Lee, and M. C. Wu. Surface-micromachined micro-xyz stages for free-space micro-optical bench. IEEE Photonics Technology Letters, 9(3):345{347, March 1997. [36] G. Liu, K. Lee, and H. Jordan. n-dimensional processor arrays with optical dbuses. In Proceedings of the Second International Workshop on Massively Parallel Processing Using Optical Interconnections, pages 116{123. IEEE, October 1995. [37] Y. Lyuu and E. Schenfeld. MICA, a mapped interconnection-cached architecture. Proc. of Frontiers of Massively Parallel Computation, pages 80{89, 1995. [38] D. M. Marom, P. Shames, F. Xu, R. R. Rao, and Y. Fainman. Compact free-space multistage interconnection network demonstration. In Optics in Computing [43], pages 192{194. [39] D. A. B. Miller. Optics for low-energy communication inside digital processors: quantum detectors, sources and modulators as ecient impedance converters. Optical Letters, 14(2):146{ 148, January 1989. [40] B. Mukherjee. WDM-based local lightwave networks part i: Single-hop systems. IEEE Network, 6(3):12{27, May 1992. [41] B. Mukherjee. WDM-based local lightwave networks part ii: Multihop systems. IEEE Network, 6(4):20{31, July 1992. [42] R. A. Nordin, A. F. J. Levi, R. N. Nottenburg, J. O'Gorman, T. Tanbun-Ek, and R. A. Logan. A systems perspective on digital interconnection technology. Journal of Lightwave Technology, 10(6):811{827, June 1992. [43] Optics in Computing, volume 8 of OSA Technical Digest Series, Washington, DC, March 1997. Optical Society of America. [44] H. Ozaktas. Comparison of fully three-dimensional optical, normally conducting and superconducting interconnections. In 11th International Parallel Processing Symposium, Workshop on Optics and Computer Science [1]. [45] J. Powers. An Introduction to Fiber Optic Systems. Irwin, second edition, 1997. [46] P. Prucnal, I. Glesk, and J. Sokolo . Demonstration of all-optical self-clocked demultiplexing of tdm data at 250gb/s. In Proceedings of the First International Workshop on Massively Parallel Processing Using Optical Interconnections, pages 106{117. IEEE, April 1994. [47] C. Qiao and R. Melhem. Recon guration with time division multiplexing MINs for multiprocessor communications. IEEE Transactions on Parallel and Distributed Systems, 5(4):337{352, 1994. [48] C. Salisbury, R. Melhem, and C. Qiao. Distributed path management in switched optical banyan netwo rks. In Optics in Computing [43], pages 195{197. 23

[49] D. Spirit and M. O'Mahony, editors. High Capacity Optical Transmission Explained. John Wiley and Sons, 1995. [50] C. W. Stirk and J. Ne . The cost of optical interconnects vs. MCMs. In Optics in Computing [43], pages 21{23. [51] T. Szymanski and H. Hinton. Design of a terabit free-space photonic backplane for parallel computing. In Proceedings of the Second International Workshop on Massively Parallel Processing Using Optical Interconnections, pages 16{27. IEEE, October 1995. [52] T. Tamir, G. Gri el, and H. Bertoni, editors. Guided-Wave Optoelectronics. Plenum Press, 1995. Proceedings of the Fourth Weber Research Institute (WRI) International Symposium on Guided-Wave Optoelectronics: Device Characterization, Analysis, and Design held October 1994. [53] R. Thompson. The dilated slipped banyan switching network architecture for use in an all optical local area network. IEEE Journal of Lightwave Technology, 9(12):1780{1787, December 1991. [54] F. A. P. Tooley. Challenges in optically interconnecting electronics. IEEE Journal of Selected Topics in Quantum Electronics, 2(1):3{13, April 1996. [55] M. Welsh, A. Basu, and T. von Eicken. ATM and fast ethernet network interfaces for userlevel communication. In Proceedings of the 3rd International Symposium on High Performance Computer Architecture, HPCA '97, pages 332{342. IEEE, February 1997. [56] R. Weverka, K. Wagner, R. McLeod, K. Wu, and C. Garvin. Low-loss acousto-optic photonic switch. In Berg and Pellegrino [4], pages 479{573. [57] K. A. Williams, T. Q. Dam, and D. H.-C. Du. A media-access protocol for time- and wavelength-division multiplexed passive star networks. IEEE Journal on Selected Areas in Communications, 11(4):560{567, May 1993. [58] X. Yuan, R. Melhem, and R. Gupta. Compiled communication for all-optical TDM networks. In Supercomputing '96. IEEE, November 1996. [59] J. Zhou, J. He, and M. Cada. Optimal design of combined distributed-feedback/fabry-perot structures for vertical cavity surface emitting semiconductor lasers. In Tamir et al. [52], pages 75{81. Proceedings of the Fourth Weber Research Institute (WRI) International Symposium on Guided-Wave Optoelectronics: Device Characterization, Analysis, and Design held October 1994.

24