Virtual-Memory-Mapped Network Interfaces - CiteSeerX

3 downloads 6262 Views 1MB Size Report
icant amount of software overhead at the operating system and ..... software. Receive buffer management. X. Message dispatch. X. Copy data to user space. X.
Virtual-Memory-Mapped Interfaces

Network

In today’s multicomputers, software overhead dominates the message-passing latency cost. We designed two multicomputer network interfaces that signif~cantiy reduce this overhead. Both support vMual-memory-mapped communication, allowing user processes to communicate without expensive buffer management and without making system calls across the protection boundary separating user processes from the operating system kerneL Here we compare the two interfaces and discuss the performance trade-offs between them.

Matthias A. Blumrich Cezary Dubnicki Edward W. Felten Kai Li Malena Princeton

R. Mesarina University

II

n Princeton’s SHRIMP (Scalable HighPerformance Really Inexpensive Multiprocessor) project, we are working to develop high-performance communication mechanisms that will integrate commodity desktop computers such as PCs and workstations into inexpensive, high-performance multicomputers. Our primary performance metrics are the end-toend latency and bandwidth available to user processes. Our goal is to provide a low-latency. high-bandwidth communication mechanism whose performance is competitive with or better than that of mechanisms used in specially designed multicomputers. The network interfaces of existing mulricomputers and workstation networks require a significant amount of software overhead at the operating system and user levels to provide protection, buffer management, and message-passing protocols. In fact, message-passing primitives on many multicomputers, such as the csend/crecv of Intel’s NX/2,’ often execute more than 1,000 instructions to send and receive a message. By comparison, the hardware overhead of data transfer is negligible. For example, sending and receiving a message on Intel’s Delta multicomputer requires 67 ys, of which less than 1 l.~sis due to time on the wire.z Other recent multicomputers, such as Intel’s Paragon,3 Meiko’s G-2, and TMC’s CM-j,’ have lower message-passing latencies than Delta. These designs treat communication as a ser-

0740-7475/95/$04.000

1995 IEEE

vice of the operating system. The challenge in designing network interfaces is to provide appropriate hardware support to achieve minimal software message-passing overhead, to accommodate multiprogramming under a variety of scheduling policies without sacrificing protection, and to overlap communication with computation. As the first step of our research, we developed an idea we call virtual-memory-mapped communication. This approach allows programs to pass messages directly between user processes without crossing the protection boundary to the operating system kernel, thus reducing software messagepassing overhead significantly. Implementation of this approach requires network interface support. We designed two network interfaces for the SHRIMP multicomputer that use Pentium PCs and an Intel Paragon routing network. Our first design makes minimal modifications to the traditional DMA-based network interface design, while implementing virtual-memory mapping in software. The design requires a system call to initiate outgoing data transfer, but its virtuamemory-mapped communication can reduce the send latency overhead as much as 78 percent. The interface transfers received messages directly to memory, typically reducing the receive software overhead to only a few instructions. Our second design implements virtual-memory mapping completely in hardware.j This approach provides fully protected, user-level

February

1995

21

Node A _-..._______-_ ,’ / Virtual: memory

, , _,’

.\

Node B . . _ _ ___ *\ Virtual- :

tems. Since mappings are established at the virtual-memory level, virtual-address translation hardware guarantees that an application can use only mappings created by itself. This eliminates the per-message software protection checking of traditional message-passing implementations. Several ways of implementing virtual-memory mapping are possible. To achiev-e a simple, low-cost design, we investigated various combinations of hardware and software. Our results are the SHRIMP-I network interface, designed to provide minimal hardware support, and the SHRIMP-II network interface, intended to provide as much hardware support as needed to minimize communication latency.

SHRIMP-I network Figure 1. Virtual-memory

mapping

message passing, and it allows user programs to initiate an outgoing block data transfer with a single memory store instruction.

Virtual-memory-mapped

communication

Figure 1 illustrates the basic idea of virtual-memorymapped communication: Applications create a mapping between two virtual-memory address spaces over the network. That is, the user maps a piece of the sender’s virtual memory to an equal-size piece of the receiver’s virtual memory across the network. The mapping operation requires a system call to provide protection between users and processes in a multiprogrammed environment. But once the mapping is established, the sending and receiving processes can use the mapped memory as send and receive buffers, and can communicate without kernel involvement. Virtual-memory-mapped communication has several advantages over traditional, kernel-dispatch-based message passing. One is that virtual-memory-mapped communication incurs low overhead, since data can move between user processeswithout context switching and message dispatching. Another advantage is that virtual-memory-mapped communication moves memory buffer management to the user level. Applications or libraries can manage their communication buffers directly without the expensive overhead of unnecessary context switches and protection boundary crossings that are commonly used. Recent studies indicate that moving communication buffer management out of the kernel to the user level can greatly reduce the software overhead of message passing. The use of a compiled, applicationtailored runtime library can improve the latency of multicomputer message passing by about 30 percent.h In addition, virtual-memory-mapped communication takes advantage of the protection provided by virtual-memory sys-

22

IEEE Micro

interface

Our design goal for the SHRIMP-I network interface was to start with a traditional, DMA-based nemork interface and add the minimal hardware support needed for implementing virtual-memory-mapped communication. The resulting network interface supports the DMA-based model and optionally implements virtual-memory-mapper1 communication with some software assistance. Figure 2 shows a block diagram of the SHRIMP-I network interface data path. The card uses DMA transactions to interface between the EISA (Extended Industry Standard Architecture) bus of a Pentium PC and a network interface chip connected to an Intel Paragon routing network. DMA transactions are limited to the size of a memory page and cannot cross page boundaries, since pages are the unit of protection. The card provides control through a set of memorymapped registers, which device driver programs use to compose packets, initiate packet transfers, examine interface status, and set up receiving memory addresses. Incoming packets optionally can generate interrupts to the host processor. The arbiter controls sharing of the bidirectional data path to the network interface chip, giving incoming data priority over outgoing data. The hardware supports physical-memory mapping for incoming data. That is, each packet carries a receive destination physical-memory address in its packet header, and the hardware automatically initiates a DMA transfer to this address upon packet arrival, without host CPU intervention. The beginning of every packet carries a header of two 64bit words. The first 64-bit word contains routing information for the Paragon network and is stripped by the network hardware. The second 64-bit word is the SHRIMP-I packet header containing four fields: version, destination address, packet size, and action. The version field identifies the version of the network interface that generated the packet. The destination address specifies a physical base address on the destination machine to receive the packet’s data. The packet size field specifies the number of 32-bit data words in the body of the packet. The action field tells the receiving network interface how to handle the packet.

Writing a packet header to the send registers initiates a send operation. That starts the send state machine, which builds a network packet and transfers the data directly from memory via the network interface chip. When the packet arrives at the destination, its header is stored in the receive registers. By default, the packet’s data is delivered to the physical memory indicated by the destination address field in the packet header. Optionally, the packet’s action field can instruct the receiving logic to deliver the data to a physical address provided by the receiver (in a memory-mapped register). In addition, the action field can cause an interrupt to the receiving host processor immediately after packet delivery. An interrupt freezes the incoming data path until the host processor explicitly restarts it by writing to a special control register. For software flexibility and debugging support, the user can program the receiving logic to override the actions indicated in the packet’s action field. Specifically, the receive control register can ignore the action field and use the physical address from the special receive register as the destination address for the next incoming packet. The user can also program the receive control register to interrupt the CPU after every packet (or never to intermpt). Finally, the user can program the receive logic to freeze the incoming data path after each packet arrival, an action useful for debugging. The SHRIMP-I network interface supports both traditional message passing and virtual-memory-mapped communication. In traditional message passing, the receiver provides the destination address, and an interrupt is raised upon message arrival (as indicated by action bits in the message header). This option allows the operating system kernel to manage memory buffers and dispatch messages. Before virtual-memory-mapped communication can take place, a mapping operation to map a user-level send buffer to a user-level receive buffer is necessary. The mapping operation pins both buffers in physical memory. Once a mapping is established, we can use it to send messages without interrupting the receiving processor. That is, we can perform a receive operation entirely at the user level, without making a system call. Virtual-memory-mapped communication is an optimization that reduces software message-passing overhead at the expense of additional mapping steps and increased consumption of physical memory caused by the pinning of send and receive buffers. If physical memory becomes scarce, one can always use traditional message passing with kernelallocated memory buffers instead of virtual-memory-mapped communication.

SHRIMP-II

network

interface

For the SHRIMP-II network interface, we wanted to provide hardware support for protected, low-latency, user-level message passing to minimize software overhead. The design

EISA bus

r

I

I

Figure 2. SHRIMP-I network

Arbiter

interface

I

data path.

shares the main idea of the SHRIMP-I network interface design-supporting virtual-memory-mapped communication. The principal difference is that the SHRIMP-II network interface implements virtual-memory mapping in hardware, allowing programs to perform message passing completely at the user level with full protection. Figure 3, next page, shows the data path of the SHRIMPII network interface, which connects to both the EISA bus and the Xpress memory extension connector of the Intel Xpress memory bus. The network interface uses the connection with the Xpress bus to “snoop” (detect) ordinary memory write transactions and to filter outgoing data destined for other nodes. It uses the connection with the EISA bus to transfer DMA bulk data between the local memory and the network.

February

1995

23

Xpress bus

Network interface chip Ii

_

Figure 3. SHRIMP-II network

interface

data path.

The key component that allows the SHRIMP-II network interface to support virtual-memory mapping in hardware is the network interface page table. This table has an entry for each physical page of main memory. Each NIPT entry containp information about whether and how a page is mapped, spicifies the destination node and the physical page number that is mapped to, and includes various control information for sending and receiving data. The SHRIMP-II network interface supports two update strategies: automatic update and deliberate update. A user program selects an update strategy for the mapped-out pages

24

IEEE Micro

at the time a mapping is created. The mapping system call uses a write-through strategy to cache pages mapped for automatic update. To initiate an automatic update operation, the source process writes to mapped memory; this write takes place on the Xpress bus. It is convenient to think of the address of this ,write as a physical page number and an offset on that page. While the write is updating main memory, the network interface snoops it and directly indexes into the NIPT, using the page number, to obtain the mapping information. If the page is mapped out for automatic update, the network interface constructs a packet header using the destination and physical mapping information from the NIPS entry, along with the original offset from the write address. The written data is appended to this header, and the now-complete packet goes into the outgoing FIFO buffer. When it eventually reaches the head of the FIFO, the network interface chip injects it into the network. When the packet arrives at the destination processor, the network interface chip puts it into the incoming FIFO buffer. Once the packet reaches the head of this FIFO, the interface again uses the page number to index into the NIPT and determines whether that page has been mapped in. If it has, the EISA DMA logic uses the destination address from the packet to transfer the data directly to main memory. The snooping architecture of the PC system ensures that the caches remain consistent with main memory during this transfer. Therefore, a SHRIMP system can use regular, cacheable DRAM as send and receive buffers for message passing without special hardware. User programs select deliberate update-to obtain the highest transfer bandwidth. Data written to a deliberate-update page does not automatically transfer to the destination node; it transfers only when the user-level application issues an explicit send command. The send command initiates an EISA DMA transfer to move data from memory to the outgoing FIFO and then to the network. Therefore, deliberateupdate pages can be cached with a write-back strategy but must be consistent with the cache at the time the send is initiated. To allow an application to issue user-level commands to control some operations of the network interface without involving the kernel, we provide a mechanism called virtual-memory-mapped commands. The network interface decodes command memory, located in the node’s physicaladdress space but not corresponding to actual RAM. References to command memory simply transmit information to or from the network interface at user level. The current network interface supports one command memory space the same size as the actual physical memory and associates a unique command page with each page of physical memory. Since the two address spaces are linear and of equal size, simply adding or subtracting a fixed off-

set determines the association. The operating system kernel gives a user-level process access to a command page by mapping that command page into the process’s virtual-memory space. For example, if physical page p currently holds the contents of some virtual page of process X, the kernel can give X access to the command pages that control p. This allows Xto tell the network interface how to operate withpdirectly from user level. If the kernel later decides to reallocate p to another process, it can revoke X’s right to access the command pages corresponding top. The command memory mechanism uses physical-address space (but not physical memory) to achieve low-overhead control of the network interface. The amount of physicaladdress space it consumes is a small constant times that of the local physical memory. We currently use the command space to implement the sencl command for deliberate updates.

System

software

support

An advantage of virtual-memory-mapped communication is the diversity of communication models it can support, We designed a simple communication model for supporting multicomputer programs, We do not describe models suitable for other application classes. For all models, the virtualmemory-mapped communication for the network interface requires system calls to create mappings and primitives to send messages using mapped memory. Our multicomputer interface uses two system calls for mapping creation: map-send and map-recv. The first is similar to the following NW2 csend and crecv calls: mapid = map-send(node-id, sendbuf, size)

process-id, bind-id; mode,

where node-id is the network address of the receiving node, process-id indicates the receiving process, and bind-id is a binding identifier (whose function is similar to the message type in the NX/2 send and receive primitives). Mode indicates whether the mapping should be an automatic or deliberate update (meaningless for the SHRIMP-I network interface, which does not support automatic update). Sendbuf is the starting address of the send buffer, and size is the number of words in the send buffer. This call is used on the sender’s side to establish a mapping. It returns a mapid, which identifies this mapping for send operations. For SHRIMP-I, mapid is just an index into a kernel-level mapping table specifically for a calling process. For SHRIMP-II, mapid is the virtual address in the command space corresponding to sendbuf. The second system call is map-recvcbind-id,

recvbuf, size, ihandler)

Here, bind-id is the binding identifier to match the mapping request by the map-send call, recvbuf is the starting address of the receive buffer, and size is the number of words in the receive buffer. If ihdndler is nonnull, it specifies a user-level interrupt handler that will be called for every message received for this mapping. The map-recv call provides the mapping identified by bind-id with a receiving physicalmemory address so that the sender’s side can create a physical-memory mapping for the virtual-memory mapping. The mapping calls will pin the memory pages of both the send and receive buffers into physical memory to create a stable physical-memory mapping. This enables data transfers on both sending and receiving sides without CPU involvement on SHRIMP-II and with minimal sender-side overhead on SHRIMP-I. Every mapping is unidirectional and asymmetric, from the source (sending buffer) to the destination (receiving buffer). A mapping can be established only if the receive buffer’s size is the same as the send buffer’s size. Mapid can be viewed as a handle to select a mapping for a send operation. SHRIMP-I uses it to provide multiple and overlapped mappings for the same memory. SHRIMP-II uses it to calculate the base address in the command space. For security, we must verify that the sending process has permission to transmit data to the receiving process. In our multicomputer programming model, only objects owned by processes belonging to the same process task group can be mapped to each other. The operating system fully controls a process’s membership in a given task group, so all processes within a task group trust each other. For example, processes cooperating on the execution of a given multicomputer program will usually belong to the same task group. Both SHRIMP-I and SHRIMP-II support the following send operation: sendcmapid, send-offset, size) For SHRIMP-I, this operation is a system call that builds a packet for each memory page. It simply looks up the mapid in the mapping table, finds the destination physical address, builds a packet header, and initiates the outgoing data transfer. This call returns immediately after the data is sent out to the network. For SHRIMP-II, the send operation is a deliberate-update macro. If the data to be sent resides within one page, this macro executes a user-level store of size to the address mapid+send-offset in the command space. The network interface decodes this write as a command to initiate the requested transfer from the corresponding physical-memory page. For a message spanning multiple pages, one store is issued for each page. Since a destination object is allocated in user space, both SHRIMP-I and SHRIMP-II can deliver data directly to the user memory without a receive interrupt. The user process can

February

1995

25

Table 1. Message-passing

overhead for three kinds of network designs.

interface

Although the SHRIMP-I network interface requires a system call to send a message, it provides virtual-memoryMessage-passing overhead Traditional SHRIMP-I SHRIMP-II mapped communication with very little additional hardware over the Sender Send system call X X traditional design. software Send argument processing X X We implemented the send operaVerify/allocate receive buffer X tion for both the SHRIMP-I and Preparing packet descriptors X X SHRIMP-II network interfaces using Initiation of send X X X Pentium-based PCs and compared the cost of our send with that of the Hardware DMA data via I/O bus X X X csend/crecv primitives of NX/2. For Data transfer over network X X X passing a small message (less than Data transfer via I/O bus X X X 100 bytes), the software overhead of a send for the SHRIMP-I network Receiver Interrupt service X Optional Optional interface is 117 instructions plus the software Receive buffer management X system call overhead and, optionalMessage dispatch X ly, an interrupt. For SHRIMP-II, the X overhead of a send implemented as Copy data to user space Receive system call X a macro is only 15 user-level instructions, with no system call necessary. In contrast, the software overhead of a csend and a crecv in NX/2 is 483 instructions plus two sysobserve the message delivery, for example, by polling a flag tem calls (csend and crecv) and an interrupt, located at the end of the message buffer. Thus, it can impleFor passing a large message, the primitives for the ment user-level buffer management and avoid the overhead SHRIMP-I network interface require only 26 additional of kernel buffer management, message dispatching, interinstructions for each additional page transferred. For rupts, and making receive system calls. SHRIMP-II, this overhead is only eight user-level instructions. Cost and performance NX/2’s csend and crecv require additional network transacBoth the SHRIMP-I and SHRIMP-II network interfaces sup tions to allocate receive buffer space on the receiving side, port traditional, DMA-based message passing, and provide and they must prepare data descriptors needed by the netthe option of virtual-memory-mapped communication. This work interface to initiate a send. option can eliminate a large amount of software overhead in The cost of mapping on both SHRIMP-I and SHRIMP-II is traditional message passing, such as buffer management and similar to that of passing a small message using csend and crecv in NX/2. For applications that have static communicamessage dispatching. Compared to traditional network interface designs, the tion patterns, the amortized overhead of creating a mapping SHRIMP-I interface only adds the destination physical can be negligib1e.j address of a packet in its header, with receiving logic to delivWe should point out that the semantics of NX/2’s er data accordingly. This simple change makes the network csend/crecv primitives are richer than the virtual-memoryinterface very flexible. Although the interface requires a sysmapped communication supported by the SHRIMP intertem call to send a message, it requires no CPU involvement faces. Our comparison shows that rich semantics often come to receive and dispatch messages. with substantial overhead. Since both SHRIMP network interThe SHRIMP-II network interface supports protected, userfaces support traditional message passing and virtuallevel virtual-memory-mapped communication. Automaticmemory-mapped communication, they allow user programs to optimize for common cases. update mode allows a single store instruction to initiate a send with only the local write-buffer latency. Deliberateupdate mode requires a few user-level instructions to send Related work up to a page of data. Traditional network interface design is based on DMA data transfer. Recent examples include the NCube’ and iPSC/860.” Table 1 shows the overhead components of message passIn this scheme an application sends messages by making opering on three kinds of network interfaces: traditional, SHRIMPI, and SHRIMP-II5 The SHRIMP-II network interface ating system calls to initiate DMA data transfers. The network interface initiates an incoming DMA data transfer when a mesimplements virtual-memory-mapping translation in hardware sage arrives and interrupts the local processor when the transso that the send operation can proceed at the user level.

26

IEEE Micro

fer completes so that it can dispatch the arriving message. The main disadvantage of traditional network interfaces is that message-passing usually takes thousands of CPU cycles. One solution to the problem of software overhead is to add a separate processor on every node just for message passing.‘,‘” Recent examples of this approach are the Intel Paragon and the Meiko ‘3-2, mentioned earlier. The basic idea is for the “compute” processor to communicate with the “message” processor either through mailboxes in shared memory or through closely coupled data paths. The compute and message processors can then work in parallel to overlap communication and computation. In addition, the message processor can poll the network device, eliminating interrupt overhead. This approach, however, does not eliminate the overhead of the software protocol on the message processor, which is still hundreds of CPU instructions. In addition, the node is complex and expensive to build. Several projects have reduced communication latency by bringing the network all the way into the processor and mapping the network interface FIFOs to special processor registers.“-‘” Writing and reading these registers queues and dequeues data from the FIFOs. While this is efficient for finegrained, low-latency communication, it requires the use of a nonstandard CPU and does not support the protection of multiple contexts in a multiprogramming environment. An alternative approach employs memory-mapped network interface FIFOs.’ In this scheme, the controller has no DMA capability. Instead, the host processor communicates with the network interface by reading or writing special memory locations that correspond to the FIFOs. This approach results in good latency for short messages. However, for longer messages the DMA-based controller is preferable because it uses the bus burst mode, which is much faster than processor-generated single-word transactions. Among commercially available massively parallel processors. the machine with the lowest latency is the Cray T3D, which supports shared memory without caching. The T3D requires a large amount of custom hardware design, and it is not clear whether the overhead from sharing remote memories without caching degrades the performance of messagepassing applications.

MINIMAL ADDITIONS TO THE TRADITIONAL network interface design can reduce software overhead by up to 78 percent. With more hardware support, the software overhead for sending a message can be reduced to a single user-level instruction. Although virtual-memory-mapped communication requires the use of map system calls, it can avoid a receive system call and a receive interrupt. For multicomputer

programs that exhibit static communication patterns (that is, transfers from a given send buffer to a fixed destination buffer), the net gain can be substantial. Both the SHRIMP-I and SHRIMP-II network interfaces provide users with flexible functionality, including the specification of data delivery locations and optional receiver interrupt generation. In addition, a good part of both designs consists of programmable logic, allowing for experimentation with hardware protocols at various stages of data transfer. We built a simulator for software development, and we are constructing 16-node prototypes of SHRIMP-I and SHRIMP-II systems. We expect the SHRIMP-I system to be operational sometime during the second quarter of 1995. At that time we also expect to have an operational two-node SHRIMP-II system. c

Acknowledgments We thank Otto Anshus, Doug Clark, Liviu Iftode, and Jonathan Sandberg for numerous discussions of the SHRIMPI and SHRIMP-II network interface designs. We also thank David Dewitt and Jeffrey Naughton for several discussions of the SHRIMP project, and Michael Carey for his innovative suggestion of using SHRIMP as our project name.

References 1. P. Pierce, “The NX/2 Operating System,” Proc. Third Conf. Hypercube Concurrent Computers and Applications. Assoc. of Computing Machinery, N.Y.. 1988, pp. 384-390. 2. R.J. Littlefield, “Characterizing and Tuning Communications Performance for Real Applications,” presentation at the First Intel Delta Applications Workshop, Tech. Report CCSF-14-92, Calif. Inst. of Tech., Pasadena, Calif., 1992, pp. 179-190. 3. ParagonXPIS ProductOverview, Intel Corp., Santa Clara, Calif., 1991. 4. C. Leiserson et al., “The Network Architecture of the Connection Machine CM-5,” Proc. Fourth ACM Symp. Parallel Algorithms and Architectures, ACM, 1992, pp. 272-285. 5. M. Blumrich et al., “A Virtual Memory Mapped Network Interface for the SHRIMP Multicomputer,” Proc. 2lst Int’l Symp. ComputerArchitecture, IEEEComputer Society Press, Los Alamitos, Calif., 1994, pp. 142-153. 6. E.W. Felten, Protocol Compilation: High-Performance Communication for Parallel Programs, PhD thesis, available as Tech. Report 93-09-09, Dept. of Computer Science and Engineering, Univ. of Washington, Seattle, 1993. 7. J. Palmer, “The NCube Family of High-Performance Parallel Computer Systems,” Proc. Thrd Conf Hypercube Concurrent Computers and Applications, 1988, pp. 845-85 1.

February

1995

27

8. 9.

10.

11.

12.

13.

iPSU860 Technical Reference Manual. Intel Corp., Santa Clara, Calif., 1991. R. Nikhil, G. Papadopoulos, and Arvind, “*T: A Multithreaded Massively Parallel Architecture,” Proc. 19th Int’l Symp. ComputerAfchitectufe, ACM, 1992, pp. 156-167. J.-M. Hsu and P. Banerjee, “A Message-Passing Coprocessor for Distributed Memory Multicomputers,” Proc. Supercomputing, CS Press, 1990, pp. 720-729. 5. Borkar et al., “Supporting Systolic and Memory Communication in iwarp,” Proc. 17th /nt’/Symp. ComputerArchitecture, CS Press, 1990, pp. 70-81. D.S. Henry and C.F. Joerg, “A Tightly-Coupled Processor-Network Interface,” Pfoc. Fifth Int’/ Conf Afchitectura/%pport for Programming Languages and Operating Systems, Assoc. Computing Machinery, 1992, pp. 1 1 l-l 22. W.J. Dally, “The l-Machine System,” Artificial intelligence at MIT: Expanding Frontiers, P. Winston and 5. Shellard, eds., MIT Press, Cambridge, Mass., 1990, pp. 558-580.

Matthias A. Blumrich is a PhD candidate at Princeton University, where he is implementing the network interface for SHRIMP-II, His research centers on highperformance computer systems, with a preference for hardware design and implementation. He has worked on a variety of R&D projects, both in academia and industry, ranging from multiprocessor architecture simulation to printed circuit board layout. Blumrich received his BS in electrical engineering from the State University of New York at Stony Brook and his MA from Princeton.

I

Edward W. Felten is an assistant professor in the Department of Computer Science at Princeton. His research interests include parallel and distributed systems, scientific computing, and operating systems. He received the BS in physics from California Institute of Technology and the PhD in computer science from the University of Washington. He is a member of the IEEE Computer Society.

Kai Li is an associate professor in Princeton’s Department of Computer Science. His research interests are operating systems, computer architecture, fault tolerance, and parallel computing. He is an editor of the IEEE Transactions on Parallel and Distributed Sy~stems and a member of the editorial board of the International Journal of Parallel Programming. He received his PhD from Yale University and is a member of the IEEE Computer Society.

Malena R. Mesarina, a technical staff member in Princeton’s Department of Computer Science, is participating in the hardware design of a parallel computer network. Previously, she worked on microprocessor embedded systems as a hardware design engineer at Mayflower Communications, a satellite communications company in Massachusetts. Mesarina holds a bachelor’s degree in electrical engineering from Boston University and is a member of the IEEE and the Society of Women Engineers.

Direct questions to the authors at Princeton University, Dept. of Computer Science: 35 Olden St.. Princeton, NJ 08544; [email protected]. Cezary Dubnicki, a research staff member at Princeton University, designs and implements systems software for SHRIMP. His research interest is architectural and systems support for parallel and distributed programming, including network interface design, high-performance message passing, and distributed shared memory. He received an MS in compute :r science from Warsaw University and a PhD in computer :science from the University of Rochester.

28

IEEE Micro

Reader Interest

Survey

Indicate your interest in this article by circling the appropriate number on the Reader Service Card.

LOW 156

Medium 157

High 158