N33-38 New Communication Network Protocol for a Data ... - Ipen.br

New Communication Network Protocol for a Data Acquisition System Tomohisa Uchida, Hirofumi Fujii, Yasushi Nagasaka, and Manobu Tanaka

Abstract— Event builder based on a communication network (e.g. IEEE 802.3(Ethernet), asynchronous transfer mode (ATM), etc.) has been used in high energy physics experiments. Especially Ethernet with the TCP/IP protocol suite is widely used because its infrastructure is very cost effective. However, in the case of event builder, each event fragment is heading for same destination at the same time so that data flow control is required to avoid congestion. It is difficult to achieve the good transfer efficiency within the TCP/IP protocol suite. In order to solve this problem, we have developed a simple network protocol encapsulated in Ethernet frame. The protocol is designed to keep the sequence of senders so that no congestion occurs. We have implemented the protocol on a small hardware device (field programmable gate array (FPGA)) and measured the performances so that it will be ready for the general Data acquisition (DAQ) system. I.

INTRODUCTION

DAQ system for an event builder is Ewidely usedbased in high energy physics experiments [1] [2] THERNET

[3], because of its high performance and the low price [2]. Fig. 1 shows a schematic diagram of typical Ethernet based DAQ system, which consists of N senders (front-end devices), an Ethernet switch and a receiver (event processor). The receiver collects event fragment data from senders via the switch.

Sender 0 Sender 1 :

Network switch

Receiver

Data flow

Sender N Fig. 1.

Control

Typical Ethernet based DAQ system.

T. Uchida is with The Graduate University for Advanced Studies, 1-1 Oho, Tsukuba-shi, Ibaraki, Japan (e-mail: [email protected]). H. Fujii, is with High Energy Accelerator Organization (KEK), 1-1 Oho, Tsukuba-shi, Ibaraki, Japan. Y. Nagasaka, is with Hiroshima Institute of Technology, 2-1-1 Miyake, Saeki-ku, Hiroshima-shi, Hiroshima, Japan. M. Tanaka, is with High Energy Accelerator Organization (KEK), 1-1 Oho, Tsukuba-shi, Ibaraki, Japan.

In many of DAQ system, the number of event processors is quite smaller than the number of front-end devices. Therefore most of the data from front-end devices are heading for the same event processor. The TCP/IP protocol suite [4] is developed for communications between two terminals and is not designed for aggregating two or more data sources. In the case of Ethernet based DAQ system, congestion occurs at switches. It causes packet losses and large latency. This aggregating behavior depends on switches and it is too complex to predict. Quality of service (QoS) in an IP network is one of the solutions to avoid congestion. However QoS protocol is an additional protocol so that special network devices which support the protocol are required. The performance depends on these devices [5] [6]. TCP/IP and QoS require complex processing. However, many of front-end devices have physical constraints, for example, board size, power consumption and so on. Therefore we have developed a new protocol named data collection protocol (DCP) which is suitable for small hardware implementation. II. DATA COLLECTION PROTOCOL We designed DCP that has small hardware implementation, avoiding congestion, fair data transfer among senders, reliable data delivery, a small hardware implementation and the predictable performance. In order to realize these functions, we introduce a token passing mechanism and a simple retransmission mechanism, which are designed as simple as possible for a small hardware implementation. Ethernet switches have many functions but generally many of these behaviors are too complex to predict. In order to avoid these complexities, we use only packet switching function of switches. The Ethernet switch that has only packet switching function is known as Ethernet switching hub (HUB) that is usually used in small office or a home for constructing small local area network (LAN). If we can use HUB, it is helpful in constructing a cost-effective DAQ system. For using HUB, we need to assume that packets are discarded if two or more senders transmit data to a receiver simultaneously, but DCP avoids congestion, these discards do not occur.

0-7803-8701-5/04/$20.00 (C) 2004 IEEE

A. Frame structure DCP is encapsulated in Ethernet frame. Fig. 2 shows an Ethernet frame structure. In the figure, MAC header includes preamble, start frame delimiter, address fields and length/type field. DCP header length is 16 bytes and consists of packet type and parameters. The mean of the parameters is depends on the packet type. DCP header

DCP payload

16

30… 1484

22

octets

4

46… 1500

MAC header

Payload

octets

FCS

FCS : Frame check sequence Fig. 2. Ethernet frame structure.

B. Token passing mechanism We introduce a token passing mechanism on Ethernet. The mechanism is similar to mechanisms of IEEE802.5 (Tokenring) and Fiber-distributed-data-interface (FDDI). Receiver

Sender 0

Sender 1

Sender 2

Token Data Token Data

Token

ACK Data ACK

Token Token

number. This function is useful for an event builder DAQ system. Because of there is only one sender that transmits to the receiver, DCP avoids congestion. Chances of transfer from senders to a receiver are to be fair with the mechanism, so transfer latency is minimized. C. Reliable data delivery Packet losses may occur due to FCS error or buffer overflow of the receiver. In order to deliver data reliably, we introduce simple sliding window and re-transmission mechanisms. In a sliding window mechanism, DCP use a sequence number (SN) and an acknowledgment (ACK). Conceptually, each octet of data is assigned a SN. Senders transmit data to the receiver with the start SN and the data length. This mechanism enables sender to transmit multiple bytes or packets before waiting for an acknowledgment. A window is the number of data bytes that the sender is allowed to send before waiting for an acknowledgment. Initial window sizes are indicated at setup. In a sliding-window operation, for example, the sender might have a SN to send (numbered 1 to 100) to a receiver and the sender has a window size of 50. The sender then would place a window around the first 50 bytes and transmit them together. It would then wait for an acknowledgment. The receiver would respond with an ACK has SN=51, indicating that it has received bytes 1 to 50 and is expecting byte 51 next. The sender then would move the sliding window 50 bytes to the up and transmit bytes 51 to 100. The receiver would respond with an ACK has SN=101, indicating that it is expecting sequenced byte 101 next. In a re-transmission mechanism, DCP use re-transmission request (RETRANS). The receiver compares the start SN and the expected SN every data received. If these values are equal, the receiver sends the sender an ACK packet. If these values are not equal, the receiver sends the sender a RETRANS with expected start SN and the sender re-transmits data with the requested start SN to the receiver. III. IMPLEMENTATION

Fig. 3. Data transfer with a token passing mechanism.

Sender 100BASE-T

Fig. 4.

NIC

PCI bus

Fig. 3 shows the token passing mechanism in the DCP network. In the figure, time flow is from the top to the bottom. In a DCP network, all senders are logically connected as a ring. In the figure sender 0 through 2 are connected in the ring. Transmitting of senders is controlled by a token, a special packet that circulates through the ring. The sender with the token can transmit data. When it is done, it passes the token to the next sender in the ring. If a sender does not have any data to transmit, it just passes the token to the next sender. We call the ring a logical token ring. Our token ring is logical one, it is possible to establish multiple logical token rings which is identified by token-ring-

The DCP is simple enough to implement on a small hardware component such as a FPGA or a programable logic device (PLD).

FPGA card

Block diagram of the hardware prototype.

We implemented a sender on a FPGA with a general Ethernet controller (EC) chip. Many of general EC chips have PCI-bus interface, so we constructed the hardware prototype

0-7803-8701-5/04/$20.00 (C) 2004 IEEE

on PCI-bus of a PC mother board. Fig. 4 shows the block diagram of this prototype. The hardware prototype system consists of a FPGA card, a general network interface cards (NIC). A NIC consists of only one EC chip that is REALTEK RTL8139D with a 100BASE-T interface and a FPGA card consists of a FPGA. The prototype processes a received packet as follows. First the packet is received by the EC on the NIC. Next the direct memory access controller (DMAC) in the EC transfers the packet data from EC to the FPGA via PCI-bus. Finally FPGA processes the packet data. The prototype transmits a packet as follows. First the FPGA request the DMAC to transfer packet data. Next the DMAC transfer the packet data from the FPGA to the EC via PCI-bus. Finally EC transmits the packet to an Ethernet network. Fig. 5 shows block diagram of the FPGA. The FPGA consists of three functional blocks: a PCI target functional block, a PCI master functional block and three DCP sender functional blocks. PCI target function is used for packet data transfer between NICs. PCI master function is used for informing EC about DMA status. DCP Sender function is used for processing DCP. Since the percentage of used logic elements of a sender is 25%, we can use smaller size FPGA when we implement one sender. The DCP implementation with Gigabit Ethernet is not difficult because FPGA performance is enough for its implementation. FPGA (ALTERA EP1S10780C7ES) PCI target function

PCI bus

PCI master function

DCP Sender function RX

DCP Processor

TX

Fig.5. Block diagram of the FPGA.

IV. MEASUREMENT We measured the DCP system performance. The test bed of this measurement is shown as Fig. 6, which consists of three senders and a receiver. In order to measure the performance, we implement a receiver on Linux operating system (OS). The software is running in user space and it is written by using standard functions of the OS (e.g. socket() functions). Each sender transfers data in a size 100M bytes to the receiver as fast as possible and these data are aggregated by the receiver via the HUB. We used two different HUBs, these

specifications are summarized as Table I. There is not large difference. TABLE I SUMMARY OF HUB SPAECIFICATIONS

Model Frame switching mechanism Number of port Packet buffer size

HUB-A BUFFALO LSW10/100-5P Store and forward

HUB-B COREGA FSW-8A Store and forward

5 (RJ45) 128KByte

8 (RJ45) 128KByte

In Fig. 7 through 11, transfer rate extremely decreases are due to Linux OS, this behavior is not observed on FPGA card. A. Fair transfer We designed DCP that fairly transfers data among senders with a token passing mechanism. In order to observe this fair data transfer, we measured transfer rate variations of each sender. Due to this fair data transfer, we expected that all transfer rates from senders to the receiver to be nearly equal bandwidth occupancy for user data. Fig. 7 shows results of two senders system with HUB-A. All senders fairly transfer data and each transfer rate is about 46% bandwidth occupancy for user data. B. Scalability We measured DCP network scalability in two and three senders system. Total bandwidth occupancy for user data is 92%, which is calculated by a free token round trip time. We expected the values are about 46% for two senders system and about 30% for three senders system. Fig. 7 shows results of two senders system with HUB-A. All senders fairly transfer data and each transfer rate is about 46% bandwidth occupancy for user data, which is expected value. Fig. 8 shows results of three senders system with HUB-A. All senders transfer data fairly and each transfer rate about 30% bandwidth occupancy for user data, which is expected value. C. Independence on HUBs Next we measured the dependences on different HUBs. DCP has not strong dependence on HUBs, Fig. 9 shows results of three senders system with HUB-B. As a result of comparison with Fig. 8 and Fig. 9, we can not find dependence on HUBs.

0-7803-8701-5/04/$20.00 (C) 2004 IEEE

Sender 0 Sender 1

Sender 2

Fig. 6.

Receiver HUB-A or HUB-B Data flow Control

Schematic diagram of the test bed for performance measurements.

Fig. 7. DCP Transfer rate variation in two Senders system with HUB-A.

Fig. 8. DCP Transfer rate variations with HUB-A.

Fig. 9. DCP transfer rate variations with HUB-B.

Fig. 10. TCP transfer rate variations with HUB-A.

Fig. 11. TCP transfer variations with HUB-B.

0-7803-8701-5/04/$20.00 (C) 2004 IEEE

V. COMPARISON TO TCP For comparison to TCP, we measured the data transfer in aggregating data flows same as DCP performance measurements. A. Measurement We wrote programs on Linux for measuring TCP by socket() functions. Before these measurements, we confirmed that average transfer rates of each connection are above 92% bandwidth occupancy for user data in single connection. TABLE II PC SPECIFICATIONS

Sender 0 Sender 1 Sender 2 Receiver

CPU

RAM

Celeron 1.3GHz Celeron 2.6GHz Celeron 2.0GHz Celeron 1.0GHz

128M 256M 512M 256M

Linux distribution Fedora Core 2 Fedora Core 2 KNOPPIX 3.4 Vine Linux 2.6

The system setup of this measurement is same as Fig. 6 in previous section and Table II summarizes specifications of PCs in the system. Each sender transfers data in a size 100M bytes to the receiver as fast as possible and these data are aggregated by the receiver via the HUB. Fig. 10 shows results obtained with HUB-A. Fig. 11 shows results obtained with HUB-B. All transfer rate variations are very large and transfer behaviors are different from each other. These behaviors are too complex to predict the performances. And these results show that the TCP transfer behavior strongly depends on characteristics of network devices. The main reason of these complicated behaviors is a re-transmission mechanism of TCP. Since the TCP is best-effort transfer, all senders independently transmit data simultaneously then packet losses occur and these losses induce re-transmission. In consequence these independent re-transmissions, the TCP behavior is to be so complex and hence its performance prediction is difficult. Certainly the TCP is reliable data delivery but is not predictable performance in aggregating by one self.

Because of the token passing mechanism keeps fairness transfer among senders, all transfer rate variations are to be same. The fairness transfer enables to minimize buffer capacities of senders and data transfer latency variation. VI. SUMMARY We have developed a new data communication protocol for a data acquisition (DCP). The protocol is simple and compact enough to implement a small hardware device. It also has small latency, reliable data delivery and predictable behavior. These results show that we can construct a cost effective DAQ system. VII. REFERENCES [1] [2] [3] [4] [5] [6]

T.J Pavel, et al., “Network Performance Testing for the BaBar Event Builder”, Proceedings of the CHEP'98 Conference. S. Stancu, et al., “The use of Ethernet in the Data Flow of the ATLAS Trigger & DAQ”, Proceedings of the CHEP'03 M. Nakao, et al., “Switchless Event Building Farm For The BELLE Data Acquisition System”, IEEE Trans. on Nuclear Science, 48, 2385-2390, 2001 W. R. Stevens, “TCP/IP Illustrated, Volume 1: The protocols”, Addison Weslay, 1994 Y. Yasu, et al., “Quality of Service on Gigabit Ethernet for Event Builder”, The 3rd International Data Acquisition Workshop on Networked Data Acquisition Systems, Lyon, France, October 20, 2000 Y. Yasu, et al., “Quality of Service on Linux for the Altas TDAQ Event Building Network”, CHEP2001

B. Comparison Since data transfer behavior of TCP is complex, the performance prediction is difficult. TCP has strong dependence on switches which have same specification (see Table I). This result shows that packet switching behavior depends on internal structure of switches. It is difficult for predicting or evaluating the performance. In contrast to TCP, data transfer behavior of DCP is quite simple. It does not have strong dependence on switches so that we can predict the performance with independent measurable parameters; the free token round trip time, the ACK round trip time, sender performance and receiver performance. Once we measured these parameters, we can predict the behavior of DCP data transfer.

0-7803-8701-5/04/$20.00 (C) 2004 IEEE