Investigations on fault tolerant clock ... - Semantic Scholar

2 downloads 0 Views 679KB Size Report
to synchronize clocks in such a system with special support of IEEE1588 compliant master groups. In the lower levels of the hierarchical system, attention has to ...
Investigations on fault tolerant clock synchronization within a powerline communication structure Georg Gaderer1 , Patrick Loschmidt1 , Thilo Sauter1 , Gerd Bumiller2 Unit for Integrated Sensor Systems, Austrian Academy of Sciences Viktor Kaplanstrasse 2, A-2700 Wiener Neustadt {Georg.Gaderer, Patrick.Loschmidt, Thilo.Sauter}@OEAW.ac.at 2 iAd Ges.m.b.H, Grosshabersdorf Germany [email protected]

1 Research

Abstract— In modern powerline communication (PLC) systems, clock synchronization is a very crucial issue. First the PLC network itself needs synchronized clocks for controlling the time-sliced communication, second also backbone networks and access points have to be coordinated in a fault tolerant fashion in order to ensure fast log-on and log-off of nodes travelling from one access point to another. This paper presents an approach to synchronize clocks in such a system with special support of IEEE 1588 compliant master groups. In the lower levels of the hierarchical system, attention has to be paid to the special behaviour of the PLC network. To tackle this, a methodology to use the IEEE 1588 format and protocol stack is presented. Finally measurements of the behaviour of the clock quality are analysed for both, Ethernet and PLC, by evaluating the Allan deviation.

I. I NTRODUCTION In recent years, the emerging technology of powerline communication overcame one of the most cost-intensive drawbacks of conventional data transmission, the expensive channel infrastructure. State of the art broadband powerline technology can only battle with last mile technology, but it currently seams that due to the wide spread usage wireless communication has strategical advantages. This mainly results from the reduced costs in the mass market for wireless shared media transmission technology. Another important market issue is the situation of powerline communication in medium area networks. In this case the hierarchical structure of the powerline cabling from the high voltage transformer station over the medium voltage transformer to the costumer supports the data distribution flow especially for control and metering applications which typically use master slave in combination with one to many communication schemes at low data rates. It is also known that the channel disturbances are dramatically decreasing with the powerline voltage level. The ongoing enhancements lead to changes in the market, observing the possibilities of this type of communication in medium area networks. Despite of obvious applications for synchronized clocks like data timestamping, it is shown in [1] that distributing tasks in this kind of networks require local clocks at every node. With the introduction of synchronized clocks, the possibilities of c 1-4244-0113-5/06/$20.00 2006 IEEE.

such networks are enhanced by the better implicit coordination between the nodes. Besides that, there are additional applications like logging and data collecting tasks. In these cases not the delivery time, but the sample time of data is essential and benefits from synchronized nodes. Examples for such applications are fault detection in energy supply lines or volume balancing in water distribution grids. The latter helps water suppliers to detect leakage and theft: By making snapshots of water meters at predefined times a balance can be drawn, and by comparing it with the amount of water feed-in manipulation of the supply network can be detected. Moreover synchronized clocks are not only a middleware service for measurement and distributed computing, they additionally are needed for the powerline itself as a frameclock supply. Yet, long term errors of clocks can be cancelled out, since synchronization is usually done via adaptation of the clock rate. Through the distributed character of the clock supply node and the control servo for the rate, this structure is in fact a distributed PLL. Consequently the whole network is able to run a TDMA (Time Division Multiplexed Access) based communication scheme relying only on the node’s internal clocks. Further, the common time base allows switching power supply lines, resulting in a new hierarchy, without the need for resynchronization. Also, the introduction of powerline synchronized clocks in low bandwidth, large area PLC networks has consequently the advantage of synchronizing whole cities with high accuracy at costs not achievable with conventional approaches. Thus, there is no need for external synchronization (e. g. GPS, DCF77) in every node, but (in the ideal case) only in the master of the whole network. Addressing the above mentioned application fields and considering the properties of time as a communication variable, fault tolerance has to be included in this kind of real-time networks. To ensure a highly stable reference time, the clock values are distributed from reference nodes to clients. For example, if, due to architectural reasons, all reference nodes are shrunk to one single clock node in the system, this node

178

is a typical single point of failure. Even if a second reference node is provided on a hot standby basis, the system is out of sync during the time-consuming switch-over to the new clock reference. Ideally the loss of one master is not even noticeable for the clients. Showing the measures for a fault tolerant clock synchronization the further aim of this paper is to present the first results of clock synchronization of the REMPLI network. The remainder of this paper is structured as follows: After a short survey on the quality of clocks and the state of the art in clock synchronization, chapter 4 discusses the boundary conditions for clock synchronization in power line networks. Finally a solution is proposed and evaluated on experimental basis in chapter 5 and 6, respectively. A conclusion rounds up the paper and discusses possibilities for further work. II. Q UALITY OF C LOCKS Describing the quality of a clock is rather straight-forward, therefore for example, the influence of non-perfect oscillators can be directly observed. This section gives some short definitions of later on used factors. A clock p runs with the accuracy α if the value of the clock Cp (t) is at every moment of the observation period T less or equal α, which means |Cp (t) − t| ≤ α.

(1)

The precession π of a clock is defined as the maximum interval between of two clocks p and q which satisfies the condition |Cp (t) − Cq (t)| ≤ π.

(2)

A system, which is setting it’s clock control parameters in a way, that the precession π (the difference of the clocks with respect to each other) is kept as low as possible, is said to perform internal clock synchronization In opposite to that a system, trying as well to optimize the difference to an worldwide timescale such as TAI (International Atomic Time) or GPS-time does external clock synchronization. The Allan deviation is used to estimate the stability of a clock. The introduction of this variance is necessary, since the classical variance diverges for random walk noise. The Allan deviation converges for all noise commonly observed in crystal oscillators, it is easy to compute and faster as well as more accurate in estimating noise processes. The Allan deviation σy is defined by v u N −1 u X 1 (yi+1 − yi )2 (3) σy (τ ) = t 2(N − 1) i=1 where yi is a set of frequency offset measurements and N is the number of values. All N measurements must be spaced in equally spaced segments of τ seconds. Equally, each frequency offset can be also estimated by evaluating the phase offsets xi . Since yi = xi−1τ−xi the Allan

deviation can be also estimated after a realignment of the sum border by evaluating v u M −2 u X 1 (xi+2 − 2xi+1 + xi )2 (4) σy (τ ) = t 2(M − 2)τ 2 i=1 with M samples of xi . At a later point of this paper the Allan Deviation will used in a modified form to have every clock synchronized node determine the quality of its local clock. III. S TATE OF THE A RT Although clock synchronization is well investigated in the academic laboratory environment, the practical introduction into PLC needs a re-investigation of the developed techniques. There are essentially two competing paradigms in clock synchronization: the democratic and the master-slave style approaches. A. IEEE 1588 The story of a simple, master-slave clock synchronization with adequate performance in real-time networks is a quite successful one. Since the publication of the IEEE 1588 standard, many industrial products are using this protocol. Besides the obvious advantages of IEEE 1588, namely the simplicity and potentially high accuracy, the protocol has one drawback regarding the fault tolerance: The master is per definition a single point of failure. The standard foresees a so-called Best Master Clock algorithm, which is an election of the master who tells the others he has the best clock as reference. Although the standard covers the re-election after a failure still two issues remain: one is byzantine master problem, which occurs if the master sends out as an so-called babbling idiot obviously wrong time values. This can also lead to multimaster configurations in which not only the network traffic increases, but also the slaves face the problem to decide which is the better reference. The other is the complete failure: since the re-election of a master takes time, all other clocks run in free cycling mode during that period. B. SynUTC Besides the fully master-slave oriented clock synchronization also democratic algorithms are well investigated [2]–[4]. These algorithms take the time of all reference clocks and combine the values to one ensemble clock value. A very basic approach would be to combine all samples by calculating the mean value. This solution has the obvious disadvantage that any byzantine node has the potential ability to decrease the ensemble accuracy. The same is true for the simple case that some clocks might be wrongly adjusted. Advanced algorithms like the external clock synchronization [4] ensure that at least 2F + 1 reference time servers are needed to mask F arbitrary failures of reference time servers. The same requirements are addressed by [3]. The difference is that locally, not only the clock value is maintained, but rather a confidence interval. This interval is adjusted according to the local precision, which has the advantage, that nodes which are included in the ensemble

179

time can be weighted according to their internal structure. E. g., a GPS clock delivers a highly precise externally aligned clock if the transceiver has connection to his satellites. Yet an OCXO-controlled non-GPS node is, once synchronized, able to maintain a high accuracy clock. Finally the main advantage of SynUTC is the continuously high performance, since the drawback of a master re-election does not appear due to its democratic approach.

the hierarchy to another due to the load balancing of the energy suppliers mentioned before. This leads to the need of a high performance clock synchronized backbone network, ensuring any access point is able to synchronize its nodes with the same reference clock. More details on the system concept can be found in [5]. Notably all the arguments so far require only a high of the nodes, not accuracy. V. P ROPOSED S OLUTION

C. Discussion of applicable approaches Section III-A has shown that in the IEEE 1588 protocol the master is a single point of failure, whereas completely democratic architectures also do not seem to be reasonable. This is due to the fact that in a network with n participants every node has to tell the remaining (n−1) peers its local time together with the confidence interval. All other nodes need to distribute that time too, and therefore Mdemoc = n × (n − 1)

(5)

unidirectional communication links have to be established to distribute the synchronization data, where Mdemoc is the number of links. This is significantly more than the (n − 1) links required for the strict master-slave principle of IEEE 1588. Furthermore, the efficiency η decreases with the number of nodes, 1 n−1 = . (6) ηdemoc = n × (n − 1) n

Analysing the margin conditions of powerline networks, it seams clear to divide the different tasks within the network structure into three stages: • A highly fault tolerant backbone network, setting up the reference time. Nodes in this network are equipped with a GPS receiver, MCXO, OCXO or even an atomic clock. • An underlying access network, where the access points reside. The main purpose of this network is high speed data transport from and to the access points, providing synchronized clocks for • the PLC network, where the nodes, which communicate with the access points are finally synchronized over bridges or additional repeater levels. The second network can be, depending on the application needs, combined with the first, by shrinking it and providing the reference time with one node, which is also an access point. A. Setting up the Reference Time

IV. M ARGIN CONDITIONS FOR PLC Engineering applications and middleware services in powerline networks is, compared to high performance networks, a challenging task. The most obvious difference is the limited bandwidth and the problem of the highly disturbed channel. Whereas in the case of Ethernet, a typical error can be expected to be some damaged or lost frame. The burstiness of disturbances on the powerline channel potentially stops all communication for longer periods in time. Although applications like data collection can deal with those drawbacks, the achievable precession of clock synchronization is limited in principle in that case. The PLC communication topology has also special influence on the clock synchronization. Since the REMPLI system uses PLC from the high voltage transformer station, where the access point resides, over medium voltage networks down to the low voltage grid, the topology may always change due to load regulation from the energy supplier. This behaviour is relatively easy to solve for usual data communication: It has to be ensured that packets from the access point do only arrive once at the node or will be ignored. For the clock synchronization a second issues has to be taken into account: the variable delay, and therefore the variable jitter within the network. Another issue is the relatively high number of nodes using a shared medium, which is also an application for synchronized clocks in REMPLI: The common frameclock. This frameclock has to be aligned system wide, in order to ensure a fast re-logon of nodes travelling from one point in

As already mentioned the reference time is kept in a highly reliable backbone network, which consists of Ethernet nodes, each enabled to keep a highly accurate time source. For cost reasons, and the advantage of the ability to keep the absolute reference time, this is done via GPS receivers. These are coupled directly to the nodes of the master group which can be interpreted as a fully democratic subnet where each member of the group talks to all others. This approach has the advantage that a failure of a single master has hardly any influence on the slaves associated, except that the overall accuracy of the master group is reduced. Within the master group the democratic SynUTC protocol is used to synchronize the participating nodes. The mastergroup nodes determine a fault-tolerant average value of the current time and pass it on to all IEEE 1588 slaves. The transmission takes place via a so-called master group speaker, represented by the switches in Figure 1. This speaker and the switches for the cross-linking of the master group must also offer the possibility for redundancy on the physical and on the protocol layer. The associated master for each node is the speaker of the superordinate group which acts transparently like an IEEE 1588 master and passes the ensemble time from the group downward. The very heart of this approach is to enhance IEEE 1588 networks with this transparently integrable master group to a hybrid architecture in order to increase stability and fault tolerance. The efficiency for m masters (including the group speaker) with n nodes

180

GPS Receivers Master Group

Fault Tolerant Switches/Group Speaker IEEE1588 Slaves

Fig. 1.

Master group concept

compared to the IEEE 1588 master-slave principle is given by ηhybrid =

m+n−1 (m − 1) + n = 2 . m × (m − 1) + n m −m+n

(7)

Note that for the common case of only a few masters synchronizing substantially larger number of client nodes (m  n) the complexity approaches the one of the master-slave method. B. Access Network The access network is in the REMPLI project an Ethernet network, where clocks are synchronized using the IEEE 1588 standard for clock synchronization. To gain a precession below 100 µs, also adapted Ethernet hardware is required, capable of timestamping arriving packets and cancelling out communication delays. This specialized hardware needs and implementation are described in [5]. C. Powerline Network The PLC physical layer communication is done on a separate ASIC, with a signal processing unit for mixing, filtering, up- and subsampling as well as synchronization detection. A finite state machine (FSM) controls all states for transmitting and receiving data. This FSM can be configured from an integrated DSP and is fully predictable, which allows easy verification of all states. In order to detect synchronization events properly, the complete module performs an energy normalized correlation of complex synchronization sequences on the equivalent complex base band. Further it generates, by the use of a threshold, an frame clock event for connected on-board modules. As a synchronization sequence a complex Barker sequence or other especially for single frequency networks designed complex synchronization sequences [6] can be used. At the detection of a synchronization event the state of the counter, used for the state machine to control transmit and receive, is automatically stored into a register. The difference between expected and happened synchronization event is calculated. After the network management has verified that this burst is valid (correct CRC), it is passed on to the DSP for a software PLL to adjust the timings of the state machine. Since the PLC physical layer communication

is handled by a separate, specialized DSP, only the frame clock events, which are automatically generated by the state machine for PLC communication can be detected by other REMPLI modules. Consequently these events are used by the clock synchronization core which also generates timestamps in the Ethernet case. This has the advantage that the format of timestamps is, within the PLC network as well as in the backbone networks, fully compliant to the IEEE 1588 format. The principle is shown in [5]. Nevertheless, a clock synchronization using pure IEEE 1588 does not seam applicable to PLC for a number of reasons: The communication delay between powerline-master and slave is compared with the delay in the opposite direction highly asymmetric. IEEE 1588 is not able to deal with that form of delay, since the measurements are only initiated by the slaves and therefore only the mean between both delays will be taken into account. Secondly, IEEE 1588 is in general not able to synchronize clock with a high precession in networks where the repeater levels may change at any moment, since a changing repeater level automatically has also effect on the network delay. Finally, a re-synchronization of with two packets each event, sent by one master in a network with potentially hundreds of nodes connected with a shared medium overloads a network with limited bandwidth as powerline.

VI. R ESULTS The experimental part of the proposed clock synchronization is done in two stages: The fault tolerant backbone network and access point synchronization and the powerline synchronization’s performance can be evaluated by means of jitter-analysis of the delivered frameclock. Results for the fault tolerant backbone network can be found in [7]. Nevertheless the IEEE 1588 synchronization needs to be further evaluated, not only due to the enhanced hardware, used in the REMPLI project. First, as presented in [5], the access-point, as well as the the node hardware, relies on the novel Hyperstone Hy32SX processor, which has a special cell supporting IEEE 1588 clock synchronization. Second, the setup itself has influence on the servo control. In the case of the investigated structure the application for clock synchronized nodes is the generation of pulses, with a frequency of a few kHz, which have to be coupled together with a very low phase error. In the common case the result of those errors is a phase jitter between two nodes. Nevertheless, the servo design is influenced by this requirement. The clocks have to synchronize in terms of phase error as fast as possible, but may take longer to log into the absolute time of their respective master(-group).

A. Backbone network The backbone network is evaluated by observation of the already mentioned Allan deviation as well as investigation of the dynamic Allan deviation. The latter is defined by

181

Raw Phase Error

−6

x 10

Dynamic Allan Devaition with power−up errors (100s window)

−2.1 −2.2 −2.3

10

−2

10

−4

10

−6

10

−8

−2.5

σ(t,τ)

Phase Error at slave (s)

−2.4

−2.6 −2.7 −2.8

50

−2.9

100

10

0

150 −3

200 250

10

300

−7

10

−8

400

τ

t

Phase Error in the stable case

Fig. 4.

Allan Deviation over whole sample interval

1

Dynamic Allan deviation during power-up

60dB Damper

60dB Damper

Trig

60dB Damper

1

2

60dB Damper

3

PLC Slave

10

350

PLC Master

−6

300

σ(τ) (s)

10

200 250 Time t (s)

PLC Repeater

Fig. 2.

150

PLC Repeater

100

PLC Repeater

50

4

DSO

Fig. 5.

10

case as soon as the servo has reached its stable state. The respective Allan deviation is calculated with a window size of 100 seconds. Concluding, the gained results show, that for the stationary case the clock at an access point shows a white phase noise. Moreover this behaviour is the same of the synchronization cell-driving oscillator, which means that the phase noise is not influenced by the synchronization algorithm, but the absolute time of the clock is aligned to the one of the IEEE 1588 master.

−9

10

0

Fig. 3.

v u u u σy (τ, t) = t

10

1

τ

10

2

10

3

Allan Deviation for the stationary case of Figure 2

1 2(bT c + 2)τ 2

i−bt+T c−2

X

Test setup for PLC

(xi+2 − 2xi+1 + xi )2 .

i=bic

(8) In that case T is the window size, defining for a one sample per second evaluation. With this deviation the Allan deviation as a function of time can be observed. For the final evaluation the clock synchronization between the access points was evaluated. The IEEE 1588 master was considered as a perfect clock, because the mastergroup behind it does transparent clock synchronization and only internal clock synchronization is needed. The stationary case is shown in Figure 2. A rough analysis of the curve shown in this figure suggests that the phase error is dominated by white phase noise. Further investigations using the conventional Allan deviation as in Figure 3, support this theory. Finally, the investigation of the clock quality has to show the non-stationary case. Figure 4 shows this successive change of the dynamic Allan deviation during power up. It can be seen, that the clock quality reaches the value of the stationary

B. Powerline Network To measure the quality of the PLC synchronization in the laboratory, the test setup shown in Figure 5 was chosen. A PLC master running on his local oscillator and equipped with special test software is used to supply the slaves with synchronization packets. This master unit transmits in configurable time intervals only PLS telegrams, representing the only communication telegrams on the line. The frame clock of the PLC master is used as external trigger event for the digital storage oscilloscope (DSO). Due to the fact that the interface to the powerline itself has no direction, dampers of 60 dB are used to emulate the powerline network. This further guarantees on the one side a good communication with the neighbouring PLC station and on the other side no communication with the station after next. Thus, a package generated by the PLC master reaches the slave after being three times forwarded by the intermediate repeaters. Because this is just a synchronization test, no responses to the master

182

Bandwidth 117 kHz 117 kHz 18 kHz 18 kHz

PLSTrate low high low high

1 11 µs 12 µs 70 µs 65 µs

Jitter@Repeater level 2 3 4 15 µs 31 µs 33 µs 16 µs 21 µs 26 µs 100 µs 121 µs 142 µs 85 µs 105 µs 130 µs

VII. C ONCLUSION

TABLE I J ITTER

MEASUREMENTS IN THE

REMPLI PLC

ENVIRONMENT

are sent. Each PLC repeater and the PLC slave generates a frameclock, which is connected to the DSO. This allows the measurement of the jitter for each repeater level with respect to the master clock. The results are shown in Table I, whereas the master using the PLST rate high transmits a PLS telegram every 12th frame and every 505th frame, if it is set to low PLST rate. In the final system it will not be possible to use this high rate for PLS telegrams, due to the resulting overhead in communication. Consequently, for synchronization of the frame clock every packet coming from the master will be used, resulting in a effectively higher rate. The allowed number of repeater levels to reach the target node is limited in normal packages and consequently does not guarantee a synchronization for far end nodes. Only broadcasts like the PLS telegram are able to reach every node in the network, even the ones, which are not logged in. The special PLS telegrams can also be used during standard operation. Thus, the presented test is like a worst case scenario for the synchronization, since no other traffic is on the line. During normal operation the situation for the lower repeater level will be more like high PLST rate and only for the highest used repeater level like low rate. Due to the better synchronization of the lower repeater levels the jitter for the last repeater level, depending on the previous ones, will also be better.

The presented paper outlines the issues occurring when using clock synchronization in environments, which raise the claim for redundancy and fault tolerance. As presented in chapter III-A the failure of a single clock synchronization master leads to instability in the connected network. Consequently, the introduction of master groups in the backbone network significantly improves the overall clock quality. The loss of synchronization messages during the election of a new master is avoided, which results in the provisioning of steadier deviation boundaries. The presented measurements show that the quality of clock synchronization achievable in practice meets the requirements for the given project. Further investigations will deal with the cascading of multiple time control loops within a heterogeneous network (Ethernet, PLC) and additional improvements to the synchronization quality concerning the algorithm used. R EFERENCES [1] L. Lamport, “Time, clocks and the ordering of events in a distributed system,” Communications of the ACM, vol. 7, p. 558.565, 1978. [2] H. Kopetz, Design principles for Distributed Embedded Applications. Kluwer Academic Publishers, 1997. [3] U. Schmid, M. Horauer, and N. Ker¨o, “How to distribute GPStime over COTS-based LANs,” in Proceedings of the 31th IEEE Precise Time and Time Interval Systems and Application Meeting (PTTI’99), Dana Point, California, Dec. 1999. [Online]. Available: http://www.auto.tuwien.ac.at/Projects/SynUTC/papers.html [4] C. Fetzer and F. Cristian, “Integrating external and internal clock synchronization,” J. Real-Time Systems, vol. 12, no. 2, pp. 123–172, Mar. 1997. [5] G. Gaderer, T. Sauter, and G. Bumiller, “Clock synchronization in powerline networks,” in Proceedings of the 2005 IEEE International Symposium on Power Line Communications and its Applications, April 2005, pp. 71–75. [6] G. Bumiller, “Verification of single frequency network transmission with laboratory measurements,” in submitted to ISPLC 2006, March 2006. [7] U. Schmid, “Synchronized UTC for distributed real-time systems,” in Proceedings 19th IFAC/IFIP Workshop on Real-Time Programming (WRTP’94), Lake Reichenau, Germany, 1994, pp. 101–107. [Online]. Available: http://www.auto.tuwien.ac.at/Projects/SynUTC/papers.html

183