Parallel computation of the euclidean distance ... - Semantic Scholar

4 downloads 9074 Views 1MB Size Report
Euclidean distance transform (3D-EDT) of voxel ai; j; k with respect to the 1-voxels ... J. Seitzer is with the Department of Computer Science, The University of. Dayton, Ohio. ...... Yu-Hua Lee received the BS degree in electrical engineering from ...
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS,

VOL. 14,

NO. 3,

MARCH 2003

203

Parallel Computation of the Euclidean Distance Transform on a Three-Dimensional Image Array Yu-Hua Lee, Shi-Jinn Horng, and Jennifer Seitzer Abstract—In a two- or three-dimensional image array, the computation of Euclidean distance transform (EDT) is an important task. With the increasing application of 3D voxel images, it is useful to consider the distance transform of a 3D digital image array. Because the EDT computation is a global operation, it is prohibitively time consuming when performing the EDT for image processing. In order to provide the efficient transform computations, parallelism is employed. In this paper, we first derive several important geometry relations and properties among parallel planes. We then, develop a parallel algorithm for the three-dimensional Euclidean distance transform (3D_EDT) on the EREW PRAM computation model. The time complexity of our parallel algorithm is Oðlog2 NÞ for an N  N  N image array and this is currently the best known result. A generalized parallel algorithm for the 3D-EDT is also proposed. We implement the proposed algorithms sequentially, the performance of which exceeds the existing algorithms (proposed by Yamada, Toriwaki). Finally, we develop the corresponding parallel programs on both the emulated EREW PRAM model computer and the IBM SP2 to verify the speed-up properties of the proposed algorithms. Index Terms—Computer vision, Euclidean distance, distance transform, image processing, parallel algorithm, three-dimension, EREW PRAM model.

æ 1

I

INTRODUCTION

a two-dimensional space, we usually consider a binary image represented as an N  N array of 1s and 0s as pixels in which the cluster of 1s (black pixels) corresponds to the components of the scene, and all other space, 0s (white pixels) correspond to the background. Often, we are interested in the shape and position of the black pixels relative to each other. The extraction of such information from a binary image can be simplified considerably by using a number of computational techniques. Some of the most important ones include the medial axis transform (MAT) introduced by Blum [5] and the distance transform (DT) introduced by Rosenfeld and Pfaltz [31], [32]. The two-dimensional DT is an operation that converts an image array of black and white pixels to an image array of pixels where each has a value denoting the distance to the nearest 1-pixel. A 2D binary N  N image array can be represented by ai; ¼ 0 or 1, for i; j ¼ 0; . . . ; N ÿ 1. The two-dimensional Euclidean distance transform (2D-EDT) is defined as follows: Let B2D ¼ fðx; yÞ : ax; y ¼ 1g represent the coordinates of the 1-pixels of the binary image. The N

Euclidean distance of pixel ai; j with respect to the 1-pixels is computed by d2i; j ¼

min ði ÿ xÞ2 þ ðj ÿ yÞ2 ;

ðx; yÞ2B2D

for all i; j ¼ 0; . . . ; N ÿ 1: Whereas, the elements of a two-dimensional image array are called pixels, the elements of a three-dimensional image array are called voxels. Similarly, the three-dimensional DT is an operation that converts a 3D image array consisting of black and white voxels to a 3D image array where each voxel has a value or coordinate that represents the distance to the nearest black voxel. A 3D binary N  N  N image array can be represented by ai; j; k ¼ 0 or 1, for i; j; k ¼ 0; . . . ; N ÿ 1. Let B3D ¼ fðx; y; zÞ : ax; y; z ¼ 1g represent the coordinates of the 1-voxels of the 3D binary image. The three-dimensional Euclidean distance transform (3D-EDT) of voxel ai; j; k with respect to the 1-voxels is computed by d2i; j; k ¼

min

ðx; y; zÞ2B3D

ði ÿ xÞ2 þ ðj ÿ yÞ2 þ ðk ÿ zÞ2 ;

for all i; j; k ¼ 0; . . . ; N ÿ 1: . Y.-H. Lee is with the Information and Communication Research Division, Chung-Shan Institute of Science and Technology, Lung-Tan, Tao-Yuan, Taiwan, Republic of China. E-mail: [email protected]. . S.-J. Horng is with the Department of Electrical Engineering, National Taiwan University of Science and Technology, Taipei, Taiwan, Republic of China. E-mail: [email protected]. . J. Seitzer is with the Department of Computer Science, The University of Dayton, Ohio. E-mail: [email protected]. Manuscript received 28 Sept. 2000; revised 11 Feb. 2002; accepted 20 Aug. 2002. For information on obtaining reprints of this article, please send e-mail to: [email protected], and reference IEEECS Log Number 112918. 1045-9219/03/$17.00 ß 2003 IEEE

Because the 2D-EDT and 3D-EDT are global operations, they are prohibitively expensive. Computation of the EDT in image processing is extremely time consuming. Thus, there are many EDT approximation algorithms that have been proposed in the past. For example, Danielsson [12] proposed a Euclidean distance propagation algorithm by a sequential operation using a two-component descriptor. These algorithms are based on the city block distance metric, the chessboard distance metric, or the octagonal distance metric, a combination of the first two. Other Published by the IEEE Computer Society

204

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS,

metrics were also used in the literature, such as chamfer distance [6], [7], quasi-Euclidean distance [7], and weighteddistance [14]. Each of the EDT approximation algorithms attempts to compute distances that are as near as possible to the actual Euclidean distances. The DT is extensively applied in the image processing area. It is usually used for morphological operations, expanding or shrinking objects [3], image matching, image compression [27], machine-vision and 2D or 3D computer graphics [15], [36], skeletonization, and computing Voronoi diagrams [4], [38]. See [8] for an overview of applications using the distance transform. Yamada [37] first proposed a parallel algorithm for computing the actual 2D-EDT problem with N  N image array, which runs in OðNÞ time using N  N processors on an eight-neighbor connected mesh. The sequential time complexity of Yamada’s algorithm is OðN 3 Þ. Paglieroni [28], [29] proposed a sequential algorithm for computing 2D-EDT problem by the scan approach. The time complexity of Paglieroni’s algorithm still takes OðN 3 Þ time. However, a great advantage of Paglieroni’s algorithm is the computation of the 2D-EDT dimension by dimension. This property makes their algorithm easily parallelized and implemented, and also suitable for hardware implementation and real-time applications. They also proposed a special hardware architecture for real-time applications. Most image processing problems require real-time computation; parallel computation of these problems is most natural. Kolountzakis and Kutulakos [19] proposed an OðN 2 log NÞ sequential algorithm for the same problem. They also parallelize their algorithm on an EREW PRAM model, the time complexity of which is OðN 2 log N=pÞ using p processors, where 1  p  N. More recently, Chen and Chuang [10] proposed an algorithm to reduce the time complexity. They showed their algorithm could be run in OðN 2 =p þ N log NÞ on an EREW PRAM model, where 1  p  N. Chuang [9], [10] also proposed an algorithm for computing the 2D-EDT problem on a mesh-connected SIMD computer. For an N  N image array, their algorithm runs in OðNÞ time on a 2D N  N torus-connected mesh. Lee et al. [23], [22] proposed some fast parallel algorithms. In Borgefors [23], [22], the parallel algorithms were given for the computation of the exact Euclidean distance transform. The running time is Oðlog2 NÞ, both on the EREW PRAM model and the hypercube computer with N 2 processors [22]. The running time is Oðlog NÞ, both on the mesh of trees model using N  N  logNN processors and on the hypercube computer using N 2:5 processors [23]. Recently, Pan et al. [30] proposed the parallel algorithm for the same problem, their algorithm runs in Oðlog2 NÞ time on a 2D N  N reconfigurable mesh. With the increasing prevalence of 3D voxel images, it is useful to consider the distance transform of a 3D digital image array. Saito andToriwaki [34] presented several EDT algorithms based on the scan approach for

VOL. 14,

NO. 3,

MARCH 2003

an n-dimensional image array. For the 3D-EDT problem, Toriwaki’s EDT algorithm also takes OðN 4 Þ time complexity. In the past, we focused on studying the 2D distance transform [20], [21], [22], [23], [24], [25]. Now, we consider the 3D-EDT problem and develop the parallel algorithm for it on the EREW PRAM model. In this paper, we first derive several important geometry relations and properties among parallel planes. We then, develop the parallel algorithm for the 3D-EDT using the EREW PRAM computation model. The time complexity of our parallel algorithm is Oðlog2 NÞ for an N  N  N image array. The generalized parallel algorithm for the 3D-EDT is also shown. Last, we implement the proposed algorithms serially, and compare the performance with other proposed algorithms. Also, we develop a parallel program on an emulated EREW PRAM computer and on an IBM SP2 to verify the performance of the proposed parallel algorithms. The organization of this paper is as follows: In Section 2, we introduce the EREW PRAM model and some notations upon which our algorithm is based. In Section 3, several important theorems are derived. These are the essential concepts of our parallel algorithm. In Section 4, we describe our parallel algorithm named procedure 3D EDT EREW in detail. A generalized parallel algorithm for the 3D-EDT is also proposed in this section. In Section 5, the implementation of the proposed algorithms is described and the performance of the proposed algorithms is compared with that of others. Finally, some concluding remarks are included in Section 6.

2

PRELIMINARIES

AND

NOTATION

2.1 The EREW PRAM Model The parallel shared-memory model is an extension of the sequential model, where the parallel shared-memory model consists of a number of processors, each of which has its own local memory to execute its own program and to access its own data. All processors communicate and exchange data through a common global memory that is also referred as shared memory. Computer organizations are characterized by the multiplicity of the hardware provided to service the instruction and data streams. In this paper, we use the single instruction multiple data stream (SIMD) for the parallel random-access machine (PRAM). That is, all the processors operate synchronously under the control of a common clock and, in each unit of time, all active processors execute the same instruction, but with different data. There are several variations of the PRAM model. The most common three models are the exclusive read exclusive write (EREW) PRAM, the concurrent read exclusive write (CREW) PRAM, and the concurrent read concurrent write (CRCW) PRAM. In the EREW PRAM model, a single memory location cannot be simultaneously accessed by more than one processor. The CREW PRAM model allows simultaneous read instructions to access a single memory location by more than one processor, but not simultaneous writes. In the CRCW PRAM model, both simultaneous read or write instructions are allowed to access a single memory location. The computational power of these three models are quite different; the CRCW PRAM model is the most powerful, then, the CREW PRAM model, and finally, the

LEE ET AL.: PARALLEL COMPUTATION OF THE EUCLIDEAN DISTANCE TRANSFORM ON A THREE-DIMENSIONAL IMAGE ARRAY

205

Fig. 1. An illustration of 2D axes X and Y .

EREW PRAM model. In general, the more powerful the model, the more complex and difficult the implementation. For simplicity, we assume that it takes a unit of time to do either an arithmetic instruction or a shared memory access for any PRAM models. The parallel computation model upon which our algorithms are based is the EREW PRAM model. There are many existing practical parallel processing architectures such as Mesh, Tree, Mesh-of-Trees, Pyramid, Hypercube, Butterfly, etc. Here, the generality exists that the more the practical machine, the higher the communication time between processors. The interested reader can refer to [1], [2], [17] for further discussion on the massively parallel processing models, architectures, and algorithms.

2.2 Definitions Fig. 1 illustrates the directions of the axes in two dimensions X and Y , and Fig. 2 depicts the directions of the axes in three dimensions X; Y , and Z. Note that the axes X and Y are drawn to point in different directions in Figs. 1 and 2. In fact, however, the usual geometric definitions and relations of these axes are the same in both figures. In twodimensional situations, the distance transform (DT) is an operation that converts an image consisting of black and white pixels to an image where each pixel has a value or coordinate that represents the distance or location to the nearest black pixel. We formally define the distance transform as follows: Definition 1. Assume a 2D binary N  N image array can be represented by ai; j ¼ 0 or 1, for i; j ¼ 0; . . . ; N ÿ 1, and ð0; 0Þ is the upper left corner pixel of the image. Let B2D ¼ fðx; yÞ : ax; y ¼ 1g represent the coordinates of the 1-pixels of the binary image. The two-dimensional Euclidean distance transform (2D-EDT) of pixel ai; j with respect to the 1-pixels is computed by d2i; j ¼

min ði ÿ xÞ2 þ ðj ÿ yÞ2 ;

ðx; yÞ2B2D

ð1Þ

for all i; j ¼ 0; . . . ; N ÿ 1: In the three-dimensional case, the DT is an operation that converts a 3D image array consisting of black and white voxels to a 3D image array, where each voxel has a value or coordinate that represents the distance or location to the nearest black voxel. We extend the definition of the two-

Fig. 3. An illustration of Theorem 1.

dimensional Euclidean distance transform to the threedimensional Euclidean distance transform as follows: Definition 2. Assume a 3D binary N  N  N image array can be represented by ai; j; k ¼ 0 or 1, for i; j; k ¼ 0; . . . ; N ÿ 1. Let B3D ¼ fðx; y; zÞ : ax; y; z ¼ 1g represent the coordinates of the 1-voxels of the 3D binary image. The three-dimensional Euclidean distance transform (3D-EDT) of voxel ai; j; k with respect to the 1-voxels is computed by d2i; j; k ¼

min

ðx; y; zÞ2B3D

ði ÿ xÞ2 þ ðj ÿ yÞ2 þ

ðk ÿ zÞ2 ; for all i; j; k ¼ 0; . . . ; N ÿ 1:

ð2Þ

Definition 3. Let V ¼ ðx; y; zÞ be the coordinates of a threedimensional image array of size N 3 voxels, where x; y; z are integers and 0  x; y; z  N ÿ 1. Let ÿz1 ¼ ðx; y; z1 Þ and ÿz2 ¼ ðx; y; z2 Þ be two planes of V , where x; y are integers and 0  x; y  N ÿ 1, z1 and z2 are two constants and z1  z2 . Each plane is a two-dimensional image array. The corresponding Z-coordinates of planes ÿz1 and ÿz2 are z1 and z2 ; the plane ÿz1 is above the plane ÿz2 . Definition 4. Let ÿr ¼ ðx; y; rÞ, where 0  x; y; r  N ÿ 1, and V ¼ fÿr : 0  r  N ÿ 1g. That is, V consists of N 2D N  N image arrays. These N 2D N  N image arrays are denoted by planes ÿr , 0  r  N ÿ 1 (i.e., planes ÿ0 , ÿ1 , . . . ÿNÿ1 ). Definition 5. Let R ¼ ðxR ; yR ; zR Þ be a voxel of V , where ðxR ; yR ; zR Þ is the coordinate of R. Let N R ðÿr Þ or N xR ;yR ;zR ðÿr Þ, 0  r  N ÿ 1 denote the nearest 1-voxel of the voxel R with respect to all 1-voxels at plane ÿr . Let N R or N xR ;yR ;zR denote the nearest 1-voxel of the voxel R with respect to all 1-voxels at V ; N R must be one of the set fN R ðÿr Þ : 0  r  N ÿ 1g. Let jR N R ðÿr Þj and jR N R j denote the Euclidean distance between the voxel R to the nearest 1-voxel N R ðÿr Þ and N R , respectively. Then, N R ¼ fN R ðÿr ÞjN R ðÿr Þ; where

3

min jRN R ðÿr Þjg:

0rNÿ1

ð3Þ

ESSENTIAL CONCEPTS

The following theorem was proposed by Kolountzakis and Kutulakos [19]. For the sake of completeness, we show this theorem as follows and we also use Fig. 3 to illustrate this theorem. Fig. 2. An illustration of 3D axes X, Y , and Z.

Theorem 1: [19]. Let P ¼ ða; jÞ, Q ¼ ðb; jÞ for b < a be two pixels with coordinates ða; jÞ and ðb; jÞ, respectively, located at

206

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS,

VOL. 14,

NO. 3,

MARCH 2003

Fig. 4. An illustration of Theorem 2.

the same column with Q above P . Let N P ¼ ðx; yÞ and N Q ¼ ðz; wÞ be two nearest 1-pixels of pixels P and Q, correspondingly. Then, z  x. That is, N Q is above N P . Proof. This can be shown by observing ðx ÿ aÞ2 þ ðy ÿ jÞ2  ðz ÿ aÞ2 þ ðw ÿ jÞ2

ð4Þ Fig. 5. An illustration of Theorem 3.

and ðz ÿ bÞ2 þ ðw ÿ jÞ2  ðx ÿ bÞ2 þ ðy ÿ jÞ2 :

ð5Þ

By adding (4) to (5), we obtain z  x, which means u t that N Q is above N P . Lee et al. [23], [22] proposed the following theorem: Based on Theorem 2, they proposed several parallel algorithms for computing the Euclidean distance transform for a two-dimensional digital image. Fig. 4 demonstrates their essential idea. Theorem 2. [23], [22]. Let P ¼ ða; jÞ, Q ¼ ðb; jÞ, and R ¼ ðc; jÞ, a > b > c be three pixels with coordinates ða; jÞ, ðb; jÞ, and ðc; jÞ, respectively, located at the same column j with R above Q, and P below Q. Let N P , N Q , N R be the nearest 1-pixels of pixels P , Q, and R, respectively. Suppose N Q ¼ ðx; yÞ, N P ¼ ðs; uÞ, and N R ¼ ðz; wÞ. Then, z  x  s. That is, N Q is below N R , but above N P . Proof. From Theorem 1, we know that x  z and x  s. That is, z  x  s. It also means that N Q is below N R , but u t above N P . Theorem 3. Let Q = ðxQ , yQ , z1 Þ be a voxel with coordinate ðxQ , yQ , z1 Þ located at plane ÿz1 . Let N Q ðÿz1 Þ = ðxN Q ðÿz Þ ; yN Q ðÿz Þ ; z1 Þ, 1 1 N Q ðÿz1 Þ 2 ÿz1 be the nearest 1-voxel of the voxel Q at plane ÿz1 . Let N S ðÿz1 Þ = ðxN S ðÿz Þ ; yN S ðÿz Þ ; z1 Þ, N S ðÿz1 Þ 2 ÿz1 be another 1 1 1-voxel at plane ÿz1 , N S ðÿz1 Þ 6¼ N Q ðÿz1 Þ. Let P = ðxQ ; yQ ; z2 Þ be a voxel with coordinate ðxQ ; yQ ; z2 Þ located at plane ÿz2 , P is under Q in the direction of Z axis. Then, N P ðÿz1 Þ = N Q ðÿz1 Þ. That is, the nearest 1-voxel at plane ÿz1 of the voxel P is the same as the nearest 1-voxel at plane ÿz1 of the voxel Q. Proof. Assume jQN Q ðÿz1 Þj ¼ a; jQN S ðÿz1 Þj ¼ b; jP Qj ¼ c: Because N Q ðÿz1 Þ is the nearest 1-voxel of Q at plane ÿz1 , so jQN Q ðÿz1 Þj < jQN S ðÿz1 Þj; a < b:

According to the assumption, P is under Q, so P Q ? QN Q ðÿz1 Þ and P Q ? QN S ðÿz1 Þ: That is, jP N Q ðÿz1 Þj ¼ ða2 þ c2 Þ

1=2

2 1=2

jP N S ðÿz1 Þj ¼ ðb2 þ c Þ

and

ð6Þ

:

Because a < b, (6) becomes ða2 þ c2 Þ1=2 < ðb2 þ c2 Þ1=2 : That is, jP N Q ðÿz1 Þj < jP N S ðÿz1 Þj:

ð7Þ

Thus, (7) means that the nearest 1-voxel at plane ÿz1 of P t u is N Q ðÿz1 Þ, i.e., N P ðÿz1 Þ ¼ N Q ðÿz1 Þ. Fig. 5 demonstrates the above theorem for more easily getting the essential concepts. Following Theorem 3, we conclude that N Q ðÿz2 Þ ¼ N P ðÿz2 Þ. Theorem 4. Let Q ¼ ðxQ ; yQ ; Þ be a set of voxels located at plane ÿ , 0   N ÿ 1. That is, they are located at each plane in the same direction of Z axis. Let N Q be the nearest 1-voxel of voxel Q . Then, N Q 2 N Qr ðÿr Þ, for 0  ; r  N ÿ 1. It means that the nearest 1-voxel of Q must be one of N Q0 ðÿ0 Þ, N Q1 ðÿ1 Þ, . . . N QNÿ1 ðÿNÿ1 Þ. Proof. Because voxels Q1 , Q2 . . . QNÿ1 are all under Q0 , according to Theorem 3, we get that N Q0 ðÿ1 Þ ¼ N Q1 ðÿ1 Þ, N Q0 ðÿ2 Þ ¼ N Q2 ðÿ2 Þ, . . . N Q0 ðÿNÿ1 Þ ¼ N QNÿ1 ðÿNÿ1 Þ. That is, N Q0 ðÿr Þ ¼ N Qr ðÿr Þ for 0  r  N ÿ 1. We know that N Q0 must be one of the set N Q0 ðÿ0 Þ, N Q0 ðÿ1 Þ, . . . N Q0 ðÿNÿ1 Þ; that is, N Q0 2 N Q0 ðÿr Þ, where N Q0 ðÿr Þ ¼ N Qr ðÿr Þ and 0  r  N ÿ 1. This is also sustained for all Q , so N Q 2 N Qr ðÿr Þ, for 0  ; r  N ÿ 1. Fig. 6 demonstrates the above theorem for more easily getting the essential concepts. Following Theorem 4, we conu t clude that N Q 2 N Qr ðÿr Þ for 0  ; r  N ÿ 1.

LEE ET AL.: PARALLEL COMPUTATION OF THE EUCLIDEAN DISTANCE TRANSFORM ON A THREE-DIMENSIONAL IMAGE ARRAY

207

N Q ðÿ! Þ ¼ N P ðÿ! Þ for r  !  N ÿ 1.

u t

Fig. 6. An illustration of Theorem 4.

Theorem 5. Let Q ¼ ðxQ ; yQ ; Þ, P ¼ ðxP ; yP ; Þ for < be two voxels with coordinates ðxQ ; yQ ; Þ and ðxP ; yP ; Þ, respectively, located at the same Z axis with Q above P . Let N Q ¼ ðxN Q ; yN Q ; aÞ and N P ¼ ðxN P ; yN P ; bÞ be two nearest 1voxels of voxels Q and P , correspondingly. Then, b  a. That is, N P is below N Q .

4

Proof. According to the assumption, we have jQN Q j  jQN P j and jP N P j  jP N Q j: That is, ðxN Q ÿ xQ Þ2 þ ðyN Q ÿ yQ Þ2 þ ða ÿ Þ2  ðxN P ÿ xQ Þ2 þ ðyN P ÿ yQ Þ2 þ ðb ÿ Þ2 :

ð8Þ

By definition, xP ¼ xQ and yP ¼ yQ . Then, ðxN P ÿ xQ Þ2 þ ðyN P ÿ yQ Þ2 þ ðb ÿ Þ2  ðxN Q ÿ xQ Þ2 þ ðyN Q ÿ yQ Þ2 þ ða ÿ Þ2 :

ð9Þ

By adding (8) to (9), we get ða ÿ Þ2 þ ðb ÿ Þ2  ðb ÿ Þ2 þ ða ÿ Þ2 :

ð10Þ

By expanding and elimination, (10) can be rewritten as bð ÿ Þ  að ÿ Þ:

It is easy to deduce that, if N P ¼ N P ðÿr Þ, 0  r  N ÿ 1, then N Q 2 N P ðÿ! Þ, where N P ðÿ! Þ ¼ N Q ðÿ! Þ for 0  !  r. Note that, in Theorems 5 and 6, we conclude that, if N Q (respectively, N P ) is found first, then, the region for finding N P (respectively, N Q ) is restricted from N Q (respectively, N P ) to the end (respectively, beginning) of the image along the Z axis. It is therefore, impossible for NP (respectively, NQ ) to be located in the region from the beginning (respectively, end) of the image to NQ (respectively, NP ) and, thus, search is unnecessary in that region. Suppose we have three voxels R, Q, and P , with R above Q and P below Q along the Z axis. If N Q is found first, then, N R and N P can be found simultaneously in parallel. The region for finding N R is restricted from the beginning of the image to N Q and the region for N P is restricted from N Q to the end of the image along the Z axis. The other item worthy of note is that, if we can find m nearest 1-voxels of m voxels in the current step, then, we can also find 2m nearest 1-voxels in the next step. Following this procedure, after log N steps, all N nearest 1-voxels of N voxels can be found.

ð11Þ

Because ÿ > 0, (11) becomes b  a. It means that N P is below N Q . u t Theorem 6. Let Q = ðxQ ; yQ ; Þ, P = ðxP ; yP ; Þ for < be two voxels with coordinates ðxQ ; yQ ; Þ and ðxP ; yP ; Þ, respectively, located at the same Z axis with Q above P . If N Q = N Q ðÿr Þ, 0  r  N ÿ 1, then, N P 2 N Q ðÿ! Þ, where N Q ðÿ! Þ ¼ N P ðÿ! Þ for r  !  N ÿ 1. Proof. From Theorem 4, because Q and P are located at the same Z axis, so N P 2 N P ðÿr Þ and N P ðÿr Þ ¼ N Q ðÿr Þ for 0  r  N ÿ 1. According to Theorem 5, because P is below Q, so N P is also below N Q . By assumption, if N Q = N Q ðÿr Þ, then N P must be one of the set N Q ðÿr Þ, N Q ðÿrþ1 Þ, . . . N Q ðÿNÿ1 Þ, i . e . , N P 2 N Q ðÿ! Þ a n d

PARALLEL ALGORITHM ON EREW PRAM MODEL

THE

4.1 The Sketch of Algorithm The parallel algorithm consists of two main phases: the plane phase and the vertical phase. During the plane phase, for each plane, we find the nearest 1-voxel for each voxel in the plane. Because each plane is a 2D image array, this is a 2D-EDT problem. Here, we can utilize the parallel algorithm proposed by Lee et al. [22]. At the end of the plane phase, each voxel in each plane has the coordinate of its nearest 1-voxel over the entire plane. In the vertical phase, for each vertical column, we compute the coordinate of the nearest 1-voxel for each middle voxel. We then, recursively compute the middle voxel of the upper half vertical column and the middle voxel of the lower half vertical column. Thus, at the end of the second vertical phase, each voxel in each vertical column obtains the coordinate of its nearest 1-voxel over the entire 3D image array. Based on the obtained coordinate, it is easy to compute the distance from this voxel to its nearest 1-voxel. Assume the coordinate of the nearest 1-voxel of the voxel ai;j;k is ðx; y; zÞ, then, the distance from this voxel to its nearest 1-voxel is qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ði ÿ xÞ2 þ ðj ÿ yÞ2 þ ðk ÿ zÞ2 : 4.2 Detailed Description of Algorithm Assume voxel ai;j;k is stored in processor P Eði; j; kÞ, 0  i; j; k  N ÿ 1. Without loss of generality, assume N ¼ 2n for a positive integer n. Let N i;j;k ðÿk Þ = ðXN i;j;k ðÿk Þ;YN ðÿk Þ ; kÞ i;j;k denote the nearest 1-voxel at plane ÿk of voxel ai;j;k which was computed in the plane phase and stored in processor P Eði; j; kÞ, 0  i; j; k  N ÿ 1.

208

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS,

Let N i;j;Sv ðqÞ ¼ ðEv ðqÞ, Wv ðqÞ, Zv ðqÞÞ, where ðEv ðqÞ, Wv ðqÞ, Zv ðqÞÞ denotes the coordinate of the nearest 1ÿ  N voxel of voxel ai;j;S v ðqÞ , S v ðqÞ ¼ 2N vþ1 ÿ 1 þ 2v q for 0  v  n ÿ 1 and 0  q  2v ÿ 1. S v ð0Þ ¼ N ÿ 1, if v ¼ n. According to Theorem 6, at the vth step, we can simultaneously find q nearest 1-voxel N i;j;Sv ðqÞ of voxels ai;j;Sv ðqÞ in the corresponding region, where 0  q  2v ÿ 1. The region that is required for computing N i;j;Sv ðqÞ is denoted by RGv ðqÞ. Let RGv ðqÞ ¼ F RGv ðqÞ:::LRGv ðqÞ, where F RGv ðqÞ denotes the first index of region RGv ðqÞ and LRGv ðqÞ denotes the last index of region RGv ðqÞ, respectively. Initially, F RG0 ð0Þ ¼ 0 and LRG0 ð0Þ ¼ N ÿ 1. That is, RG0 ð0Þ ¼ 0:::N ÿ 1. Following Theorem 6, the relationship between regions RGv ðqÞ and RGvþ1 ðpÞ for the vth and ðv þ 1Þth steps is described as follows: For 0  v  n ÿ 1 and 0  q  2v ÿ 1, in the ðv þ 1Þth step, if p ¼ 2q then, set RGvþ1 ðpÞ = F RGv ðqÞ . . . Zv ðqÞ, RGvþ1 ðp þ 1Þ ¼ Zv ðqÞ . . . LRGv ðqÞ, where Zv ðqÞ is the Z-coordinate of the nearest 1-voxel of voxels ai;j;S v ðqÞ which is computed in the vth steps. For example, RG1 ð0Þ ¼ F RG1 ð0Þ . . . LRG1 ð0Þ ¼ F RG0 ð0Þ . . . Z0 ð0Þ ¼ 0 . . . Z0 ð0Þ, RG1 ð1Þ ¼ F RG1 ð1Þ . . . LRG1 ð1Þ ¼ Z0 ð0Þ . . . LRG0 ð0Þ ¼ Z0 ð0Þ . . . N ÿ 1. Let h be an index of region RGv ðqÞ. Suppose N i;j;h ðÿh Þ is the nearest 1-voxel found in plane ÿh for voxel ai;j;S v ðqÞ ; that is, the square distance ði ÿ XN i;j;h ðÿh Þ Þ2 + ðj ÿ YN i;j;h ðÿh Þ Þ2 + ðS v ðqÞ ÿ hÞ2 is minimal. Then, set the coordinate ðEv ðqÞ; Wv ðqÞ; Zv ðqÞÞ of the nearest 1-voxel N i;j;Sv ðqÞ of voxel ai;j;Sv ðqÞ to be ðXN i;j;h ðÿh Þ ; YN i;j;h ðÿh Þ ; hÞ. The formal parallel algorithm for computing 3D-EDT is shown in procedure 3D EDT EREW . Procedure 3D EDT EREW Input: A 3D binary N  N  N image array, each data point being represented by ai;j;k ¼ 0 or 1, for 0  i; j; k  N ÿ 1. Output: A 3D N  N  N array, each data point being represented by N i;j;k , for 0  i; j; k  N ÿ 1. 1.0 The Plane Phase 1.1 for k :¼ 0 to N ÿ 1 pardo 1.2 Each processor P Eði; j; kÞ computes N i;j;k ðÿk Þ: This can be done by utilizing the parallel algorithm proposed by Lee et al. [22] on each plane. 1.3 end; 2.0 The Vertical Phase 2.1 for v :¼ 0 to n do 2.2 Each processor computes S v ðqÞ by the following formula.ÿ  N S v ðqÞ ¼ 2N vþ1 ÿ 1 þ 2v q for 0  v  n ÿ 1 and 0  q  2v ÿ 1; S v ð0Þ ¼ N ÿ 1 for v ¼ n. 2.3 for each processor P Eði; j; kÞ, 0  i; j; k  N ÿ 1 pardo 2.4 Let h be an index of region RGv ðqÞ such that the square distance ði ÿ XN i;j;h ðÿh Þ Þ2 + ðj ÿ YN i;j;h ðÿh Þ Þ2 + ðS v ðqÞ ÿ hÞ2 is minimal. Then, set the coordinate ðEv ðqÞ; Wv ðqÞ; Zv ðqÞÞ of the nearest 1-voxel N i;j;Sv ðqÞ of voxel ai;j;Sv ðqÞ to be ðXN i;j;h ðÿh Þ ; YN i;j;h ðÿh Þ ; hÞ. That is,

VOL. 14,

NO. 3,

MARCH 2003

N i;j;Sv ðqÞ ¼ fðXN i;j;h ðÿh Þ ; YN i;j;h ðÿh Þ ; hÞj min ði ÿ XN i;j;h ðÿh Þ Þ2 h¼RGv ðqÞ

þ ðj ÿ YN i;j;h ðÿh Þ Þ2 þ ðS v ðqÞ ÿ hÞ2 g ¼ ðEv ðqÞ; Wv ðqÞ; Zv ðqÞÞ: 2.5 end; 2.6 end; After running the plane phase, each voxel in each plane has the coordinate of its nearest 1-voxel over the entire plane. That is, N i; j; h ðÿh Þ = ðXN i;j;h ðÿh Þ ; YN i;j;h ðÿh Þ ; hÞ. In order to make readers easily understand the above algorithm, we give a detailed demonstration for the vertical phase. First, assume that N ¼ 64. Initially, v ¼ 0, so q ¼ 0; S 0 ð0Þ ¼ 31. Thus, the region that required for computing N i;j;31 is RG0 ð0Þ ¼ 0 . . . N ÿ 1. Let h be an index of region 0 . . . N ÿ 1, N i; j; 31 ¼ fðXN i; j; h ðÿh Þ ; YN i;j;h ðÿh Þ ; hÞj

min ði ÿ XN i;j;h ðÿh Þ Þ2

0hNÿ1

þ ðj ÿ YN i;j;h ðÿh Þ Þ2 þ ðS v ðqÞ ÿ hÞ2 g ¼ ðE0 ð0Þ; W0 ð0Þ; Z0 ð0ÞÞ: At the next iteration, v ¼ 1, so q ¼ 0, S 1 ð0Þ ¼ 15; q ¼ 1, S 1 ð1Þ ¼ 47. Thus, the region that required for computing N i;j;15 is RG1 ð0Þ ¼ 0 . . . Z0 ð0Þ; N i;j;47 is RG1 ð1Þ ¼ Z0 ð0Þ . . . 63, where Z0 ð0Þ is the Z-coordinate of the nearest 1-voxel of voxels ai;j;15 which is computed in the previous iteration. Then, simultaneously find the nearest 1-voxel N i;j;15 and N i;j;47 of voxels ai;j;15 and ai;j;47 , correspondingly. N i;j;15 = fðXN i;j;h ðÿh Þ ; YN i;j;h ðÿh Þ ; hÞj min0hZ0 ð0Þ ði ÿ XN i;j;h ðÿh Þ Þ2 + ðj ÿ YN i;j;h ðÿh Þ Þ2 + ðS v ðqÞ ÿ hÞ2 g = ðE1 ð0Þ; W1 ð0Þ; Z1 ð0ÞÞ and N i;j;47 = fðXN i;j;h ðÿh Þ ; YN i;j;h ðÿh Þ ; hÞj minZ0 ð0Þh63 ði ÿ XN i;j;h ðÿh Þ Þ2 + ðj ÿ YN i;j;h ðÿh Þ Þ2 + ðS v ðqÞ ÿ hÞ2 g = ðE1 ð1Þ; W1 ð1Þ; Z1 ð1ÞÞ. Then, at the next iteration, using the information computed in the previous iteration, we simultaneously find the nearest 1-voxel for four voxels in a vertical. Thus, using a similar technique, we compute the 3D-EDT for the rest of the voxels until all voxels are computed.

4.3 Time Complexity Analysis The time complexity of this algorithm is analyzed as follows: This is basically a 2D-EDT problem during the plane phase. Thus, we can use the parallel algorithm proposed by Lee et al. [22], where each plane independently computes the N i;j;k ðÿk Þ, for 0  k  N ÿ 1. Their algorithm is Oðlog2 NÞ for the computation of the 2D-EDT on a 2D N  N image array. Hence, the plane phase of our algorithm is of time complexity Oðlog2 NÞ. In the vertical phase, N i;j;Sv ðqÞ for each region RGv ðqÞ is computed by the allocated processors, respectively. In each iteration, the region RGv ðqÞ required to for computing N i;j;Sv ðqÞ , only one element overlaps another between two consecutive regions. For example, in ðv þ 1Þth iteration, the regions that are required for computing N i;j;Svþ1 ð0Þ and N i;j;S vþ1 ð1Þ are RGvþ1 ð1Þ ¼ 0 . . . Zv ð0Þ and RGvþ1 ð1Þ ¼ Zv ð0Þ . . . Zv ð1Þ, correspondingly. N i;j;Svþ1 ð0Þ is computed by the processors from P Eði; j; 0Þ to P Eði; j; Zv ð0ÞÞ; N i;j;S vþ1 ð1Þ is computed by the processors from P Eði; j; Zv ð0ÞÞ to P Eði; j; Zv ð1ÞÞ. Only P Eði; j; Zv ð0ÞÞ is used for two regions. Note that the nearest 1-voxel

LEE ET AL.: PARALLEL COMPUTATION OF THE EUCLIDEAN DISTANCE TRANSFORM ON A THREE-DIMENSIONAL IMAGE ARRAY

N i;j;S v ðqÞ of ai;j;Sv ðqÞ will never computed again and there are no memory access conflict during the parallel computation on the EREW PRAM model. For each iteration, the size of region RGv ðqÞ is 0 . . . N ÿ 1 for the worst case. Hence, Steps 2.3 to 2.5 take at most Oðlog NÞ time. The loop contained in Steps 2.1 through 2.6 takes Oðlog2 NÞ time. Thus, the time complexity for the vertical phase is Oðlog2 NÞ. From the above description, we get the following theorem. Theorem 7. The 3D-EDT of a binary image of size N 3 voxels can be computed in Oðlog2 NÞ time on an EREW PRAM model using N 3 PEs.

4.4 Generalized Parallel Algorithm For a 3D N  N  N binary image array, we consider the general case for the 3D-EDT. It is assumed that the binary image array is allocated to an EREW PRAM model with p3 PEs, where 1  p  N. The EREW PRAM model is conceptually arranged in a p  p  p 3D array. Each PE is responsible for the computation of the 3D-EDT problem for N=p  N=p  N=p subimage array. Accordingly, we generalize the parallel algorithm 3D_EDT_EREW in the following way. The parallel algorithm of 3D_EDT_EREW consists of two main phases, the plane phase and the vertical phase. As mentioned before, the plane phase is a 2D-EDT problem for each plane. In procedure 3D_EDT_EREW, this is computed by utilizing the parallel algorithm proposed by Lee et al. [22] on each plane, independently. So, for the general case of the 3D-EDT problem, we must generalize the parallel algorithm proposed by Lee et al. [22] in the following. Suppose a 2D N  N binary image array is allocated in an EREW PRAM model with p2 PEs. Recall, the parallel algorithm proposed by Lee et al. [22] consists of three phases. During the first phase (row phase), for each row, independently compute the column index of the nearest 1-pixel for each pixel in this row. For each row of p processors, there are N=p  N image data to be allocated to these p PEs; each PE is then allocated to N=p  N=p image data. For pixels of each row of image data allocated in processors, the row phase applies the parallel prefix minimum operation for two times, and it runs in OðN=p þ log pÞ time based on the general form of prefix operation. The OðN=pÞ time is required for finding the local nearest column index of the 1-pixel of the allocated subimage; the nearest column index is determined by another Oðlog pÞ time for merging the local nearest column index of p subimages. This should be repeated for N=p times as there are N=p rows of image data allocated in each PE, so the time complexity can be counted as OððN=pÞ2 þ ðN=pÞ log pÞ time. In the second phase (column phase), each PE computes the 2D-EDT for the N=p  N=p subimage data allocated in it. On each PE, utilizing the sequential 2D-EDT algorithm proposed by Lee et al. [22], it takes OððN=pÞ log N=pÞ time for pixels of each column of image to find the local nearest 1-pixel. Then, for each column, compute the coordinate of the nearest 1-pixel of the middle pixel, independently. The pixels of each column are divided into two parts by the nearest 1-pixel of the middle pixel. We then, recursively compute the nearest 1-pixel for the

209

middle pixel of each divided part log p times. Computing the nearest 1-pixel for each middle pixel takes Oðlog pÞ time. The nearest 1-pixel divided the column into two parts. Therefore, it takes Oðlog2 pÞ time to find the nearest 1-pixel for each pixel of a column image. For each column processors, there are N  N=p image data allocated in p7nbsp;PEs, and each PE is allocated to N=p  N=p image data. For pixels of each column of image data in PEs, each PE repeats to compute N=p times as there are N=p columns of image data allocated to each PE, so the time complexity can be counted as fOðN=p log N=pÞ + Oðlog2 pÞgN=p time. That is, it takes OððN=pÞ2 logðN=pÞ þ ðN=pÞ log2 pÞ time complexity for the second phase. At the last phase, each processor computes the results locally. It takes only OððN=pÞ2 Þ time complexity for this phase. Thus, the generalized algorithm of the 2D-EDT takes OððN=pÞ2 logðN=pÞ þ ðN=pÞ log2 pÞ time for a 2D N  N binary image array allocated in an EREW PRAM model with p2 PEs. So, the plane phase of the general case of procedure 3D_EDT_EREW takes OððN=pÞ3 logðN=pÞ þ ðN=pÞ2 log2 pÞ time, for a 3D N  N  N binary image array allocated in an EREW PRAM model with p3 PEs, where 1  p  N. The time complexity is OðN 3 log NÞ, when p ¼ 1; Oðlog2 NÞ when p ¼ N. For each vertical processor, there are N=p  N=p  N image data allocated in p PEs, and each PE is allocated to N=p  N=p  N=p image data. In the vertical phase of the general case of procedure 3D_EDT_EREW, each PE sequentially computes the nearest 1-voxel for each subvertical voxels. It takes OððN=pÞ log N=pÞ time complexity for each subvertical voxels. And, then, merge the p subvertical results. According to the result shown in Theorem 6, only boundary voxels should be merged for each subimage. For each vertical, there are p subimage allocated in p processors. For each vertical processor, we compute the coordinate of the nearest 1-voxel for each middle voxel independently. The voxels of each vertical are then divided into two parts by the nearest 1-voxel of the middle voxel. We then, recursively compute the nearest 1voxel for the middle voxel of each divided part log p times. Computing the nearest 1-voxel for each middle voxel takes Oðlog pÞ time. The nearest 1-voxel divides the vertical into two parts. Therefore, it takes Oðlog2 pÞ time to find the nearest 1-voxel for each voxel of a vertical image. For voxels of each vertical of image data in PEs, each PE repeats to compute ðN=pÞ2 times as there are ðN=pÞ2 verticals of image data allocated to each PE, so the time complexity can be counted as fOðN=p log N=pÞ þ Oðlog2 pÞgðN=pÞ2 time. That is, it takes OððN=pÞ3 logðN=pÞ þ ðN=pÞ2 log2 pÞ time complexity for the vertical phase. The time complexity is OðN 3 log NÞ, when p ¼ 1; Oðlog2 NÞ when p ¼ N. From the above description, we get the following theorem: Theorem 8. The 3D-EDT computation problem of a binary image of size N 3 voxels can be computed in OððN=pÞ3 logðN=pÞ þ ðN=pÞ2 log2 pÞ time on an EREW PRAM model using p3 P E’s.

210

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS,

Fig. 7. Sequential performance comparisons with an increasing number of 1-voxel.

5

IMPLEMENTATION COMPARISON

AND

PERFORMANCE

5.1 The Performance of the Sequential Algorithms We implement the sequential algorithms presented in this paper (denoted as 3DEDT_LEE) and compare the performance with those proposed by Yamada [37] (denoted as 3DEDT_YD) and Saito and Toriwaki [34] (denoted 3DEDT_SCAN). Yamada [37] presented an algorithm for 2D-EDT problem by the distance propagation approach, which propagates the distance of each pixel to each of its eight neighbors, and iteratively propagates the distance for the whole image array pixel until the result converges. For an N  N image array, the Yamada’s algorithm requires N iterations to converge in the worst case. During each iteration, it takes OðN 2 Þ sequential time to scan the whole image array, so the worst case of Yamada’s algorithm takes OðN 3 Þ. Here, we directly extend the Yamada’s distance propagation approach for the 2D-EDT problem for an N  N  N 3D image array. Each voxel propagates the distance information to each of its 26 neighbors and continues iteratively through the whole image array until the result converges. OðNÞ iterations are required to converge the result in the worst case, so the time complexity of the Yamada’s 3D-EDT algorithm is OðN 4 Þ. Paglieroni [28], [29] proposed the algorithm and architecture for the 2D-EDT problem by the scan approach. For an N  N 2D image array, Paglieroni’s algorithm consists of two main passes, the row scan, and the column scan. For each row scan, the row is scanned right (forward) and left (reverse) and, then, the results are merged. Likewise, for each column scan, the column is scanned down (forward) and up (reverse) and, then, the results are merged. After the column scan, the 2DEDT results are obtained. Using the time complexity of Paglieroni’s algorithm, it still takes OðN 3 Þ time. However, a great advantage of Paglieroni’s algorithm is the ability to compute the 2D-EDT dimension by dimension independently. The benefit of this property is that their algorithm is easily parallelized and implemented and, thus, also suited for hardware implementation and real-time applications. Saito and Toriwaki [34] presented several EDT algorithms based on the scan approach for an n-dimensional image array. For the

VOL. 14,

NO. 3,

MARCH 2003

Fig. 8. Sequential performance comparisons with the increasing size of the voxel image array.

3D-EDT problem, Saito and Toriwaki’s EDT algorithm also takes OðN 4 Þ time complexity. For a 128  128  128 3D image array, we generate V N  V N 1-voxels for the performance test, where the 1-voxel distribution is uniform. Fig. 7 shows the performance comparisons for the 3D-EDT sequential algorithms with the increasing number of 1-voxels. The programs are running on the platform (Intel Pentium III 850 CPU, 512MB RAM, MS Windows 98SE). As we can see from Fig. 7, program 3DEDT_YD is very time-consuming when the number of 1voxels is sparse because the propagation distance is long and requires more iterations. By increasing the number of 1voxels, the running time of program 3DEDT_YD converges to a stable time. The running time of program 3DEDT_YD is very sensitive to the number and the distribution of 1-voxels. The program 3DEDT_SCAN runs much faster and more stably than program 3DEDT_YD. Interestingly however, 3DEDT_LEE shows a better performance than the other two. Fig. 8 shows the performance comparisons for the 3D-EDT sequential algorithms with the increasing size of the voxel image array. No matter what size N is, the program 3DEDT_LEE always runs fastest. Furthermore, the run time ratio rtt=(3DEDT_SCAN/3DEDT_LEE) is increased in proportion to N. For example, when N=64, rtt=2.2; when N=128, rtt=3.658; when N=256, rtt=6.69.

5.2 The Performance of the Parallel Algorithms Here, we use the same platform (Intel Pentium III 850 CPU, 512MB RAM, MS Windows 98SE) to emulate the EREW PRAM model computer. For a 256  256  256 3D voxel image array, we partition the image array into p subimage arrays, where p is the number of emulated EREW PRAM model processors. For each subimage array, the computation loading is different for each processor and it is dependent on the particular 1-voxel pattern that is loaded. The running time on the EREW PRAM model computer is bounded by the worst computation time for each subimage array. So, for each phase, we sum up the worst computation time of each processor and, then, compare the result with the optimal speed up curve. Fig. 9 shows the parallel algorithms running time proposed by this paper, including both the optimal speed up time (OPT) and the emulated

LEE ET AL.: PARALLEL COMPUTATION OF THE EUCLIDEAN DISTANCE TRANSFORM ON A THREE-DIMENSIONAL IMAGE ARRAY

Fig. 9. Emulate the performance of the parallel algorithms on the EREW PRAM model.

EREW PRAM model computer running time. From Fig. 9, we see that both times are very close. In addition, we implement the 3D-EDT programs on an IBM SP2 using MPI (Message Passing Interface) to obtain the speed up curve of the proposed parallel algorithms. Because the IBM SP2 system is a nonshared memory architecture, however, the data exchange time exceeds the processor computation time. The results are shown in Fig. 10.

6

CONCLUDING REMARKS

In this paper, we have presented a parallel algorithm to compute the three-dimensional Euclidean distance transform on the EREW PRAM model. We first derive some important geometry relations and properties between parallel planes. Based on the derived properties, we effectively reduce the computational time complexity for the transform. The parallel algorithm developed on the EREW PRAM model using N 3 PEs takes Oðlog2 NÞ time complexity for an N  N  N binary image array. A generalized parallel algorithm for the 3D-EDT is also proposed and it runs in OððN=pÞ3 logðN=pÞ + ðN=pÞ2 log2 pÞ time for an N  N  N binary image array on the EREW PRAM model computer using p3 PEs, where 1  p  N. We have implemented the proposed algorithms sequentially and compared the performance with those proposed by Yamada and Toriwaki. Based on the comparison, the algorithm presented in this paper exhibits a superior performance to the other two algorithms. We also implemented the two respective parallel algorithms. One is running on the emulated EREW PRAM model computer, the other is running on an IBM SP2. The former demonstrates near optimal speed up. The latter, however, takes too much data exchange time. In future work, therefore, a meaningful contribution could consist of techniques to improve the overhead of data exchange time on the nonshared memory parallel architectures.

ACKNOWLEDGMENTS This work was partly supported by the National Science Council under Contract No. NSC-89-2213-E-011-108/89-

211

Fig. 10. MPI (Message Passing Interface) program running on IBM SP2.

2213-E-267-002. Part of this work was carried out when the second author was visiting the Computer Science Department at the University of Dayton, Ohio, July-September 2000.

REFERENCES [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17]

S.G. Akl, The Design and Analysis of Parallel Algorithms. N.J.: Prentice-Hall, 1989. S.G. Akl and K.A. Lyons, Parallel Computational Geometry. N.J.: Prentice-Hall, 1993. C. Arcelli and G. Sanniti di Baja, “A Width-Independent Fast Thinning Algorithm,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 7, pp. 463-474, 1985. C. Arcelli and G. Sanniti di Baja, “Computing Voronoi Diagrams in Digital Pictures,” Pattern Recognition Letters, vol. 4, pp. 383-389, 1986. H. Blum, “A Transformation for Extracting New Descriptors of Shape,” Models for the Perception of Speech and Visual Form, Mass.: MIT Press, W. Wathen-Dunn, ed., pp. 362-380, 1967. G. Borgefors, “Distance Transformations in Arbitrary Dimensions,” Computer Vision, Graphics, and Image Processing, vol. 27, pp. 321-345, 1984. G. Borgefors, “Distance Transformations in Digital Images,” Computer Vision, Graphics, and Image Processing, vol. 34, pp. 344371, 1986. G. Borgefors, “Applications of Distance Transforms,” Aspects of Visual Form Processing, A. Arcelli et al., eds., pp. 83-108, 1994. G. Borgefors, “On Digital Distance Transforms in Three Dimensions,” Computer Vision, Graphics, and Image Processing, vol. 64, pp. 368-376, 1996. L. Chen and H.Y.H. Chuang, “A Fast Algorithm for Euclidean Distance Maps of a 2-D Binary Image,” Information Processing Letters, vol. 51, pp. 25-29, 1994. L. Chen and H.Y.H. Chuang, “An Efficient Algorithm for Complete Euclidean Distance Transform on Mesh-Connected SIMD,” Parallel Computing, vol. 21, pp. 841-852, 1995. P.E. Danielsson, “Euclidean Distance Mapping,” Computer Vision, Graphics, and Image Processing, vol. 14, pp. 227-248, 1980. A. Fujiwara, T. Masuzawa, and H. Fujiwara, “An Optimal Parallel Algorithm for the Euclidean Distance Maps of 2-D Binary Images,” Information Processing Letters, vol. 54, pp. 295-300, 1995. A. Fujiwara, M. Inoue, T. Masuzawa, and H. Fujiwara, “A Parallel Algorithm for Weighted Distance Transforms,” Proc. 11th Int’l Parallel Processing Symp., pp. 407-412, 1997. T. He and A. Kaufman, “Collision Detection for Volumetric Objects,” Proc. 1997 IEEE Visualization Conf., pp. 27-34, 1997. T. Hirata and T. Kato, ”A Unified Linear-Time Algorithm for Computing Distant Maps,” Information Processing Letters, vol. 58, pp. 129-133, 1996. J. Ja´Ja´, An Introduction to Parallel Algorithms. Addison-Wesley, 1992.

212

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS,

[18] J.F. Jenq and S. Sahni, “Serial and Parallel Algorithms for the Medial Axis Transform,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 14, pp. 1218-1224, 1992. [19] M.N. Kolountzakis and K.N. Kutulakos, “Fast Computation of the Euclidean Distance Maps for Binary Images,” Information Processing Letters, vol. 43, pp. 181-184, 1992. [20] Y.H. Lee and S.J. Horng, “Fast Parallel Chessboard Distance Transform Algorithm,” Proc. Int’l Conf. Parallel and Distributed Systems, pp. 488-493, 1996. [21] Y.H. Lee and S.J. Horng, “The Equivalence of the Chessboard Distance Transform and the Medial Axis Transform,” Proc. 10th Int’l Parallel Processing Symp., pp. 424-428, 1996. [22] Y.H. Lee, S.J. Horng, T.W. Kao, F.S. Jaung, Y.J. Chen, and H.R. Tsai, “Parallel Computation of Exact Euclidean Distance Transform,” Parallel Computing, vol. 22, pp. 311-325, 1996. [23] Y.H. Lee, S.J. Horng, T.W. Kao, and Y.J. Chen, “Parallel Computing Euclidean Distance Transform on the Mesh of Trees and Hypercube Computer,” Computer Vision and Image Understanding, vol. 68, pp. 109-119, 1997. [24] Y.H. Lee and S.J. Horng, “Equivalence of the Chessboard Distance Transform and the Medial Axis Transform,” Int’l J. Computer Math., vol. 65, pp. 165-177, 1997. [25] Y.H. Lee and S.J. Horng, “Optimal Computing the Chessboard Distance Transform on Parallel Processing Systems,” Computer Vision and Image Understanding, vol. 73, pp. 374-390, 1999. [26] Y.H. Lee, S.J. Horng, and J. Seitzer, “Fast Computation of the 3-D Euclidean Distance Transform on the EREW PRAM Model,” Proc. 2001 Int’l Conf. Parallel Processing, pp. 471-478, 2001. [27] J. Mayer and G.G. Langdon, “Post-Processing Enhancement of Decompressed Images Using Variable Order Bezier Polynomials and Distance Transform,” IEEE Proc. 1998 Data Compression Conf., p. 561, 1998. [28] D. Paglieroni, “A Unified Distance Transform Algorithm and Architecture,” Machine Vision and Applications, vol. 5, pp. 47-55, 1992. [29] D. Paglieroni, “Distance Transforms: Properties and Machine Vision Applications,” CVGIP: Graphical Models and Image Processing, vol. 54, pp. 56-74, 1992. [30] Y. Pan, M. Hamdi, and K. Li, “Euclidean Distance Transform for Binary Images on Reconfigurable Mesh-Connected Computers,” IEEE Trans. Systems, Man, and Cybernetics, vol. 30, pp. 240-244, 2000. [31] A. Rosenfeld and J.L. Pfalz, “Sequential Operations in Digital Picture Processing,” J. ACM, vol. 13, pp. 471-494, 1966. [32] A. Rosenfeld and J.L. Pfalz, “Distance Function on Digital Pictures,” Pattern Recognition, vol. 1, pp. 33-61, 1968. [33] A. Rosenfeld and A.C. Kak, Digital Picture Processing. New York: Academic, 1982. [34] T. Saito and J. Toriwaki, “New Algorithms for Euclidean Distance Transformation of an n-Dimensional Digitized Picture with Applications,” Pattern Recognition, vol. 27, pp. 1551-1565, 1994. [35] O. Schwarzkopf, “Parallel Computation of Distance Transforms,” Algorithmica, vol. 6, pp. 685-697, 1991. [36] A. Utsumi, T. Miyasato, F. Kishino, J. Ohya, and R. Ryohei, “Multiple-Camera Based Hand Pose Estimation Method Using Distance Transformation,” J. Institute of Image Information and Television Engineers, vol. 51, pp. 2116-2125, 1997. [37] H. Yamada, “Complete Euclidean Distance Transformation by Parallel Operation,” Proc. Seventh Int’l Conf. Pattern Recognition, vol. 1, pp. 69-71, 1984. [38] Q.Z. Ye, “The Signed Euclidean Distance Transform and Its Applications,” Proc. Ninth Int’l Conf. Pattern Recognition, pp. 495499, 1988.

VOL. 14,

NO. 3,

MARCH 2003

Yu-Hua Lee received the BS degree in electrical engineering from Chung Yaun Christian University, Chung-Li, Tao-Yuan, Taiwan, Republic of China, the MS degree in electrical engineering from National Taiwan Institute of Technology, Taipei, Taiwan, in 1992 and 1994, respectively. Currently, he is an associate researcher in the Information and Communication Research Division, Chung-Shan Institute of Science and Technology, Lung-Tan, Tao-Yuan, Taiwan. His research interests include image processing, parallel computing, parallel algorithms, VLSI digital system design, and Digital Signal Processing (DSP). Shi-Jinn Horng received the BS degree in electronics engineering from National Taiwan Institute of Technology, the MS degree in information engineering from National Central University, and the PhD degree in computer science from National Tsing Hua University in 1980, 1984, and 1989, respectively. He has published more than 100 research papers and received many awards. Currently, he is a full professor in the Department of Computer Science and Information Engineering, National Taiwan University of Science and Technology. His research interests include VLSI design, multiprocessing systems, and parallel algorithms. For more information, visit Dr. Horng’s homepage: http://gold.ee.ntust.edu.tw. Jennifer Seitzer received the PhD degree in June of 1997 for her work in theoretical artificial intelligence, and did postdoctoral study at Purdue University during the Summer of 1998 in computer networking. She is an assistant professor in the Computer Science Department at the University of Dayton. Her current research and study involves both intelligent systems and computer networking for which she has been funded by several organizations including the US National Science Foundation and the United States Air Force.

. For more information on this or any computing topic, please visit our Digital Library at http://computer.org/publication/dlib.