Fast discrete cosine transform pruning - Signal ... - Semantic Scholar

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 42, NO. I , JULY 1994

181 . _G. H. Golub. M. Heath. and G. Wahba. “Generalized cross-validation as a method for choosing a good ridge parameter,” Technomerrics, vol. 21, pp. 215-223, May 1979. 191 D. A. Girard. “A fast ‘Monte-Carlocross-validation’ Drocedure for large least squares problems with noisy data,” NumeriJche Murhemarik, no. 56, pp. 1-23, 1989. [IO] M. F. Hutchinson, “A stochastic estimator for the trace of the influence matrix for Laplacian smoothing splines,” Comm. SratisticsSimukzrion Comput., vol. 19, no. 2, pp. 433450, 1990. [ I l l S. J. Reeves and K. M.-Perry, ‘‘A stopping rule for iterative image restoration with constraints,” in Proc. IEEE 1993 Inr. Symp. Circ. Sysrems, pp. 41 1414.

1833

11. DIRECTFCT COMPUTATION The most commonly used discrete cosine transform is the even type DCT-11, which for a sequence x , ~ t?. = 0.1. ..., iV - 1 is defined as [2] .v - 1

+

x(2n l ) k 2 ,AT _.

.-

.

k = 0 . 1 ..... x - 1 ( 1 )

where S = 2”‘ and ~k = 1 / & for k = 0 and ~k = 1 otherwise. If we assume, for simplicity, that the scaling factors are absorbed into -1-k and apply the input mapping [9] i,,= . r l r L , .i. L - , L - - l

= .rl,,+l,

ti

.

= 0 , l . . . . *V/2

-

1

(2)

Fast Discrete Cosine Transform Pruning (1)

IS

rewritten as

Athanassios N. Skodras

Abstract-A new fast pruning algorithm is proposed for computing the

IV,lowest frequency components of a Iength-~Vdiscrete cosine transform, where AVOis any integer less than or equal to N,and AV = 2“‘. The computational complexity of the developed algorithm is lower than any of the existing algorithms, resulting in significant time savings. In the special case that AVO= 2 “ Q , the required number of multiplications and 1, respectively. additions is $rnoL%’and ( n z o l)‘%-+( t i n o - 2).V”

+

+

I. INTRODUCTION The discrete cosine transform (DCT) is the central part of many speech and image coding applications, and hence, it is important to have efficient methods for computing it. The first major breakthrough in this area came with the development of the fast discrete cosine transform in 1974 [I]. Later, alternate indirect and direct approaches to compute the DCT appeared in the literature [2]. The most recently proposed algorithms belong to the second category [3], [4]. Their structure is similar to a decimation-in-frequency SandeTukey fast Fourier transform (FFT) and computation of the DCT is performed from two identical lower order DCT’s. The decomposition is repeatedly applied until eventually only trivial two-point DCT’s remain. Common for all these algorithms is the fact that that they assume the same number of input and output points. However, the most useful information about the signal is kept by the low-frequency DCT components. Therefore, only low-frequency DCT components have to be computed. Although many algorithms exist for the computation of the pruning FFT [5]-[7], only one FCT pruning algorithm has recently been proposed by Wang [8]. In this correspondence, a new pruning fast discrete cosine algorithm that computes any number (not only powers of two) of lowfrequency components is derived. The fast discrete cosine transform (FCT) computation is presented in Section 11, and the pruning algorithm is analyzed in Section 111. Computational saving is evaluated in Section IV, and some comparisons are reported in Section V.

By decomposing (3) into even- and odd-indexed frequency components and taking into account the trigonometric identity cos(A+B) = 2 . cos .-I . cos B - cos( ; I- D),we end up with two !\*/2-point DCT’s [4]. Further decomposition yields a decimation-)-frequency fast discrete cosine transform algorithm similar to the corresponding FFT algorithm. The required additions (post-additions) are calculated in a regular manner after the bit-reversal operation in ni - 1 stages [4], where I I I = log, AV.There are :Y/2 butterflies in each of the nt stages. Each butterfly needs one multiplication and two additions to be calculated. As a consequence, for the computation of the :\:-point FCT, 11.v multiplications and 0.v additions/subtractions are required, where

and (k\

=

(tIiS)

+

(;-,,IS s + > -

:

1 = -nzS

-

s + 1.

(5)

The term in the first set of parentheses in ( 5 ) is due to the additions needed to calculate the butterflies, and that in the second set of parentheses is due to the post additions. The migration of the bit-reversal operation to the beginning of the algorithm results in a faster in-place computation due to the input-reordering capability [9] without affecting the computational complexity given above. Additionally, it provides the possibility to develop an efficient pruning algorithm as it will be analyzed in the following section. The flow graph of the algorithm for !\- = 16 is given in Fig. 1 (there is no difference between broken and solid lines at present). FCT 111. THE PRUNING

Manuscript received September 9, 1992; revised October 14, 1993. The associate editor coordinating the review of this paper and approving it for publication was Prof. Henrik V. Sorensen. The author is with the Electronics Laboratory, University of Patras, Patras, Greece. IEEE Log Number 9401280.

The DCT possesses a high-energy compaction property, which is superior to any known transform with a fast computations] algorithm. It is this that makes the DCT the most appropriate transform for lossy data compression by simply retaining -VO< coefficients in the transform representation [2].

1053-587X/94$04.00 0 1994 IEEE

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 42, NO. I , JULY 1994

1834

XO

X8 x4 x1 2 x2 x1 0

'6 x14 X 1 X 9

x5 x13 x3 x1 1

xi7 x15

Fig. 1 .

Flow graph representation of a 16-point FCT

stage is equal to

TABLE I NUMBER OF COMPLETE AND INCOMPLETEBUTTERFLIES PER STAGE

- If q = 1, then both complete and incomplete butterflies exist, as for the second stage of Fig. 1. The number of incomplete and complete butterflies at this stage equals, respectively Let us suppose that only the first low-frequency components are needed, where YO is any integer less than or equal to S and not necessarily a power of 2. The flow graph for the computation of -1-0 = 3 out of S = 16 points is shown in Fig. 1 (solid lines). It is evident that a number of operations could be saved if only the first A17~ points were calculated. Pruned Flow Graph Analysis: At each stage s, there are B9 blocks or bunches of butterflies and b , butterflies per block, where

Bs = 2'"-'-', Let q and

T

b, = 2".

s

= 0 , l . . . . .nj

- 1.

(6)

be the quotient and remainder of Y o / b , , respectively, or

A number of complete butterflies (ncl>)and/or incomplete butterflies ( n i l , ) has to be computed at each stage. The exact count of complete and incomplete butterflies depends on the relation between the points to be computed ( ~ V Oand ) the number of butterflies per block ( b s ) . -

If q > 1, then only complete butterflies exist, the number of which equals the total number of butterflies per stage, or

This is the case for the first stage of Fig. I . - If q < 1, then only incomplete butterflies exist, as for the last two stages of Fig. 1. The number of incomplete butterflies per

The number of complete and incomplete butterflies per stage is summarized in Table I.

The Pruning Algorithm: The above-described analysis of the pruning FCT flow graph leads to the development of an efficient algorithm for its implementation (Fig. 2). Although the structure of this algorithm looks similar to that of the FFT algorithm (triple nested loop structure), it has two remarkable differences: (i) The innermost loop is actually a double loop: one for calculating the complete butterflies and the other for calculating the incomplete butterflies at each stage. (ii) All the butterflies of each block are calculated before proceeding to next block's butterfly calculations. This is due to the fact that all butterflies of the same block are multiplied by the same weighting factor (cosine). The arrangement of the cosines became very abnormal after migrating the bit-reversal stage to the beginning of the algorithm. Fig. 3 illustrates all the cosines needed for the computation of an FCT of length AV = 16. It is seen that at each stage s, s = 0 , l . . . . ,ni - 1, the first weighting factor is C k ,where k = 2". The second factor of the same stage is derived from the first one by adding N / 2 to its argument, namely, the second factor is C"+"'*. The next two weighting factors are calculated from the first two by adding X / 4 to their arguments. This procedure is repeated until all coefficients at each stage have been calculated. Careful examination of Fig. 3 leads to a recursive algorithm for the calculation of these cosines (Fig. 4). All cosine values are put in order in a look-up table from where they are retrieved one after the other during the computation. The length of the look-up table is equal to AV - 1.

1835

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 42, NO. 7, JULY 1994

I*

Program for generating the look-up table containing the cosine weighting factors

/*Pruning FCT Algorithm NO=any integer less than or equal to N

N=2"m,

*/

Author: A. N. Skcdras, University of Patras. June 1992

*/ P-fW)

/* Input parameters: N. NO, m, xu, ctU t /* Output parameters: xu */ unsigned int i,j,k,l,p,q,r,Bs, bB,bls,ble.ncb,nib double a,c;

*/

lut-Of-COS() { /* Input parameters: N, NO, m. PI02 = 1.570796327 */ /* Output parameters: ctg / unsigned int e,i,k,l,p.t,inc.len.mml ,et[1024];

/*

et0 is a temporary exponent (cosine arguments) table an is the cosine table

=/

p=O; mml =m-1; e = l ;

BS = N; bB

=

bls = 1; p

=

0;

for(k=O; k l ; e p = s > > l ; loops=loops l ) ; j + + ) { I= ep; for(i=l; (i