View PDF

6 downloads 0 Views 23KB Size Report
Dietmar Saupe, “Algorithms for random fractals,” The Science of Fractal Images, Heinz-Otto Peitgen, and. Dietmar Saupe, Eds., pp. 71-136, Springer-Verlag: ...
Physics and image data compression

Adolph E. Smith, Michael Gormish, and Martin Boliek Ricoh California Research Center 2282 Sand Hill Road, Suite 115 Menlo Park, CA 94025 Tel: (415) 496-5723 Fax: (415) 854-8740

ABSTRACT We show how several basic image compression methods (predictive coding, transform coding, and pyramid coding) are based on self-similarity, and a 1/f2 power law. Phase transitions often show self-similarity which is characterized by a spectral power law. Natural images often show a self-similarity which is also characterized by a power law spectrum which is near 1/f2. Exploring physical analogs leads to greater unity among current methods of compression and perhaps lead to improved techniques. 1. INTRODUCTION Because of limited capacity in transmission and storage a great deal of work has been done on image data compression methods in the past. These methods were first developed by examining pixel statistics, specifically entropy and correlation. Later methods made use of studies of the Human Visual System (HVS) to determine how more lossy coding could be achieved. An important function of animal vision is to represent information as concisely as possible. The goal of image compression algorithms is the same although conciseness is desirable for storage rather than for understanding. This similarity suggests that comparison of the physics in vision and data compression may be useful. One of the most interesting developments in physics in the past few decades has been in phase transitions, which is the study of how matter passes from one state to another. Two examples are the change from magnetic to non-magnetic and the change from normal conductivity to superconductivity. In their study of the variables which control the change, physicists have learned that that there is a great similarity in the laws describing the change. In particular, near the critical point, such as the superconducting temperature, these variables obey power laws. When systems obey power laws there is an invariance with respect to multiplicative scaling and the object has the same form although its size is changed. Essentially there is scale invariance. A well known example is fractals. These shapes have a form which is independent of size. We show how the notion of self-similarity unifies the compression schemes of predictive coding, transform coding, some fractal compression methods, and pyramidal coding. 2. EDGES AND PHASE TRANSITIONS A rapid local change in image amplitude or luminance is considered to be an edge in an image. Edges in an image are really two dimensional representations of some rapid change in the scene, such as changes in luminescence or color, commonly caused by changes from one object to another. Edges in color images are more complicated but are still a fundamental attribute. Figure 1 shows an edge in an image on the left and a phase transition for H2O changing from liquid to gas. The edge graph shows values of pixel intensities along a line segment drawn in Lena. Almost any segment of sufficient length from Lena will show a similar change somewhere. The phase transition graph shows the density of water as the change from liquid to gas is made by increasing the pressure at constant temperature. We equate pixel luminance with density and temperature with position. For the edge displayed, the pixels on the right can be modeled by one set of pixel statistics while the pixels on the left of the edge obey a different statistical model. It is also clear that few pixels are in the transition region, thus one model can be used almost perfectly until the transition and a different model is accurate after the transition.

Just as it is important to know the boiling temperature for water, it is extremely valuable to know the location of an edge for image understanding or compression. The analogy can be extended by using a two dimensional edge for the image, and examining pressure and temperature for H2O. The phase transition now occurs at a set of pairs (pressure,temperature) which divides liquid from solid, or solid from gas, or gas from liquid. This set of pairs defining the location of a phase transition corresponds to a set of locations (x,y) which define an edge in an image. A “triple point” occurs in an image where three objects overlap. An example of a triple point from Lena can be seen in Figure 2. To make the analogy complete we must determine the critical point for an edges in an image. In the case of water there is a critical temperature (374°C) above which there is no difference between liquid and gas. The “edge” on the pressure-temperature phase diagram ends. In regions near the end of this edge, the critical point, water behaves as many critical systems do. There is a power law behavior and bubbles of all scales form. With the analogies we have drawn so far it is hard to find a critical point for an edge in an image. A critical point would logically be a place where an edge ends (without touching another edge). This is unusual for images since edges tend to arise from boundaries of objects and are thus closed curves. However, critical points can arise from image processing, either natural visual processing or digital processing. We have not explicitly identified the critical points for an image but when contrast and brightness are adjusted in an image edges appear and disappear. At high contrast settings a processed image can appear to be made solely of black and white pixels, and as brightness is changed edges can disappear. As this occurs regions of the image change form black to white. These changes from black to white can look very much like the displays of Ising models near their critical point. While we have not explicitly defined the critical point for an image, we believe critical points do exist, at least in the processing operations. It is important to note that while water has only one critical point an image could have a different critical point associated with each edge. A Luminance Edge

Density

Pixel Intensity

H2O Under Constant Temperature

Pressure

Horizontal Position

Fig. 1 Phase Transitions in Images and Physics H20 Phase Diagram

Pressure

Critical Point Solid

Liquid

Triple Point Vapor

Temperature

Fig. 2 Triple Points in Images and Water

In physical systems, behavior is often specified by changes of state (phase transitions) such as melting points, boiling points, normal to superconducting temperatures, etc. These are characterized by rapid changes with respect to temperature, magnetic field strength, etc. In these phase transitions, the systems show a transition from locality to globality. For example, in the transition from normal conduction to superconductivity, the electron behavior changes from local random scattering to a collective motion of all the electrons in the material. An edge is local information. In visual perception, local visual information is combined to form a global representation. The global structure of the image emerges out the local interaction of the edge elements. This interaction may be analogous to the physical situation. 3. SELF-SIMILARITY AND 1/F 2 One of the purposes of vision is to ideally have the eye produce a description of reality that is independent of the distance from the eye. We need to have invariant representations of objects in the environment in spite of changing sensations. 1 This would mean that a scale invariant representation is a goal for human vision. A scale invariant description necessarily implies a visual sensitivity which is 1/f.2 This can be seen by considering a surface with image energy E between frequencies f and nf and viewed at distance d. If the distance from the eye to the scene, d, is changed to ad, the energy is shifted in distribution between spatial frequencies af and anf. If the energy at frequency f is E(f) then E(f) = 2 π f G(f), where G(f) is the power spectrum. In order to have constant energy received by the eye at different distances the total energy in these frequencies ranges must be a constant. Knill expresses this as nf

anf

⌠ ⌡ 2 π f G(f) df = ⌠ ⌡ 2 π f G(f) df = K . f

(1)

af

Therefore G(f) =

const f 2.

(2)

In other words, the amplitude spectrum falls off as 1/f, and the power spectrum as 1/f2. Edges in natural images are often self-similar since they do not become smoother with scaling. In fact a one dimensional symmetric edge around the origin, which can be represented by h(x) = sgn(x) = |x|/x, has a Fourier transform given by H(f) = -i/2 π f. Thus edges have a power spectrum of |H(f)|2 = a/f2. Recent work suggests that the ensemble of natural images is fractal like with a power spectrum that is proportional to 1/f β with 2 < β < 4. Knill et. al. investigated the power spectrum of images to which the eye is optimally tuned.3 They reported that the eye is tuned to maximal sensitivity in the range 2.8 < β< 3.6. These values of β, while not identical, are consistent with the 1/f amplitude of natural images. Thus we believe the eye has adapted to respond to the 1/f like spectrum found in nature. We now show how the 1/f amplitude spectrum has been utilized in image compression methods. 4. BASIC COMPRESSION CODING Image coding methods can be classified in a variety of ways. One grouping into four categories is: pulse code modulation (PCM), predictive coding, transform coding, and quantization. Full implementations usually combine ideas from two or more of these groups.4 We show how several image compression techniques which have been developed separately provide compression by utilizing self-similarity, phase transitions, or 1/f2 power spectrums, which we have previously connected.

4.1. Predictive coding Predictive coding uses a set of known pixels (previously transmitted) to predict the current pixel. Provided that this prediction is good the difference between the prediction and the actual pixel value can be transmitted or stored more efficiently than the actual pixel intensity. The simplest predictive coding is horizontal differential pulse code modulation. Differential pulse code modulation uses a model of pixel statistics to determine prediction coefficients in order to minimize the squared error of the pixel prediction. A common model of pixel statistics is the assumption that the pixel intensity has autocorrelation function given |m| |n| by r(m,n) = σ2ρx ρy , where m and n are the horizontal and vertical distances between pixels, ρx and ρy are constants between 0 and 1, and σ2 is the mean square pixel intensity. For one dimensional encoding this can be simplified to r(m) = σ2ρ|m|. If the pixel intensities are assumed to have a Gaussian probability density function then this is a first order GaussMarkov model and the best predictor is the previous pixel. This autocorrelation model, without the Gaussian assumption has been used in many prediction models.4 Since the Fourier transform of the autocorrelation function yields the power spectrum, if we assume an image is sampled from a continuous function with autocorrelation r(m) = σ2ρ|m| then the image comes from a continuous function with power spectrum PS given by PS = FourierTransform[ σ2ρ|m|] =

2 σ2 2πf 2 -ln ρ ( 1 + ( ) ) -ln ρ

=

A . B + f2

(3)

For high frequencies this approaches a 1/f2 power spectrum. Note that no image can have a perfect 1/f2 spectrum at low frequencies since this would require infinite power. Improved image models employ an autocovariance function which is nonseparable but often circularly symmetric like r(d) = σ2ρd, where d is the distance between two pixels. This model leads to a 1/f 2 power spectrum for a one dimensional sampling plane in any direction. We see that predictive coding is related to the self-similar 1/f2 power spectrum found in nature which in this case has been utilized by examining pixel statistics. 4.2. Transform coding Transform coding methods such as the now ubiquitous Discrete Cosine Transform (DCT) take advantage of spectral 1/f properties in two ways: data compaction and quantization. One key component of transform compression methods is energy compaction. Transforming several blocks of pixels can yield some transform coefficients with large variances and some with near zero variance. A transform can be chosen depending on the specific image to concentrate energy in the minimum number of coefficients. The Karhunen-Loéve Transform (KLT) is a data dependent transform which will maximize the image energy in the first coefficients. However, it is possible to use a transform which does not depend on the image data because almost all images have a spectrum close to 1/f. The DCT is very successful because it concentrates image energy into a few coefficients almost as well as the KLT but it is not necessary to adapt it for each image. The low frequency coefficients always contain more energy than high frequency coefficients as would be expected for a 1/f2 spectrum. The lossy part of transform compression methods is due to quantization. In the JPEG standard, which uses the DCT, each 2-dimensional frequency coefficient can be quantized by a different amount. Human visual system response studies have shown the eye to be much more sensitive to low frequencies, and virtually unable to detect image fluctuation above 60 cycles per degree of foveal vision. 4 The typical quantization tables in JPEG use this information by quantizing the higher frequencies much more than lower frequencies. In the examples of quantization tables given in the JPEG standard5 some high frequency coefficients are divided by 120 while the lowest frequency horizontal coefficient is divided by 11. JPEG treats block DC coefficients in a special manner by performing predictive coding. This allows even greater precision when quantizing the single most important coefficient. Predictive coding also allows exploitation of the

correlation from block to block of the coefficient most likely to be correlated. Thus JPEG and other DCT based schemes use the power spectrum which results from phase transitions for energy compaction and HVS specific quantization. While there is no standardized wavelet encoding scheme as there is for the DCT, all wavelet transforms rely on self-similarity. Wavelets do not have the pure frequency interpretation available for the Fourier and Cosine transform but the basis functions definitely vary in frequency. Wavelet basis functions are translations and scaling of a single function. Thus while one basis function captures an aspect of a whole image, the next smaller wavelet will represent the same aspect on a smaller scale of a portion of the image. 4.3. Fractals and pyramid coding Fractal methods have had tremendous success in image creation, especially for science fiction movies. Their use for image compression of natural scenes has grown more recently and a variety of fractal related methods are now in use. Extremely high compression using Barnsley’s “Iterated Function System” (IFS) has been reported.6 While this method of image compression will not be discussed because the details of encoding are not available, we note that the IFS relies on one portion of the image being the affine transform of another portion. This is perhaps the epitome of self-similarity in data compression. Pyramid coding methods, which have been used longer than fractals, are actually identical to some fractal image creation methods. Both the compression method and the fractal landscape forgeries rely heavily on the self similar nature of images. Pyramid data compression schemes progressively transmit first a low resolution image and then successively refine the image by transmitting higher resolution corrections. Ideally, the low resolution images maintain the appearance of the high resolution image, thus allowing early recognition.7 The decoding process for pyramid compressed images exactly matches one of the construction processes for fractal Brownian surfaces. To decode a pyramid image a receiver interpolates the a resolution subsampled image to the next larger size and then adds a correction term to each pixel at the new image size. These correction terms have been encoded so that when added to the interpolated picture a higher resolution image closer to the original is obtained. Data compression occurs because the correction values tend to be smaller than the pixel values and require fewer bits to encode. If the low resolution images were created by subsampling without filtering then the higher resolution image can be formed by adding a correction only to the interpolated pixels, and using the lower resolution value as the fourth pixel. If the low resolution images were formed by filtering and then sampling a correction term must be added to every pixel. The creation of fractal surfaces follows a similar path. Initially, given a lattice of Brownian related heights a higher resolution lattice can be created by interpolating between low resolution points and adding a Gaussian random variable to the interpolated midpoints. For better fractals of different dimensions the method of successive random additions can be used.8 In this method the low resolution lattice is viewed as an approximation to the next level so new points are interpolated and a Gaussian correction is added to all points in the higher resolution lattice. The only difference between the creation of this type of fractal landscape and decoding an image pyramid is that the fractal adds a random variable while the image decoder adds a precomputed value. 4.4. Segmentation coding Image compression techniques which concentrate almost exclusively on use of the human visual system for compression are often called “second generation techniques.” The majority of these techniques are called contour-texture methods or image segmentation coding.9 Essentially, an image is divided into a large number of regions along what are hopefully visually significant contours. The image can be stored or transmitted using a description of the contours and a method to fill the regions the contours define. Segmentation coding varies in the methods used to divide the image into regions and the method used to represent the texture within the region. Perhaps the simplest example is segmenting an image using an intensity threshold and filling the regions with a constant average grayscale value. Many segmentation methods begin with simple or directional edge detection where an edge is a rapid change in grayscale. This can be done with linear filters. Although more processing must be done because simple filtering does not provide connected edges forming closed regions, the connection with phase transitions is obvious. We have previously identified phase transitions with rapid changes of intensity which these second generation techniques use for segmentation.

Even if the texture within a region is described by a fractal10 and regions are determined by region growing, (not an edge filtering technique) the connection between this form of image compression and phase transitions is clear. There are several regions of a picture describable by simple models and the change from one region to another happens extremely rapidly. In fact, there is an assumption that no pixels need to be represented by the average of two textures in almost all segmentation coding. It is interesting to note that the authors saw connections between phase transitions, 1/f2 power spectrums, and self-similarity and the other compression techniques before thinking about segmentation coding. It is exactly this sort of image compression method which we would have hoped would be developed through practical examination of the connections presented here. 5. CONCLUSIONS We have shown how self similarity relates to methods of image compression which have been independently developed empirically, or using pixel statistics or using properties of the human visual system. Specifically, we have seen the dependence of some fractal and wavelet coding, DCT coding, and differential coding on self-similarity and 1/f2 power spectrum. We have described a physical analog for image coding algorithms. There has been much recent thinking about the physics of information. Our study suggests that some sort of computer architecture based on phase transitions could be ideal for image processing. Further study is needed to explicitly define ciritcal points in images and to see which physical systems have are able to successfully mimic image compression algorithms. 6. ACKNOWLEDGEMENT We are grateful to Eric Saund for helpful comments. 7. REFERENCES 1. D. Marr, Vision, Chapt. 1, W.H. Freeman, San Francisco, 1982. 2. David J. Field, “Relations between the statistics of natural images and the response properties of cortical cells,” J. Opt. Soc.Am.A, vol. 4(12), pp.2379-2394, Dec. 1987. 3. David C. Knill, David Field and Daniel Kersten, “Human discrimination of fractal images,” J. Optical Soc. Am. A, vol. 7, no.6, pp. 1113-1123, 1990. 4. Arun Netravali, and Barry Haskell. Digital Pictures Representation and Compression, Plenum Press: New York, 1988. 5. JPEG, JPEG-10-R2: Working Draft for Development of JPEG CD, 18 May 1991. 6. Michael Barnsley, Fractals Everywhere, pp. 86-117, Boston: Academic Press, 1988. 7. Tzou, Sou-Hu. “Progressive image transmission: a review and comparison of techniques,” Optical Engineering, vol. 26 (7), pp.581 - 589, July 1987. 8. Dietmar Saupe, “Algorithms for random fractals,” The Science of Fractal Images, Heinz-Otto Peitgen, and Dietmar Saupe, Eds., pp. 71-136, Springer-Verlag: New York, 1988. 9. Murat Kunt, Athanassios Ikonomopoulos, and Michel Kocher, “Second-Generation Image-Coding Techniques,” Proceedings of the IEEE, vol. 73(4), pp.549-574, April 1985. 10. J. Jang and S.A. Rajala, “Segmentation based image coding using fractals and the human visual system,”ICASSP 90, pp. 1957 - 1960.