Robust Histogram Shape Based Method for Image ... - Semantic Scholar

40 downloads 5692 Views 8MB Size Report
http://www.ieee.org/publications_standards/publications/rights/index.html for more information. ... obtained from the IEEE by sending an email to [email protected]. ... Some watermarking methods embed not only the water-.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCSVT.2014.2363743, IEEE Transactions on Circuits and Systems for Video Technology 1

Robust Histogram Shape Based Method for Image Watermarking Tianrui Zong, Yong Xiang, Senior Member, IEEE, Iynkaran Natgunanathan, Song Guo, Senior Member, IEEE, Wanlei Zhou, Senior Member, IEEE, and Gleb Beliakov, Senior Member, IEEE

Abstract— Cropping and random bending are two common attacks in image watermarking. In this paper, we propose a novel image watermarking method to deal with these attacks, as well as other common attacks. In the embedding process, we first pre-process the host image by a Gaussian low-pass filter. Then a secret key is used to randomly select a number of gray levels and the histogram of the filtered image with respect to these selected gray levels is constructed. After that, a histogram-shape-related index is introduced to choose the pixel groups with the highest number of pixels and a safe band is built between the chosen and non-chosen pixel groups. A watermark embedding scheme is proposed to insert watermarks into the chosen pixel groups. The usage of the histogram-shape-related index and safe band results in good robustness. Moreover, a novel high-frequency component modification mechanism is also utilized in the embedding scheme to further improve robustness. At the decoding end, based on the available secret key, the watermarked pixel groups are identified and watermarks are extracted from them. The effectiveness of the proposed image watermarking method is demonstrated by simulation examples. Index Terms— Image watermarking, histogram, cropping, random bending, Gaussian filtering.

EDICS: VTA-SV-02 I. I NTRODUCTION With the rapid development of modern communication networks, nowadays information is transmitted with the speed never seen before. At the same time, illegally manipulated copies of digital media can be easily transmitted and distributed. As a result, copyright protection has become a major issue worldwide. Digital watermarking is a promising technique to tackle this problem. Whilst digital watermarking can be applied to audio [1], [2], image [3], [4] and video [5], [6], this paper focuses on image watermarking. A good image watermarking method should be imperceptible, robust and secure. Imperceptibility means that watermarks should be perceptually unobtrusive. Robustness indicates the ability Copyright (c) 2014 IEEE. Personal use of this material is permitted. However, permission to use this material for any other purposes must be obtained from the IEEE by sending an email to [email protected]. Yong Xiang is the corresponding author. T. Zong, Y. Xiang, I. Natgunanathan, W. Zhou, and G. Beliakov are with the School of Information Technology, Deakin University, Melbourne Burwood campus, Burwood, VIC 3125, Australia (tel: +61 3 92517740, fax: +61 3 92446831, e-mail: {tzong, yxiang, iynkaran.natgunanathan, wanlei.zhou, gleb.beliakov}@deakin.edu.au). S. Guo is with the School of Computer Science and Engineering, The University of Aizu, Japan (tel: +81 242 37 2579, fax: +81 242 37 2548, e-mail: [email protected]). This work was supported in part by the Australian Research Council under grant DP110102076.

of correctly extracting watermarks after undergoing different kinds of attacks. There are two types of attacks, which are signal processing attacks (e.g., compression, filtering, and noise addition) and geometric attacks (e.g., scaling, rotation, shearing, cropping, and random bending). Security refers to the resistance to unauthorized watermark decoding without knowing the secret key. Over the last decade, various image watermarking methods have been developed. Many of these methods are robust to common signal processing attacks but do not cope well with geometric attacks. For example, the methods in [3] and [4] are resistant to median filtering and JPEG compression but very sensitive to rotation. In order to tackle geometric attacks, various techniques have been utilized in image watermarking. Those techniques can be broadly classified into non-blind and blind watermarking techniques, respectively. The non-blind watermarking techniques [7]-[10] need to use the host image for watermark extraction at the decoding end. Thus, their practical usage is very limited. On the other hand, the blind watermarking techniques do not require the information of the host image in the decoder, which makes them more suitable for real-world applications. Most of the blind watermarking techniques can be divided into exhaustive search [11], [12], transform domain [13]-[17], template [18]-[20], moment [23], [24] and feature [25]-[29] based techniques. In [11] and [12], by assuming possible attack parameters, exhaustive search is carried out at the decoding end to search the embedded watermarks from the received image. The exhaustive search based methods are very expensive in computation. Moreover, their false detection probability is high. In [13] and [14], the watermarks are embedded in a domain generated by Fourier-Mellin transform, which is invariant to scaling and rotation. However, the Fourier-Mellin domain is inherently vulnerable to cropping. The discrete wavelet transform and discrete Fourier transform are combined for image watermarking in [15]. Whilst the method in [15] is robust to affine attacks, it is vulnerable to random bending attacks (RBAs). Kang et al. develop a watermarking method to resist geometric attacks by using uniform log-polar mapping in [16], but it cannot tackle RBAs either. In [17], the discrete cosine transform and singular value decomposition are combined to deal with cropping attacks. However, this method does not perform well against signal processing attacks and some geometric attacks such as scaling and rotation. Some watermarking methods embed not only the watermarks but also a template as additional information [18]-[20].

1051-8215 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCSVT.2014.2363743, IEEE Transactions on Circuits and Systems for Video Technology 2

At the decoding end, the template can provide information about the geometric attack which the watermarked image has undergone. The method proposed by Pereira et al. embeds a 25 points template in the discrete Fourier transform domain by using log-polar-map and log-log-map to resist rotation and scaling [18]. Unfortunately this method is vulnerable to the combination of rotation and scaling. To resist general affine transformations, a line structure is applied to the template in [19]. In [20], the Chirp-Z transform is employed for more accurate and faster template matching. The main drawback of the template-based methods in [18]-[20] is that the template can be estimated by attackers and then removed. Besides, it is also difficult for the template-based schemes to estimate the parameters used in RBAs [21], which would lead to a severe degradation in robustness [22]. Moreover, the templatebased methods cannot guarantee the successful extraction of embedded watermarks under cropping attacks due to the permanent loss of image information. In [23] and [24], the spatial statistical properties of an image are used to extract the moment invariants and then the latter are employed for watermark embedding. This approach is invariant to affine attacks but it is highly vulnerable to cropping attacks. In [25]-[28], the watermarks are embedded into some feature regions in the spatial domain, which can resist affine attacks and cropping attacks. A method is proposed in [29] to optimally select the feature regions to resist as many attacks as possible. However, the feature region detection processes in [25]-[29] depend on the relationship between neighboring pixels. This implies that the decoding process will be affected by pixel position shifting, which can be caused by RBAs. In summary, all of the afore-mentioned watermarking schemes are unable to cope with both cropping attacks and RBAs. Cropping attacks and RBAs are two common geometric attacks [21]. By cutting some part of the image, the synchronization between the transmitter and the receiver can be destroyed without degrading the perceptual quality significantly. The RBAs can be realized by printing the watermarked image on a high quality printer and then scanning it again with a high quality scanner [30]. As a result, the watermarked image will be randomly modified. This can displace the sampling grid locations therefore causing the de-synchronization problem [21]. Moreover, with some image processing tools, cropping and random bending can be easily operated even by a person who does not have much knowledge about image watermarking. In [31], fractal coding is applied to withhold cropping and random bending but this scheme is vulnerable to global transforms such as affine transforms. Recently, it has been shown in [32]-[34] that the histogram shape of an image is invariant to pixel position shifting and insensitive to cropping, thus having potential to be exploited to tackle cropping attacks and RBAs. However, the methods in [32]-[34] do not explicitly employ the histogram shape of the image. Instead, they use the mean of the histogram to select gray levels in a certain range and use the pixels corresponding to these gray levels to form pixel groups for watermark embedding. A common drawback of these methods is that many potential pixel groups which could contain more pixels are not formed and utilized for

watermark embedding, which compromises their robustness. Moreover, signal processing attacks and interpolation errors caused by geometric attacks can also affect the performance of histogram based methods. Since the method in [33] does not consider these impacts, it is sensitive to many attacks. In [32], a low-pass pre-filtering step is applied to tackle this problem but the negative impact on watermark decoding caused by prefiltering is not effectively handled. As a result, this method cannot guarantee 100% detection rate even when attacks are absent. In [34], histogram is combined with spatial feature regions but this combination brings other issues. First, instability is induced due to insufficient robustness and accuracy of spatial feature regions, which does not exist in methods using histogram only. Second, the lack of adequate pixels in one single feature region makes the method vulnerable to attacks. Note that histogram has also been used in reversible watermarking [35], [36]. However, attacks are usually not considered in reversible watermarking. In this paper, the histogram shape of image is explicitly exploited in watermark embedding and decoding. At the embedding phase, a Gaussian low-pass filter is first applied to the host image to extract the image feature robust to signal processing attacks. Then, we construct the histogram of the filtered image in relation to a number of gray levels randomly selected by using a secret key. Instead of using histogram mean as in [32]-[34], we define a histogram-shape-related index to select pixel groups with the highest number of pixels and propose to build a safe band between the selected and nonselected pixel groups, which will be beneficial to robustness. A watermark bit is inserted into a selected group by moving some pixels to certain gray levels within the pixel group. Moreover, during watermark embedding, a novel high-frequency component modification (HFCM) scheme is implemented to compensate the side effect of Gaussian filtering, which further enhances robustness. At the decoding phase, by using the secret key, one can find the watermarked pixel groups from the received image and then extract watermarks from them. Compared to other histogram-based watermarking methods such as those in [32]-[34], the major contributions of our paper are the propositions of a new histogram-shape-related index for pixel group selection assisted by a secret key, a new pixel transfer approach to reduce the extent of pixel movements, a safe band to create an error buffer between the selected and non-selected pixel groups, and a HFCM scheme to compensate the side effect of Gaussian filtering. The performance of the proposed image watermarking method is illustrated by simulation examples and compared with recent works. The remainder of this paper is organized as follows. Section II and Section III illustrate the proposed image watermarking method. Simulation results are provided in Section IV to show the superior performance of our method, in terms of perceptual quality and robustness against common attacks. The selection of parameters of the proposed method is also discussed in this section. Finally, Section V concludes the paper.

1051-8215 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCSVT.2014.2363743, IEEE Transactions on Circuits and Systems for Video Technology 3

II. WATERMARK EMBEDDING PROCESS The watermark embedding process of the proposed method consists of four steps: Gaussian filtering, histogram construction, pixel group selection, and HFCM-based watermark embedding.

Each element of p(n) randomly takes an integer from the range [0, K −1], where p(i) 6= p(j) if i 6= j. Denoting the S selected gray levels by K1 , K2 , . . . , KS , the ith selected gray level Ki is given by Ki = p(i). Then, we construct the histogram of the image Ilow by

HS = {hS (Ki )|i = 1, 2, . . . , S} S ecretkey Host Image

Watermarked image

Fig. 1.

G aussian filtering

‫ܫ‬

‫ܫ‬ௐ

‫ܫ‬௛௜௚௛

‫ܫ‬௟௢௪

ܹ ߚ

HFCM based watermark embedding

Histogram construction

where hS (Ki ) is the number of pixels corresponding to gray level Ki . C. Pixel group selection

ܶீ ߙ

After constructing the histogram HS , we take each LB neighboring gray levels in HS to form a bin. In total, one can form

Pixelgroup selection

MB = b

Block diagram of watermark embedding process.

A. Gaussian filtering It is known [32], [37] that robustness to common signal processing attacks can be achieved by embedding watermarks into the low-frequency component of an image. Thus, we first pre-process the host gray scale image I by a 2-D Gaussian low-pass filter 1 − x2 +y2 2 2σ ᇱ e (1) ‫ܫ‬௟௢௪ 2 2πσ where (x, y) denotes the position of the pixel and σ is the standard deviation of the distribution, which is usually chosen as σ = 1 [32]. The filtered image Ilow can be expressed as F (x, ‫ ܫ‬ᇱ y, σ) =

Ilow (x, y) = F (x, y, σ) ∗ I(x, y)

(2)

where ∗ denotes the convolution operator. If Ihigh stands for the high-frequency component removed by the Gaussian filtering from the host image I, it follows Ihigh (x, y) = I(x, y) − Ilow (x, y).

B. Histogram construction Assume that the filtered image Ilow has K gray levels, e.g., an 8-bit gray scale image has K = 256 gray levels, ranging from 0 to 255. The histogram of an image illustrates the number of pixels versus the gray level values. Clearly, the shape of the histogram is related to the image content. In order to introduce security into our method, a pseudonoise (PN) sequence p(n) = [p(1), p(2), . . . , p(K)] of length K is used as a security key to randomly select S gray levels from the K available gray levels, where (4)

S c LB

(6)

bins, where b·c is the floor function. It is easy to find that the number of pixels in the ith bin is ¡ ¢ ¡ ¢ hB (i) = hS K(i−1)·LB +1 + hS K(i−1)·LB +2 + · · · +

hS (Ki·LB )

(7)

where i = 1, 2, . . . , MB . Further, we take each two neighboring bins to form a group, which will yield b M2B c groups. For the ith group, the two bins (called Bin 1 and Bin 2) contain hB (2i − 1) and hB (2i) pixels, respectively, where i = 1, 2, . . . , b M2B c. Thus, the number of pixels in the ith group is hG (i) = hB (2i − 1) + hB (2i), i = 1, 2, . . . , b

MB c. 2

(8)

Next, we select the pixel groups that are suitable for hiding watermarks. Let NS be the total number of pixels corresponding to the S selected gray levels, i.e., NS =

(3)

In practice, the size of the Gaussian mask F is often chosen using expression (2kσ + 1) × (2kσ + 1), where k is a positive integer. Since 99.7% energy of the Gaussian distribution is concentrated within 3 standard deviations from the mean, we can set k to be 3. Thus the size of the Gaussian mask F used in this paper is 7 × 7.

K/2 ≤ S < K.

(5)

S X

hS (Ki ).

(9)

i=1

To make the selection of pixel groups adaptive to the histogram shape, we propose a pixel group selection criterion based on the ratio between hG (i) and NS , i.e., based on g(i) =

hG (i) NS

(10)

where i = 1, 2, . . . , b M2B c. As shown in Appendix A, g(i) in (10) is insensitive to common geometric attacks, including cropping attacks and RBAs, which is essential to tackling geometric attacks. If g(i) is greater than a predetermined threshold TG , the ith pixel group is considered to be suitable for watermark embedding. The threshold TG balances robustness and embedding rate. The larger TG , the higher robustness due to more pixels in the group but the lower embedding rate as fewer pixel groups will be chosen for watermark embedding. In this paper, TG is empirically chosen as TG =

Total number of image pixels . 4LB

1051-8215 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

(11)

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCSVT.2014.2363743, IEEE Transactions on Circuits and Systems for Video Technology 4

We assume without loss of generality that LG pixel groups are suitable for embedding watermarks, where LG ≤ b

MB c. 2

(12)

Denote the number of watermarks to be embedded by LW . If LW = LG , all of the LG pixel groups will be chosen to embed watermarks. If LW < LG , the LW pixel groups with the highest g(i) values will be chosen to embed the LW watermarks. In the watermark decoding process, the g(i) values will be estimated from the post-attack watermarked image, and then used to identify the LW pixel groups containing watermarks. Although g(i) in (10) is insensitive to common geometric attacks, it could be changed to some extent after attacks, which will lead to errors in identifying the watermarked pixel groups. To minimize this unwanted situation from happening, in the following, we introduce a safe band for g(i) between the LW chosen pixel groups and the non-chosen pixel groups in the embedding process. Let gmin be the minimum g(i) value among the LW chosen pixel groups. We set the width of the safe band to be α · gmin , where α is a factor which controls the strength of the safe band. The safe band is implemented as follows. For each nonchosen group, if the corresponding g(i) satisfies (1 − α) · gmin < g(i) < gmin

(13)

we randomly pick up [g(i) − (1 − α) · gmin ]·NS pixels in this non-chosen group. Then we change the gray level values of these pixels such that they fall into the nearest chosen group. After this modification, the g(i) value of the corresponding non-chosen group reduces to no more than gmin − α · gmin . This forms the safe band of width α · gmin . At watermark decoding end, this group-level safe band will significantly diminishes the chance of classifying non-watermarked pixel groups as watermarked pixel groups. We would like to note that while increasing α improves the robustness, it will degrade the perceptual quality as a larger α generally results in more pixel modifications. We will discuss how to empirically choose a suitable α value in Section IV-A. D. HFCM-based watermark embedding 1) Watermark embedding via pixel transfer between bins: Assume that w1 , w2 , . . . , wLW are the watermark bits to be embedded into the LW chosen groups, respectively. It is known that each chosen group, say the ith group, is composed of two bins: Bin 1 containing hB (2i − 1) pixels and Bin 2 containing hB (2i) pixels. Then, one can embed the watermark bit wi into the ith chosen group by using the following embedding rule: ( h (2i−1) B if wi = 1 hB (2i) ≥ 2, . (14) hB (2i−1) 1 hB (2i) ≤ 2 , if wi = 0 The above embedding rule implies that if wi = 1, a certain number of pixels (say N1 ) should be transferred from Bin 2 to Bin 1 such that hB (2i − 1) ≥ 2hB (2i). Similarly, if wi = 0, some pixels (say N0 ) need to be transferred from Bin 1 to

Bin 2 such that hB (2i − 1) ≤ 21 hB (2i). The minimum N0 and N1 are ( B (2i) N0 = 2hB (2i−1)−h 3 . (15) B (2i−1) N1 = 2hB (2i)−h 3 Clearly, the way of transferring pixels will have direct impact on perceptual quality. In order to reduce the perceptual quality degradation, the amount of pixel value changes during the pixel transfer should be small. Without loss of generality, we assume that we would like to embed watermark bit “0” into the ith group. This requires us to transfer no less than N0 pixels from Bin 1 into Bin 2 within this group. Referring to (7), the LB gray levels in Bin 1 are K(i−1)·LB +1 , K(i−1)·LB +2 , . . . , Ki·LB and those in Bin 2 are Ki·LB +1 , Ki·LB +2 , . . . , K(i+1)·LB . Denote the numbers of pixels at these gray levels by NK(i−1)·LB +1 , NK(i−1)·LB +2 , . . . , NK(i+1)·LB , respectively. For the methods in [32] and [34], the N0 pixels are randomly selected from the gray levels K(i−1)·LB +1 , K(i−1)·LB +2 , . . . , Ki·LB in Bin 1 and then moved to the gray levels Ki·LB +1 , Ki·LB +2 , . . . , K(i+1)·LB in Bin 2, respectively. That is, all selected pixels will jump at least LB gray levels to reach the new destinations, which means considerable changes to the gray level values of all selected pixels. This will have negative impact on perceptual quality. Here, we propose a new pixel transfer approach to reduce the extent of pixel movements. Our approach considers the following two cases. • Case 1: If NKi·L +1 ≥ N0 , move N0 pixels from gray B level Ki·LB +1 to gray level Ki·LB +2 within Bin 2. Then select N0 pixels from Bin 1 as follows: – If NKi·LB ≥ N0 , choose all N0 pixels from gray level Ki·LB . – If NKi·LB < N0 , then choose NKi·LB pixels from gray level Ki·LB and the remaining N0 − NKi·LB pixels are chosen from the other gray levels in Bin 1. The pixels selected in Bin 1 are moved to gray level Ki·LB +1 in Bin 2. • Case 2: If NKi·L +1 < N0 , move all NKi·L +1 pixels B B from gray level Ki·LB +1 to gray level Ki·LB +2 . Then select N0 pixels from the LB gray levels in Bin 1 in the way described in Case 1. Among these selected pixels, move the first NKi·LB +1 pixels to gray level Ki·LB +1 and the remaining N0 − NKi·LB +1 pixels to gray level Ki·LB +2 . Fig. 2 shows an example illustrating how to transfer pixels from Bin 1 to Bin 2 in Case 1. Here, for the sake of illustrative simplicity, we assume i = 1 and LB = 3. In this scenario, the gray levels in Bin 1 are K1 , K2 , K3 and the gray levels in Bin 2 are K4 , K5 , K6 . One can see that in our approach, many selected pixels are only shifted to the next gray level. This means that the modifications on these pixels are minimum, which is beneficial to improving perceptual quality. It should be noted that although the above example only shows how to embed watermark bit “0” into group 1, the approach can be easily

1051-8215 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCSVT.2014.2363743, IEEE Transactions on Circuits and Systems for Video Technology 5

extended to embed any watermark bit into any selected pixel group.

frequency component I44 low of I44 can be calculated as 7 X 7 X

I44 low =

Iij Fij

(17)

I44 low = Q44 + I44 F44

(18)

i=1 j=1

which can be further written as ‫ܭ‬ଵ ‫ܭ‬ଶ ‫ܭ‬ଷ

‫ܭ‬ସ ‫ܭ‬ହ ‫଺ܭ‬

‫ܭ‬ଵ ‫ܭ‬ଶ ‫ܭ‬ଷ

‫ܭ‬ସ ‫ܭ‬ହ ‫଺ܭ‬

where Q44 =

7 X 7 X

Iij Fij − I44 F44 .

(19)

i=1 j=1

‫ܭ‬ଵ ‫ܭ‬ଶ ‫ܭ‬ଷ

‫ܭ‬ସ ‫ܭ‬ହ ‫଺ܭ‬

‫ܭ‬ଵ ‫ܭ‬ଶ ‫ܭ‬ଷ

‫ܭ‬ସ ‫ܭ‬ହ ‫଺ܭ‬

Fig. 2. Illustration of transferring pixels from Bin 1 to Bin 2, where i = 1, LB = 3, the horizontal axis denotes gray level, and the vertical axis denotes the number of pixels. (a) Original state. (b) Transfer pixels from gray level K4 to gray level K5 . (c) Transfer pixels from gray levels K1 , K2 , K3 to gray level K4 . (d) Final state.

2) Performance enhancement via HFCM: We first discuss the necessity of HFCM. Denoting the watermarked lowW frequency component without using HFCM by I˜low , the corresponding watermarked image can be obtained as I˜W = W I˜low + Ihigh , where Ihigh is the high-frequency component of the host image I. Let I˜0 be the received image at the watermark decoding stage. Clearly, in the absence of attacks, we have I˜0 = I˜W . Since watermarks are hidden in the lowfrequency component of I˜0 , the 7 × 7 Gaussian mask F used at the watermark embedding stage is applied to the received 0 image I˜0 to extract its low-frequency component I˜low . Thus we obtain

As mentioned before, watermarks are embedded into Ilow by modifying the gray level values of some of its pixels and only one pixel will be modified each time. Assuming that the pixel to be modified is I44 and the other pixels do not change in the meantime, then Q44 will be a constant. From (18), it results in I44 low − Q44 I44 = . (20) F44 Since I44 = I44 low + I44 high , it follows from (20) that the high-frequency component I44 high can be expressed as I44 high

       

0 I˜low

= = =

F ∗ I˜0 F ∗ I˜W W F ∗ (I˜low + Ihigh ).

 

= I44 − I44 low I44 low (1 − F44 ) − Q44 . = F44

F11

F12 

F13

F14

F15

F16

F17 

F21

F22 

F23

F24

F25

F26

F27 

F31

F32 

F33

F34

F35

F36

F37 

F41

F42 

F43

F44

F45

F46

F47 

F51

F52 

F53

F54

F55

F56

F57 

F61

F62 

F63

F64

F65

F66

F67 

F71

F72 

F73

F74

F75

F76

F77 

(21)



(16)

0 To correctly extract the embedded watermarks from I˜low , the 0 W ˜ ˜ ideal case should be Ilow = Ilow . Otherwise, detection error may occur, even if attacks are absent. Unfortunately, from (16), 0 W it is obvious that I˜low is generally quite different from I˜low , which is a side effect of Gaussian filtering. To correctly extract the embedded watermarks, Ihigh needs to be properly modified to compensate the side effect of Gaussian filtering such that the watermarked low-frequency component at the embedding end equals its counterpart at the decoding end. Specifically, after a pixel in Ilow is altered due to watermark embedding, the corresponding pixel in Ihigh will be modified by the HFCM procedure presented below. Prior to modifying Ihigh , it is important to find the relationship between Ihigh and Ilow . For the 7 × 7 Gaussian mask F introduced in the subsection II-A, its (i, j)th element is F (i, j, 1). To save space, we relabel F (i, j, 1) as Fij , as shown in Fig. 3. Assume that the Gaussian mask F is applied to the host image I, centered at pixel I44 (see Fig. 4). Then the low-

Fig. 3.

 Illustration of a 7 × 7 Gaussian mask.



 

 I 11 I 21

  Ga uss ia n

‫ ܫ‬

mas kହସ

ce nt red at

‫ܫ‬ହସ



I 31 I 41 I 51 I 61 I 71 I 81



I 91

Fig. 4.





I 12



I 22



I 32

   

I 42 I 52 I 62 I 72 I 82



I 92



  







I 13



I 14

I 15

I 16

I 17

 I1 8



I 23



I 24

I 25

I 26

I 27





I 33



I 34

I 35

I 36

I 37

 3 8

I 43

I 44

I 45

I 46

I 47

I 48

I 53

I 54

I 55

I 56

I 57

I 63

I 64

I 65

I 66

I 67

I 73

I 74

I 75

I 76

I 77

I 83

I 84

I 85

I 86

I 87

 I88



I 94

I 95

I 96

I 97

I 98

   

   

I 93

 

I  28

   

I

 

I 58

 

I 68

I 

78

Ga uss ia n

‫ܫ‬ସସ 

mas k

ce nt red at

‫ܫ‬ସସ

Illustration of Gaussian filtering.

In (21) we derived the relationship between Ihigh and Ilow . By using this relationship, we can calculate the desired highfrequency component from the watermarked low-frequency component. Given that I44 low is modified by an amount ξ

1051-8215 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCSVT.2014.2363743, IEEE Transactions on Circuits and Systems for Video Technology 6

due to watermark embedding, the watermarked low-frequency W component I44 low can be expressed as W I44 low = I44 low + ξ.

(22)

Now, we show how to modify I44 high according to the change of I44 low . Recall that the received image I 0 is the same as the watermarked image I W . Then, for the received image I 0 , it follows from (21) that 0 I44 low (1

− F44 ) − Q44 . (23) F44 Also, to cancel the side effect of Gaussian filtering, we should W 0 ensure I44 low = I44 low . Thus, from (22), it leads to 0 I44 high

=

0 I44 low = I44 low + ξ.

(24)

Substituting (24) into (23), we obtain the desired high0 frequency component I44 high as follows 0 I44 high

=

(I44 low + ξ)(1 − F44 ) − Q44 . F44

(25)

0 W As described before, in order to make I44 low = I44 low , we 0 have to modify I44 high to its desired counterpart I44 high . Based on (21) and (25), the modification amount is 0 = I44 high − I44 high (1 − F44 )ξ = . (26) F44 Next, we show that changing I44 will also affect the lowfrequency components of neighboring pixels at the receiver end. Since a Gaussian mask is concentrated on the central part and degrades significantly towards the edges, we only consider the neighboring pixels I33 , I34 , I35 , I43 , I45 , I53 , I54 and I55 . Furthermore, because of the circular and symmetrical property of Gaussian distribution, the impact on I34 , I43 , I45 and I54 will be the same. Similarly, the impact on I33 , I35 , I53 and I55 will also be the same. So, in the following, we only analyze the case of I54 and I55 . For I54 , similar to (17), we can write I54 low as

∆I44 high

I54 low =

7 X 7 X

Ii(j+1) Fij .

(27)

i=1 j=1

Clearly, one term on the right-hand side of the above equation is related to I44 . If I44 is altered, I54 low will be affected. Denoting Q34 =

7 X 7 X

Ii(j+1) Fij − I44 F34

(28)

From (24) and (30), it yields 0 I54 low = Q34 +

(I44 low + ξ − Q44 ) F34 . F44

(31)

Based on (29) and (31), we have ∆I54 low

0 = I54 low − I54 low F34 ξ . = F44

(32)

Referring to (26), we can further develop (32) into ∆I54 low = F34 ξ + F34 ∆I44 high

(33)

which is the amount of variation of I54 low caused by modifying I44 and Gaussian filtering. Similarly, the variation amount of I55 low can be expressed as ∆I55 low = F33 ξ + F33 ∆I44 high .

(34)

The above HFCM procedure can be applied to other pixels in the same way. Note that changing the high-frequency component of I44 will affect the neighboring pixels and degrade perceptual quality. So, in practice, we introduce a parameter β to control the strength of HFCM, i.e., replace ∆I44 high with β · ∆I44 high . The selection of β will be discussed in Section IV-A. It is also worth noting that the embedding capacity of a watermarking method using histogram depends on the number of gray levels employed in a bin. The fewer gray levels used in a bin, the higher embedding capacity, and vice versa. For the proposed method, since the HFCM step does not impose any restriction on the bin size, the minimum number of gray levels in each bin is 1. In contrast, to tackle the problem caused by pre-filtering, the methods in [32] and [34] require 3 gray levels in each bin. For the method in [33], it needs at least 2 gray levels in each bin. Hence, the proposed watermarking method can have higher embedding capacity than other histogrambased watermarking methods [32]-[34]. Remark 1: Although the standard invertible transform domains, such as the widely used discrete wavelet transform, are applicable to embedding watermarks, most of them are not invariant to geometric attacks [15], [39]. While certain transform domains such as Fourier-Mellin transform and logpolar domain are invariant to affine transforms, they are not robust to other geometric attacks considered in the paper. On the contrary, Gaussian filter is rotation invariant due to its isotropic and circular shape [40]. This property is beneficial for obtaining geometric invariant feature points [37].

i=1 j=1

III. WATERMARK DECODING PROCESS

it follows from (20), (27) and (28) that I54 low

=

Q34 + I44 F34 (I44 low − Q44 ) F34 = Q34 + . (29) F44 At the decoding end, according to (29), the low-frequency 0 0 component of the pixel I54 , denoted as I54 low , can be expressed as 0 I54 low = Q34 +

0 (I44 low − Q44 ) F34 . F44

(30)

The watermark decoding process is shown in Fig. 5, which involves Gaussian filtering, histogram construction, identification of watermarked groups, and watermark extraction. • Gaussian filtering: Since Gaussian low-pass filter is not invariant to scaling attacks, the dimension of the 7 × 7 Gaussian mask used in the embedding process may be varied in the received image I 0 after scaling attacks. Hence, same as the method in [32], a σ-based matching scheme is employed to determine the size of the Gaussian

1051-8215 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCSVT.2014.2363743, IEEE Transactions on Circuits and Systems for Video Technology 7

S ey ec retk  6. For each image, two different embedding rates are used: i) f I d entific atio no The number of watermark bits is LW = 25. ii) The number  k ᇱ G R H W ᇱ ec eiv ed isto gram aterm ar   ‫  ܫ‬aussian ‫ܫ‬௟௢௪   aterm ark ed w of watermark bits is LW = LG , where LG is the maximum  f im age c o nstruc tio n extrac tio n iltering gro up s number of watermark bits that can be embedded into the Fig. 5.

Block diagram of watermark decoding process.

mask to be used in the decoding process. The obtained Gaussian mask is applied to I 0 to extract the correspond0 ing low-frequency component Ilow . • Histogram construction: Based on the security key, find the S gray levels K1 , K2 , . . . , KS used for watermark embedding, from the K possible gray levels. Then con0 struct the histogram of Ilow , denoted as HS0 , by mimicking (5). • Identification of watermarked groups: Take each LB neighboring gray levels in HS0 to form a bin and take each two neighboring bins as a group. Compute the number of pixels in each bin, h0B (i), and that in each group, h0G (i), by referring to (7) and (8), respectively. After that, calculate NS0 and then g 0 (i) by imitating (9) and (10), respectively. The LW groups with the highest g 0 (i) values are identified as watermarked groups. • Watermark extraction: For the ith watermarked group, h0B (2i − 1) and h0B (2i) denote the number of pixels in the first bin and that in the second bin, respectively. If h0B (2i − 1)/h0B (2i) ≥ 1, the extracted watermark bit is “1”; Otherwise, watermark bit “0” is extracted. Remark 2: In the proposed method, without knowing the S secret key at the decoding end, an offender needs to search CK times to extract watermarks. For example, if K = 256 (i.e., the image is an 8-bit gray scale image) and S is set to be 246, then S 246 ≈ 2.7883 × 1017 . Lowering the value of S can CK = C256 further enhance security. In contrast, estimating the watermark sequences in [32] and [34] without authorization is easy, and the method in [33] does not use any security measure. So our method is much more secure than those methods in [32]-[34]. Remark 3: Since the proposed watermarking method exploits the histogram shape of an image, it does not perform well against attacks which can significantly change the histogram shape, such as brightness, contrast or gamma modification. This is a common issue for all histogram-based watermarking methods and majority of other watermarking methods. While the watermarking methods in [46] and [47] are designed to resist this type of attacks, they are vulnerable to geometric attacks. So, the proposed watermarking method is particularly suitable for applications where geometric attacks are of primary concern. IV. S IMULATION RESULTS In this section, we evaluate the perceptual quality and robustness of the proposed method by simulations, in comparison with the methods in [32]-[34]. Ten standard 8-bit gray scale images Barbara, Calculator, Conch, Desk, Elaine, Lena, Office, Peppers, Pirate and Venice are used as host images in the simulations, which are shown in the top two rows of Fig.

image. The LG values of these images are all greater than 25 (i.e., LG > 25) and the average LG value is 30. Perceptual quality is measured by the peak signal-to-noise ratio (PSNR) index and the structural similarity (SSIM) index [41], [42]. The larger PSNR or SSIM value, the better perceptual quality. Robustness is measured by the bit error rate (BER) index and smaller BER value indicates better robustness. The robustness of our method and the methods in [32]-[34] is compared in the absence of attacks and under various signal processing attacks (including median filtering, JPEG, Gaussian noise, and salt & pepper noise) and geometric attacks (including rotation, scaling, shearing, cropping, and random bending). These attacks are generated by programs written in MATLAB codes [43]-[45] and the effects of attacks on images are illustrated in Fig. 7. The performance indices PSNR, SSIM and BER are calculated by averaging the results obtained from the ten images. A. Parameters of the proposed method

For the proposed method, we set S = 246 and LB = 3. Regarding the parameters α and β, we here empirically investigate their influence on perceptual quality and robustness using the afore-mentioned host images, which will facilitate the selection of α and β. 1) Selection of α: As mentioned in Section II-C, α controls the strength of the safe band and consequently the robustness. A larger α improves the robustness. On the other hand, increasing α results in more pixel modifications and thus has more negative impact on the perceptual quality. Fig. 8(a) shows how α affects the perceptual quality of the proposed method with LW = 25 and β = 0. Fig. 8(b) shows the influence of α on the robustness of our method under various attacks including closed-loop, median filtering 5 × 5, JPEG 30%, Gaussian noise with variance 10, salt & pepper noise with variance 10, rotation 25%, scaling 80%, shearing 10%, cropping 10%, global bending with factor 10, high-frequency bending with factor 1 and jittering 1%. We can see that when α is greater than 0.15, further increasing α only yields small improvement on robustness but causes considerable perceptual quality degradation. Hence, in order to achieve high robustness while maintaining a good perceptual quality, we empirically choose α = 0.15. 2) Selection of β: Fig. 9 shows the impact of β on perceptual quality and robustness, where LW = LG and α = 0.15. As shown in Fig. 9(a), the rise of β makes both PSNR and SSIM decrease. This is not surprising as larger β means stronger HFCM, equivalently more changes on the high-frequency components of pixels. This degrades perceptual quality. On the other hand, we can see from Fig. 9(b) that when β ≤ 0.2, increasing β reduces BER and the reduction of BER is more significant when an attack (e.g., Gaussian noise attack in this case) is present. However, after β exceeds 0.2, the further surge of β returns larger BER.

1051-8215 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCSVT.2014.2363743, IEEE Transactions on Circuits and Systems for Video Technology 8

(a) Barbara

(b) Calculator

(c) Conch

(d) Desk

(e) Elaine

(f) Lena

(g) Office

(h) Peppers

(i) Pirate

(j) Venice

(k) Watermarked Barbara

(l) Watermarked Calculator

(m) Watermarked Conch

(n) Watermarked Desk

(o) Watermarked Elaine

(p) Watermarked Lena

(q) Watermarked Office

(r) Watermarked Peppers

(s) Watermarked Pirate

(t) Watermarked Venice

Fig. 6. Top two rows: original images Barbara, Calculator, Conch, Desk, Elaine, Lena, Office, Peppers, Pirate and Venice. Bottom two rows: watermarked counterparts of these images, where LW = LG and β = 0.2.

This interesting phenomenon can be explained as follows. Assume that HFCM is applied to the (i, j)th pixel of the host image and changes the value of this pixel in the embedding process. When performing Gaussian filtering on the received image in the decoding process, it could alter the values of the pixels surrounding the (i, j)th pixel. The lager β value, the more likely the values of these pixels will be changed. When this happens, we will see BER starts to increase. Therefore, to balance perceptual quality and robustness, β can be empirically chosen from the range [0, 0.2]. In the remaining simulations, we considered three β values: 0, 0.1 and 0.2. B. Perceptual Quality Table I shows the PSNRs and SSIMs of the proposed watermarking method under different β and LW . As expected, raising β degrades perceptual quality. Moreover, with the increase of LW , perceptual quality also slightly decreases. The reason is that to embed more watermark bits, more pixel groups will be used. This means that more pixels will be modified, which reduces perceptual quality. It is mentioned in [10] that the PSNR of 40 dB indicates good perceptual quality, whilst [41] uses the SSIM value of

0.9. Since the PSNRs (resp. SSIMs) obtained by our method are well above 40 dB (resp. 0.9), the watermarked images will be of good quality. For example, the bottom two rows of Fig. 6 shows the watermarked counterparts of the images Barbara, Calculator, Conch, Desk, Elaine, Lena, Office, Peppers, Pirate and Venice by our method, where LW = LG and β = 0.2 (which yield a PSNR of 45.13 dB and an SSIM of 0.947, as show in Table I). Clearly, there is no visual difference between the original images and their watermarked versions. C. Robustness Table II shows the BERs of the four watermarking methods against common attacks, where each image is embedded LW = 25 watermark bits. We make the compared methods have very close perceptual quality but our method has slightly higher perceptual quality than the other methods. For example, in the case of β = 0.2, the PSNR and SSIM indices of the proposed method are (PSNR = 45.54 dB, SSIM = 0.952), while those of the methods in [32], [33] and [34] are (PSNR = 44.26 dB, SSIM = 0.932), (PSNR = 44.59 dB, SSIM = 0.938) and (PSNR = 43.75 dB, SSIM = 0.937), respectively. This ensures that the comparison is fair. From Table II,

1051-8215 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCSVT.2014.2363743, IEEE Transactions on Circuits and Systems for Video Technology 9

(a) Original

(b) Median Filter

(c) JPEG

(d) Gaussian Noise

(e) Salt & Pepper Noise

(f) Rotation

(g) Scaling

(h) Shearing

(i) Cropping

(j) Global Bending with Factor 10

(k) High−Frequency Bending with Factor 1

(l) Random Jittering

Fig. 7. Illustration of attacks. (a) Original; (b) Median filter; (c) JPEG; (d) Gaussian noise; (e) Salt & pepper noise; (f) Rotation and scaling back; (g) Scaling; (h) Shearing; (i) Cropping; (j) Global bending with factor 10; (k) HF bending (factor 1); (l) Random jittering.

1051-8215 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCSVT.2014.2363743, IEEE Transactions on Circuits and Systems for Video Technology 10

(a) PSNR SSIM

54

0.995 0.99 0.985 0.98

50

0.975 0.97

48

SSIM

PSNR (dB)

52

0.965 0.96

46

0.955 44

0.95

0

0.15

α

0.3

0.45

(b) 0.15 Closed−loop Median filtering JPEG 30% Gaussian noise Salt & pepper Rotation 25% Scaling 80% Shearing 10% Cropping 10% Global bending HF bending Jittering

BER

0.1

0.05

0 0

Fig. 8.

0.15

α

0.3

0.45

(a) PSNR and SSIM versus α, where LW = 25 and β = 0; (b) BER versus α under common attacks. TABLE I PSNR S AND SSIM S OF THE PROPOSED METHOD VERSUS β, WHERE ŁW = 25 AND LW = LG , RESPECTIVELY. Perceptual quality PSNR (dB) SSIM

β=0 51.83 0.982

LW = 25 β = 0.1 β = 0.2 47.04 45.54 0.962 0.952

we can see that the proposed method is robust against common attacks, even in the case of β = 0 (i.e., HFCM is not activated). This good performance results from several factors: Firstly, the pixel groups employed to embed watermarks in our method contain more pixels than those in [32]-[34] do, thanks to the usage of the pixel group selection criterion based on histogram shape. Secondly, in the new method, a safe band is built between the pixel groups selected for watermark embedding and the non-selected pixel groups, while the methods in [32][34] do not have. Thirdly, the proposed method uses the unique HFCM scheme. Table III shows the BERs of the four methods when the number of watermark bits embedded into an image is increased to LW = LG . Again, all four methods have almost the same

β=0 51.14 0.981

LW = LG β = 0.1 β = 0.2 46.64 45.13 0.958 0.947

perceptual quality but the perceptual quality of the proposed method is slightly higher than that of the other methods. From Table III, it can be seen that when HFCM is enabled, the proposed method achieves satisfactory robustness under all attacks. In contrast, the methods in [32]-[34] do not perform very well under most attacks. V. CONCLUSION In this paper, we propose a new image watermarking method that is robust to common attacks. To deal with signal processing attacks, a Gaussian low-pass filter is employed to pre-process the host image such that watermarks will only be embedded into the low-frequency component of the host

1051-8215 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCSVT.2014.2363743, IEEE Transactions on Circuits and Systems for Video Technology 11

TABLE II BER S OF THE PROPOSED METHOD AND THE METHODS IN [32]-[34] UNDER COMMON ATTACKS , Attacks

β=0

Proposed method β = 0.1 β = 0.2

WHERE

Method in [32]

Method in [33]

Method in [34]

Closed-loop

0.006

0.004

0

0.050

0

0.074

Median filtering JPEG 10% JPEG 30% Gaussian noise Salt & pepper noise

0.056 0.201 0.007 0.027 0.006

0.035 0.146 0.005 0.008 0.004

0.016 0.112 0 0.001 0

0.091 0.212 0.061 0.071 0.050

0 0.490 0.485 0.473 0

0.118 0.245 0.199 0.146 0.074

Rotation 10◦ Rotation 25◦ Scaling 80% Scaling 120% Shearing 10% Cropping 10% Cropping 20% Cropping 30% Global bending HF bending Jittering 1%

0.025 0.033 0.028 0.027 0.024 0.006 0.009 0.019 0.025 0.008 0.007

0.014 0.020 0.015 0.011 0.020 0.005 0.006 0.011 0.011 0.005 0.005

0.003 0.005 0.007 0 0.004 0.004 0.004 0.006 0.001 0.004 0.004

0.075 0.083 0.073 0.060 0.064 0.060 0.138 0.142 0.071 0.050 0.053

0.267 0.307 0.283 0.240 0.039 0 0 0 0.056 0 0

0.228 0.230 0.234 0.204 0.206 0.053 0.073 0.076 0.197 0.083 0.076

LW = 25.

TABLE III BER S OF THE PROPOSED METHOD AND THE METHODS IN [32]-[34] UNDER COMMON ATTACKS , WHERE LW = LG . Attacks

β=0

Proposed method β = 0.1 β = 0.2

Method in [32]

Method in [33]

Method in [34]

Closed-loop

0.020

0.009

0.006

0.066

0

0.081

Median filtering JPEG 10% JPEG 30% Gaussian noise Salt & pepper noise

0.099 0.236 0.033 0.058 0.023

0.059 0.176 0.015 0.020 0.008

0.032 0.113 0.012 0.011 0.004

0.112 0.244 0.082 0.096 0.066

0 0.493 0.496 0.476 0

0.121 0.251 0.232 0.189 0.086

Rotation 10◦ Rotation 25◦ Scaling 80% Scaling 120% Shearing 10% Cropping 10% Cropping 20% Cropping 30% Global bending HF bending Jittering 1%

0.071 0.073 0.066 0.065 0.059 0.029 0.036 0.048 0.061 0.035 0.031

0.031 0.035 0.032 0.031 0.024 0.013 0.017 0.024 0.029 0.015 0.012

0.021 0.025 0.023 0.013 0.010 0.004 0.005 0.010 0.014 0.007 0.007

0.091 0.106 0.093 0.080 0.084 0.080 0.160 0.162 0.091 0.069 0.068

0.268 0.310 0.297 0.241 0.062 0 0 0 0.071 0 0

0.237 0.251 0.249 0.226 0.242 0.073 0.083 0.083 0.227 0.087 0.086

image. To tackle geometric attacks (including cropping attacks and RBAs), a histogram-shape-related index is utilized to form and select the most suitable pixel groups for watermark embedding. Besides, a safe band is introduced between the selected pixel groups and the non-selected pixel groups to improve robustness to geometric attacks. Furthermore, a novel HFCM scheme is proposed to compensate the side effect of Gaussian filtering, which further enhances robustness. Due to the usage of secret key, the proposed watermarking method is also secure. The superior performance of the proposed method is demonstrated by simulation results. A PPENDIX I A PPROXIMATE INSENSITIVITY OF g(i) TO COMMON

respectively. The parameters hG (i), NS and g(i) related to the host image I are defined in (8)-(10), respectively. For the image undergone geometric attacks, the corresponding symbols and parameters are I 0 , N 0 , R0 , C 0 , h0G (i), NS0 and g 0 (i), respectively. Note that after geometric attacks, an interpolation process is generally needed for images [38]. The interpolation process will introduce interpolation errors such as aliasing, blocking and blurring [38]. However, in order to maintain the perceptual quality of the attacked image to an acceptable level, the errors introduced by interpolations are often small [32] and thus will be ignored in the following analysis. We first consider scaling attack with scaling factor a1 and a2 in vertical and horizontal directions, respectively. After scaling, R0 = a1 R and C 0 = a2 C, which lead to

GEOMETRIC ATTACKS

Assume that the host image I has N = RC pixels, where R and C denote the number of rows and columns of the image,

N0

= R0 C 0 = a1 a2 N.

1051-8215 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

(35)

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCSVT.2014.2363743, IEEE Transactions on Circuits and Systems for Video Technology 12

(a) 1 PSNR SSIM

50

45 0.9 40 0.85

35

0

0.8 0.2

0.4

β

0.6

0.8

1

0.8

1

(b) 0.09 0.08

Closed−loop Gaussian noise

0.07

BER

0.06 0.05 0.04 0.03 0.02 0.01 0 0

0.2

0.4

β

0.6

Fig. 9. Impact of β on perceptual quality and robustness, where LW = LG and α = 0.15. (a) PSNR and SSIM versus β; (b) BER versus β under closed-loop and Gaussian noise (variance 10) attacks.

Similar to (35), we can get h0G (i) = a1 a2 hG (i)

(36)

and NS0 = a1 a2 NS .

(37)

Considering the definition of g(i) together with (36) and (37), we have g 0 (i) = = =

h0G (i) NS0 a1 a2 hG (i) a1 a2 NS g(i).

(38)

This means that g(i) is invariant to scaling attacks. Then we consider rotation, translation, shearing and affine attacks. It is obvious that rotation and translation only make the pixels change their positions. Under a shearing attack, the image will be stretched, which will only shift the positions of

SSIM

PSNR (dB)

0.95

the pixels. Similarly, affine transform also only causes pixel position displacements as it can be regarded as a combination of scaling, rotation, translation and shearing attacks. Since g(i) is dependent on the histogram shape and the latter is irrelevant to the pixel positions, g(i) is insensitive to rotation, translation, shearing and affine attacks. Under a cropping attack, parts of the host image will be removed, which can cause de-synchronization problem for many watermarking schemes [30]. Strictly speaking, the degree of insensitivity of g(i) under cropping attacks depends on the image, the cropped area and the function for selecting the S gray levels. Although the “invariance” property of the histogram shape of an image to cropping attacks is an approximate invariance, it holds reasonably well for most natural images in practice. We tested the insensitivity of g(i) under 30% cropping attack by using 100 randomly selected images, including street view, satellite, plant, animal, biomedical, portrait, landscape, geological, underwater, architecture, military, indoor and kitchenware images, from the online image database http://decsai.ugr.es/cvg/index2.php. For each image, we selected the 25 largest g(i) values. Thus, we considered 2500 g(i) values in total. We computed the average absolute relative variations of g(i) caused by 30% cropping. The average absolute relative variations of g(i) under 30% cropping is 5.3%, which is relatively small. In the proposed method, a safe band which will be introduced in the encoding process later is 15%, much greater than 5.3%, so our method is robust to cropping attacks. An RBA normally uses different transformation parameters for different parts of the image. So, it is very difficult to estimate the transformation parameters used in the attack and apply inverse transformations to the attacked image. The main effect of an RBA on an image is local warping, which is a kind of pixel position shifting. Since histogram shape is invariant to pixel positions, g(i) is insensitive to RBAs. ACKNOWLEDGMENT The authors would like to thank the editor and anonymous reviewers for their valuable comments and suggestions that improved the quality of the paper. R EFERENCES [1] Y. Xiang, I. Natgunanathan, D. Peng, W. Zhou, and S. Yu, “A dualchannel time-spread echo method for audio watermarking,” IEEE Trans. Information Forensics and Security, vol. 7, no. 2, pp. 383–392, Apr. 2012. [2] Y. Xiang, D. Peng, I. Natgunanathan, and W. Zhou, “Effective pseudonoise sequence and decoding function for imperceptibility and robustness enhancement in time-spread echo-based audio watermarking,” IEEE Trans. Multimedia, vol. 13, no. 1, pp. 2–13, Feb. 2011. [3] N. K. Kalantari, S. M. Ahadi, and M. Vafadust, “A robust image watermarking in the ridgelet domain using universally optimum decoder,” IEEE Trans. Circuits Syst. Video Technol., vol. 20, no. 3, pp. 396–406, Mar. 2010. [4] E. Nezhadarya, Z. J. Wang, and R. K. Ward, “Robust image watermarking based on multiscale gradient direction quantization,” IEEE Trans. Inf. Forensics Security, vol. 6, no. 4, pp. 1200–1213, Dec. 2011. [5] L. Wang, H. Ling, F. Zou, and Z. Lu, “Real-time compressed-domain video watermarking resistance to geometric distortions,” IEEE MultiMedia, vol. 19, no. 1, pp. 70–79, Jan. 2012. [6] M. Lee, K. Kim, and H. Lee, “Digital cinema watermarking for estimating the position of the pirate,” IEEE Trans. Multimedia, vol. 12, no. 7, pp. 605–621, Nov. 2010.

1051-8215 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCSVT.2014.2363743, IEEE Transactions on Circuits and Systems for Video Technology 13

[7] N. Johnson, Z. Duric, and S. Jajodia, “Recovery of watermarks from distorted images,” in Proc. 3rd Int. Workshop Inf. Hiding, 1999, pp. 318–332. [8] F. Davoine, “Triangular meshes: A solution to resist to geometric distortions based watermark-removal softwares,” in Proc. EURASIP Signal Process. Conf., 2000, pp. 493–496. [9] P. Dong, J. Brankov, N. Galatsanos, and Y. Yang, “Geometric robust watermarking based on a new mesh model correction approach,” in Proc. IEEE Int. Conf. Image Process., 2002, pp. 493–496. [10] H. Zhang, H. Z. Shu, G. Coatrieux, J. Zhu, Q. M. J. Wu, Y. Zhang, H. Q. Zhu, and L. M. Luo, “Affine legendre moment invariants for image watermarking robust to geometric distortions,” IEEE Trans. Image Process., vol. 20, no. 8, pp. 2189–2199, Aug. 2011. [11] J. Lichtenauer, I. Setyawan, T. Kalker, and R. Lagendijk, “Exhaustive geometrical search and the false positive watermark detection probability,” in Proc. SPIE, vol. 5020, Security and Watermarking of Multimedia Contents V, 2003, pp. 203–214. [12] M. Barni, “Effectiveness of exhaustive search and template matching against watermark desynchrnizaion,” IEEE Signal Process. Lett., vol. 12, no. 2, pp. 158–161, Feb. 2005. [13] J. J. K. Ruanaidh and T. Pun, “Rotation, scale and translation invariant spread spectrum digital image watermarking,” Signal Process., vol. 66, no. 3, pp. 303–317, May 1998. [14] C. Y. Lin, J. A. Bloom, I. J. Cox, M. L. Miller, and Y. M. Lui, “Rotation, scale, and translation resilient watermarking for images,” IEEE Trans. Image Process., vol. 10, no. 5, pp. 767–782, May 2001. [15] X. Kang, J. Huang, Y. Shi, and Y. Lin, “A DWT-DFT composite watermarking scheme robust to both affine transform and JPEG compression,” IEEE Trans. Circuits Syst. Video Technol., vol. 13, no. 8, pp. 776–786, Aug. 2003. [16] X. Kang, J. Huang, and Z. Wenjun, “Efficient general print-scanning resilient data hiding based on uniform log-polar mapping,” IEEE Trans. Inf. Forensics Security, vol. 5, no. 1, pp. 1–12, Mar. 2010. [17] D. Rosiyadi, S. J. Horng, P. Fan, X. Wang, M. K. Khan, and P. Yi, “Copyright protection for e-government document images,” IEEE Trans. Multimedia, vol. 19, no. 3, pp. 62–73, Jul.-Sept. 2013. [18] S. Pereira, J. J. K. Ruanaidh, F. Deguillaume, G. Csurka, and T. Pun, “Template based recovery of Fourier-based watermarks using log-polar and log-log maps,” in Proc. Int. Conf. Multimedia Computing Systems, Special Session Multimedia Data Security Watermarking, Jul. 1999, pp. 870–874. [19] S. Pereira and T. Pun, “Robust template matching for affine resistant image watermarks,” IEEE Trans. Image Process., vol. 9, no. 6, pp. 1123– 1129, Jun. 2000. [20] S. Pereira and T. Pun, “An iterative template matching algorithm using the Chirp-Z transform for digital image watermarking,” Pattern Recognit., vol. 33, no. 1, pp. 173–175, Jan. 2000. [21] V. Licks and R. Jordan, “Geometric attacks on image watermarking systems,” IEEE MultiMedia, vol. 12, no. 3, pp. 68–78, Sept. 2005. [22] M. Alvarez-Rodriguez and F. Perez-Gonzalez, “Analysis of pilot-based synchronization algorithms for watermarking of still images,” Signal Process.: Image Comm., vol. 17, no. 8, pp. 611–633, Sept. 2002. [23] M. Alghoniemy and A. H. Tewfik, “Geometric invariance in image watermarking,” IEEE Trans. Image Process., vol. 13, no. 2, pp. 145– 153, Feb. 2004. [24] P. Dong, J. G. Brankov, N. P. Galatsanos, Y. Yang, and F. Davoine, “Digital watermarking robust to geometric distortions,” IEEE Trans. Image Process., vol. 14, no. 12, pp. 2140-2150, Dec. 2005. [25] J. S. Seo and C. D. Yoo, “Image watermarking based on invariant regions of scale-space representation,” IEEE Trans. Signal Process., vol. 54, no. 4, pp. 1537–1549, Apr. 2006. [26] J. S. Tsai, W. B. Huang, C. L. Chen, and Y. H. Kuo, “A featurebased digital image watermarking for copyright protection and content authentication,” in Proc. IEEE Int. Conf. Image Process., Sept. 2007, pp. 469–472. [27] X. Gao, C. Deng, X. Li, and D. Tao, “Geometric distortion insensitive image watermarking in affine covariant regions,” IEEE Trans. Syst., Man, and Cybernetics, Part C: Applications and Reviews., vol. 40, no. 3, pp. 278–286, May 2010. [28] Y. T. Lin, C. Y. Huang, and G. C. Lee, “Rotation, scaling, and translation resilient watermarking for images,” IET Image Process., vol. 5, no. 4, pp. 328–340, Jun. 2011. [29] J. S. Tsai, W. B. Huang, and Y. H. Kuo, “On the selection of optimal feature region set for robust digital image watermarking,” IEEE Trans. Image Process., vol. 20, no. 3, pp. 735–743, Mar. 2011.

[30] F. Petitcolas, R. J. Anderson, and M. G. Kuhn, “Attacks on copyright marking systems,” in Proc. Int. Workshop Inf. Hiding, 1998, pp. 218– 238. [31] J. L. Dugelay, S. Roche, C. Rey, and G. Doerr, “Still-image watermarking robust to local geometric distortions,” IEEE Trans. Image Process., vol. 15, no. 9, pp. 2831–2842, Sept. 2006. [32] S. Xiang, H. J. Kim, and J. Huang, “Invariant image watermarking based on statistical features in the low-frequency domain,” IEEE Trans. Circuits Syst. Video Technol., vol. 18, no. 6, pp. 777–790, Jun. 2008. [33] X. Li, B. Guo, and H. Shen, “A new histogram based image watermarking scheme resisting geometric attacks,” Proc. Int. Conf. Inf. Assurance and Security, Aug. 2009, pp. 239–242. [34] C. Deng, X. Gao, X. Li, and D. Tao, “Local histogram based geometric invariant image watermarking,” Signal Process., vol. 90, no. 12, pp. 3256–3264, Dec. 2010. [35] S. W. Jung, L. T. Ha, and S. J. Ko, “A new histogram modification based reversible data hiding algorithm considering the human visual system,” IEEE Signal Process. Lett., vol. 18, no. 2, pp. 95–98, Feb. 2011. [36] G. Coatrieux, W. Pan, N. Cuppens-Boulahia, F. Cuppens, and C. Roux, “Reversible watermarking based on invariant image classification and dynamic histogram shifting,” IEEE Trans. Inf. Forensics Security, vol. 8, no. 1, pp. 111–120, Jan. 2013. [37] C. S. Lu, S. W. Sun, C. Y. Hsu, and P. C. Chang, “Media hashdependent image watermarking resilient against both geometric attacks and estimation attacks based on false positive-oriented detection,” IEEE Trans. Multimedia., vol. 8, no. 4, pp. 668–685, Aug. 2006. [38] P. Thevenaz, T. Blu, and M. Unser, “Image interpolation and resampling,” in Handbook of Medical Image Process. and Anal., N. B. Isaac and I. N. Bankman, Eds., 2nd ed. Burlington, VT: Academic, 2009, ch. 28, pp. 465–493. [39] X. Kang, J. Huang, Y. Shi, and Y. Lin, “Watermarking digital image and video data. A state-of-the-art overview,” IEEE Signal Process. Mag., vol. 17, no. 5, pp. 20–46, Sept. 2000. [40] P. Zhu and P. M. Chirlian, “On critical point detection of digital shapes,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 17, no. 8, pp. 737–748, Aug. 1995. [41] S. Li, J. J. Ahmad, D. Saupe, and C.-C. J. Juo, “An improved DC recovery method from AC coefficients of DCT-transformed images,” in Proc. IEEE Int. Conf. Image Process., Sept. 2010, pp. 2085–2088. [42] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE Trans. Image Process., vol. 13, no. 4, pp. 600–612, Apr. 2004. [43] C. A. Glasbey and K. V. Mardia, “A review of image warping methods,” Journal of Applied Statistics, vol. 25, no. 2, pp. 155–171, 1998. [44] D. Damien, D. Jean-Francois, M. Benoit, and B. Michel, “Compensation of geometrical deformations for watermark extraction in digital cinema application,” in Proc. Security and Watermarking of Multimedia Contents III, Jan. 2001, pp. 149–157. [45] V. Licks, F. Ourique, R. Jordan, and F. Perez-Gonzalez, “The effect of the random jitter attack on the bit error rate performance of spatial domain image watermarking,” in Proc. Int. Conf. Image Process., 2003, pp. 455–458. [46] V. Saxena and J. P. Gupta, “Towards increasing the robustness of image watermarking scheme against histogram equalization attack,” in Proc. 15th Signal Process. and Commun. Appli. Conf., Jun. 2007, pp. 1–4. [47] X. G. Kang, J. W. Huang, and W. J. Zeng, “Improving robustness of quantization-based image watermarking via adaptive receiver,” IEEE Trans. Multimedia, vol. 10, no. 6, pp. 953–959, Oct. 2008.

Tianrui Zong received the B.E. degree in automation science and electrical engineering from BeiHang University, Beijing, China in 2009 and M.Sc. degree in signal processing and communications from the University of Edinburgh, Edinburgh, the United Kingdom in 2010. He is currently a Ph.D. candidate in the School of Information Technology, Deakin University, Melbourne, Australia. He has been a research assistant since 2013 at Deakin University. His research interests include digital watermarking, privacy preservation and image processing and telecommunication.

1051-8215 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCSVT.2014.2363743, IEEE Transactions on Circuits and Systems for Video Technology 14

Yong Xiang (SM’12) received the Ph.D. degree in Electrical and Electronic Engineering from The University of Melbourne, Australia. He is a Professor and the Director of the Artificial Intelligence and Image Processing Research Cluster, School of Information Technology, Deakin University, Australia. His research interests include signal and system estimation, information and network security, multimedia (speech/image/video) processing, and wireless sensor networks. He has served as Program Chair, TPC Chair, Symposium Chair, and Session Chair for a number of international conferences. He currently serves as Associate Editor of IEEE Access. Dr. Xiang is a senior member of the IEEE. Iynkaran Natgunanathan received the B.Sc. Eng. (Hons) degree in electronics and telecommunication engineering from the University of Moratuwa, Sri Lanka in 2007 and the Ph.D. degree from Deakin University, Australia in 2012. From 2006 to 2008, he was a Software Engineer at Millennium Information technology (Pvt) Ltd, Malabe, Sri Lanka. In 2012, he joined the School of Information Technology at Deakin University, Australia, where he was a research assistant from 2012 to 2014 and is currently a postdoctoral research fellow. His research interests include digital watermarking, privacy preservation, psychoacoustics, audio and image processing and telecommunication. Song Guo (SM’11) received the PhD degree in computer science from University of Ottawa, Canada. He is currently a Professor at School of Computer Science and Engineering, the University of Aizu, Japan. His research interests are mainly in the areas of protocol design and performance analysis for computer networks and distributed systems. He has published over 250 papers in refereed journals and conferences in these areas and received three IEEE/ACM best paper awards. Dr. Guo currently serves as Associate Editor of IEEE Transactions on Parallel and Distributed Systems, IEEE Transactions on Emerging Topics in Computing with duties on emerging paradigms in computational communication systems, and on editorial boards of many others. He has also been in organizing and technical committees of numerous international conferences. Dr. Guo is a senior member of the IEEE and the ACM. Wanlei Zhou (M’92–SM’09) received the B.Eng and M.Eng degrees from Harbin Institute of Technology, Harbin, China in 1982 and 1984, respectively, and the Ph.D. degree from The Australian National University, Canberra, Australia, in 1991. He also received the DSc degree from Deakin University, Victoria, Australia in 2002. He is currently the Alfred Deakin Professor, Chair Professor of Information Technology, and the Head of School of Information Technology, Faculty of Science and Technology, Deakin University, Australia. His research interests include distributed and parallel systems, network and system security, mobile computing, bioinformatics and e-learning. Professor Zhou has published more than 200 papers in refereed international journals and refereed international conferences proceedings. Since 1997, he has served as General Chair, Steering Committee Chair, PC Chair, Publication Chair and Session Chair, for more than 20 international conferences.

List of figure captions Fig. 1. Block diagram of watermark embedding process. Fig. 2. Illustration of transferring pixels from Bin 1 to Bin 2, where i = 1, LB = 3, the horizontal axis denotes gray level, and the vertical axis denotes the number of pixels. (a) Original state. (b) Transfer pixels from gray level K4 to gray level K5 . (c) Transfer pixels from gray levels K1 , K2 , K3 to gray level K4 . (d) Final state. Fig. 3. Illustration of a 7 × 7 Gaussian mask. Fig. 4. Illustration of Gaussian filtering. Fig. 5. Block diagram of watermark decoding process. Fig. 6. Top two rows: original images Barbara, Calculator, Conch, Desk, Elaine, Lena, Office, Peppers, Pirate and Venice. Bottom two rows: watermarked counterparts of these images, where LW = LG and β = 0.2. Fig. 7. Illustration of attacks. (a) Original; (b) Median filter; (c) JPEG; (d) Gaussian noise; (e) Salt & pepper noise; (f) Rotation and scaling back; (g) Scaling; (h) Shearing; (i) Cropping; (j) Global bending with factor 10; (k) HF bending (factor 1); (l) Random jittering. Fig. 8. (a) PSNR and SSIM versus α, where LW = 25 and β = 0; (b) BER versus α under common attacks. Fig. 9. Impact of β on perceptual quality and robustness, where LW = LG and α = 0.15. (a) PSNR and SSIM versus β; (b) BER versus β under closed-loop and Gaussian noise (variance 10) attacks.

Gleb Beliakov (M’08–SM’08) received a PhD degree in Physics and Mathematics in Moscow, Russia, in 1992. He worked as a Lecturer and a Research Fellow at Los Andes University, the Universities of Melbourne and South Australia. He is currently a Professor with the School of Information Technology at Deakin University, Australia. His research interests are in the areas of aggregation operators, multivariate approximation, global optimization and decision support systems. He is the author of over 150 research papers and a monograph in the mentioned areas, and a number of software packages. He serves as an Associate Editor of IEEE Transactions on Fuzzy Systems, Fuzzy Sets and Systems and TOP journals.

1051-8215 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.