MP3 robust Audio Watermarking - CiteSeerX

MP3 robust Audio Watermarking Michael Arnold, Sebastian Kanka Department for Security Technology for Graphics and Communication Systems Fraunhofer-Institute for Computer Graphics 64283 Darmstadt, Germany [email protected] This paper presents methods for audio watermarking in the frequency domain. It will consider desired properties and possible applications of watermarking algorithms. Special attention is given to statistical methods relying on the so-called patchwork techniques. It will present a solution of robust watermarking of audio data and reflect the security properties of the technique. Experimental results show good robustness of the approach against MP3 compression and other common signal processing manipulations. Enhancements of the presented methods are discussed.

Introduction The problems associated with copyright protection, i.e. the protection of intellectual property rights (IPR), arise from the transition from analog to the digital data representation. Easy copying with a bit-by-bit reproduction of original and all copies, simple and inexpensive distribution of information across networks with the help of compression mechanisms causes new problems. In the case of audio data personalized audio players are often used to establish a usage control as a kind of IPR protection between service providers and consumer. The control over the data is lost if it emerges from such secured envirionements. An other idea of the protection of IPR, also known as watermarking, is to integrate the information relevant to copyright directly into the data.

Desired features of the algorithms A watermark establishes a link between the raw data and corresponding information which can serve different purposes according to the intended application. Secret watermarks can be used as authentication and content integrity mechanisms in a variety of ways. This implies that the watermark is a secured link readable only by authorized persons with the knowledge about the secret. Removing this kind of watermark should only be possible with a heavy degradation of the carrier signal. Public watermarks act as an information carrier with the watermark readable by everybody. These public watermarks are - in the ideal case - not detectable or removable by a third party. This requirement can be lowered since public watermarks mainly act as information links. Therefore these links do not need to fulfill the same security constraints as the secret watermarks. According to the intention and the kind of watermark, watermarking techniques should possess certain signal, security and general properties.

Signal processing properties The watermark should be not perceivable by an observer, which in turn means that the carrier signal should not be altered in a perceptible way. The watermark should be robust against intentional or anticipated manipulations, e.g. compression, filtering, resampling, requantisation, cropping, scaling, etc. Security properties The watermarking procedure should rely on a key to ensure security, not on the algorithm’s secrecy (Kerckhoffs, A., 1883). The algorithm should be published and therefore be available to all users for review. The watermark should be statistically undetectable. The watermarking procedure should have a mathematical formulation to permit the validation. The coding procedure should be symmetric or asymmetric (in the sense of the public key cryptographic algorithms), according to the application. Robustness against attacks, which use multiple watermarked copies in order to destroy the watermark or to generate a different valid watermark also known as collusion attacks. General properties The algorithm should allow real-time processing (Busch, Funk, & Wolthusen, 1999). The algorithm must be adjustable to different degrees of robustness, quality and different amount of data. The algorithm should be tunable to different media types (Cox, Kilian, Leighton, & Shamoon, 1995). The algorithm should support multiple watermarks [(Piron et al., 1999), (Busch et al., 1999)].

Audio watermarking and applications Copyright protection The natural application of every watermarking algorithm is to establish a reliable link be-

2

MICHAEL ARNOLD, SEBASTIAN KANKA

tween the copyright owner and the material via the watermark. The copyright owner will be authenticated by the knowledge of the secret key to read the secret watermark. Monitoring Embedding a secret watermark to enable the tracing of illegal copying (Busch et al., 1999). Fingerprinting In point-to-point distribution environements - like the internet - information about authenticated customers can be embedded as secret watermarks right before the secure delivery of the data. The embedding of the client identity enables the detection of the head of an unauthorized redistribution chain. Indication of content manipulation The indication of content manipulation (tamper-proofing) from the authorized state could be detected by means of a public watermark. The presented applications above use the watermarking as a security mechanism. The watermark is also useful for applications which benefit from a robust and reliable connection between a mark corresponding raw data. Information carrier A public watermark embedded into the data stream can act as a link to external databases storing information about the copyright and license conditions. This embedded unique registration numbers are for the benefit of the customer.

Embedding the watermark. 1. Map the secret key and the watermark to the seed of a random number generator. Start the generator in order to pseudorandomly select two intermixed subsets A = faigi=1;:::;M and B = fbigi=1;:::;M of equal size M N from the original set. Furthermore we define ~a and ~b as vector of elements from A or B with a total number lower or equal M. 2. Formulate the test hypothesis (H0 ) and alternative hypothesis (H1 ). The appropiate test statistic z will be a function of vectors ~a and ~b 1 with the pdf φ(z) in the unmarked and φm (z) in the marked case: H0 : The test statistic follows the distribution φ(z). The watermark is not embedded in the audio data. H1 : The test statistic follows the distribution φm (z). The watermark is embedded in the audio data. The two kinds of errors are incorporated in hypothesis testing:

Z +∞ I: φ(z)dz = PI T ZT φm (z)dz = PII II: ∞

The large amount of data of a typical audio signal in CD quality (178 kbyte) offers the possibility of using statistical methods relying on large sets. The audio watermarking method presented below can be categorized as follows: It is a statistical method. Embeds a 1-bit watermark. Doesn’t need the original audio stream or additional data to read the watermark. Embeds the same watermark in every time slice of about 1.2 seconds and is therefore a block based approach. The method works in the Fourier domain. The algorithm is symmetric. The algorithm presented below was motivated by the Patchwork approach (Bender, Gruhl, Morimoto, & Lu, 1996). Similiar methods are working in the time domain (Bassia & Pitas, 1998). Our audio watermarking method has been adapted to the frequency domain and does not require the original in order to detect the watermark in contrast to other approaches (Boney, Tewfik, & Hamdy, 1996; Cox et al., 1995). Let us assume a dataset containing 2N values (in our method frequency coefficients of the Fourier domain).

(Type

I error)

(1)

II error)

(2)

3. Define the error probability PI for false detection and calculate threshold T from equation (1). 4. Calculate the vector of parameters ~k 2 K = fki gi=1;:::;2N , for a given error propability PII and threshold T and embedding functions eA ; eB :

Z

T ∞

The watermarking method

(Type

φm ( f (eA (~a;~b;~k); eB (~a;~b;~k)))dz = PII

(3)

5. Alter the selected elements ai 2 A and bi 2 B ; i 1; : : : ; M according to the embedding functions eA ; eB : a0i = eA (~a;~b;~k); b0i = eB (~a;~b;~k); i = 1; : : : ; M

=

(4)

Besides the desired error probabilities the changes have to be distributed in a way, which achieves also the inaudibility. Detection. 1. Map the secret key and watermark to the seed of the random number generator in order to generate the subsets A and B . 2. Decide for the probability of correct rejection 1 PI according to the application and calculate the threshold T from equation (1). 3. Calculate the sample mean E (z) = E ( f (~a0 ;~b0 )) and decide between the two mutually exclusive propositions: H0 : E (z) T watermark not embedded H1 : E (z) > T watermark is embedded

(5) (6)

In subsequent sections we will investigate different test statistics and their applicability to the watermarking problem. 1

z = f (~a;~b) original and z = f (~a0 ;~b0 ) marked data

3

MP3 ROBUST AUDIO WATERMARKING

The embedding function. The embedding functions should introduce relative changes, which are robust against differences in scale: eA = (1 + k)ai; eB = (1 k)bi a0i = (1 + k)ai; b0i = (1 k)bi; i = 1; : : : ; M

(7) (8)

Test statistic I The natural choice for the test statistic is the difference between the population means of A and B rather than their values per se. The standardized test statistic is given by : z = f (~a0 ;~b0 ) =

(9)

H0 : φ(z) = N (0; 1) k(a¯ + b¯ ) H1 : φm (z) = N (z¯m ; 1); z¯m = σˆ a¯0 b¯ 0

(10)

= z1 PII

(12)

leading to k p 4

(z1 PI + z1 PII )ε (z1 PI + z1 PII

)2 ε2

ε :=

p

2σˆ max a¯+b¯ 2

(13)

with σˆ 2max := max(σˆ 2a¯ ; σˆ 2b¯ )

(14)

Test statistic II An extension of the test statistic according to (9) can be made by normalizing the random variable z from equation 0 ¯0 (9) to the mean a¯ +2 b of the whole set A [ B :

z=

f (~a0 ;~b0 ) =

a¯0 b¯ 0 σa¯0 b¯ 0 a¯0 b¯ 0 =2 1 a¯0 +b¯ 0 a¯0 + b¯ 0 2 σa¯0 +b¯ 0

The general approach in quality evaluation is to compare the original signal with the watermarked one for different k. To find the minimum embedding factor, with imperceptible or perceptible, but not annoying differences, between the reference and the watermarked signal, we used the so-called Subjective Diff-Grades (SDG) (Neubauer & Herre, 1998): SDG 0.0 -1.0 -2.0 -3.0 -4.0

Description imperceptible perceptible, but not annoying slightly annoying annoying very annoying

Table 1 Subjective Diff-Grades used in listening tests

(11)

with N (µ; σ2 ) the normal distribution with mean µ and variance σ2 . With the equations (10) and (1) the threshold can be calculated as T = z1 PI . According to the symmetry of the normal distributions φ(z) and φm (z), k can be calculated from T

Features of the algorithm Quality evaluation

a¯0 b¯ 0 σa¯0 b¯ 0

Both sample means can be assumed normally distributed under the Central Limit Theorem, because of the large amount (M 30) of Fourier coefficients. Therefore a¯ 0 b¯ 0 is normally distributed, and one can make the estimations σa¯0 σˆ a¯0 , σb¯ 0 σˆ b¯ 0 and σ2a¯0 b¯ 0 σˆ 2a¯0 b¯ 0 . The independence ¯ which are from the same set of the random variables a¯ and b, A [ B of Fourier coefficients, leads to the approximations a¯ b¯ 0. Therefore we can formulate the two mutually exclusive propositions:

z¯m

Therefore according to equation (15) we expect z to be a random variable with mean 0 in the unmarked and an approximated mean of z¯m 2k in the marked case. In contrast to test statistic I this results in a complicated function not suitable for the calculation of the threshold and embedding parameter. We measured z¯m for different k in order to investigate the equation (15). The results verify the linear relation between z¯m and k, but shows a factor 1:54 of instead of the expected value 2.

(15)

The selected 5 test items were excerpts of length 12s from the SQAM program material. The test pairs presented to the listener consist of the orginal and watermarked signal with different embedding factors k. The listener has to classify the difference in terms of the SDG scale. The result of the subjective quality evaluation were averaged over the number of listeners. The embedding factor k = 0:10 was used for the following robustness tests.

Security The watermark and the marking procedure should also possess certain security properties. The space of distinguishable watermarks should be large enough, which is surely the case, because the number of different watermarks NW = 2M M for M = 256 results in approximately 4:7 10152 watermarks. The probability of false detection of a watermark can be determined by performing the following steps: 1. Determine the number of bits B out of the M bits of the embedded watermark, which gives a false detection. 2. Calculate the probability that i = B; : : : ; M bits match with the embedded bitpattern from: M

∑

i=B

M B

M M B 2M M

(16)

4

MICHAEL ARNOLD, SEBASTIAN KANKA

We used different audio tracks and marked all the tracks with the same watermark and an embedding factor of k = 0:15. Afterwards we constructed 30 different bit patterns with a certain percentage of identical to the original pattern and measured the values for the random variables. The threshold T indicating detection of the watermark were chosen to be half the size of the maximum value z for the embedded watermark. The thresholds are plotted as horizontal lines. For both cases we receive a critical similarity of about 67% with the original watermark.

Filtering To test the robustness against filtering a bandpass filter was applied to the watermarked signal by amplifying the signal by -9 dB in the low and high frequency domain. The cut-off frequencies has been 441 Hz for the low-pass and 4410 Hz for the high-pass filter. This filtering results in audible distortions in comparison to the original signal. Resampling The original audio signals were sampled with a sampling rate of 44.1 kHz. The sampling rate of the watermarked audio data was reduced to 22,05 kHz and resampled to the original rate of 44,1 kHz. According to the Nyquist theorem such a resampling results in an unrecoverable loss of the high frequencies above 11,025 kHz. This causes audible distortions especially in audio tracks carrying high frequencies. Requantization Audio tracks sampled at 8-bit are often used in games and multimedia applications. We therefore tested the process of requantization of a 16-bit watermarked audio signal to 8-bit and back to 16-bit. This increases the incoherent background noise of the audio track due to the rounding errors during the processing.

Figure 1. Security evaluation about the faked watermarks

According to equation (16), the probability of detecting a watermark which isn’t embedded is 10 35 for statistic I.

Robustness against cropping The embedding of the watermark in every time slice of about 1.2 seconds enables the detection of the watermark even in the case of cropping or cutting. The necessary precondition for reading the watermark is to have a contiguous part of at least about 2.5 seconds from the watermarked audio stream. Test results.

Robustness To test the robustness of the presented watermarking algorithm we randomly choose 400 different combinations of watermarks and keys and embedded this bit pattern 10 times into one audio track. All files were 16-bits signed stereo sampled at 44.1 kHz (CD quality). Test conditions. All audio signals were watermarked with a embedding factor of k = 0:10 and subsequently manipulated (Fig. (2)). Audio signal

Measurement of Embedding

Manipulation

Detection random variable z

Figure 2. Test scenario to measure detection values z

To measure the robustness the total error probability Perror = PI + PII due to the overlap of the pdf’s in the unmarked and marked case were calculated from equations (12). MPEG 1 Layer III audio compression The robustness against MPEG 1 audio Layer III compression has been tested by using a compression rate of 128 kbps for the watermarked signal.

Error/%

MPEG 1.22

Filtering 1.47

Resampling 1.29

Requantisation 1.43

Table 2 Total error probabilities for different manipulations

The error probabilities of the several manipulation of the watermarked signal can be easily reduced by using more frequency coefficients. This is equivalent to the embedding to the watermark in a longer time slice.

Possible enhancements of the algorithm The watermarking algorithm presented leaves room for further research in a variety of ways: Further investigation of test statistics and embedding functions to improve probabilities for correct detection and rejection. Integration of a psychoacoustic model to improve the overall quality and robustness against compression. Alteration of the algorithm to embed a bitstream. Embedding of multiple watermarks.

MP3 ROBUST AUDIO WATERMARKING

Improve the robustness against the so-called jitter attack (Petitcolas, Anderson, & Kuhn, 1998). Conclusion The watermarking of audio data is an appropiate mechanism to protect the IPR. We discussed the different kinds of watermarks like private and public, which offer the possibility for a wider range of applications. The presented algorithm is a 1-bit algorithm based on a statistical approach working in the Fourier domain. Different test statistics were presented and investigated. The watermarking techniques were evaluated in terms of quality, security and robustness. The security is guaranteed by the immense number of distinguisible watermarks and the vanishing probability of faking a watermark even if the Fourier coefficients are known to the attacker. Besides attacks relying on the statistical features of the watermark, robustness against signal manipulations were investigated. The algorithm shows good robustness against common signal processing like MPEG 1 Layer III compression, low- and high-pass filtering, resampling, requantization and cropping. Possible enhancements were discussed and are subject of current research.

References Bassia, P., & Pitas, I. (1998). Robust audio watermarking in the time domain. In S. Theodoridis et al. (Eds.), Signal processing IX, theories and applications: proceedings of eusipco-98, ninth european signal processing conference, rhodes, greece, 8–11 september 1998 (pp. 25–28). Patras, Greece: Typorama Editions. Bender, W., Gruhl, D., Morimoto, N., & Lu, A. (1996). Techniques for data hiding. IBM Systems Journal, 35(3&4), 313–336. Boney, L., Tewfik, A. H., & Hamdy, K. N. (1996). Digital watermarks for audio signals. In 1996 ieee int. conf. on multimedia computing and systems (pp. 473–480). Hiroshima, Japan. Busch, C., Funk, W., & Wolthusen, S. (1999). Digital watermarking: From concepts to real-time video applications. IEEE Computer Graphics and Applications, 19(1), 25–35. Cox, I. J., Kilian, J., Leighton, T., & Shamoon, T. (1995). Secure spread spectrum watermarking for multimedia (Technical Report No. 95-10). NEC Research Institute. Kerckhoffs, A. (1883). La cryptographie militaire. Journal des Sciences Militaire, 9th series, 161-191. Neubauer, C., & Herre, J. (1998). Digital watermarking and its influence on audio quality. In Anonymous (Ed.), Proceedings of the 105th convention of the audio engineering society, san francisco, USA 26–29 september, 1998. Petitcolas, F. A. P., Anderson, R. J., & Kuhn, M. G. (1998). Attacks on copyright marking systems. In D. Aucsmith (Ed.), Second international workshop on information hiding, 14–17 april, 1998, portland, oregon, USA (Vol. 1525, pp. 219–239). Berlin, Germany / Heidelberg, Germany / London, UK / etc.: SpringerVerlag. Piron, L., Arnold, M., Kutter, M., Funk, W., Boucqueau, M., & Craven, F. (1999). Octalis benchmarking: comparison of four watermarking techniques. In P. W. Wong & E. J. Delp (Eds.), Security and watermarking of multimedia contents (Vol. 3657, pp. 240–250). San Jose, California.

5