Spatial Decomposition Method

I NTRODUCTION As the subjective comparison of concert halls is difficult and often biased with taste matters, researchers have strived to develop objective ways to measure some features of the acoustics. This work has lead to the international standard ISO3382-1:2009. Standard measurements include 6-10 receiver positions in the audience area and a few source positions on stage. The capturing microphone has omni or figure-of-eight directivity. The standard parameters are calculated usually at octave bands from integrating the sound energy decay curves. Recently the ISO33821:2009 has been criticized in many aspects [1, 2]. For example, the algorithms to compute the parameters have uncertainties, the applied frequency range is limited, and an omnidirectional source in the few recommended positions does not correspond to the characteristic sound source in concert halls – a real orchestra. Moreover, it is a generally accepted belief among acousticians that only two parameters are correlating with subjective perceptions, namely Strength (G) correlates with loudness and EDT with reverberance. Other perceptually relevant factors, e.g., intimacy has no corresponding objective parameter [3]. Due to the shortcomings of the standard, we have recently taken a novel approach to measure concert hall acoustics. The sound source is a loudspeaker orchestra, consisting of 34 loudspeakers, that simulate a symphony orchestra. The directivity of the loudspeakers approximates the directivity of real instruments [4]. Originally this loudspeaker orchestra was developed for subjective studies [3, 5]. The spatial impulse responses from the sources are captured with a six-microphone array. Spatial Decomposition Method (SDM) [6] encodes the spatial room impulse response into a set of image-sources or plane-waves for acoustic analysis and auralization purposes. A recent article [7] presents a technique to visualize the temporal development of spatial sound field using SDM encoded data, incorporating the plan and section drawings of the studied hall. In this paper, these recently proposed techniques are briefly overviewed and applied to the measurements of 11 unoccupied European concert halls. Visualizations results are discussed.

M ETHODS This section presents the spatial analysis of the sound field with SDM, and the calculation of the directional energy distribution from the spatial analysis output.

Spatial Decomposition Method A spatial room impulse response is captured with n microphones at locations rn " # R X 4 h n (t) = h(t|rn , x) = h r,n (t) + wn (t), r =0

(1) where wn (t) is the measurement noise, h r,n (t) are individual acoustic events r = 0, . . . , R is the index for each acoustic event, t is time, and x is the source position. The acoustic events can be, for example, the direct sound, discrete reflections, diffractions, or diffuse reflections. The whole impulse response is altered by several acoustic phenomena. A majority of the acoustic events is attenuated according to 1/r-law and affected by air absorption. In addition, the frequency response of an event is altered by the absorption of the surfaces in the enclosure. Moreover, the directivities of the microphones and the sound source have an effect on the impulse response. The SDM analyses the captured spatial room impulse response in small time windows for each discrete time step k, i.e., at every ∆ t = 1/ f s , where f s is the sampling frequency. The direction of a time window is estimated from the time delay of arrival (TDOA) estimates with the

generalized correlation method. The generalized correlation method with direct weighting estimates the TDOA between microphones i and j as follows [8]: © ª τˆ (i,kj) = arg max F −1 { H (ik) (ω)(H (jk) (ω))∗ }(τ) (2) τ

where F −1 {} is the inverse Fourier-transform, H n(k) (ω) are the frequency domain versions of the windowed impulse responses, and ()∗ denotes complex conjugate. For subsample accuracy, each TDOA estimate is interpolated with the exponential fit [9]. The set of instantaneous TDOA estimates is denoted with ) τˆ k = [τˆ (1k,2) , τˆ (1k,3) , . . . , τˆ (Nk− ]T , 1,N

where N is the number of microphones, and the corresponding microphone position difference vectors with V = [r1 − r2 , r1 − r3 , . . . , r N −1 − r N ]T . The least squares solution for the slowness vector is given as [10, p. 75]: ˆ k = V+ τˆ k , m

(3)

where (·)+ is Moore-Penrose pseudo-inverse, and the direction of the arriving sound wave is given ˆ k = −m ˆ k /km ˆ k k. The distance to the plane-wave as the opposite direction of the slowness vector n at discrete time step k is given directly by d k = ck∆ t, where c denotes the speed of sound. These four parameters, the normal vector and the distance of each plane-wave, can be translated to azimuth and elevation angles [θˆ , φˆ ] with standard coordinate transformations. The distance can be omitted from the presentation since it is directly described by the time moment as shown above. Although the sound wave propagation in the real measurements is spherical, due to the small intra-sensor distances it can be treated as a planar. The localization result is therefore a set of plane waves, instead of a set of image-sources. They can be treated in a similar manner in the analysis due to the far-field assumption. The above SDM analysis assigned a location for each time step in the spatial room impulse response. Second, each location is given a pressure value from an omnidirectional impulse response h p , which is one of the microphones of the applied array. Ideally, the pressure value is obtained from an omnidirectional microphone which is in the geometric center of the microphone array. In case the pressure microphone is not in the geometric mean of the array, one has to predict the value of the pressure signal according to the locations of the estimated plane-waves. However, since in this paper the dimensions of the array are small compared to the dimensions of the enclosure and the prediction of the pressure value would require some computational effort due to Fourier-interpolation of the signal, we directly assign pressure at the topmost microphone (see Fig. 1) as the pressure signal, i.e., h p (∆ t) = h 5 (∆ t). In the lateral plane the selected microphone for pressure is in the center of the array. Then, each spatial impulse response is presented by three values [h p (∆ tk), θˆ (∆ tk), φˆ (∆ tk)] at each time moment ∆ tk (see Eq. (4)). As a result, the output of SDM is a monaural impulse response, in which every single sample now also has a location in 3D space. Therefore, the samples of this impulse response can be conceptually represented as a set of image-sources (or plane waves), i.e., image-source locations with the corresponding pressure values: 4

h0l (t|θˆ l (t), φˆ l (t)) = [h l (t), θˆ l (t), φˆ l (t)] = SDM{hl (t)},

(4)

where θˆ and φˆ denote the azimuth and elevation angle estimates, respectively, and hl (t) is a set of impulse responses for loudspeaker channel l, i.e., a spatial impulse response, measured with a microphone array.

TABLE 1: Physical and acoustical parameters for the measured halls. The acoustic parameters are averaged over all measured source positions, 5 receiving positions, and 500 Hz and 1000 Hz octave bands. Measurements for Helsinki Music Centre were conducted in Dec. 2011 and for other halls in Nov. 2012.

Hall

Abbr.

Shape

V [m3 ]

N

G [dB]

EDT [s]

Amsterdam Concertgebouw Wuppertal Stadthalle Vienna Musikverein Berlin Konzerthaus Munich Herkulessaal Brussels Palais des beaux-arts Stuttgart Beethovenhalle Cologne Philharmonie Munich Gasteig Berlin Philharmonie Helsinki Music Centre

AC WS VM BK MH BB SB CP MG BP MT

Shoebox Shoebox Shoebox Shoebox Shoebox Curved shoebox Fan Fan Fan Vineyard Vineyard

18 780 17 000∗ 15 000 15 000 13 590 12 520 16 000 19 000∗ 29 700 21 000 25 000

2 040 1 500 1 680 1 580 1 300 2 150 2 000 2 000 2 400 2 220 1 700

2.8 3.6 4.1 2.7 2.9 3.6 1.8 1.9 1.2 2.1 2.2

2.4 2.6 3.1 2.1 2.1 1.6 2.0 1.6 2.1 1.9 2.0

V: volume, N: number of seats, G: Strength, EDT: early decay time, ∗ estimated.

Directional energy distribution A directional energy distribution is calculated from the SDM output for time integral from t 0 to τ + t 0 : G SDM (θ ) = 10 log10 τ

i2 L Z τ+ t 0l h 1X h0l (t|θˆ l (t) = θ ) dt − G ref , L l =1 t= t0

(5)

for θ ∈ [−π, π) and l = 1, . . . , L are the loudspeaker channels. In our case the number of loudspeaker channels is L=25, and the total number of loudspeakers is 34 (see Fig. 2), since some of the loudspeakers are connected to the same signal channel [4]. The above equation yields the spatial development of the arriving energy over azimuth angles, marginalizing over elevation angles. The corresponding analysis for median plane is calculated by first making the appropriate coordinate transform in the Cartesian coordinate-domain. G ref is the normalization term for a free-field response at 10 m distance averaged over different source channels. t 0 denotes the time instant of the earliest initial direct sound from all sources. As a directional energy distribution in the time-domain, this method does not take into account phase-related phenomena. The time dependency of the directional energy distribution allows to separately investigate the early and the late part of the sound field. In this paper, the directional energy distribution is investigated at 5 ms, 30 ms, 50 ms, 80 ms,100 ms, and 2.5 s.

E XPERIMENTS Eleven European concert halls were measured with the above mentioned loudspeaker orchestra. The naming of the halls and some features are presented in Table 1. A detailed description of the loudspeaker orchestra can be found for example in [4]. Please note that visualization results from Munich Gasteig are not shown for brevity. The impulse responses were measured with the sine-sweep technique from 25 loudspeaker channels to a microphone array, which is of type G.R.A.S. vector intensity probe VI-50. The array consists of 6 microphones on a regularly spaced open sphere, two on each Cartesian coordinate axis. The spacing between two microphones on an axis is 10 cm as shown in Fig. 1. The SDM analysis was performed to each of the spatial impulse responses individually. For each receiver position, the directional energy distributions are first calculated for each individual

100 mm

F IGURE 1: The loudspeaker orchestra (left) and the microphone array (right) applied in the measurements.

loudspeaker and then summed over all the loudspeakers, as in Eq. (5), to achieve the total response of the loudspeaker orchestra. In addition, a 3 degree smoothing window is applied directly to the energy distribution to smoothen the distribution.

R ESULTS , ANALYSIS , AND DISCUSSION The directional energy distributions for all the concert halls in two receiver positions (at 15 and 23 m distances from the first row of loudspeakers) are shown in Figs. 2 - 4. The innermost (gray) and outermost (red) lines correspond to the directional energy distribution at 5 ms and 2.5 s, respectively. The other lines, expanding from the center, correspond to directional energy distribution at 30 ms (thick black line), 50 ms, 80 ms, and 100 ms, in increasing order. The first observation from the visualizations is that the early part of the direction energy distribution has a triangular shape in shoebox-shaped concert halls. The shoebox-shaped halls have more energy in the early sound field. In contrast, in vineyard and fan-shaped halls, this triangularity is missing, and the directional energy distribution of the early sound is shaped more like an ellipse. These visible differences between the hall types depend clearly on the side wall reflections. Whereas the rectangular shape provides early lateral reflections, the generic receiver positions in vineyard-shaped halls are lacking the necessary reflecting surfaces. The difference between 200 ms and final cumulative energies was found to correlate strongly with standard EDT measure [7], indicating the amount of reverberance. Here we approximate the spatial reverberation as the area between the outmost black line (100ms) and the red line (2.5s). Eventhough the non-shoebox-shaped halls are missing the early sound energy from the sides, there is still quite a lot of surrounding reverberation, suggesting potential problems in clarity. The late part of the sound field is circular in most of the halls, although in vineyard and fan-shaped halls it is closer to elliptical and the total power is often weaker than in shoeboxshaped halls. Thus, most shoebox-shaped halls have also quite a lot of enveloping reverberation to accompany the strong and wide early sound. Finally, Fig. 4 shows interestingly that in some halls the geometry of the ceiling introduces energy to the early sound field, indicating strong ceiling reflections. These reflections can sometimes cause image-shift effect. This was also experienced by the authors in-situ with the loudspeaker orchestra.

C ONCLUSIONS AND FUTURE WORK A new method to visualize spatial impulse responses measured from concert halls was applied to ten concert halls. The method illustrates the wide band sound energy distribution in space at different time windows. When mapped over plan and section drawings of the halls the visualizations show intuitively which surfaces reflect the energy of the whole orchestra as the visualizations consists of average energy of a large number of point sources on stage. This paper shows sound energy distribution only in two positions in each hall. The more detailed analysis and visualizations at other receiver positions were left as future work. In addition, future work includes the mapping of these visualizations to perceived differences between measured concert halls. Such work requires quite extensive listening tests and possible the visualizations at different frequency bands. However, the presented approach allows to understand the links between architecture and acoustics, much more intuitively than with ISO3382-1:2009 parameters.

ACKNOWLEDGMENTS The research leading to these results has received funding from the Academy of Finland, project no. [257099] and the European Research Council under the European Community’s Seventh Framework Programme (FP7/2007-2013) / ERC grant agreement no. [203636].

R EFERENCES [1] J. Bradley, “Review of objective room acoustics measures and future needs”, Applied Acoustics 72, 713–720 (2011). [2] L. Kirkegaard and T. Gulsrud, “In search of a new paradigm: How do our parameters and measurement techniques constrain approaches to concert hall design?”, Acoustics Today 7, 7–14 (2011). [3] T. Lokki, J. Pätynen, A. Kuusinen, and S. Tervo, “Disentangling preference ratings of concert hall acoustics using subjective sensory profiles”, Journal of the Acoustical Society of America 132 (2012). [4] J. Pätynen, “A virtual loudspeaker orchestra for studies on concert hall acoustics”, PhD dissertation, Aalto University School of Science, Espoo, Finland (2011). [5] T. Lokki, J. Pätynen, A. Kuusinen, H. Vertanen, and S. Tervo, “Concert hall acoustics assessment with individually elicited attributes”, Journal of the Acoustical Society of America 130, 835–849 (2011). [6] S. Tervo, J. Pätynen, A. Kuusinen, and T. Lokki, “Spatial decomposition method for room impulse responses”, J. Audio Eng. Soc. 1–14 (2013), in Press. [7] J. Pätynen, S. Tervo, A. Kuusinen, and T. Lokki, “Analysis of concert hall acoustics via visualizations of timefrequency and spatiotemporal responses”, J. Acoust. Soc. Am. 12 (2013), in Press. [8] C. Knapp and G. Carter, “The generalized correlation method for estimation of time delay”, IEEE Trans. Acoust., Speech and Signal Proc 24, 320–327 (1976). [9] L. Zhang and X. Wu, “On cross correlation based-discrete time delay estimation”, in IEEE International Conference on Acoustics, Speech, and Signal Processing, volume 4, 981–984 (2005). [10] T. Pirinen, “Confidence scoring of time delay based direction of arrival estimates and a generalization to difference quantities”, Ph.D. thesis, Tampere University of Technology (2009), publication; 854.

6 dB 0 dB −6 dB

6 dB 0 dB −6 dB

x

x

6 dB 0 dB −6 dB

6 dB 0 dB −6 dB

x

x

6 dB 0 dB −6 dB

6 dB 0 dB −6 dB

x

x

6 dB 0 dB −6 dB

6 dB 0 dB −6 dB

x

x

6 dB 0 dB −6 dB

6 dB 0 dB −6 dB

x

x

6 dB 0 dB −6 dB

6 dB 0 dB −6 dB

x

x

F IGURE 2: Directional energy distributions of shoebox-shaped concert halls in positions R3 (left) and R5 (right) in lateral plane.

6 dB 0 dB −6 dB

6 dB 0 dB −6 dB

x

x

6 dB 0 dB −6 dB

6 dB 0 dB −6 dB

x

x

6 dB 0 dB −6 dB

x

6 dB 0 dB −6 dB

x

6 dB 0 dB −6 dB

x

6 dB 0 dB −6 dB

x

F IGURE 3: Directional energy distributions of vineyard and- fan-shaped concert halls in positions R3(left) and R5(right) in lateral plane.

6 dB 0 dB −6 dB

6 dB 0 dB −6 dB

x

x

6 dB 0 dB −6 dB

6 dB 0 dB −6 dB

x

x

6 dB 0 dB −6 dB

6 dB 0 dB −6 dB

x

x

6 dB 0 dB −6 dB

6 dB 0 dB −6 dB

x

x

6 dB 0 dB −6 dB

6 dB 0 dB −6 dB

x

x

6 dB 0 dB −6 dB

6 dB 0 dB −6 dB

x

x

F IGURE 4: Directional energy distributions of selected concert halls in positions R3 (left) and R5 (right) in median plane.