successive deletion of main sources in acoustic ... - Acoustic Camera

SUCCESSIVE DELETION OF MAIN SOURCES IN ACOUSTIC MAPS WORKING IN THE TIME DOMAIN Dirk Döbler; Ralf Schröder1; Gunnar Heilmann2 1

Society for the promotion of applied computer science (GFaI e.V.) {[email protected]; [email protected]} 2 gfai tech GmbH {[email protected]}

Abstract Beamforming for the computation of acoustic mappings is a well established, fast and practical method in the field of acoustic source localization. A disadvantage of the method is the still relatively low image contrast due to sidelobes in the map caused by the respective array pattern and signal frequencies. Thus, weaker sources may be masked by the stronger ones. Optimizing the array geometry can only yield a limited amount of contrast improvement, and simply increasing the microphone channel number is also limited by cost and handling factors. Therefore, algorithmic contrast improvements in acoustic maps are an ongoing topic of intense research. Many of the algorithms developed for this purpose are very expensive computationally. The paper presents a new and very fast time domain method which successively deletes the dominant sources in the beamforming map and hence uncovers weaker sound sources. The contrast improvements are partially very dramatic and they offer the advantage of an interactive analysis of the acoustic emissions nearly in real time. The properties of the algorithm are discussed, and simulations and application examples are shown. Keywords: acoustic beamforming, contrast improvement, time domain, acoustic source eraser, side lobe suppression

1 Introduction The low contrast of standard beamforming [1] is a limiting factor for the application of the technique. If the differences between the sound levels of sources in an acoustic scene are greater than the maximum contrast of the used microphone arrays (the level distance between main lobe and sidelobes), weaker sources are masked by the sidelobes of the stronger sources. A typical example is illustrated in Figure 1. It shows an acoustic photo of

1

INTERNOISE 2010 │ JUNE 13-16 │ LISBON │ PORTUGAL

five uncorrelated noise sources (frequency domain 100 Hz - 20 kHz) with the levels s1 (90 dB), s2 (80 dB), s3 (70 dB), s4 (60 dB), and s5 (50 dB). The maximum contrast of the array (ring array, 48 channels, sampling rate 48 kHz) for this broadband signal is about 12 dB. Therefore the source s2 (80 dB) is still visible in the mapping, whereas all the other point sources are masked. A separation in the frequency domain is not possible, due to the broadband emission of the sources. For that reason, in [2] and [3] the main sources are eliminated by correlative methods and the contrast of the mapping is improved.

Figure 1 – Acoustic photo of five noise sources (90 dB, 80 dB, 70 dB, 60 dB, 50 dB). This article points out that all hidden sources from Figure 1 can be uncovered by means of a purely subtractive decomposition of the signal.

2 Time domain signal decomposition To start the iterative process, at first only the signal of the strongest source s1 is reconstructed according to equation (1) using standard delay-and-sum beamforming:

1 fˆ ( s1 , t ) = M

M

∑ w f (t − ∆ ) i

i

(1)

i

i =1

This is the estimated time function of the source s1 at the spatial location s1. In equation (1), M denotes the number of microphones in the array, the wi are optional weighting factors, the fi are the time functions of the individual microphone channels, and ∆i is the relative time delay from the focused source location (s1 in our case) to microphone number i. f ( s1, t )

1 M

− ∆i

fˆ ( s1, t )

+

t Figure 2 – Signal reconstruction of source s1 by delay-and-sum beamforming.

2


The process of the estimation of the time function of source s1 by the beamformer is shown in Figure 2. In the ideal case (exact focus point, no noise, no other sources or distortions) this time function would be a perfect estimation of the true source’s time function at location s1. In a second step, this time function is translated now by +∆i and it is subtracted from all microphone channels, as is shown in Figure 3.

fˆ ( s1, t + ∆ i )

−

=

Figure 3 – The reconstructed signal of source s1 is subtracted from all microphones. Under ideal conditions, now the source s1 would be completely removed from the original time signal. From this modified original signal a new acoustic photo is calculated, which does not contain the source s1 and all its related sidelobes. The sidelobes of all the frequency components are automatically taken into account simultaneously because the procedure is working in the time domain. The result is shown in Figure 4. By this process, weaker sources which have been previously masked become visible, hence Figure 4 now already shows sources s2 and s3. By recursively proceeding the proposed method (up to an appropriate recursion depth as stop criterion), all 5 sources can be detected successively, as is demonstrated in Figure 5 to Figure 7.

Figure 4 – The source s1 is deleted.

Figure 5 – Source s2 is deleted as well.

Figure 6 – Sources s1, s2, s3 are deleted.

Figure 7 – Source s5 is unmasked.

3


Of course, in reality the reconstructed time function for a source generated by the beamformer is not completely identical to the original time function of that source. An acoustic photo has a discrete resolution, so that the point for the maximum of a source can not be determined with the required accuracy. Furthermore, the reconstructed time function does also contain signal parts from other sources in the photo. So, in practice, after the subtraction is applied, always some small signal artefacts of the deleted sources will remain in the time functions of the microphone channels. With increasing level reduction during the progress of the method, this results in a reappearance of already deleted sources. But because those signals stem from the residue of exactly the same sources that have already been deleted, they will also reappear at the same locations again, and no additional ghost sources in wrong locations will be created. With the repeated application of the method on these artefact sources, all the reappearing signal parts of the stronger sources can be removed almost completely. Finally, by merging all partial mappings, all sources could be depicted in one mapping.

3 Application example As an application example, in Figure 8 a typical acoustic photo of a money sorting machine is shown. Those automatons are used to sort out foreign currencies, false coins etc. out of a stream of coins at high speed. The acoustic photo has been taken with an older matrix array not optimized for acoustic beamforming; while this array exhibits a slightly better amplitude contrast than a ring array at the same time it has an aliasing pattern which is much worse especially at the higher frequencies. In Figure 8, the image contrast has been increased to 8 dB but still only the loudest source at the coin baffle plate in front of the money outlet is clearly visible. Further increasing the contrast does not show up new sources.

Figure 8 – Acoustic photo of a money sorter, only the loudest source is visible. After deletion of the predominant sound source by the subtractive beamforming method described above, quieter sources will become visible, as can be seen in Figures 9, 10 and 11. In Figure 9, the next loudest sound source uncovered now is generated by the coins dropping into the money collection bag below the outlet, while two additional weaker sources are apparent in Figure 9. Here it is still difficult to decide if these are other true sources or rather just ghost sources due to array aliases of the newly discovered strongest source in Figure 9.

4


Figure 9 – Money sorter with the loudest source deleted, now the money bag appears.

Figure 10 – After deletion of the money bag two additional sources become visible.

Figure 11 – Money sorter, detecting and sorting out a false coin.

5


In Figure 10, again after deletion of the dominant source of Figure 9 it can be clearly seen that the two new spots are not aliasing figures but that they represent real sound sources. The one in the upper left is created by the coins falling down from the band conveyor to the outlet, while the source in the lower part of the image is a single false coin that has been detected by the machine and sorted out at the moment. This coin source is even more present after deletion of the conveyor belt source, as can be seen in Figure 11. The last iteration step, shown in Figure 12, shows two sources obviously exhibiting a dipole character and not emitting in the array direction. These two sources are caused by vibrating surfaces of the metal coverings of the machine. The other emissions are artifacts due to room resonances, reflexions and other machinery running inside the hall.

Figure 12 – Money sorter, two dipole-like sources are detected.

4 Advantages and limitations The advantages in comparison to correlative techniques in the frequency domain mainly result from the simplicity of the proposed new time domain method: •

• • •

It is very fast. If the resolution of the acoustic photo is not too high (up to 1500 points), it is possible to 'erase' sources in real-time. This allows interactive working with the acoustic material and a better understanding of the source mechanisms. No additional reference channel is required. The signal of a reference microphone can however be used instead of the reconstructed time function. No falsification by windowing or averaging errors, which often result from the required estimation of the covariance matrix in the spectral domain. Up to 50 dB contrast improvement under ideal conditions, but about 30 dB seem to be a realistic value for practical applications.

Disadvantages: • •

Correlated sources (reflections, spatially differentiated sources with common stimulation) can not be deleted. The method can not be applied for main sources outside the image field.

6


5 Conclusions The described new source deletion technique allows for an improvement of the contrast in acoustic mappings by means of a purely subtractive signal decomposition. The method works very fast in the time domain and it avoids the time resolution loss and the windowing and averaging errors that result from the usual cross-spectral matrix calculations in the frequency domain. Unlike with the CLEAN-method known from the frequency domain, an explicit computation of the point spread function for every frequency band is not necessary, because all sidelobe effects are taken into account simultaneously because our method is working in the time domain. Under ideal conditions (anechoic room, point sound sources) contrast improvements up to 50 dB can be achieved. Under more realistic assumptions, contrasts of 20 dB to 30 dB are possible. This is still a dramatic improvement compared to the contrast values given by usual array geometries (about 8 dB to 13 dB). The low resource requirements allow for interactive working on acoustic mappings, just like an 'Acoustic Eraser'. However, the correlated signal parts of a sound source, that are spatially separated in the mapping (reflections, sources with a common stimulation), can not be removed with the proposed technique. Ongoing research work is trying to extend the method to coherent sound sources as well and to automate the iterative process of maximum finding, source selection and source deletion until a plausible stopping criterion is reached.

References [1] Johnson, D.H.; Dudgeon, D.E. Array Signal Processing. Concepts and Techniques. PTR Prentice Hall, 1993. [2] Guidati, S. Offenlegungsschrift DE 102008026657 A1: Verfahren und Vorrichtung zur bildgebenden Darstellung von akustischen Objekten. Deutsches Patent- und Markenamt, München, 2008. [3] Kern, M. Ein Beitrag zur Erweiterung von Beamfoming-Methoden. Dissertation, Technische Universität Berlin, 2008. [4] Dougherty, R.P.: Advanced Time-domain Beamforming Techniques. AIAA-Paper 20042955, Proc. of 10th AIAA/CEAS Aeroacoustics Conference, Manchester, UK, May 2004.

7