Chemometrics: pushing the limits of advanced ...

3 downloads 0 Views 4MB Size Report
Laboratoire de Spectrochimie Infrarouge et Raman http://lasir.univ-lille1.fr. Chemometrics: pushing the limits of advanced spectroscopy. Ludovic Duponchel.
Laboratoire de Spectrochimie Infrarouge et Raman

Chemometrics: pushing the limits of advanced spectroscopy Ludovic Duponchel Laboratoire de Spectrochimie Infrarouge et Raman UMR CNRS 8516 Université Lille I Villeneuve d’Ascq – France http://lasir.univ-lille1.fr

http://lasir.univ-lille1.fr

We need chemometrics tools !

Modern spectroscopic instrumentation : huge amounts of data.

The complexity of the data is always more important (natural / industrial samples, acquisitions with several spectrometers). Need to develop chemometrics tools to extract a maximum amount of useful information. Chemometrics is way to push the limits of instrumentation.

Outline

The potential of Multivariate Curve Resolution methods – Probing the hydration shell of molecules with Terahertz Spectroscopy – Extending the potential of time-resolved spectroscopy : analysis of photochromic molecules – Exploring single cells with infrared imaging at the SOLEIL French synchrotron Facility. The super-resolution concept : a way to increase the spatial resolution in imaging spectroscopy – Raman analysis of environmental samples : the dust case

Multivariate Curve Resolution

Multivariate curve resolution : most important chemometrics method for the last 20 years.

Objective : “Recovery of the response profile of pure components in an unresolved and unknown mixture obtained from evolutionary processes with no a priori”.

Response profile : spectra, pH profiles, elution profiles …

Multivariate Curve Resolution : the concept Example : capillary electrophoresis with diode array detection (CE-DAD) Simultaneous extraction of pure compound spectra and their corresponding concentration profiles with no prior knowledge

s1

.

= c1

D (t x l)

cn

+E

sn

ST (n x l)

C (t x n)

Experimental data set Detection of n pure contributions Wavelengths

Retention times

Pure concentration profiles

Pure spectral profiles

Case #1 : Probing hydration shell of molecules

Understanding the hydration mechanisms is crucial

Example : solvated-proteins dynamics is highly correlated to the solvent dynamics.

Case #1 : Probing hydration shell of molecules The behavior of water molecules at the vicinity of the solvated molecule is changed. A classical hydration model

« Bound water » Formation of the hydration shell Solvated molecule

B. Born, M. Havenith (2009) J Infrared Milli Terahz Waves 30 1245–1254

« Bulk water »

Case #1 : Probing hydration shell of molecules

Observation: many studies on this topic but many questions remain (hydration-shell extent, hydration-process time scales …). Main idea: by probing directly the hydrogen bond network in water, Terahertz spectroscopy (100GHz-10THz) can give additional knowledge on hydration shell structure. Objective: development of a global methodology (spectroscopic acquisition and chemometrics) in order to probe hydration shell of molecules.

Case #1 : The Terahertz spectral domain

The Terahertz (Thz) spectral domain: a good way to probe low energy bonds.

Problem : huge water absorption, THz measurements on liquid samples are not easy. Performing transmission measurements : necessary to use powerful sources (problem of temperature regulation) or reduce the sample volume. Our solution: development of an original microfluidic THz sensor to characterize hydration of molecules in a chip.

Case #1 : Spectral acquisition in a microfluidic system A Vectorial Network Analyzer measures amplitude and phase of both reflected and transmitted signals (S). Calculation of the pseudo absorbance (pa) for all the spectral domain.

Goubau line

3 mm

Vectorial Network Analyser

6 mm x 1.6 mm

Goubau line Absorbance measurement

Micro chanel (50 mm) Wave Guide

Microfluidic circuit

Injection

Analyse area

Pseudo absorbance

Case #1 : Spectral data observation Feasibility study of the concept: analyze of the ethanol / water system A Savitsky-Golay first derivative applied to the water / ethanol spectral dataset 0.025

Pure water Water 95%, Ethanol 5% Water 90%, Ethanol 10% Water 85%, Ethanol 15% Water 80%, Ethanol 20% Water 75%, Ethanol 25% Water 70%, Ethanol 30% Water 65%, Ethanol 35% Water 55%, Ethanol 45% Water 45%, Ethanol 55% Water 40%, Ethanol 60% Water 35%, Ethanol 65% Water 30%, Ethanol 70% Water 20%, Ethanol 80% Water 15%, Ethanol 85% Water 10%, Ethanol 90% Water 5%, Ethanol 95% Ethanol 100%

0.02

First derivative

0.015 0.01 0.005 0

-0.005 -0.01 -0.015 -0.02 10

20

30

40

50

60

70

80

90

Frequency (GHz)

Variances are located on particular frequency zone.

Case #1 : Univariate spectral data observation 0.025

-3

-0.0105

0.015

@ 42,5 GHz

-0.0115

0.005 0

-0.012

-0.0125

-0.005 -0.01

-0.013

11.5

@ 56,25 GHz

-0.0135 0

20

30

40

50

60

70

80

90

10

20

30

40

50

60

70

80

Alcohol volume ratio

90

100

4 3.5 3 2.5 2

1

0

10

20

30

40

50

60

70

80

Alcohol volume ratio

90

x 10

@ 79,5 GHz

11

4.5

1.5

-0.015 -0.02 10

-3

x 10

5

First derivative

First derivative

-0.011

0.01

5.5

First derivative

0.02

100

10.5

10

9.5

9

8.5

8

0

10

20

30

40

50

60

70

80

90

Alcohol volume ratio

Particular trends are observed on different frequencies. Possible to observe various signal shapes of decrease and increase. Non-correlation between them : different spectral contributions are observed. Univariate observations can’t give us an idea about the real number contributions present in the chemical system. No guarantee that the used frequency is effectively specific to only one compound in the system (responses are often the addition of several contributions).

100

Case #1 : towards a multivariate spectral data analysis

Univariate spectral data analysis gives us a partial vision of the chemical system and very often a biased one.

Our goal: use chemometrics methods in order to retrieve information about the hydration shell in the ethanol / water system using the whole spectral domain with no a priori.

A very appropriate analysis method: multivariate curve resolution based on alternating least squares (MCR-ALS).

Case #1 : the multivariate curve resolution method The MCR-ALS method: • A model-free approach • Based on factor analysis • Unravels the pure contributions of all species in the spectroscopic dataset D (concentration profiles C and corresponding pure spectra S) without requiring chemical information about the underlying physicochemical system. Mathematical bilinear decomposition of the total instrumental response of a system D (n×f) into the product of two simpler matrices C (n×p) and St (p×f). p

f

D n

f p

n

C

f

St n

E

D(n x f) : Spectral data matrix, n = number of ethanol / water mixtures, f = frequencies in the spectral domain. C(n x p) : Matrix of pure concentration profiles, n = number of ethanol / water mixtures, p = mathematical rank of D matrix = number of pure contributions in the chemical system. St(p x f) : Matrix of pure spectral profiles. E(n x f) : Residual matrix (unmodelled variations)

Case #1 : Multivariate Curve Resolution results for water / ethanol system.

0.02

0.9

0.015 0.01 0.005 0 -0.005 -0.01 -0.015 -0.02 10

20

30

40

50

60

70

80

90

0.025

C

0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

0

10

20

30

40

50

St

0.02

First derivative (a.u.)

1

Relative contribution

0.025

60

70

80

90

100

0.015 0.01 0.005 0 -0.005 -0.01 -0.015 -0.02 10

20

30

Alcohol volume ratio

Frequency (GHz)

40

50

60

70

80

90

Frequency (GHz)

St D

C

.

E -4

4

x 10

3

Two species represent the “non-bulk” water i.e. the hydration shell (light and dark blue).

2

1

0

The hydration shell is more complex than the usual well established model.

-1

-2

-3 10

20

30

40

50

60

70

Frequency (GHz)

80

90

Case #2 : Extending the potential of time-resolved spectroscopy

Time-resolved spectroscopy

The recording of spectra at a series of time intervals after the excitation of the system with a light pulse (or other perturbation) of appropriately short duration. _______________________________________________ 1996, 68, 2280 IUPAC Compendium of Chemical Terminology 2nd Edition (1997)

Case #2: Time resolution, few examples Modern spectroscopic techniques (ultrafast lasers) : information about the nature of the excited states, energy transfer, molecular dynamics, molecular environment …

Time-resolved infrared spectroscopy Resolution : 10-9 s Time-resolved pump-probe absorption spectroscopy Resolution : 10-12 s Time-resolved Raman spectroscopy Resolution : 10-15 s

Case #2: Is high time resolution enough? Faster instruments : follow very fast reactions and observe intermediates or transient species Spectral data matrix D

Initial product

Time

Spectral range

Time resolution is not species resolution !

Spectrum #1

Even if we have a high resolution in time, spectra are often representative of mixtures (not pure spectra). Final product

Spectrum #N

Case #2: Time-resolved spectroscopy is not enough Multivariate Curve Resolution : a good way to extract kinetic profiles and spectral signatures of pure species from time-resolved experiments. Spectral data matrix D

Kinetic profiles C Concentration

Spectrum #1

Time

Time

Spectral range

= Spectrum #N

Pure species spectra St Spectral range

. Chemometrics is very often the only way to obtain the spectra of intermediates or transient species because of their very low life time (not stable, impossible to isolate)

Case #2: Extending the potential of time-resolved spectroscopy One of the most studied photochromic molecule : Spirooxazines Photochromism: reversible transformation of a chemical compound between two forms by the absorption of radiation, where the two forms have different absorption spectra.

[ colorless ]

[ blue ]

Transformation occurs in 15 picoseconds ! Objective: understand the reaction pathway (a good way to optimize the photochromic effect of new compounds).

Case #2: Extending the potential of time-resolved spectroscopy

Picosecond pump-probe time resolved absorption spectroscopy in my Lab

Probe Emission ~ 750 nm Ti-Sa Laser

Continuum spectra generator

Emission 400 – 750 nm

80 fs pulse

Monochromator CCD Acquisition board Second harmonic generator

Emission ~ 380 nm

hν1

Activing radiation

Pump

Sprirooxazine Sample

Case #2: Extending the potential of time-resolved spectroscopy Spectroscopic data : first insight

Spectrum number

Times (ps)

750

Times (ps)

400

Spectrum number

General observations A rapid signal decay between 0.8 and 3.4 ps (i.e. Sp. #1 and #20) in the 400-550 nm spectral range. From 3.4 ps, emergence of a new spectral contribution in the 600-750 nm spectral range

Absorbance

Experimental data Spectral range : 400 – 750 nm Time range : 0.8 ps – 100 ps Data set : 46 spectra

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

0,8 0,9 1 1,1 1,2 1,3 1,4 1,5 1,6 1,7 1,8 1,9 2 2,2 2,4 2,6 2,8 3 3,2 3,4 3,6 3,8 4

24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46

4,3 4,6 5 5,6 6,2 7 8 9 10 12 14 16 18 20 25 30 40 50 60 70 80 90 100

Case #2: Extending the potential of time-resolved spectroscopy

Time (ps)

Absorbance

Chemical rank = 4 MCR-ALS

Concentration

Species Resolution by MCR-ALS

400

Proposed photochromic pathway

Wavelength (nm)

750

Case #3: Multivariate Curve Resolution in imaging spectroscopy Imaging Spectroscopy Powerful tool for characterizing the molecular distribution of different chemical compounds in heterogeneous materials. The classical approach From the experimental data cube, selection of a specific wavelength for the compound of interest : generation of a chemical map li chemical map

Case #3: Multivariate Curve Resolution in imaging spectroscopy Drawbacks (too often we forget that) • Necessary to know all compounds present in the analyzed sample. Forgotten compound : overestimations of concentration (non selective spectral ranges = false distribution maps = biased vision of the sample). Unknown compound : impossible to select a specific spectral range (no image) • Very difficult or impossible to find a specific spectral range for each compound due to the complexity of the sample (high number of species) and/or the high bandwidth of the considered spectroscopy (ex : overlaps in IR spectroscopy). What can we do with Multivariate Curve Resolution ?

Case #3: Multivariate Curve Resolution in imaging spectroscopy MCR Methodology for imaging spectrscopy • Unfolding of the experimental data cube. • Rank evaluation of the D matrix • Simultaneous extraction of C and St • Refolding C in order to generate D Spectral data chemical map matrix

Pure compounds Pure compounds concentration matrix Spectral matrix λ

Spectral data unfolding

=

xy

nc xy

y

ST

C nc

l

x

Multivariate curve resolution Extraction with no prior knowledge

l

.

1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1

50

100

150

200

250

300

350

400

450

First pure spectra

Experimental data cube

(molecular characterization)

Refolding of the first pure compound concentration distribution (pixels space)

Color bar = Concentration scale

y

x

Case #3: Multivariate Curve Resolution in imaging spectroscopy Infrared imaging at the SOLEIL French Synchrotron Facility Characterizing single cancer cells (HeLa type) with infrared microspectrometry Visible image Region of interest Size : 50.4 mm x 22.4 mm Step between mapping positions : 0.8 mm

20 mm

Spectral range : 1000-4000 cm-1 4 cm-1 spectral resolution x = 28 y = 63

1764 spectra in the spectral data cube l

Case #3: Multivariate Curve Resolution in imaging spectroscopy MCR-ALS analysis Singular Value Decomposition indicates three spectral contributions 0.35

0.3

D λ

C

ST

nc = 3

λ

0.25

nc = 3

0.2

0.15

=

x  y = 1764

x  y = 1764

0.1

0.05

0

.

0

50

100

150

0

50

100

150

50

100

0.25

0.2

0.15

0.1

0.7

0.05

0.6 0

0.5 0.35

0.4 0.3

0.3 0.25

0.2 0.2

0.1 0.15

0 -0.1

0.1

0

20

1186.7

40

60

80

1710.1/2710.6

100

120

140

160

180

3441.8

0.05

0

0

1186.7

Relative intensity and shift bands (amides)

1710.1/2710.6

150

3441.8

Spectroscopic imaging, a great potential but… Actual trends Desire to analyze smaller samples (ex: nanosciences) or observe more details on bulk samples. A general observation The diffraction limit sets the maximal lateral resolution i.e. the minimal distance d between two objects in order to be considered as resolved (really observed).

d = 0,61.(l / NA) with l the radiation wavelength and NA the numerical aperture of the optical system.

A limited spatial resolution but… Rough approximation Raman (using visible laser): spatial resolution ~ 1 mm. Mid infrared: spatial resolution ~ 10 mm. such spatial resolutions are incompatible with spectroscopic characterization of micronic samples.

the

Solutions to push the diffraction limit? Instrumental solution: Near field spectroscopy. Many drawbacks : very sensitive instrumentation, very costly, difficult to collect photons, average spectral quality.

Toward super-resolution Solutions to push the diffraction limit? Chemometrics solution : develop a super-resolution concept applied to images acquired on our classical far field spectroscopy: go further with our actual instrumentation. Concept: Simultaneous exploitation of several low resolution images of the same object (observed from “different angles”) in order to obtain one higher resolution image. Low resolution image #1 Condition of application: Sub-pixellic shift between low resolution images (image shifts lower than the pixel size of low resolution image).

Shift < x mm

Pixel size = x mm

Low resolution image #2

An application of SR in spatial observation NASA's Viking Mission: the mission objectives were to obtain high resolution images of the Martian surface, characterize the structure and composition of the atmosphere and surface, and search for evidence of life.

Source :

Simultaneous use of 24 shifted images of mars acquired by vicking orbiter 1

Super-resolution

742 m / pixel

186 m / pixel

Super-resolution principle Understand the super-resolution concept: make the link between the ideal High Resolution image (HR) and a Low Resolution image (LR) : the image formation / degradation model. The observation model describes the direct LR image acquisition process by an imaging degradation system. Noise n Blur

Ideal object x HR image

Move Translation Rotation

Optical blur (defocussing, Diffraction limit, optical aberrations)

Point spread function PSF (detector is not a point)

i

Undersampling

(L1,L2)

+

yi ith observed LR image

Super-resolution principle The observation model with some linear algebra and matrices (just 2 slides ) Each low resolution image yi is a shifted, blurred, under-sampled, and noisy version of the x high resolution image. LR image

HR image

yi = [yi,1,yi,2,…,yi,M]T

x = [x1,x2,…,xN]T

N1 pixels × N2 pixels with M = N1 × N2

N1L1 pixels × N2L2 pixels with N = N1L1 × N2L2

yi = Ei.Fi.Mi.x + ni For 1 ≤ i ≤ p Under-sampling matrix

Ei (N11N2 × N1L1N2L2)

(p low resolution images are available) Blur matrix

Fi (N1L1N2L2 × N1L1N2L2)

Shift matrix

Mi (N1L1N2L2 × N1L1N2L2)

Super-resolution principle Generalization to the p available low resolution images:

y1

E1F1M1 .x +

=

yp

n1

EpFpMp

np

y = H.x + n Super-resolution is what we call an « inverse problem » (reciprocal of the modelisation) Objective : retrieve x (the high resolution image) from y (the low resolution images) and an estimation of H matrix

Super-resolution for Raman spectroscopic imaging Chemical analysis of environmental samples : the dust case Visible image of collected dust

Region of interest 1 mm

Micron-size particles

10 mm

Raman micro-imaging The spatial resolution is limited by the laser spot size and photon’s wavelength

Super-resolution for Raman spectroscopic imaging Region of interest 6 x 12 mm 1mm2 Conventional Raman mapping experiment: 1 mm step

7000 6000 5000

Spectral data cube (6 x 12 x l)

4000 3000 2000 1000 0 1200

900

72 Raman spectra

Super-resolution for Raman spectroscopic imaging Chemical analysis of environmental samples : the dust case Generation of chemical maps Detection of 5 chemical contributions

Na2SO4

7000

PbSO4

6000 5000 4000

MCR-ALS

3000

CaSO4,H2O / NaNO3

2000 1000 1200

900

CaSO4,H2O / organic ?

A possible molecular identification but a “poor” spatial resolution for the characterization of such particles

CaSO4,2H2O

MCR-ALS simultaneous extractions

Super-resolution for Raman spectroscopic imaging Sub-pixellic moves between mapping : multiple of 200 nm in x and y direction

Mapping #1 Spectral data cube

MCR-ALS

x y

Mapping #2 Spectral data cube

MCR-ALS

Contrib. #1

Contrib. #1

Contrib. #2

Contrib. #2

Contrib. #3

Contrib. #3

Contrib. #4

Contrib. #4

Contrib. #5

Contrib. #5

Mapping #25 Spectral data cube

Super-resolution for Raman spectroscopic imaging 25 low resolution images for each chemical contribution

Super-resolution images

Superresolution

A possible molecular identification and Sub-micron spatial resolution. Experimental over-sampling doesn’t change the spatial resolution LabRam HR intrinsic spatial resolution : ~ 600 nm (Green laser) Super-resolution: ~ 200 nm !

Conclusions