Autoregressive-Model-Based Missing Value ... - Semantic Scholar

8 downloads 8634 Views 351KB Size Report
data, missing values must be estimated before further analysis can take ... M. Charbit is with the Department of Signal and Image Processing, Ecole. Nationale ...
IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE, VOL. 13, NO. 1, JANUARY 2009

131

Autoregressive-Model-Based Missing Value Estimation for DNA Microarray Time Series Data Miew Keen Choong, Member, IEEE, Maurice Charbit, Member, IEEE, and Hong Yan, Fellow, IEEE

Abstract—Missing value estimation is important in DNA microarray data analysis. A number of algorithms have been developed to solve this problem, but they have several limitations. Most existing algorithms are not able to deal with the situation where a particular time point (column) of the data is missing entirely. In this paper, we present an autoregressive-model-based missing value estimation method (ARLSimpute) that takes into account the dynamic property of microarray temporal data and the local similarity structures in the data. ARLSimpute is especially effective for the situation where a particular time point contains many missing values or where the entire time point is missing. Experiment results suggest that our proposed algorithm is an accurate missing value estimator in comparison with other imputation methods on simulated as well as real microarray time series datasets. Index Terms—Autoregressive (AR) model, microarray data analysis, missing value estimation, time series analysis.

I. INTRODUCTION NA MICROARRAY technology that provides information on gene expression levels of thousands of genes under different experiment conditions has gained extensive usage in biological studies. Unfortunately, datasets obtained from DNA microarray experiments often suffer from missing value problems. Diverse reasons lead to this problem, including insufficient resolution, image corruption, technical errors during hybridization, systematic errors on slides, or artifacts on the microarray [1]. Since most algorithms for biological studies require a complete matrix of gene array values in order to analyze the data, missing values must be estimated before further analysis can take place. For a gene expression matrix, as shown in Fig. 1, the simplest method of dealing with missing values is to ignore them, simply flag out these values, replacing them with zeros [2], or compute “row average” or “column average.” However, since these approaches do not consider the correlation structure of the data, they usually produce poor results [3]. Troyanskaya

D

Manuscript received January 30, 2008; revised May 21, 2008 and September 8, 2008. Current version published January 4, 2009. This work was supported in part by the Hong Kong Research Grant Council under Project CityU 122607. M. K. Choong is with the School of Electrical and Information Engineering, University of Sydney, Sydney, N.S.W. 2006, Australia (e-mail: [email protected]). M. Charbit is with the Department of Signal and Image Processing, Ecole Nationale Sup´erieure des T´el´ecommunications, Paris 75634, France (e-mail: [email protected]). H. Yan is with the Department of Electronic Engineering, City University of Hong Kong, Kowloon, Hong Kong, and also with the School of Electrical and Information Engineering, University of Sydney, Sydney, N.S.W. 2006, Australia (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TITB.2008.2007421

Fig. 1.

Gene expression data matrix.

et al. [3] summarize two imputation methods, namely k-nearest neighbors imputation (KNNimpute) and singular value decomposition imputation (SVDimpute), where the former is shown to be outperformed by the latter from the biological viewpoint. The KNNimpute algorithm, which uses the closest k genes to estimate missing values, works well on non time series or noisy data. SVDimpute that takes all gene profile correlation information into consideration yields best results on time series data with low noise levels. The projection onto the convex sets (POCS) algorithm [4] that combines the advantages of SVDimpute and KNNimpute is shown to achieve a better performance. Several other methods have also been recently developed to estimate missing values. Bayesian principal component analysis (BPCA) [5] is shown to perform exceptionally well [6], [7]. However, BPCA is a sophisticated method that is highly dependent on the number of principal axes [6]. The fixed-rank approximation algorithm (FRAA) proposed by Friedland et al. [8] carries out the estimation of all missing entries in the gene expression data matrix simultaneously based on the singular value decomposition (SVD). Local least-squares imputation (LLSimpute) by Kim et al. [9] exploits the local similarity structures in the data and uses the least-squares optimization method to find the missing values that are represented as a linear combination of similar genes. Many imputation methods simply ignore genes with many missing values. These algorithms are useful only if there are a number of genes without any missing values. In the case where all values are missing for a time point (entire column), the KNNimpute and BPCA algorithm will not work. The results of the LLSimpute algorithm will be just like the row average method, whereas results of FRAA will be similar to the zero-imputation method. The row average and zero-imputation method will work in all situations, but their imputation results are very poor. With

1089-7771/$25.00 © 2009 IEEE

132

IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE, VOL. 13, NO. 1, JANUARY 2009

A. Estimation of AR Coefficients First, all the missing values in the dataset are initialized by setting them to zero. Let S = {y1 , . . . , yt , . . . , yn } be a stationary time series that follows an AR model of order p (AR-p). The AR model in matrix form can be described as yj = Yj aj + εj

(1)

or if we use the forward–backward linear prediction method     y[p] y[p − 1] ··· y[1] y[p + 1]  y[p + 2]   y[p + 1] y[p] ··· y[2]         .. .. ..  ..        . . .  .      y[n]   y[n − 1] y[n − 2] · · · y[n − p]      =   y[1]   y[2] y[3] · · · y[p + 1]      y[2]   y[3] y[4] · · · y[p + 2]         .. .. ..  ..       . . .  . Fig. 2.

Flowchart of our algorithm for missing value estimation.

typically short microarray temporal data, simply excluding the corrupted time point will cause the dataset to be even shorter and problematic to analyze. In some studies, i.e., correlation analysis or spectral analysis, imputation of the column that contains no values is important before further analysis can take place. If the missing data are not filled, frequency bias and ripples will occur in the spectrum obtained using a spectral analysis method. As such, having a good imputation method is vital to investigate the biological processes from microarray data. Our motivation here is to propose an algorithm that will be able to perform well when there are many missing values at particular time points, and even when the experiments for time points fail or are missing. In addition, many of the imputation methods disregard the dependency among observations in the analysis of time series. In fact, the temporal profiles of gene expression data may contribute to the accuracy of the missing value imputation. Thus, we introduce an imputation method for microarray time series data that exploits the dependency among observations.

II. THEORY AND METHODOLOGY The algorithm we propose, ARLSimpute, consists of two major processes: in the first one, we assume no missing data and estimate the AR coefficients, and in the second process, we assume that the AR coefficients are known and we estimate the missing data. We repeat the two steps iteratively until convergence is achieved. Fig. 2 shows the flowchart of these steps. The convergence criterion is defined as the difference between the imputed missing data points of the current iteration and of the previous iteration. In this paper, the tolerance is set to 0.005 where the number of iterations is 4–6.

y[n − p]

y[n − p + 1] y[n − p + 2]   a1 a   2  ×  ..  + εj  . 

···

y[n]

(2)

ap where ε is a noise sequence that we assume to be normally distributed, with zero mean and variance σ 2 . The forward–backward linear prediction method is used instead of forward or backward prediction only because this algorithm increases the number of equations to determine the coefficients. If the length of the data model n is small, which is the typical characteristic of microarray data, then the number of AR equations is small, and the linear system in (2) becomes unstable and error-prone. However, there are typically a small number of genes that are strongly correlated, say k. We assume that the strongly correlated genes have the same AR coefficients. We then combine the k coexpressed genes by stacking the vector ycj and matrices Ycj as follows:     yc1 Yc1  .   .  yc =  ..  Yc =  ..  . (3) Y ck y ck This method has been proven to be effective in improving the accuracy of the estimated frequency [10]. With the combination of k genes, we try to find the jointly modeled AR coefficients using a least-square solution based on SVD [10]–[12]. In this way, we can improve the stability and accuracy of the estimated AR coefficients [10] by zeroing the small singular values [13]. In addition, we can capture the gene-wise correlation of the microarray dataset. The coexpressed genes are identified based on Euclidean distance, which has been proven to outperform other similarity measures [3]. Correlation coefficients can also be used in our algorithm. In fact, it is interesting to note that there is a close

CHOONG et al.: AUTOREGRESSIVE-MODEL-BASED MISSING VALUE ESTIMATION FOR DNA MICROARRAY TIME SERIES DATA

relationship between Euclidean distance and correlation coefficients geometrically [14]. There is no theoretical result for determining the value of k optimally [9]. In this paper, we set the value for k as the number of genes with distance values less than the global median of the Euclidean distances of the dataset. In order to have an efficient computation time, while maintaining the quality of the results, we set the lower boundary for the value of k at 10 and the upper boundary at 300.

133

TABLE I DATASETS ANALYZED IN THE PAPER

B. Estimation of Missing Data We assume that the AR-p parameters, i.e., a1 , . . . , ap , and σ 2 are known. In the following, we assume that {y1 , . . . , ys } are the observed data and {x1 , . . . , xm } are the missing data. The indices of x and y are not necessarily consecutive. Then, the log-likelihood, up to the constants that do not depend on the missing data, is given by  2 p n

yt − aj yt−j  = eT e (4) (z) = t=p+1

j =1

where eT denotes the matrix transpose and e = Az

(5)

where z is a column vector that consists of the observed data y and the missing data x, and A is a To¨eplitz matrix whose column number is n and row number is n − p   −ap · · · −a1 1 0 ··· 0  ..   0 −ap · · · −a1 1 0 . . (6) A=   . . .. .. .. .. .. ..   .. . . . . . 0 ··· 0 −ap · · · −a1 1 Now, if we separate the observed data from the missing data and split A in the block matrix, we may write e = Bx + Cy

(7)

where B = [ B1 B2 · · · ] and C = [ C1 C2 · · · ] are block submatrices of A corresponding to the respective locations of observed data y and missing data x. The minimization of the norm of e with respect to x is a classical least-square problem. The solution is given by x = −B # Cy

(8)

where B # denotes the pseudoinverse of B. C. Datasets First, we apply our method to a simulated dataset to test our proposed method. The simulated dataset consists of five different AR processes with the order of 4. There are a total of 3000 genes with 15 time points. We add white noise to the simulated dataset with the signal-to-noise ratio of each time series as 30 dB. We also apply our proposed method to several microarray time series data where the dataset is preprocessed by removing rows that contain missing values, yielding “complete” matrices. The first dataset consists of the study of the intraerythrocytic

developmental cycle (IDC) of Plasmodium falciparum [15] using 3D7 strains. This dataset (referred to as 3D7 hereafter) contains 2626 genes with 53 time points that do not contain any missing values. Another dataset used is the cell cycle of yeast Schizosaccharomyces pombe [16] using cdc25-22 block release containing 3724 genes and 51 samples (referred to as cdc25 hereafter). We also study the HeLa cells experiment using thymidine–nocodazole block (Thy-Noc) [17]. Since there are a large number of cells, we only concentrate on the top 1134 cells identified as a cell cycle by Whitfield et al. [17]. Out of the 1134 cells, 920 genes with 19 samples contain no missing values. The datasets analyzed in this paper are summarized in Table I. D. Performance Measure We use the normalized rms error (NRMSE) as the performance measurement for our method, calculated as   m n  i=1 j =1 [Y˜ (i, j) − Y (i, j)]2 m n (9) NRMSE =  2 i=1 j =1 [Y (i, j)] where Y is the true value, Y˜ is the estimated value, and m and n are the total number of rows and columns, respectively. III. RESULTS AND DISCUSSION The performance of our proposed method is compared with the zero-imputation method, row average method, KNNimpute, LLSimpute (with k-value estimator), BPCA, and FRAA. It is stated in [3] that k of range 10–20 produces the best estimation for the KNNimpute algorithm. As such, we try to simulate all the k values within 10–20 and the best results for KNNimpute are noted. For FRAA, we choose the parameter as proposed by Friedland et al. [8], using 5 as the number of iterations. For the number of significant values for FRAA, we try to obtain the best NRMSE results for the significant values within 2–20. We adopt the KNNimpute algorithm that is available from the bioinformatics toolbox in Matlab 7. All other algorithm codes are available for downloading from the researchers’ respective Web sites. For the datasets tested in this paper, the AR model order is selected as 4. Several criteria, such as the Akaike information criterion, combined information criterion, Bayesian information criterion, can be used for the order selection [18], [19]. Different datasets may have a different optimal prediction order.

134

IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE, VOL. 13, NO. 1, JANUARY 2009

TABLE II NRMSE RESULTS FOR SIMULATED DATA

TABLE III NRMSE RESULTS FOR SIMULATED DATA WHERE THE OVERALL MISSING PERCENTAGES ARE 15% AND 20%

Using the “complete” matrices after removing rows that contain missing values, we test the ability of our algorithm inn situations where a particular column contains many missing values, or even when the column contains no values. We randomly remove 90%, 99%, and 100% (column containing no values) of one, two, and five columns of the dataset. Our missing value method is also tested by randomly removing a certain percentage (5%–20%) of expression values with the condition that 99% of values are missing in two columns of the dataset. A. Simulated Dataset The result of the simulated dataset is shown in Tables II and III. Table II shows the NRMSE of the simulated data where one, two, and five columns of the dataset contain 90%, 99%, and 100% of missing values, respectively. It can be observed that the results of the proposed method are superior compared to existing methods. For cases where an entire time point is missing, KNNimpute and BPCA do not function. The results of the LLSimpute algorithm will be the same as the row-imputation method, whereas results of FRAA will be similar to the zero-imputation method. Our proposed method still works exceptionally well with low NRMSE. Table III shows the results of the tests on the simulated data where the overall missing percentages are 15% and 20% with the condition that 99% of values are missing in two columns of the dataset. Since the simulated data are short, we did not manage to simulate overall missing percentages of less than 15%. The results once again show the superiority of our proposed algorithm in these situations.

B. Real Datasets Tables IV–VI show the performance comparison of the existing algorithms and our proposed method. Our method has lower NRMSE values for all the situations tested except for 90% missing values in five rows for cdc25 and Thy-Noc. For cases where the entire time point is missing, BPCA, LLSimpute, and FRAA will still compute their algorithm intensively as usual. After some intensive calculations, the missing values are not imputed using BPCA, while LLSimpute and FRAA are similar to row-average and zero imputation, respectively. It can be noted that there are cases where the NRMSE decreases in the 100% missing values compared to the 99% missing values. The number of missing values in both cases is very close and the NRMSE does not differ much. Lower NRMSE values for some cases may be due to the bias caused by the remaining 1% of data in the particular time point. Figs. 3 and 4 show the results of the 3D7 and cdc25 datasets, where the overall missing percentages are 5%, 10%, 15%, and 20% with the condition that 99% of values are missing in two columns of the datasets. The horizontal axis denotes the percentage of missing values whereas the vertical axis denotes the NRMSE values. ARLSimpute shows better performance overall. The results are indeed encouraging. KNNimpute does not work in the case of 3D7 where there is no single gene without any missing values, and thus, its performance is omitted in the diagram. Again, for Thy-Noc, since the dataset is short, only the 15% and 20% missing values are tested for and the results are shown in Table VII. The performance of FRAA and KNNimpute may depend on parameter selection. LLSimpute has the automatic k-value estimator and may not have the problem of selecting the correct value of k manually. In our algorithm, we propose the use of the global median of the Euclidean distances of the dataset with a lower boundary of 10 and an upper boundary of 300 to select the appropriate k value for a particular dataset. This may prove to be a simple yet effective k-value selector. Kim et al. [9] report that their method is better than BPCA. However, from the results shown, BPCA outperforms LLSimpute in all of the datasets tested, and this coincides with the investigation of Wong et al. [7]. The aforementioned experiment results show that our algorithm can achieve significantly less error in comparison with KNNimpute, zero imputation, row average method, LLSimpute, BPCA, and FRAA in situations where a particular time point

CHOONG et al.: AUTOREGRESSIVE-MODEL-BASED MISSING VALUE ESTIMATION FOR DNA MICROARRAY TIME SERIES DATA

TABLE IV NRMSE RESULTS FOR THE 3D7 DATA

TABLE V NRMSE RESULTS FOR THE cdc25 Data

TABLE VI NRMSE RESULTS FOR THE THY-NOC DATA

Fig. 3.

Estimation performance (NRMSE) comparison of zero impute, row average, LLSimpute, BPCA, FRAA, and ARLSimpute for the 3D7 data.

135

136

Fig. 4.

IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE, VOL. 13, NO. 1, JANUARY 2009

NRMSE comparison of KNNimpute, zero impute, row average, LLSimpute, BPCA, FRAA, and ARLSimpute for the cdc25 data.

TABLE VII NRMSE RESULTS FOR THE THY-NOC DATA WHERE THE OVERALL MISSING PERCENTAGES ARE 15% AND 20%

estimated by stacking the k genes. The second step is to estimate the missing values using the AR coefficients estimated in the previous step. Our experimental results suggest that ARLSimpute shows competitive results when compared with existing methods. Our proposed algorithm may be useful for missing value estimation, especially when a particular experiment contains many missing values and even when the values are missing entirely.

REFERENCES

contains many missing values, and even when the column contains no values at all. Our method also has several limitations. When the time series is too short (fewer than ten points), (2) will be ill-defined, i.e., not many linear equations are available; so the method cannot be used reliably. Fortunately, most microarray time series are longer than ten time points; so our method is useful in most cases. Our algorithm can only be applied directly to uniformly sampled time series data. Further research work is needed to generalize it for the imputation of nonuniformly sampled data. IV. CONCLUSION We have developed a computer method, ARLSimpute, to solve the problem of microarray missing value estimation for the situation where a particular column contains many missing values, and even when values in an entire column are missing. Our imputation method takes into account the dynamic behavior of the microarray time series data where each observation may depend on prior ones. There are basically two steps involved in our algorithm. The primary step is to select the k most correlated genes for the target gene in the entire dataset in order to capture the gene-wise correlation of the gene array. The selection is based on Euclidean distances among genes. AR coefficients are

[1] Y. H. Yang, M. J. Buckley, S. Dudoit, and T. P. Speed, “Comparison of methods for image analysis in cDNA microarray data,” Dept. Stat., Univ. California, Berkeley, Tech. Rep. 584, 2000. [2] A. A. Alizadeh, M. B. Eisen, R. E. Davis, C. Ma, I. S. Lossos, A. Rosenwald, J. C. Boldrick, H. Sabet, T. Tran, X. Yu, J. I. Powell, L. Yang, G. E. Marti, T. Moore, J. Hudson, L. Lu, D. B. Lewis, R. Tibshirani, G. Sherlock, W. C. Chan, T. C. Greiner, D. D. Weisenburger, J. O. Armitage, R. Warnke, R. Levy, W. Wilson, M. R. Grever, J. C. Byrd, D. Botstein, P. O. Brown, and L. M. Staudt, “Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling,” Nature, vol. 403, pp. 503–511, 2000. [3] O. Troyanskaya, M. Cantor, G. Sherlock, P. Brown, T. Hastie, R. Tibshirani, D. Botstein, and R. B. Altman, “Missing value estimation methods for DNA microarrays,” Bioinformatics, vol. 17, pp. 520–525, 2001. [4] X. Gan, A. W.-C. Liew, and H. Yan, “Microarray missing data imputation based on a set theoretic framework and biological consideration,” Nucl. Acids Res., vol. 34, pp. 1608–1619, 2006. [5] S. Oba, M.-A. Sato, I. Takemasa, M. Monden, K.-I. Matsubara, and S. Ishii, “A Bayesian missing value estimation method for gene expression profile data,” Bioinformatics, vol. 19, pp. 2088–2096, 2003. [6] X. Wang, A. Li, Z. Jiang, and H. Feng, “Missing value estimation for DNA microarray gene expression data by support vector regression imputation and orthogonal coding scheme,” BMC Bioinf., vol. 7, pp. 1–10, 2006. [7] D. S. V. Wong, F. K. Wong, and G. R. Wood, “A multi-stage approach to clustering and imputation of gene expression profiles,” Bioinformatics, vol. 23, pp. 998–1005, 2007. [8] S. Friedland, A. Niknejad, and L. Chihara, “A simultaneous reconstruction of missing data in DNA microarrays,” Linear Algebra Appl., vol. 416, pp. 8–28, 2006. [9] H. Kim, G. H. Golub, and H. Park, “Missing value estimation for DNA microarray gene expression data: Local least squares imputation,” Bioinformatics, vol. 21, pp. 187–198, 2005.

CHOONG et al.: AUTOREGRESSIVE-MODEL-BASED MISSING VALUE ESTIMATION FOR DNA MICROARRAY TIME SERIES DATA

[10] M. K. Choong, D. Levy, and Y. Hong, “Study of microarray time series data based on forward–backward linear prediction and singular value decomposition,” Int. J. Data Mining Bioinf., 2009. [11] D. W. Tufts and R. Kumaresan, “Estimation of frequencies of multiple sinusoids: Making linear prediction perform like maximum likelihood,” Proc. IEEE, vol. 70, no. 9, pp. 975–989, Sep. 1982. [12] D. W. Tufts, R. Kumaresan, and I. Kirsteins, “Data adaptive signal estimation by singular value decomposition of a data matrix,” Proc. IEEE, vol. 70, no. 6, pp. 684–685, Jun. 1982. [13] W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery, Numerical Recipes in C: The Art of Scientific Computing, 2nd ed. Cambridge, U.K.: Cambridge Univ. Press, 1992. [14] H. Yan, “Efficient matching and retrieval of gene expression time series data based on spectral information,” in Lecture Notes in Computer Science, vol. 3482, O. Gervasi, M. L. Gavrilova, and V. Kumar, Eds. New York: Springer-Verlag, 2005, pp. 357–373. [15] M. Llinas, Z. Bozdech, E. D. Wong, A. T. Adai, and J. L. DeRisi, “Comparative whole genome transcriptome analysis of three Plasmodium falciparum strains,” Nucl. Acids Res., vol. 34, pp. 1166–1173, 2006. [16] A. Oliva, A. Rosebrock, F. Ferrezuelo, S. Pyne, H. Chen, S. Skiena, B. Futcher, and J. Leatherwood, “The cell cycle-regulated genes of Schizosaccharomyces pombe,” PLoS Biol., vol. 3, pp. 1239–1260, 2005. [17] M. L. Whitfield, G. Sherlock, A. J. Saldanha, J. I. Murray, C. A. Ball, K. E. Alexander, J. C. Matese, C. M. Perou, M. M. Hurt, P. O. Brown, and D. Botstein, “Identification of genes periodically expressed in the human cell cycle and their expression in tumors,” Mol. Biol. Cell, vol. 13, pp. 1977–2000, 2002. [18] P. Stoica, Spectral Analysis of Signals. Englewood Cliffs, NJ: PrenticeHall, 2005. [19] P. M. T. Broersen, “Finite sample criteria for autoregressive order selection,” IEEE Trans. Signal Process., vol. 48, no. 12, pp. 3550–3558, Dec. 2000.

Miew Keen Choong (M’03) received the B.Eng. (Hons.) degree in electrical engineering from the University of Malaya, Kuala Lumpur, Malaysia, in 2001, and the M.Eng.Sc. degree from Multimedia University, Cyberjaya, Malaysia, in 2004. She is currently working toward the Ph.D. degree at the University of Sydney, Sydney, N.S.W., Australia. She was an R&D Engineer in the R&D Department, OYLE, Shah Alam, Malaysia. In 2003, she joined the Faculty of Engineering, Multimedia University, as a Tutor in 2003, and became a lecturer in 2004. Her current research interests include medical image analysis, signal processing, and bioinformatics. Ms. Choong was a Scholar of Hong Leong Management Company. She is a Member of the Institution of Engineers (IEM), Malaysia, and the Board of Engineers Malaysia (BEM).

137

Maurice Charbit (M’97) received the Ing. degree from the Institut Polytechnique de Grenoble, Grenoble, France, in 1972, and the Ph.D. degree from the University of Paris XI, Paris, France, in 1982. He is currently a Professor at the Ecole Nationale Sup´erieure des T´el´ecommunications, Paris. His current research interests include applied statistics, geolocalization and tracking, array processing, and image processing.

Hong Yan (S’88–M’89–SM’93–F’06) received the Ph.D. degree from Yale University, New Haven, CT. He is currently a Professor of computer engineering at City University of Hong Kong, Kowloon, Hong Kong. He is also an Honorary Professor of electrical and information engineering at the University of Sydney, N.S.W., Australia. His current research interests include image processing, pattern recognition, and bioinformatics. Prof. Yan is a Fellow of the International Association for Pattern Recognition (IAPR) and the Institution of Engineers Australia (IEAust).