Computer Science - University of Hull

NEURAL NETWORKS FOR XRAY IMAGE SEGMENTATION

Darryl Davis, Su Linying*, Bernadette Sharp, School of Computing, Staffordshire University, Stafford ST16 0DG, UK, [email protected]

1.

OVERVIEW

The work described here is part of Intelligent Multi-Agent Image Analysis System, which is being developed to promote the automated diagnosis and classification of digital images. Image analysis by content continues to be a challenging problem. Model-based approaches have met with some success in domains where objects can be well described using geometric primitives, but such explicit models are often difficult to obtain. Knowledge-based approaches typically use a set of rules to describe the properties and geometric relationships of each object to be recognized, together with the rules for object-specific recognition strategies that reduce the overall complexity of the interpretation process. However, these approaches require a significant amount of effort to acquire the necessary knowledge base. In most approaches to the image interpretation problem there are two stages, segmentation and region classification. As a matter of fact both these stages may be successfully solved using artificial neural networks. Image segmentation entails the division or separation of the image into regions of similar attribute. The most basic attribute for segmentation is image amplitude, the luminance (gray level) for a monochrome image, such as X-ray film. The real problems we encountered in image segmentation are how these similarities of image attributes could be described in form of quantity. If they can, what independent values can be used to separate the image * Mr. Su Linying is a visiting scholar in this university until March 1999. He is from College of Computer, Inner Mongolia University, China.

regions, and how they can be extracted. That is the problems involved in image feature extraction. Different images’ applications have their own different definitions about the attributes similarities and different approaches to feature extraction. Numerous traditional techniques have been advocated to image segmentation, it is possible to discern three main groupings: statistical classification, edge and boundary detection, and region growing. Each of them has its characteristic strengths and weaknesses [1], and one of the common characteristics is that they need predefined similarities of image attributes for segmentation. They are lack of generality, adaptability and flexibility. Since those methods of segmentation are hard to learn the environment changes, so they are not suitable to co-operate with other process within multi-agent architecture. The application of neural networks for image segmentation simplifies complex images processing in a unified strategy. They do not require any well described and explicit properties or relationship of objects to be recognized. Given some ground-truth exemplars they could learn the general features of use in segmentation. They still could learn how to classify image attributes according to their similarity without predefined categories. As more exemplars are supplied to the neural networks and further training and refinement take place their performances may be promoted. This type of artificial intelligence image segmentation strategy can combine all the capabilities required to segment image in such a manner that progressive steps towards final image segmentation can modify earlier results. The neural networks provide a unique technique to unify definition of similarities of image attributes, image feature extraction, segmentation performance and refinement. The current project addresses how to separate the thigh figure from the background of the X-ray film, to separate hard tissue (bone) from the thigh figure and then to identify irregularities where present. It is possible, using neural networks to deal with the first two processes together, and by building a hybrid network it may be possible to perform all three tasks. Network typologies investigated include Back

Propagation (BP), Counter Propagation (CP), Self-Organizing Feature Map (SOFM) and Bi-directional Associative Memory (BAM). Using these networks as basic components, a hybrid neural network is derived, which takes into account not only local information but also more global distributions of image pixels in order to deal with the inconsistencies of image labeling. A one layer neural network, known as WISARD, is used to validate segmentation performance. When SOFM is used for segmentation, to deal with the inconsistency of image label, we mix input patterns’ position information into the input pattern vectors to rise divergence among them. By Inconsistency of image labeling we mean that same input patterns are mapped to different output labels. This strategy can be applied to other networks by extending the pixel intensity input vectors with position information. We call the input pattern Neighborhood dependent when only neighborhood’s pixel intensity attributes are included. If the input patterns also include positional information we call them Neighborhood-Position dependent. A group of image size independent neural networks, including Back Propagation Networks (BPN), Counter Propagation Networks (CPN), SelfOrganizing Feature Map (SOFM), Bidirectional Associative Memory (BAM) and a image size dependent hybrid network consisting of BPN and SOFM, have been designed here for segmentation, and have been compared on their performance.

2.

SEGMENTATION

A binary label strategy is used to represent the segmentation for figure or hard tissue extraction. Any monochrome graphic may serve as the desired labels of the corresponding sample images. This strategy is used to simplify neural networks’ performance and samples’ labeling. All of the networks except of SOFM, adopt a topological strategy whereby the input and output vector are of the same dimension. These dimensions describe a rectangular region of interest within the input and output images. We may represent this size in two parameters, height (Row)

and width (Col) of the region, thus the size of input layer, as well as the size of output layer, of neural networks is Row x Col. 2.1

Back Propagation Networks

A three layer BPN was built, with image pixel gray level values taken as input values, and data in an open interval (±0.5) generated as the output. Here we use (2.1) as the activation function, most of the output units should be converged around -1/2 and 1/2 in a fully trained network. We take zero as a threshold value, and convert the real-values to binary. Therefore a binary image derived depicting whether the corresponding pixel is inside or outside of process-derived areas. In case of no learning improvement, i.e. where the derivative of the activation function, f′(x)(2.2), reaches nearly to 0, we simply add 0.1 to the derivative, then the f′(x)+0.1 is used in training rule instead of f′(x). f(x) = 1/(1+e-3x) – 0.5 f′(x) = ¾ - 3f(x)2

(2.1) (2.2)

Equation (2.1) is symmetric, it may arise fast learning progress [3]. Equation (2.2) has larger value when x is near 0, it is 0.75, hopefully, and quicker stable state convergence may be achieved. The number of hidden layer units of BPN are variant in this application, according to the size of input and output regions, and the number of training data. For example, if the size of output region is k, we prefer k/2 as the number of hidden units in this network. We also have to satisfy that the number of training data is k2/ε at least with a fraction 1ε of correct classification. There is an extra bias unit in both of hidden and input layer for improvement on the convergence properties of the networks and providing a “threshold” effect on each unit it targets[4]. 2.2

Counter Propagation Networks

A three layer CPN, similar to the BPN described above, was also built. We adopted the winner-takes-all learning rule. A series of image pixels’ gray level values, after normalization, are taken as input vector to the network, and a series of 1’s and 0’s are generated as the output to represent correspondence of pixels to derived features. In this architecture, the weight matrix, 2

which connects input layer to competitive layer, is required to be normalized. After the weights change, the weight matrix is renormalized again. The number of competitive units of CPN, which is relevant to the size of input and output regions, determines the precise of the segmentation performance. The CPN with fewer competitive units has good generality, however coarse boundaries it gives. With large number of competitive units, the CPN can give a more precise mapping at the cost of over-specialization, and large requirement of training data and computation. Let S be the size of input image region, and S is also the size of output since both regions are always same size in this architecture. If the CPN is supposed to label all possible segments, we will at least construct 2s number of competitive units. For example, we take a vertical line from the raw image as the input region, say 480 pixels, there will be 2480 units in the competitive layer. Obviously, it is impractical situation according to current computing capability. In practice, we usually adopt cN as the quantity of competitive units, where c is a small positive constant, such as 3-5, and N is the input (or output) size. We built a CPN in size of 480x1024x480, corresponding to the number of input, competitive, and output units respectively. 2.3

Self-Organizing Feature Map

The segmentation process associated with the use of the SOFM network employs both texture and gray level information using 8-D vectors consisting of the gray level intensity and 7 texture features around the pixel of interest. The texture features are derived from image block moments (2.32.5) by, mpq = ∑(n=1,N) ∑(m=1,M)npmqg(n,m) (2.3) µpq = ∑(n=1,N) ∑(m=1,M)(n-m10/m00)p(mm01/m00)qg(n,m) (2.4) vpq = µpq/µr00

(2.5)

where g(n,m) represents the gray level of the pixel at position , which refers to relative row and column of the pixel in current process region, and r is (p+q)/2. The

size of the region of interest around the processed pixel for texture measurements is varied according to how much neighborhood relationship is desired in segmentation. These moments derived from (2.3)-(2.5) are invariant to variation in image translation, rotation [5]. We call this kind of information employment Neighborhood Dependent, it doesn’t mind where the region is located in the image. To solve the problem of inconsistency of segmentation, we may use Neighborhood-Position Dependent input vectors to feed networks. In this case, the texture information is derived from the amendments to above equations by equation (2.6) and (2.7), respectively. mpq = ∑(n=a,b) ∑(m=c,d)npmqg(n,m) (2.6) µpq = ∑(n=a,b) ∑(m=c,d)(n-m10/m00)p(mm01/m00)qg(n,m) (2.7) In equation (2.6) and (2.7), that means the image region where the pixels are referred to generate texture is refereed (2. in their coordinates of the whole image, from ath to bth columns and cth to dth rows. Since we use Euclidean distance (2. to measure the competition in SOFM learning, it is necessary to normalize the input vectors, and weight features by importance. This prevent features of lesser importance from overriding more important features in the mappings. In this application, we found that the gray level intensities of the pixels are more important than other features to discriminate their classes. Therefore the weighted feature values f1v1,…, f8v8 replace the features both in training and recalling mode. The most important feature associates the largest v, and least important feature associates the smallest v. The size and the radius of the receptive fields of the SOFM are adjustable by architecture parameters. A 16 x 16 size, and a radius of three-quarter of the size are recommended [6]. In addition, every pixel in the training set has associated with it a binary groundtruth label, which defines the pixels’ class. Once an SOFM is trained, then every unit of competitive layer is assigned 1 or 0, according to the majority label of those

3

pixels in the training set which most closely match it. Any particular X-ray image of thigh may now be segmented by extracting the relevant information vector for each of pixels, finding the closest matching unit in the SOFM and assigning to the pixel the label assigned to that unit. This process forms regions in the image, which are then as segmentation. 2.4

Bi-directional Associative Memory

A discrete Bi-directional Associative Memory, which is defined in Hamming space (±1) [7], has also been used to perform image segmentation, as boundary contour of thigh plays a very important role in segmentation, and this contour is usually expressed in binary images. This interpretation straightforwardly unifies boundary contour expression and binary segmentation labels. The requirement that input patterns are orthogonal limits the application of BAM to the area of image segmentation. The lower storage capacities of BAM network may be another restriction to this application. This capacity can be roughly expressed as P, the maximal number of patterns that can be stored as, P = min(n,p)

(2.8)

where n and p represent the dimensions of input and output vector, respectively [8]. In our application we always keep n and p equal, thus the capacity of the BAM network is proportional to the size of the input vector. A BAM network takes a fixed weights connection between input and output vectors, and brings a simple mechanism to add or reduce a pair of association by simply adding or reducing further intranetwork vectors (ypxpt) to or from the matrix. This means it may associate (input and output) features dynamically 2.5

both local and global information during the segmentation. One hybrid (figure 1) consists of a BPN and a SOFM (or SOFMs), where the BPN is applied to the top level of a quadtree representation derived from the raw image to be segmented, and the SOFM generates the boundary detail, when recursively traveling down the quadtree to its bottom (most-detailed), which represents the segmentation labels.

Hybrid Network

Since we have computational limitations of storage and amount of calculation we have to segment image with local information, which raises inconsistency and convergence problems. A simple hybrid of neural networks employed here may break the storage restriction and take account of

BPN S O F M

Figure 1. Hybrid ANN Architecture The BPN takes image global information, resolution reduced raw image, as input vector, and generates the same size binary image, which represents the segmentation. Let pij be a point in current level of the image quadtree representation, if pij is not in the boundary of the segmentation, four points, at , , , , will be generated as pij’s children at direct below level, and they inherit same classes (inside or outside of the process-derived region) from their parent. if pij is in the boundary, then its four children points’ classes will be predicated by the SOFM.

3.

VALUATION

The BPN is probably the most well known and widely used option among the current types of neural networks system available. In order for back propagation networks to be successful for applications, the key issue is that the hidden layer units must be trained to recognize the right sets of features. These features must be sufficiently general, so that the networks can respond correctly, even when its input is different from those it has previously encountered. This goal may be achieved by choosing enough representative raw images and associated labels as training data, appropriate number of hidden units, and suitable training algorithm. Since the BPN in hybrid architecture treats one image as 4

one independent input vector, hence it requires much more exemplars to generalize than single BPN application, where several hundreds training patterns could be collected from a single sample image. Therefore, the hybrid network could give more precise segmentation with qualified noise immunity, but harder to generalize than single BPN’s. The CPN employs competitive hidden layer, acts as a look-up table. The performance is not as good as a BPN, but this network converges faster than BPN. The CPN requires more units in the hidden layer than BPN to be able to adapt to the boundaries between pattern classes. Too few competitive units lead to coarse contours in the boundaries, and over-generalization. SOFM is unique among the typologies we have used so far in that it is a network that organizes its own representation of categories over the input data. All other networks have either had supervised training methods in which the network was taught to recognize an exemplar pattern via adaptive weight changing algorithm (BPN, CPN), or had fixed weights and were unable to learn(BAM). All other networks, except SOFM, we employed here, are multi-points to multipoints mappings, and the mapping take places in disjoint regions, so that the recalling takes less computation than SOFM. However each rout application of SOFM only get one point’s segmentation. Since SOFM adopts winner-takes-all learning rule, similar to CPN and topologic preserved, so its convergence speed is slower than CPN and faster than BPN. SOFM is an unsupervised learning network, and the training data don’t need to be labeled at all. A subsequent label process determines the representation of the mapping. Therefore there is possibility to reuse a trained SOFM by re-labeling without re-training. Discrete BAM may be the simplest network typology so far for the segmentation application. With its fixed weight connection between input and output pattern (2.9), we can easily change the mapping by adding or reducing some associate pairs to or from the weights, W. W = Σ yixit

(2.9)

Unfortunately, this application doesn’t perform well when the exemplar patterns are too close to each other. The interaction between these patterns can result in the creation of spurious stable states. Whatever neural networks are employed to segment image, it is crucial important to collect training exemplars, which should include all generals features of use in segmentation, and distribute in uniform. In fact, we couldn’t enumerate these features, even we may not know what these features are. The approaches to this problem are contained in the domain of Feature Extraction. Except of the SOFM application, which did feature extractions, here we directly use the pixels’ gray level as features to feed the neural networks, actually we leave the feature extraction to the learning process of the networks. The correct classification of this type of features is based on enough exemplars that is supposed to contain all general information uniformly. In practice, we may not have enough exemplars that include all general information, or these exemplars may have unknown distributions. A dynamic process, which collects exemplars and trains the networks dynamically, may be obtained to solve this problem. Some Self-Growing neural networks have the above dynamic property [10]. Once an invalid segmentation happened, The dynamic process begins, some new training data is collected, and retraining is processed.

4.

VALIDATION

Since we may represent the result of segmentation as a binary image, the use of one layer neural network, known as WISARD, to validate the result is possible. This network can be trained with those ground-truth segmentation label images or subsequent reasonable results of segmentation dynamically. Every four pixels of the binary image are considered as a 4-bits address to one of the RAMs, which have 16 units to hold 1 or 0. Before the network being trained, all of the units of each RAMs are initialized to 0. In training process, the corresponding units of every RAMs are assigned 1, these units were chosen by those addresses which were derived from the training image. During 5

recalling, the network reads data from one of the units of each RAMs and sums them to determine if the current binary image (segmentation) is validation. In this project, the network is built on normal size of image, say 480 x 640, so there are 76,800 RAMs(480 x 640/4), that is the maximum of the networks’ output. 95 percent of this amount may be set as a threshold to validate the segmentation. We say that the segmentation is valid if the result of segmentation can reach amount to 76800 x 95% or greater. Computing the centroid curve of the segmentation may derive a more general validation information. The centroid curve is defined on background or soft tissue removed gray level images by a serial of centroids as given by equations (2.10-2.14), Cx = m01/m00

(2.10)

Cy = m10/m00

(2.11)

m00 = ∑(j=y1,y2) ∑(i=x1,x2)g(i,j)

(2.12)

m01 = ∑(j=y1,y2) ∑(i=x1,x2)ig(i,j)

(2.13)

m10 = ∑(j=y1,y2) ∑(i=x1,x2)jg(i,j)

(2.14)

where each point is a centroid over a image region defined by and as a pair of vertices of a rectangle region. Here we choose a vertical line of the image, say x1=x2, as the process region. Figure 2 shows a background removed Xray image of the thigh that was derived by the hybrid network segmentation

T h e C e n t r o id C u r v e

5.

CONCLUSIONS

The work described here related to an ongoing project aimed at developing an intelligent framework for diagnosing and classifying digital X-ray thigh images. Our strategy of using local and global information at various stages of neural network based processing is proving feasible. The percentage of wrong labeling (less than 15% in average), while almost inevitable, may be decreased through increment of the sampling size and neural network capacity. Our results suggest an approach combining the hierarchical hybrid network, for figure-background segmentation, and SOFM, for hard tissue labeling, is to be preferred.

REFERENCES [1] R. Wilson, M. Spann,1988. Image Segmentation and Uncertainty, Letchworth Hertfordshire, England: Research Studies Press Ltd. [2] S. Fahlman,1988. Faster-learning variation on backpropagation: An empirical study, Proc. of the 1988 Connectionist Models Summer School. [3] W. Stornetta, B. Huberman,1987. An improved threelayer back-propagation algorithm, Proc. First IEEE Int’l. Conf. Nerual Networks, (San Diego, CA., June 21-24, 1990) II637-II644. [4] J. Dayhoff,1990. Neural Network Architectures: an introduction, New York: Van Nostrand Reinhold, 6566. [5] W. Pratt,1991. Digital Image Processing, New York: A Wiley-Interscience Publication John Wiley & Sons, INC, 636-646. [6] N. Campbell, B. Thomas, T. Trosclanko,1997. Automatic Segmentation and Classification of Outdoor Images Using Neural Networks, International Journal of Neural Systems, Vol. 8, No. 1, 137-144. [7] J. Freeman, D. Skapura,1992. Neural Networks: Algorithms, Applications, and Programming Techniques, New York: Addison-Wesley Publishing Company, 128-130. [8] A. Maren,1990. Handbook of Neural Computing Applications, San Diego: Academic Press, INC., , 175. [9] R. Boyle, R. Thomas,1988. Computer Vision A First Course, Oxford: Blackwell Scientific Publications, 1018. [10] C.M. Bishop, 1995. Neural Networks for Pattern Recognition, Oxford: Oxford University Press, 357-359

Figure 2. Result Image

application, and its centroid curve.

6