MedCloud : Healthcare Cloud Computing System - IEEE Xplore

9 downloads 18489 Views 828KB Size Report
Cloud computing promises low cost, high scala- bility, availability and disaster recoverability which can be a natural solution for some of the problems faced in ...
The 7th International Conference for Internet Technology and Secured Transactions (ICITST-2012)

MedCloud : Healthcare Cloud Computing System Dalia Sobhy

Yasser El-Sonbaty

Mohamad Abou Elnasr

College of Computer Engineering Arab Academy of Science and Technology and Maritime Transport Alexandria, Egypt, 1029 Email: [email protected]

College of Computer Engineering Arab Academy of Science and Technology and Maritime Transport Alexandria, Egypt, 1029 Email: [email protected]

College of Computer Engineering Arab Academy of Science and Technology and Maritime Transport Alexandria, Egypt, 1029 Email: [email protected]

Abstract—Existing systems for patients’ data storage are not scalable enough for the increasing number of patients and applications. Cloud computing promises low cost, high scalability, availability and disaster recoverability which can be a natural solution for some of the problems faced in storing and analysing patients’ medical records. This paper examines the impact of cloud computing on improving healthcare services. More specifically, this research details the architectural design for a personal health record system called ”MedCloud” that utilizes and integrates services from Hadoop’s[1] ecosystem in conjunction with HIPAA privacy and security rules[2]. A scalable platform is proposed for developers to use in application development and Restlet[3], a web portal, is presented to users, to access the MedCloud system. Later on, the development of the MedCloud model is illustrated through issues analysis followed by an in-depth performance evaluation. Index Terms—Cloud Computing, Hadoop, Hbase, Restlet, HIPAA

I. I NTRODUCTION Healthcare is always a major concern for the community.“Towards smarter IT” is the main slogan for a successful healthcare institution. There is a great need for new strategies to reduce healthcare costs and improve the quality of service. Moreover, IT has positively affected the healthcare sector, it provides more accurate and timely information regarding patient care. [4] Several healthcare providers and insurance companies store the patient’s medical data in the form of electronic medical records (EMR) [5] in centralized databases. From the past, EMRs have been an essential way for storing patients medical records electronically. [6] The problem is that, typically, any patient has various healthcare providers including physicians, specialists, therapists, and other medical practitioners. Further, he may also have different insurance companies. Healthcare providers require a complete vision about a patient’s health status for proper diagnosis, based on the aggregation of all his medical records. It is common that each provider has his own database. Therefore, a healthcare provider may request a patient’s EMR from other healthcare providers. The interoperation and sharing of data among different EMRs is extremely slow. As a result, there is a need for a common place for storing EMRs to accelerate their sharing. [7] To overcome the delay of transferring EMRs back and forth between different healthcare providers, this common place could facilitate and enhance this process more efficiently.

978-1-908320-08/7/$25.00©2012 IEEE

Cloud Computing is a magnificent way to solve this problem. Moving data into the cloud offers a great convenience among users, because they don’t need to care about the complexities of direct hardware management [8]. This will also help the developers to create different healthcare applications sharing the same data, hence saving time of gathering patients’ data from different sources. Currently, cloud computing and open standards can be a significant foundation for streamlining healthcare. They can be used for maintaining medical records, monitoring patients, as well as handling cares and diseases efficiently and analysing patients’ data. It is popularly believed that using clouds to manage and administer healthcare applications will result in a revolutionary change in the way healthcare is done today. Enabling the access to healthcare ubiquitous not only will help us improve healthcare as our data will always be accessible from anywhere at any time, but it will also help cut down the costs drastically. A fundamental step for the success of tapping healthcare into the cloud is the in-depth understanding of how to use cloud computing models effectively. In 2003, federal Health Insurance Portability and Accountability Act (HIPAA) created a national standard for privacy of health information. The entities covered by HIPAA are health care providers, health plans, and healthcare clearing-houses. According to HIPAA, the data records are maintained and transmitted in the form of electric records. The main aim of HIPAA’s Privacy Rule is to guarantee the full protection of the individuals’ health information without any violation with the flow of health information. Moreover, it improves the quality of healthcare and safeguards the people’s health. [2] In this paper, a cloud computing system “MedCloud” is proposed for storing EMRs. The main objective is forming a platform for developers to use instead of individual platforms. This is a good aid for simplifying the development phase. The MedCloud system provides the users with the fundamental services for building an efficient healthcare cloud application. The proposed system uses the Hadoop ecosystem for server implementation. Column-oriented databases running on top of distributed file systems are suitable for data storage and analysis. The users access the system through the prominent web framework Restlet. A detailed description for the system architecture is also provided as the proposed model considers the privacy and security concerns based on HIPAA; where

161

The 7th International Conference for Internet Technology and Secured Transactions (ICITST-2012)

some functional parts of the system were implemented. Finally, convincing performance measures for the system are explained. This paper is divided into four sections: Section 2 discusses the Related Work, followed by Section 3, where a detailed description of the proposed architecture is provided. Section 4 illustrates the implementation and performance evaluation of the proposed model and finally, Section 5 concludes the paper. II. R ELATED W ORK Currently, Amazon Elastic Compute Cloud [9] is one of the most popular cloud platforms. It provides a virtual computing environment for a user to run his application, resulting in improved performance, but is inflexible to sharing data. It also implements the pay-per-time model. Google App Engine [10] is another cloud platform, which allows a user to run web applications written using the Python and Java Programming languages. But it only provides limited function modules. It also provides a Web-based Administration Console for the user to manage running his Web applications, but it is unsuitable for high performance distributed application. [11] In the medical field, various medical software systems were invented worldwide, in an attempt to improve healthcare. Care2X is an open source hospital information system that implements the client-server architecture. Care2X benefits are flexibility, easy handling, a developer could make his own tools, easy to select the different departments and stations, and a great help from the Care2Xs community. A major drawback is there is no real standard between modules and it is still under development and needs a lot of innovation. Furthermore, it has poor documentation and deficient security measures.[12] In [13], the authors proposed an open source private cloud solution for rural healthcare in India. They deployed Care2X on the private cloud. They tightened their application to service only rural areas in case of emergencies. However, if it was applied to all hospitals or patients in a country, it will result in saving patients’ lives at any place at any time. Moreover, some statistics about the most common diseases would help in finding new ways to minimize the vulnerability to these diseases. Although these systems seem to be competent, they cannot achieve high performance computing and complex analysis. A specific-purpose platform; medical, is therefore recommended. Since cloud computing has the potential of becoming a revolutionary technology in the performance of domain-oriented service computing, a cloud-oriented approach would be a good choice for developing a scalable medical system. III. M ED C LOUD : A C LOUD C OMPUTING S YSTEM In this MedCloud first. After illustrated. milestones

section, there is a detailed explanation of the system. The system requirements are addressed that, the building blocks of the system are clearly A sequence diagram is important to show the for users access to the system.

978-1-908320-08/7/$25.00©2012 IEEE

A. System Requirements In this part, we will address the system requirements with respect to HIPAA privacy and security rules [2]. Although HIPAA is an American standard, yet it can be easily applied in various countries. The system features are customizable depending on the country, covered entities and users. 1) Requirements: • The users will be categorized based on their privileges’ level, and the access level will be defined while registration. • The hospital or medical institution will choose the access level for its users from the already defined list of access categories that is set with respect to HIPAA’s privacy and security rules.[2] • The Notice of Privacy Practices (NPP) that contain the policies and regulations of the Privacy Rule[2], should be posted on the system’s website. • Participants could submit a list of authorized persons to the HIO (Health Information Organization), depending on the country. • Any documentation that requires signatures may be provided as a scanned image of the signed documentation or as an electronic document with electronic signature. • Based on HIPAA’s Security rule, covered entities have to apply restrictive security measures. – The medical information system should apply technical policies and procedures, as well as implement the required hardware and software that allow only authorized persons to access e-PHI. – Security data transmission is required. B. Proposed Architecture In this section, the implementation and components of the proposed cloud computing system “MedCloud” are described. In the MedCloud model, domain specific services were built for users; services were selected and designed based on the requirements described before. Users are mainly composed of developers, who use the system for building new applications that benefit from shared data. MedCloud system achieves application agnostic characteristics, where different applications can access the system simultaneously through the web portal. The system is divided mainly into three parts: data layer, server layer, application layer, and the client. With the explosive growth of medical information, the real challenge is how to effectively manage the computation and storage requirements. The Data Layer achieves the function of having efficient data storage. Distributed computing and management of the whole system is the main job of the Server layer. 1) Data Storage Layer: As seen in Fig. 1, there is an EMR store for storing medical information. A Distributed File System is necessary for storing EMRs. It is a file system designed for storing very large files with streaming data access patterns. It also runs on clusters of commodity hardware, i.e. it will continue working even if any node fails, which

162

The 7th International Conference for Internet Technology and Secured Transactions (ICITST-2012) TABLE I PATIENT TABLE Row-key

Column Families info:

(patient ID)

First Name Middle Name Last Name Date Of Birth Gender Marital Status Address Telephone Blood Group

TABLE II V ISIT TABLE Row-key

(visit ID)(patient ID)

Fig. 1.

The architecture of MedCloud

is important for guaranteeing availability feature. But it is not a general purpose file system, and does not provide fast individual record lookups in files. Consequently, a data warehouse for fast record lookups and updates for large tables is required. Applications, such as statistical analysis from EMRs and medical imaging transfer, require terabytes or petabytes storage to fulfil their computational and storage requirements. SQL-like databases are restricted to certain capacities and are unable to handle massive storage needs.[14] This led to the development of horizontally scalable, distributed non-relational data stores, called NoSQL databases. Due to the quick reads and writes, mass storage support, easily expandable, and low cost, the NoSQL data stores best suit the current demands.[15] They are categorized into document stores, key-value pairs, and column-oriented databases.[16] The MedCloud used the column-oriented data stores because it is the best choice for the system. NoSQL data stores are used for data-warehousing applications, high performance of data analysis, and business intelligence processing. In the MedCloud an efficient data warehousing tool was needed, and thus, a column-oriented data store running on top of a distributed file system has the potential of yielding the required high performance. Furthermore, Column-oriented databases are applying the column-families methodology; each table is uniquely identified by one primary key and no foreign keys. [17] A sample design for the patient table and visit table is clearly defined in table I and table II, respectively. In the Patient Table, there is one column family that contains most of the necessary infor-

978-1-908320-08/7/$25.00©2012 IEEE

Column Families info:

cardiac:

AdmissionDate DischargeDate Room num Physician ID

Diagnosis ID LabTest ID Symptom ID Sign ID RiskFactor ID

mation for a patient. There are some mandatory information such as the patient’s ID and full name in the Patient Table and others are optional to provide flexibility in information gathering; because some hospitals and clinics may vary in the information available. In the Visit Table, the visit could be any hospital or clinic visit. There are two column families in the table: one for the basic information in a visit and the other one for information about cardiac diagnosis. If more diagnosis needs to be added to the system, more column families may be added. In table III, some of the services provided by the system are shown; for instance, the addition, deletion, and retrieval of patients’ information. There are also other services to be added to the system. The major pro of this platform is that it is customizable and any services can be added later on without affecting the system’s efficiency. 2) Server Management Layer: In the proposed architecture, master-slave architecture is implemented. The master has two main components: • Query Manager: is a vital element in the system. It accepts the queries from the application layer. It contains the meta-data of the file system, and data locations in the database, required for each query. It also controls systemwide activities such as garbage collection of unused chunks and chunk migration between slaves. • Concurrency Manager: is responsible for managing and distributing the jobs/requests on the slaves by coordinating with the query manager. Moreover, it is used for data replication across the slaves. In the slave part, there are: • Data Storage Manager: is used as a worker for handling the data storage as well as storing the data files in the

163

The 7th International Conference for Internet Technology and Secured Transactions (ICITST-2012) TABLE III S ERVICES TABLE Provided Services

Requirements

Description

AddPatient

All information is required

Add new patient if not found

DeletePatient

Patient’s ID

Delete patient’s data from the database

UpdatePatientByID

Patient’s ID

Update patient’s data if ID is given

UpdatePatientByName

Patient’s name

Update patient’s data if name is given

RetrievePatientByID

Patient’s ID

Show patient’s data if ID is given

RetrievePatientByName

Patient’s name

Preview patient’s data if name is given

RetrievePatientHistory

Patient’s ID, start date, and end date

Show the patient’s medical for a specific period of time

CountPatientsByDiag

Type of diagnosis and current year

Count the number of patients suffering from a particular diagnosis for the current year Count the number of patients suffering from a particular diagnosis for a specific period of time

CountPatientsByDiagYears Type of diagnosis,start date, and end date

distributed file system. Task Manager: is regarded as the slave for the Concurrency Manager. This is because, it is in charge of instantiating and monitoring individual tasks within a job. Coordination Manager: is the key component in the MedCloud system. It handles and manages the requests and responses in case of multi master-slave communication. 3) Application Layer: The function of the entire layer is to provide services for users. An application is a collection of software components (services) that work together to achieve a specific goal. A user could request a service via network access. HTTP [18] stands for Hypertext Transfer Protocol; the standard network protocol under Internet. Therefore, a web framework that uses an HTTP server for services transfer is required for users to use the HTTP technology for passing the service request. The application server accepts this request and compares it with the available services and then replies according to different response strategies. Note that the services are published based on the REST architectural style[19]. For this layer, all the components available run on top of the Web framework. In this part, the function of the application layer is introduced through a number of elements. • Authenticator: is responsible for validating the client’s login details, i.e. logging in and out of the system. Healthcare providers and medical employees are the main users of a healthcare system. Naturally, Employees including •

978-1-908320-08/7/$25.00©2012 IEEE





• • •





administrative staff, clinical staff,management, and IT staff; should be authenticated before accessing EMRs. Authorizer: Since the patients information is private and critical, therefore, HIPAA privacy and security rules are applied within our system. This is to guarantee security metrics and gain the users’ trust. Furthermore, this is a crucial part for distinguishing the user’s permissions and showing the services permitted for this user based on HIPAA privacy rules. A healthcare provider has to assign a person for adding and deleting users to the system. Request Receiver: By calling this service implementation strategy, the Server layer processes the service’s request and the response is then routed back to the user. Data Integrator: checks and validates the input data during any operations. NPP Registry: contains the HIPAA privacy and security policies and regulations. Authorizers Registry: embraces a list of all the authorized users. Once a healthcare provider adds a user, he is automatically added to this registry. Services Registry: comprises all the services provided by the MedCloud system. It is mapped with table III. Any new services are added to the entire registry. Disclosure Tracker: includes all the users or healthcare providers who accessed each patient’s medical records.

C. Issues Analysis The MedCloud framework will provide a set of services that work across all programming models, support storage, file management, monitoring, and security. The developers and end-users are the two kinds of users in the system. Thus, each will use the system in a different way. 1) Application Deployment: The MedCloud system provides the specially crafted software development kit (SDK) for the developers to create their own applications without having to know their insights. After a developer completes his application, he can easily deploy it. As stated before, there is specific platform is created for the developers to use in designing their own applications. The pricing and regulations for usage will be formulated for future use. 2) Service Request: Fig. 2 shows the sequence diagram for the request handling phases for retrievePatient(ID) service. • • •





Step 1,2: The client accesses the system securely through web server. Step 3,4: The authenticator validates the client’s login details. Step 5,6: If valid, the login details are passed to the access controller that sends a list of the granted services for this client based on HIPAA privacy rules. Step 7,8: The client sends retrievePatient(ID) request to the Service Request Handler, which forwards the incoming request to the Patient Server Resource. Step 9: Patient Server Resource contains definitions for the services available. It creates a new hbase client instance.

164

The 7th International Conference for Internet Technology and Secured Transactions (ICITST-2012)

Fig. 3. Fig. 2.







MedCloud Deployment Model

Sequence Diagram for a read request

In the next steps, the request is handled through the communication between hbase client and hbase server, as in steps 10,11. Step 12,13: The hbase client calls the getRow(ID) method that searches for a particular row throughout hbase region servers via hbase master server. Step 14,15: The response either patient requested or “not found” message is sent to the client.

Fig. 4.

Requests per second versus number of machines

IV. I MPLEMENTATION AND P ERFORMANCE E VALUATION In this section, the physical components of the system are described. Fig. 3 shows the deployment model of the MedCloud system. The MedCloud architecture, assigns two nodes as masters for the management of the Hadoop cluster. As for the slaves part, the system is able to add as many slaves as needed, because Hadoop scales linearly.This cluster is implemented using Cloudera’s third release update five (CHD3U5). Cloudera’s package includes Hadoop; used to achieve scalability, and its necessary services. In the practical design, Hadoop is established on 23 cloud servers from RackSpace Open Cloud Company including 3 major nodes for cluster management. The first master node carries Hmaster and Zookeeper that runs on Namenode [1], it performs the tasks of query and concurrency managers. A zookeeper cluster is required, which handles the job of the coordination manager, hence the zookeeper is running on 3 nodes. The second master contains Secondary Namenode, Job Tracker, as well as Zookeeper. The third node runs zookeeper only. As for the data nodes i.e. slaves, 20 nodes are used, where each contains the region server that run on datanode with the task tracker. The data nodes are efficient in data storage management and jobs execution. After adjusting the Hadoop cluster, Thrift was used as the client API for its simplicity and noticeable benefits to communicate with Hbase [20]. After successfully establishing connection with Hbase server, the Restlet Framework (restlet 2.1rc1) is used as the web platform. The machines’ specifications are HP, Intel(R) Core(TM)2 Quad, 2.66GHz processor, 320GB disk storage and 8.00GB RAM for masters. As for slaves same specifications except for disk storage 160GB and physical memory 8.00GB RAM.

978-1-908320-08/7/$25.00©2012 IEEE

The test is attempted to measure the system’s scalability by requesting random reads from Hbase datastore. Apache bench is a good benchmarking tool for measuring the performance. The number of requests per second is computed across increasing number of datanodes. The experiment parameters are 1000 input requests, 100-concurrency rate, and 100Mbps transfer rate. Every time 5 data nodes are added to the cluster, 200,000 records are added to the database as well, for instance 5 datanodes (250,000 record), 10 datanodes (450,000 record), 15 datanodes (650,000 record)...etc. Each record contains 8 rows with 3 copies. As seen in Fig. 4, the requests per second are approximately 200. Although the data is progressively growing, the requests per second are constant along the increasing number of data nodes. This means that by raising the data size and extending the datanodes, still MedCloud system maintains its performance level. Therefore, this ensures the system’s scalability. V. C ONCLUSION This paper presented a cloud oriented approach for medical systems development. The main idea is the need for a medical system, which all people from different places can easily access and use, as well as help the developers in creating their own healthcare applications sharing the EMRs. The proposed model introduced in this paper offers a flexible and portable platform for applications development. Scalability and privacy were the major concerns. MedCloud system successfully overcome these issues by deploying hadoop cluster for scalability and designing the system based upon HIPAA requirements. We provided the users with an easy way to access the system

165

The 7th International Conference for Internet Technology and Secured Transactions (ICITST-2012)

via Restlet web server. Finally, promising output across the conducted tests was a good indicator for usability of the proposed system. R EFERENCES [1] T. White, Hadoop : The Definitive Guide, 1st ed. 1005 Gravenstein Highway North, Sebastopol, CA 95472.: O’reilly Media Incorporation, June 2009, no. 978-0-596-52197-4. [2] U. S. D. of Health and H. Services, Summary of Hipaa Privacy Rule, Office for Civil Rights, 200 Independence Avenue, S.W. Washington, D.C. 20201, May 2003. [3] T. T. Jerome Lovel, Restlet in Action. Mannings, 2011. [4] J. Walther and C. de Jong, “Multimedia, ieee,” Technology for More Effective Healthcare, vol. 16, no. 4, pp. 5–7, April 2009. [5] I. of Medicine, The computer-based electronic medical record: An essential technology for healthcare. NAP, Washington, DC, 1997. [6] Y. Shuli, Y. Xiaoping, and L. Huiling, “Research on the emr storage model,” in International Forum on Computer Science Technology and Applications, 2009. IFCSTA 2009., vol. 1, December 2009, pp. 222–226. [7] P. Ray and J. Wimalasiri, “The need for technical solutions for maintaining the privacy of ehr,” in Engineering in Medicine and Biology Society, 2006. EMBS ’06. 28th Annual International Conference of the IEEE, September 2006, pp. 4686 –4689. [8] K. M. Yashpalsinh Jadeja, “Cloud computing - concepts, architecture and challenges,” in International Conference on Computing, Electronics and Electrical Technologies [ICCEET], 2012. [9] (2012, September). [Online]. Available: http://aws.amazon.com/ec2/ [10] (2012, October). [Online]. Available: https://appengine.google.com/ [11] X. J.-B. Zeng Shu-Qing, “The improvement of paas platform,” in 2010 First International Conference on Networking and Distributed Computing (ICNDC), October 2010, pp. 156–159. [12] C. Corporation. (2012, August). [Online]. Available: http://www.care2x.org/ [13] M. Lakshmi and J. Dhas, “An open source private cloud solution for rural healthcare,” in Signal Processing, Communication, Computing and Networking Technologies (ICSCCN), 2011 International Conference on, July 2011, pp. 670–674. [14] D. J. Abadi, “Data management in the cloud: Limitations and opportunities,” IEEE Data Engineering Bulletin, vol. 32, no. 1, pp. 3–12, March 2009. [15] J. Han, E. Haihong, G. Le, and J. Du, “Survey on nosql database,” in Pervasive Computing and Applications (ICPCA), 2011 6th International Conference on, October 2011, pp. 363 –366. [16] S. C. S. Rabi Prasad Padhy, Manas Ranjan Patra, “Rdbms to nosql: Reviewing some next-generation non-relational database’s,” Interntional Journal of Advanced Engineering Sciences and Technologies (IJAEST), vol. 11, no. 1, pp. 15 – 30, 2011. [17] F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber, “Bigtable: A distributed storage system for structured data,” in Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation, ser. OSDI ’06, vol. 7, ACM. Berkeley, CA, USA: USENIX Association, 2006, p. 15. [18] (2012, April). [Online]. Available: http://www.w3.org/Protocols/ [19] A comparison of SOAP and REST implementations of a service based interaction independence middleware framework, ser. WSC ’09, no. 10. Austin, Texas: Winter Simulation Conference, 2009. [20] L. George, Hbase The Definitive Guide. Oreilly Media Incorporation, August 2011.

978-1-908320-08/7/$25.00©2012 IEEE

166