Preparing for Future Data Center Professionals ... - IEEE Xplore

1 downloads 0 Views 901KB Size Report
Technology (CIT) major students at Northern Kentucky. University (NKU). Since our CIT program emphasizes hands-on learning, we have developed the course ...
Session S2E

Preparing for Future Data Center Professionals: Integrating Storage Technology into the Computer Information Technology Curriculum Wei Hao, Hetal Jasani, Traian Marius Truta Northern Kentucky University, [email protected], [email protected], [email protected] Abstract -Information brings economic value to the customers and data is the "soul" of the enterprise. Data centers are playing more and more important roles in the enterprises. Storage technology is one of the fundamental technologies behind data centers. The storage knowledge and skills are needed for data center professionals. Thus, we have developed a new course, Storage Administration, for Computer Information Technology (CIT) major students at Northern Kentucky University (NKU). Since our CIT program emphasizes hands-on learning, we have developed the course based on hands-on laboratory components. The laboratory components are developed based on open source software and simulator software. In this paper, we describe the hands-on laboratory components in details. Index Terms – Storage Technology, CIT Education, Lab Modules. 1.

INTRODUCTION

The demand for data fuels the expansion of storage requirements beyond traditional corporate databases and data warehouses. The International Data Corporation (IDC) research [1] shows that the digital universe information that is either created, captured, or replicated in digital form was 281 Exabytes in 2007. In 2011, the amount of digital information produced in the year should equal nearly 1,800 Exabytes, or 6 times that produced in 2007. Storage technology is playing a more and more important role in IT. A few universities had offered storage technology courses [2, 3, 4, and 5]. In general their approaches were theory heavy and these courses failed to deliver hands-on based laboratory experience for students. Hands-on labs can help students better understand and apply what they have learned during the class lectures. A second drawback of the exiting courses is that the emerging technologies, such as virtualization and Cloud computing, were not taught at all. These emerging technologies are changing the way future data centers are built, configured, and operated. The new knowledge and skills required for future data center professionals are not taught by current storage curriculum.

To address the above problems, we have developed a new course entitled CIT 465/565 - Storage Administration, for senior undergraduate students majoring in CIT and for graduate students in the Master of Science in Computer Information Technology (MSCIT) at NKU. Since both our CIT programs emphasize the hands-on learning, we have developed not only lecture components but also laboratory components for the course. The lecture components are designed to cover three parts: storage fundamentals, storage networks, and emerging technologies and data centers. The storage fundamentals part focuses on the fundamental storage concepts, such as storage devices, disk interfaces, disk geometry, disk partitions, disk performance, files systems, Redundant Array of Independent Disks (RAID), hot swap, Logical Volume Management (LVM), and storage planning. The storage networks part emphasizes on DirectAttached Storage (DAS), Storage Area Network (SAN), Network-Attached Storage (NAS), Network File System (NFS), Common Internet File System (CIFS), IP-SAN, Internet Small Computer System Interface (iSCSI), and Content-Addressed Storage (CAS). The emerging technologies and data centers part covers virtualization, storage virtualization, data center, and Cloud computing. Correspondingly, three laboratory modules are designed for those three parts. The lab module 1 is designed for the storage fundamentals part. It includes a hard disk installation lab, a disk performance monitoring and testing lab, and a software RAID and LVM lab. The lab module 2 is designed for the storage networks part. It includes an EMC Navisphere Manager Simulator lab and Openfiler labs. The lab module 3 is designed for the emerging technologies and data centers part. It includes a virtualization lab, a network optimization lab, and a Cloud computing lab. We successfully offered this course in Fall 2010. At the end of the course, student evaluations showed students have not only learned storage concepts but also gained hands-on experience on how to manage storage systems. The students liked this course, especially hands-on lab modules. The hands-on labs helped students better understand the storage technologies. In this paper, we discuss our hands-on lab modules. The rest of the paper is organized as follows: the related work is described in Section 2. In Section 3, we describe the lab module 1 consisting of lab exercises on

978-1-61284-469-5/11/$26.00 ©2011 IEEE October 12 - 15, 2011, Rapid City, SD 41st ASEE/IEEE Frontiers in Education Conference S2E-1

Session S2E storage fundamentals. We present the lab module 2 on storage networks in Section 4. Section 5 discusses the lab module 3, which includes the labs on emerging technologies and data centers. Section 6 summarizes the paper. 2. RELATED WORK Several universities had offered storage-related courses into their undergraduate curricula. Michigan Technological University offered one such course entitled "Storage Area Networking" [2]. This class covered the dominant mass storage technologies, specifically rotating magnetic and optical media. It also covered the distributed network storage methods, such as iSCSI, DAS, NAS, and SAN technologies. Pennsylvania State University offered one course named "Designing High Availability Information Management and Storage Architectures" [3]. It focused on the concepts of DAS, NAS, SAN, and various SAN topologies. It also covered Fibre Channel architecture, Storage-over-IP technologies, such as Fibre Channel over IP (FCIP), Internet Fibre Channel Protocol (iFCP), iSCSI, and InfiniBand. Georgia Southern University offered another course called "Storage Technologies" [4]. This course included modern storage infrastructure technologies such as SAN, NAS, DAS, CAS, storage virtualization technologies, local and remote replication, backup and recovery. Same faculty presented another storage technology course [5]. The course topics included I/O system, minimal elements of queuing theory, storage networking protocols, common information model, storage area protocols (iSCSI, iFCP, FCIP), business continuity, and disaster recovery. 3.

LAB MODULE 1: STORAGE FUNDAMENTAL LABS

The primary objective of lab module 1 is to understand the fundamental concepts of storage. This lab module consists of three hands-on lab exercises. This module includes a hard disk installation lab, a disk performance monitoring and testing lab, and a software RAID and LVM lab. 1.

Hard Disk Installation on a Linux Machine

To prepare for a storage administrator career, the students need to know how to install a new disk. We design this lab to help students become familiar with the hard disk installation procedure on a Linux machine. The procedure for adding a new disk involves the following steps: (1) connecting the disk to an HBA (Host Bus Adapter) and BIOS setup for the disk; (2) partitioning the disk; (3) creating filesystems within disk partitions; (4) mounting the filesystems; (5) setting up automatic mounting; (6) labeling disk partitions; (7) setting up swapping on swap partitions. The software and materials used for this lab includes one live Ubuntu 10.04 CD and one removable SATA hard disk. 2.

Disk Performance Monitoring and Testing

introduce smartmontools [6] to students in this lab. SelfMonitoring, Analysis, and Reporting Technology (S.M.A.R.T.) is a system in hard disks designed to report conditions that may indicate impending failure. Smartmontools is a free software package that can monitor S.M.A.R.T. attributes and run hard drive self-tests. The purpose of S.M.A.R.T. is to warn a system administrator of impending drive failure while there is still time to take action, such as copying the data to a replacement device. Smartmontool comes with two programs: smartctl, which is meant for interactive use, and smartd, which continuously monitors S.M.A.R.T. attributes. In the lab, we ask students to use smartctl to enable S.M.A.R.T. support and offline data collection on the disk, check the overall health of the disk, run a self-test on the disk, and set up smartd to do tests automatically. Storage planning is another responsibility of the storage administrator. In order to do good planning, the administrator needs to understand the performance of storage devices. For example, if an application requires 1TB of storage capacity and performs 5000 IOPS (Input/output Operations per Second), then the storage administrator needs to determine the number of disks need to meet the application requirements. In this lab, we introduce several open-source disk performance test tools to students. They are hdparm [7], iostat [8], and iometer [9] tools. We ask students to use those tools to measure the performance of different storage devices, such as SATA drive, SCSI drive, and USB drives. Based on the measurements, students plot graphs to compare read/write and sequential/random access rates among different storage devices. 3. Software RAID, Hot Swap, and LVM RAID is a typical setup for storage systems. Storage administrators need to hot swap a bad disk during the run time. Buying servers with hardware RAID and hot swap support is expensive. To save cost, we design a software RAID lab that simulates a more costly hardware RAID environment. The software and materials used in the lab include one PC, one Ubuntu 10.0.4 live CD, three 4GB USB flash drives. The first part of this lab is about RAID 1(Mirror) configuration and hotswap. First, students boot up the PC with the Ubuntu live CD and then download and install mdadm package [10]. Second, they connect two USB drives to the PC. The USB drives appear as the SCSI devices on Ubuntu, such as /dev/sdb1 (the first USB drive) and /dev/sdc1 (the second USB drive). A RAID 1 array is created on the partitions of the two USB drives via mdadm -create /dev/md0 --level=1 --raid-devices=2 /dev/sdb1 /dev/sdc1. Third, the students check the RAID status while the RAID 1 is building. The following two approaches are used to check the status. One is to use the mdadm command (one possible command is mdadm --detail /dev/md0). The other is to check the Linux proc interface (for example more /proc/mdstat). Fourth, after the RAID 1 build process is complete, the students are asked to use the mdadm command to simulate the process of the disk failure. For

An important task that storage administrators face daily is to monitor the disk health status. They need to predict possible disk failure and prevent losing the critical data. Thus, we 978-1-61284-469-5/11/$26.00 ©2011 IEEE October 12 - 15, 2011, Rapid City, SD 41st ASEE/IEEE Frontiers in Education Conference S2E-2

Session S2E example, mdadm /dev/md0 -f /dev/sdb1 fails one USB drive in the RAID1. Fifth, the students are asked to perform hotswap procedure to replace the “bad” disk (USB drive). The students replace the failed USB with a new USB drive. The new USB drive is added into the RAID array via mdadm /dev/md0 -a /dev/sdd1. Storage administrators often use RAID and LVM together. RAID provides the reliability. LVM [11] provides the feature of adding/removing/resizing partitions on demand. The second part of this lab is about RAID and LVM. First, the students download and install the LVM on Ubuntu via apt-get install lvm2. Second, they create a physical volume via the pvcreate command. Third, they create a volume group via the vgcreate command. Fourth, they create a logical volume via the lvcreate command. Fifth, they create a filesystem for their logical volume via the mkfs.ext3 command. Sixth, the filesystem is mounted via the mount command. Seventh, the students extend the size of their logical volume via the lvextend command and resize the file system for the extended logical volume via the e2fsck and resize2fs commands. The outcomes of this lab module are that students are able to • Understand the functionalities of storage administration. • Explain the theory and practice of RAID/LVM storage and management. • Demonstrate ability to predict possible storage failure and measure storage performance. 4.

LAB MODULE 2: STORAGE NETWORKS LABS

Lab module 2 is focused on understanding the fundamental concepts of storage networks. This lab module consists of a Navisphere manager simulator lab and open source storage software Openfiler labs. 1.

Navisphere Manager Simulator

As a storage administrator, they need to perform storage management on SAN disk array systems. EMC CLARiiON is a popular SAN disk array product in the enterprise. In this lab, we use Navisphere manager simulator to help students understand configuration and management of EMC CLARiiON storage systems. The Navisphere manager Simulator can be downloaded at EMC Education web site [12]. This lab includes (1) configure storage pools; (2) create LUNs(Logical Unit Number) for storage groups; (3) configure snapshots; (4) create clones; (5) create SANCopy full and incremental sessions; (6) create MirrorView synchronous and asynchronous images; (7) expand a LUN to create metaLUNs; (8) migrate a LUN to another LUN. Figure 1 shows the Navisphere manager simulator.

Figure 1. Navisphere manager simulator. 2.

Openfiler

Openfiler [13] is an open source storage software. It is based on the Linux operating system. The Openfiler supports client access to storage at both file and block levels. In this lab, we design the following exercises: (1) configure the Openfiler to support locally attached USB drives; (2) set up a NAS server to support NSF and CIFS protocols; (3) set up a SAN server to support an iSCSI protocol. The software and materials used in this lab include one Ubuntu 10.0.4 live CD, one Openfiler CD, one removable hard disk, two USB flash drives, and two PCs. The first part of this lab is to configure the Openfiler to support locally attached USB drives. Students use one PC with the removable disk as a server. They install the Openfiler on the removable disk and then boot up the Openfiler. Two USB drives are connected to the Openfiler as storage devices. Through the Openfiler GUI, students create logical volumes on the USB drives. Also, students configure the LDAP authentication for the Openfiler. Students use the other PC as a client machine. The client machine boots up with the Ubuntu Live CD to verify the Openfiler’s configurations. The Openfiler’s configuration is shown in Figure 2.

Figure 2. Openfiler configuration. 978-1-61284-469-5/11/$26.00 ©2011 IEEE October 12 - 15, 2011, Rapid City, SD 41st ASEE/IEEE Frontiers in Education Conference S2E-3

Session S2E The second part of the lab is to configure the Openfiler as a NAS server. Students configure access control rules and NFS/CIFS shares for the NAS server. And then they configure the Linux client machine to access the NFS shares on the NAS server. They configure a Windows VM on the Linux client machine to access the CIFS shares on the NAS server. Figure 3 shows the NAS configuration.

Figure 3. NAS configuration. The third part of the lab is to use the Openfiler to set up a SAN server, which supports iSCSI protocol for the block level data access. Students configure access control rules for the SAN server and configure iSCSI targets on the server. Figure 4 shows the iSCSI target configuration on the server. Students download iSCSI initiator software from Microsoft web site [14] and install it on the Windows VM on the Linux client machine. They configure the iSCSI initiator to access to the iSCSI target devices on the server. Figure 5 shows the configuration of the iSCSI initiator.

Figure 5. iSCSI initiator. The outcomes of this lab module are that students are able to • Understand the functionalities of storage network administration. • Set up a NAS server to support file level data access via the NSF and the CIFS protocols. • Set up a SAN server to support the iSCSI protocol for block level data access. 5.

Figure 4. iSCSI target configuration.

LAB MODULE 3: EMERGING TECHNOLOGY AND DATA CENTER LABS

Information brings economic value to the customers and data is the "soul" of the enterprise. Data centers are playing more and more important roles in the enterprises. Emerging technologies, such as virtualization, network optimization, and cloud computing, are changing the way future data centers are built, configured, and operated. The new knowledge and skills are required for future data center professionals. We design this lab module to help students understand those emerging technologies and prepare them for the future success. This lab module consists of three lab exercises. They are a virtualization lab, a network optimization lab, and a Cloud computing lab. 1. Virtualization Virtualization is an important trend in enterprise IT. It is transforming data centers. In this lab, VMware ESX servers [15] are used to help students understand the virtualization concepts. We introduce datastore, virtual disks, Virtual Machine File System (VMFS), and vStorage provisioning to students. Students use VMWare vCenter software [16] to create VMs and VM templates on the ESX server (as shown

978-1-61284-469-5/11/$26.00 ©2011 IEEE October 12 - 15, 2011, Rapid City, SD 41st ASEE/IEEE Frontiers in Education Conference S2E-4

Session S2E in Figure 6). Students are asked to compare the differences between vStorage thin provisioning and thick provisioning and do storage planning based on two provisioning approaches(as shown in Figure 7 [17] ). Also, students are asked to configure the Openfiler as storage for the ESX Server.

server(vSphere server). We use Squid proxy server software to design hierarchical web cache [20] running on top of storage networks. We configure Cisco routers to transparently intercept HTTP requests from clients via Web Cache Communication Protocol (WCCP) protocol [21]. The intercepted HTTP requests are redirected via Generic Routing Encapsulation (GRE) protocol [22] to the proxy server. The web requests can be optimized at the proxy server. The setup for this lab is shown in Figure 8.

Figure 6. VMWare vCenter server. Figure 8. Network optimization setup. 3.

Cloud Computing

Cloud computing is widely acknowledged to be the future for data centers. Thus, we design a Cloud computing lab. In this lab, we use the open source Cloud computing software Eucalyptus [23] to build a small-scale private Cloud on top of storage networks.

Figure 7. Storage provisioning. 2.

Network Optimization

Eucalyptus interface is compatible with the Amazon EC2 Cloud [24]. Eucalyptus has been included in Ubuntu Enterprise Cloud (UEC). The Eucalyptus consists of five components: Cloud Controller (CLC), Walrus, Storage Controller (SC), Cluster Controller (CC), and Node Controller (NC). The CLC provides the interface with which users of the Cloud interact. The Walrus is used for persistent storage and access control of virtual machine images and user data. The SC provides block-level network storage for virtual machine images. The NC manages virtual machine lifecycle. The CC operates as the liaison between the NC and the CLC. In this lab, students install the CLC, the CC, the Walrus and the SC on one machine. They install the NC on another machine. They set up a private Cloud based on Eucalyptus, which is shown in Figure 9. Based on the constructed private Cloud, students design a virtual lab prototype. In the virtual lab, instructors can reserve virtual machine images for use by students.

Network optimization is used to accelerate applications hosted at data centers via data caching, eliminating redundant transmissions, compressing and prioritizing data, and streamlining distributed file system protocols. Currently many commercial network optimization products, such as Cisco WAAS, Riverbed Steelhead, and Blue Coat ProxySG, are widely deployed in enterprise data centers. In this lab, we use Cisco routers/switches and open source software Apache web server [18] and Squid proxy server [19] to design a small-scale network optimization lab exercise. We run an Apache web server on the VMWare ESX 978-1-61284-469-5/11/$26.00 ©2011 IEEE October 12 - 15, 2011, Rapid City, SD 41st ASEE/IEEE Frontiers in Education Conference S2E-5

Session S2E [7]

hdparm utility, http://sourceforge.net/projects/hdparm/.

[8]

iostat command, http://en.wikipedia.org/wiki/Iostat.

[9]

iometer utility, http://www.iometer.org/.

[10] Vadala D,”Managing RAID on Linux”, O'Reilly, 2002. [11] LVM, Logical Volume Manager, http://en.wikipedia.org/wiki/Logical_Volume_Manager_(Linux). [12] Navisphere Manager Simulator, https://education.emc.com/default_guest.aspx. [13] Openfiler storage software, http://www.openfiler.com/. [14] Microsoft iSCSI Software Initiator Version 2.08, http://www.microsoft.com/downloads/en/details.aspx?familyid=12cb 3c1a-15d6-4585-b385-befd1319f825&displaylang=en [15] VMware ESX server, http://www.vmware.com/products/vsphere/esxi-and-esx/index.html.

Figure 9. A private Cloud based on Eucalyptus. The outcomes of this lab module are that students are able to • Understand two important applications of storage networks, data centers and cloud computing. • Demonstrate ability to design and build a small-scale data center and a small-scale cloud computing environment. • Work in team and quickly get up to speed with various open source software, network equipments and other hardware resources. 6.

CONCLUSIONS

CIT 465/565 Storage Administration has been developed for CIT major students at NKU. We successfully offered this course in Fall 2010. At the end of the course, student evaluations showed the students liked this course, especially hands-on lab modules. The hands-on lab modules have facilitated active learning in the classroom. Through the developed lab modules, students have learned not only storage fundamentals but also emerging technologies. The knowledge and skills the students have learned from the labs will prepare them for future success in business. REFERENCES [1]

IDC report, http://www.emc.com/collateral/analyst-reports/diverseexploding-digital-universe.pdf, 2009.

[2]

Davis B., SAT 3200 Storage Area Networking, http://www.ece.mtu.edu/faculty/btdavis/courses/mtu_sat4200_f07/, 2007.

[3]

Cameron B., IST 402 Designing High Availability Information Management and Storage Architecture, 2008.

[4]

Jovanovic V., CSCI 5090 Storage Technologies, http://cit.georgiasouthern.edu/cs/FacPics/vladan.html, 2010.

[5]

Jovanovic V., Mirzoev T., Teaching Network Storage Technology: Assessment Outcomes and Directions, Proceedings of the 9th ACM SIGITE Conference on information Technology Education, Cincinnati, OH, USA, 2008.

[6]

smartmontools package, http://sourceforge.net/apps/trac/smartmontools/wiki.

[16] VMWare vCenter server, http://www.vmware.com/products/vcenterserver/overview.html. [17] VMware vStorage thin provisioning, http://www.vmware.com/files/pdf/VMware-vStorage-ThinProvisioning-DS-EN.pdf [18] Apache web server, http://httpd.apache.org/. [19] Squid proxy server, http://www.squid-cache.org/. [20] Ross K., "Hash Routing for Collections of Shared Web Caches", IEEE Networks, pp. 37-44, 1997. [21] Web Cache Communication Protocol v2, http://www.cisco.com/en/US/docs/ios/12_0t/12_0t3/feature/guide/wc cp.html. [22] GRE, RFC2784 - Generic Routing Encapsulation (GRE), Available online at: http://www.faqs.org/ rfcs/rfc2784.html. [23] Nurmi D., Wolski R., Grzegorczyk C., Obertelli G., Soman S., Youseff L., Zagorodnov D. (2009), The Eucalyptus Open-Source Cloud-Computing System, Proceedings of the 2009 9th IEEE/ACM international Symposium on Cluster Computing and the Grid, May 2009, Washington, DC. [24] Vliet J., Paganelli F., Programming Amazon EC2, O'Reilly Media, first edition, 2011.

AUTHOR INFORMATION Wei Hao, Assistant Professor, Computer Science Department, Northern Kentucky University, [email protected]. Hetal Jasani, Assistant Professor, Computer Science Department, Northern Kentucky University, [email protected] Traian Marius Truta, Associate Professor, Computer Science Department, Northern Kentucky University, [email protected]

978-1-61284-469-5/11/$26.00 ©2011 IEEE October 12 - 15, 2011, Rapid City, SD 41st ASEE/IEEE Frontiers in Education Conference S2E-6