Page 1 IBM Copyright > ^ David Raften [email protected] Page 2

Availability benefits of Linux on z Systems David Raften [email protected]

pg. 1

IBM Copyright

CONTENTS 1

Why Linux on z Systems ........................................................................................................................ 3 1.1

2

3

4

The z System hardware level ................................................................................................................ 5 2.1

Designed for zero down time........................................................................................................ 5

2.2

Call Home ...................................................................................................................................... 6

2.3

IBM zAware: .................................................................................................................................. 7

Availability at the z/VM Hypervisor level .............................................................................................. 8 3.1

Hardware virtualization at native speed....................................................................................... 8

3.2

Live Guest Relocation.................................................................................................................... 9

3.3

Thousands times easier on z System .......................................................................................... 10

The Linux level..................................................................................................................................... 12 4.1

5

Linux Health Checker – LNXHC .................................................................................................... 12

Disaster Recovery................................................................................................................................ 13 5.1

Disaster Recovery Disaster.......................................................................................................... 13

5.1.1

D/R using recovery service provider ................................................................................... 13

5.1.2

D/R using in-house recovery site ........................................................................................ 15

5.1.3

Maintaining a Consistency Group ....................................................................................... 16

5.2 6

Availability, the hidden expense ................................................................................................... 4

GDPS/PPRC and GDPS Virtual Appliance .................................................................................... 17

Summary ............................................................................................................................................. 18

Appendix A – Selected z System availability features................................................................................. 19 6.1

Unplanned outage avoidance ..................................................................................................... 19

6.2

Planned outage avoidance .......................................................................................................... 21

Appendix B – References ............................................................................................................................ 22

pg. 2

IBM Copyright

Data center managers are constantly being asked to do more with a fixed or declining budget. Yet, the data center needs to have more capacity to support more workloads, be more secure, and more available. With these external forces, business as usual does not work. There are four categories of data center expenses: • Hardware • Software • People / System Management • Facilities In most places in the world, most of the I/T budget goes to either software fees or system management. While a single z System server may be more expensive than a single x86-based server, server hardware is actually one of the smallest components of the budget. In fact, hardware costs have tended to be constant for over 10 years while software and system management costs continue to increase. Moving workloads to Linux on z Systems can help reduce expenses by significantly reduce the software and people/system management costs by simplifying the infrastructure and avoiding “server sprawl”, while at the same time also reduce environmental expenses. This enables more resources to concentrate on to developing a secure, modern, highly available infrastructure.

1 WHY LINUX ON Z SYSTEMS YSTEMS The Linux operating system is designed to run on any platform: x86, Power, or z System. The major difference between a Linux distribution running on a “PC” and one on a z Systems mainframe is the device drivers and the instruction set architecture to interface with the host hardware. The Linux kernel, GNU environment, compilers, utilities, network protocols, memory management, process management, etc. are all the same. Linux is designed so one can create a Linux application on one platform and run it on any other. More importantly, it has the same system management interface on any platform. From a system programming or application programming point of view, it does not matter what the underlying hardware is. Although the Linux operating system and applications do not care what the underlying hardware is, it does matter from a cost and availability perspective. In the past, many data centers chose to run a single application at a time on a server, often getting 10% utilization out of the server. Today with virtualization such as VMware, the utilization is higher, but not much. When one considers the amount of servers configured for: • Development • Quality Assurance / Test • Production • Backup for production at the primary site • Then double it for disaster recovery The utilization is often no more than 35%. Although some users get the primary production server at a higher utilization, the average across the data center is low. This is even more dramatic when one considers the configuration for a single application is then duplicated for each of the hundreds or thousands of applications being run. Each of the tens of thousands of servers incurs expenses of:

pg. 3

IBM Copyright

•

Software. Often the biggest data center expense, many products charge by the number of cores on the server. It doesn’t matter if the server is 35% busy, 100% busy, or 0% busy.

•

System Management. Also a large part of the data center budget, how do you maintain all the software to keep it current? How do you maintain the hardware?

•

Facilities. Each one uses electricity, floor space, and cooling chiller systems. The availability of electricity has driven many companies to spend millions of dollars to create new data centers away from cities where power is more readily available.

•

Hardware. If you buy a product, expecting to use all of it, are you happy if you can only use a third? Why is this acceptable for servers?

z System can improve the average utilization of all the servers to near 100%, one would need significantly less servers to run the workload. The savings can be even greater by then using faster servers. Many sites have seen a 20:1 ratio in the number of cores after moving applications to Linux on z with up to 40% reduction in total data center expenses.

1.1 AVAILABILITY, THE HIDDEN EXPENSE When calculating expenses, hardware, software, system management, and facilities costs are what is being tracked. But there is an expense that is not often looked at: the cost to the business when applications are unavailable. Depending on the industry, this cost can be $1 Million or more for each hour of downtime. This can be calculated by summing: •

Missed business opportunity. Look at the number of transactions not run during the period of the outage and the average revenue generated by each transaction. This needs to be modified by the transactions that can be deferred until when the system comes up again.

•

Loss of productivity. What is the hourly cost of all the employees affected that can no longer do their job? What is the hourly cost of the data center?

•

Loss of brand image and customers. If the system is often unavailable or even just getting bad performance, how many customers will permanently move to your competition?

•

Other factors. This includes financial penalties, overtime payments, and wasted goods.

The cost per minute of an outage increases with the duration of the outage. The effects of the impact to customer service are subjective and depends on how frequently outages occur and for how long. The more customers are affected by the outage, the more chance there is of them taking their business elsewhere. Different hardware platforms have different availability characteristics. This affects the bottom line on the Total Cost of Ownership for the application solution.

pg. 4

IBM Copyright

2 THE Z SYSTEM HARDWARE LEVEL 2.1 DESIGNED FOR ZERO DOWN DOWN TIME The IBM z Systems “mainframe” servers were designed with over 50 years of experience with availability being the primary core value. The “z” in z System stands for “zero down time.” Almost all the features developed for the z Systems are available to any operating system and application being hosted, including z/VM and Linux. In every generation the z System looks for new ways to provide additional protection for its core components with redundancy and seamless failover. These components include the core CPs, Cache, Memory, I/O, Power, and Cooling, as well as looking at other areas such as Security, since an outage caused by an external attack is still an outage, or interaction with applications to proactively try to detect problems before they occur. As soon as the first Linux transaction runs on z Systems, it gets all the availability protection that the z System is known for without any application change. IBM has a requirement: each System z server needs to be better than its predecessor. For this to take place, for each of the major subsystems the z Systems addresses the different levels of availability concerns. Some major functions include the following: •

Unplanned outage avoidance. While many platforms provide many “n+1” components, the z Systems goes further with dual power supplies, Transparent CPU Sparing so if there is a problem with any general or special purpose core, then spares that come with the server would detect this and take over. This would be invisible to the applications and they would continue without any interruption. z Systems has a Redundant Array of Independent Memory (RAIM). Based on the RAID concept for disk, memory can be set up to recover if there are any failures in a memory array. This provides protection at the dynamic random access memory (DRAM), dual inline memory module (DIMM), and memory channel levels, extensive error detection and correction on all components including bus and fabric. For Security there is a hardware-based tamper-resistant cryptographic accelerator. The z System is the only class of servers to obtain EAL Level 5 certification. Every part is stress tested multiple times during the different manufacturing phases.

•

Planned outage avoidance Every major hardware component supports dynamic maintenance and repair. This encompasses the Cores (including oscillators), Power, Cooling, Memory (including IO cage, STIs, channels), and cryptographic processor. The z System supports dynamic firmware updates and dynamic driver load updates. This can be done while your key applications are running. Dynamic IO reconfiguration allows redefinition of channel types. Dynamic swapping of processor types allows for price advantages. Dynamic LPAR add allows for workload flexibility.

•

Power and Thermal management While providing the ability to save on data center power consumption, Power and thermal management also provides improved availability for the server. Static power save mode is designed to reduce power consumption on z System servers when full performance is not required. It can be switched on and off during runtime with no disruption to currently running

pg. 5

IBM Copyright

workloads, aside from the change in performance. As well as providing 20% - 30% reduction in power consumption (depending on system configuration), the availability benefit is in silicon reliability when operating at lower temperatures, as well as less mechanical component wear. Typical examples of using static power save mode includes:

•

o

Periods of lower utilization - weekends, third shift.

o

Capacity backup systems - systems used for emergency backup; keep them "running" but reduce energy consumption. Systems can quickly be brought back to full performance.

Reduced IBM interaction (“touches”) with customer systems The design includes extra hardware that will not be replaced on 1st failure in the customer site, better problem management, and better diagnostics with first failure data capture.

A more detailed, although not inclusive, list of availability features of the z Systems can be found in Appendix A – Selected z System availability features. The z System hardware provides other features to improve availability for proactive error detection and notification. These include: •

Call Home

•

IBM zAware

2.2 CALL HOME The Call Home service is an automated notification process that detects problem conditions on the server and reports them to IBM Support, sometimes even before the problem manifests itself. The service watches your system for error conditions such as: • Primary SE loss of communications with the Alternate SE • Memory Sparing Threshold is reached • High humidity is sensed inside the machine • Alternate SE is fenced due to automatic switchover. The server also looks for degraded conditions. Although the server is still operating, some hardware is not working. • Loss of channels due to CPC hardware failure • Loss of memory • The drawer is no longer functioning • Capacity BackUp (CBU) resources have expired • Processor cycle time reduced due to temperature problem • CPC was IMLed during cycle time reduction. • Repeated intermittent problems When it detects a problem, the call home service automatically gathers the basic information to resolve the problem and sends an email with log files or other diagnostics for the failure condition. IBM Support processes the email information, opens a Problem Management Record (PMR) and assigns it to a

pg. 6

IBM Copyright

support engineer who investigates the problem. The call home service ensures that the PMR contains the required information about the system and problem. The IBM Support server also sends an email to alert a designated administration contact with the PMR number. The ability to diagnose and report on troublesome, but still working components has at times allowed IBM customer engineers (CEs) to come with a replacement and dynamically change the part before any failure has occurred.

2.3 IBM ZAWARE: The IBM System z Advanced Workload Analysis Reporter (IBM zAware) is an integrated, self-learning, analytics solution that helps identify unusual behaviors of workloads based on message pattern recognition analytics. It intelligently examines Linux on System z messages for potential deviations, inconsistencies, or variations from the norm, providing out-of-band monitoring and machine learning of operating system health. Large operating system environments can sometimes generate more than 25 million messages per day. This can make manual analysis time-consuming and error-prone when exceptional problems occur. IBM zAware provides a graphical user interface (GUI) and APIs for easy drill-down into message anomalies, which can lead to faster problem detection and resolution, increasing availability. IBM zAware provides: • Support for native or guest Linux on z Systems message log analysis • The ability to process message streams with or without message IDs • The ability to group multiple systems that have similar operational characteristics for modeling and analysis o Recognition of dynamic activation and deactivation of a Linux image into a group, and appropriate modeling and analysis. o User-defined grouping. For Linux on IBM z Systems, the user can group multiple systems' data into a combined model: by workload (one for all web servers, one for all databases, and so on); by "solution" (for instance, one model for your cloud); or by VM host. • Heat map display which provides a consolidated/aggregated/higher level view with the ability to drill down to detail views

pg. 7

IBM Copyright

3 AVAILABILITY AT THE THE Z/VM HYPERVISOR LEVEL IBM z/VM is the premier mainframe virtualization platform, supporting thousands of virtual servers in a single footprint, more than any other platform. The z/VM hypervisor is designed and developed in conjunction with the z System hardware. As such, it can exploit new hardware functions for performance, security, and availability and pass these benefits on to its guests such as Linux, as well as z/VSE, z/OS, and z/TPF. There are many examples of z/VM exploiting hardware functions. Some examples include High Performance FICON (zHPF) for more I/O throughput, HyperPAV for less I/O contention, Simultaneous Multi-Threading (SMT) for performance, hardware based cryptographic acceleration, zEDC Express for high-performance, low-latency hardware data compression while reducing disk space and improving channel and networking bandwidth, and of course, all of the availability features described in the previous chapter. As well as the performance and security capabilities, from an availability perspective the power of z/VM is its ability to efficiently virtualize hardware components as well as its implementation of Live Guest Relocation.

3.1 HARDWARE VIRTUALIZATION AT NATIVE SPEED The ability to virtualize hardware components adds another layer of availability. Efficient virtualization of processor, memory, communications, I/O, and networking resources help reduce the need to duplicate and manage hardware, programming and data resources. z/VM can significantly over-commit these real resources and allow users to create a set of virtual machines with assets that exceed the amount of real hardware available. This reduces hardware requirements and simplifies system management. Because resources are virtualized, a z/VM guest sees only what z/VM presents to it. If there is a problem with a hardware component, z/VM can seamlessly switch to use the redundant component and hide this from the guest. One example is the ability to balance the workload across multiple cryptographic devices, and should one device fail or be brought offline, z/VM can transparently shift Linux systems using that device to an alternate cryptographic device without user intervention. Another example of this is Multi-VSwitch Link Aggregation support. It allows a port group of OSAExpress network adapter features to span multiple virtual switches within a single z/VM system or between multiple z/VM systems. A single VSwitch can provide a link aggregation group across multiple network adapters, and make that highly-available connection available to guests transparently. Sharing a Link Aggregation Port Group with multiple virtual switches increases optimization and utilization of the OSA-Express adapters when handling larger traffic loads and enables sharing the network traffic among multiple adapters while still presenting only a single network interface to the guest. HiperSockets can be used for communication between Linux, as well as z/OS, z/VM, and z/VSE instances on the same server. It provides an internal virtual IP network using memory to memory communication.

pg. 8

IBM Copyright

This improves not only response time improvements, but also saves on processor utilization. In addition, the complete virtualization of the network infrastructure provides efficient and secure communication. If your system does not have good performance, do you consider it as “available?” Certainly your users may not as well as the help desk representatives that they are complaining to. z/VM exploits the hardware functions through direct execution of the machine instructions. Since it knows what hardware it is running on there is no need for an additional layer to trap hardware-directed instructions such as for disk or network access, and then emulate them. This allows z/VM and its guests to run significantly faster than other solutions that need to interrupt execution, emulate instructions, and then follow pointers to the emulated code. The system runs at native speed. The virtualization capabilities in a single System z footprint can help to support thousands of virtual Linux servers. Since a single IBM System z server doesn’t require external networking to communicate between the virtual Linux servers, all of the Linux servers are in a single box, communicating via very fast internal I/O connections. The ability of z/VM to provide simple virtualization at high performance helps provide availability as seen by the end user.

3.2 LIVE GUEST RELOCATION The most prevalent outage type in a z Systems environment is for software or hardware maintenance or upgrades. The IBM z/VM Single System Image Feature provides live guest relocation, a process where a running virtual machine can be relocated from one z/VM member system of a cluster to another. Virtual servers can be moved to another LPAR on the same or a different z Systems without disruption to the business. Relocating virtual servers can be useful for load balancing and for moving workload off of a physical server or member system that requires maintenance. After maintenance is applied to a member, guests can be relocated back to that member, thereby allowing z/VM maintenance while keeping the Linux on System z virtual servers available. Checks are in place before a Linux guest is relocated to help avoid application disruption. Some checks include: •

It has enough resources available on the target system, such as memory, CPU, and so on.

•

It has the same networking definition, for example VLAN, VSWITCH.

•

It is disconnected and accessible when the guest is being relocated.

•

It has access to the same or equivalent devices on the target system.

As well as the Linux instance, the memory contents and interrupt stack are also relocated. The design point is to avoid unplanned outages while doing a planned outage. VMware supports Live Guest Relocation with the Vmotion technology, but it has a different design point. It was designed not to provide for planned outages, but rather to try to help avoid unplanned

pg. 9

IBM Copyright

outages if there is a possible future hardware problem. They do whatever it takes to move guests off of one server and on to another quickly. This sometimes results in guest availability being negatively affected. z/VM runs on z System servers where hardware availability is not an issue. Another differentiation is the flexibility of where a guest can be relocated. X86 servers often do not support full backward compatibility. In this environment one must plan for the target for each guest and upgrade the servers as a group or else have the administrator fence off specific instruction sets if a guest is moved to an older server model. By design, z System supports full backward compatibility. Applications written in 1965 can still run on today’s servers. While some hardware features such as hardware encryption may not be consistent across the servers, the Linux guests will still run uninterrupted.

3.3 THOUSANDS TIMES EASIER ON Z SYSTEM Sites have virtualized over 50 Linux distributed cores on a single z System core. With 140 usable cores on the z13, these users can obtain a server reduction of over 7000 to 1. IBM internal tests have in fact run over 41,000 separate Linux guests on a single server, all managed by a single z/VM hypervisor. Massive virtualization massively reduces the amount of hardware in the infrastructure. There are that times many less servers, and since Linux to Linux communication can take place by using virtual links, an exponential times less cables and ports. From an availability point of view, the end to end availability as measured by the user requires all of the components to be available. For example, if there are four components that are touched such as servers, routers, ports, etc., and each at 99% available, then the net effective availability is .99 x .99 x .99 x .99 = .9606, or about only 96% available. System management is that much easier. Cloning servers is much easier then installing servers. Provisioning new servers can be done in just a couple of minutes as compared to days for real servers. All this affects availability. There is also less hardware that can fail. The distributed model for providing High Availability is to deploy redundant physical servers. Often, this means more than just two, but rather several physical servers clustered together so that if any one of them fails there will be enough spare capacity spread around the surviving servers in the cluster to absorb the failed guy’s work. But, something often not considered is that as the number of physical servers increases, so do the number of potential points of failure – you have eliminated single points of failure, but by increasing physical components you have increased the odds that something will fail. By contrast, you can put z/VM LPARs on the same server and eliminate all single points of failure with only two z/VM instances – except for the z System server itself which, as explained above, is highly available. Furthermore, since we can share CPU capacity between those two LPARs, if one entire z/VM should fail the surviving z/VM will instantly and transparently inherit the failed z/VM’s CPU capacity (although not its memory). It is like squeezing a balloon – one side gets smaller and the other gets bigger. From a disaster recovery point of view, recovery planning and actions are that much easier. There are less servers, hypervisors, and multi-vendor provisioning tools to worry about. Recovery planning and activity is that much easier. In the event of a total site failure, bringing production images and workloads up at the recovery site can now consistently be done within a single shift. Best of all, if a

pg. 10

IBM Copyright

z/OS system is already installed in the site, the same planning, tools, skills, and infrastructure can be used for z/OS as for Linux on z. See Chapter 4, Disaster Recover for more discussion on this. As an additional benefit, having less servers greatly reduces software fees, system management expenses, total hardware expenses, energy usage, and floor space.

pg. 11

IBM Copyright

4 THE LINUX LEVEL One advantage of Linux is that it looks and feels the same across platforms from an application and systems point of view. Applications can be ported without changes. It has the same system management interface independent of the base hardware platform. But the Linux functionality is not the same across all platforms. There are a number of functions that the Linux distributors put into their code to take advantage of the z System hardware capability. A few examples include the capability to avoid unplanned and planned outages, improve dump processing, manage the hardware resources, load balancing, exploitation of the CryptoExpress hardware, and failover across cryptographic adapters. It is recommended that you talk to your Linux sales representative for a complete list of their value-add capabilities for IBM z System.

4.1 LINUX HEALTH CHECKER – LNXHC The Linux Health Checker tool can identify potential problems before they impact your system’s availability or cause outages. It collects and compares the active Linux settings and system status for a system with the values provided by health-check authors or defined by the user. It produces output in the form of detailed messages, which provide information about potential problems and the suggested actions to take. Although the Linux Health Checker will run on any Linux platform which meets the software requirements, currently available health check plug-ins focus on Linux on z Systems. Examples of health checks include: • • • • •

Configuration errors Deviations from best-practice setups Hardware running in degraded mode Unused accelerator hardware Single point-of-failures

Some specific health checks include, but not limited to: • • • • • • • • • • • • • • •

pg. 12

Verify that the bootmap file is up-to-date Screen users with superuser privileges Check whether the path to the OpenSSL library is configured correctly Check for CHPIDs that are not available Confirm that automatic problem reporting is activated Ensure that panic-on-oops is switched on Check whether the CPUs run with reduced capacity Check for an excessive number of unused I/O devices Spot getty programs on the /dev/console device Check Linux on z/VM for the "nopav" DASD parameter Check file systems for adequate free space Check file systems for an adequate number of free inodes Check whether the recommended runlevel is used and set as default Check the kernel message log for out-of-memory (OOM) occurrences Check for an excessive error ratio for outbound HiperSockets traffic

IBM Copyright

• • • • • • • • •

•

Check the inbound network traffic for an excessive error or drop ratio Confirm that the dump-on-panic function is enabled Identify bonding interfaces that aggregate qeth interfaces with the same CHPID Identify qeth interfaces that do not have an optimal number of buffers Identify network services that are known to be insecure Identify unusable I/O devices Identify multipath setups that consist of a single path only Identify unused terminals (TTY) Identify I/O devices that are in use although they are on the exclusion list Identify I/O devices that are not associated with a device driver

The Linux Health Checker is available for download from http://lnxhc.sourceforge.net/

5 DISASTER RECOVERY When most people think of when they would need to implement a disaster recovery plan is in event of major “front page” events such as major natural or man-made disasters such as flooding, earthquakes, or plane crashes. But in reality, it is much more likely to require sustain a site failure or temporary outage due to other, smaller, factors. Real examples have included air conditioner failure, train derailment, a snake shorting out the power supply, a coffee machine leaking water, or smoke from a nearby restaurant. Often the management decision is to not declare a disaster because it would take too long to restore service at the recovery site, there will be data loss, and there is no easy plan to bring service back to the primary site. This is usually not due to the issues with the z System servers, but rather with the distributed environment. The decision is to “gut it out,” and wait until service can be restored. While this is happening, money is being lost for the company.

5.1 DISASTER RECOVERY DISASTER ISASTER There are two common options on how the recovery site is managed: It can be “in-house,” owned by the company, or it can be managed by a business resiliency service provider. These have two very different implications for recovering the x86 servers with much less differences for z System servers. 5.1.1

D/R using recovery service provider

One difference between z System and distributed environments is the variability. There are a lot of different distributed operating systems such as Windows, Unix (AIX, Sun, HP-UX, … ), or Linux (RHEL, SUSE, … ). Each of these come with different release and version levels. On top of that are the different hypervisors such as VMware, KVM, or HyperV. As a result, many of these have operating systems have dependencies tied to a specific hardware abstraction layer which is tied to physical or virtual systems. It is impossible for a recovery service provider to duplicate the exact same hardware configuration for all its customers, so in a disaster recovery situation, especially if there is a regional event, the hardware configuration at the recovery site will be different from what is being run in the production site. In fact,

pg. 13

IBM Copyright

it has a possibility of being different from what was being tested. Massive virtualization with z System using z/VM with its management tools such as IBM Wave greatly simplifies this so that a Linux image is recovered at the same level that it was running on in the primary production site with compatible hardware. How do you recovery on dissimilar hardware? With a finite amount of assets at the recovery site, a recovery service provider cannot mirror the specific hardware configuration for every client, including the server type, storage type, firewalls, load balancers, routers, gateways, etc. Kernel drivers may be tied to specific hardware, so before restored systems can be started, it may be required to first modify or update the operating system level and device drivers to match the target destination recovery hardware. Since the service providers may guarantees an “equal or greater” hardware platform, nothing is known ahead of time what will be used. If there are issues, then multiple skill sets are needed to do problem determination. Consequently, you may run into performance issues once the recovered systems come up due to applications being tied to specific hardware devices or some systems may just not be recover on the new hardware. This issue can be eliminated by running Linux on z. Even though there can be different levels of z System hardware (z196, zEC12, z13... ) at different driver levels, and all of the z System hardware is backwards compatible. In addition, the extreme virtualization provided by z/VM reduces the amount of hardware variability. Unlike disk is also an issue for the same reasons as unlike servers. Although SCSI attached disk is supported with Linux on z with support for FB format data, many chose to place the Linux data on ECKD formatted disk for advantages of system management, reliability, and less CPU consumption. This disk is storage agnostic. ECKD disk on any storage vendor all appear as the same generic (3390) disk due the standardized interface. This plus the fact that there is no internal disk on z System reduces complexities managing different disk devices and driver levels. Distributed systems need to restore the production images prior to restoring databases. These production images often have many different drive volumes (C-drive, D-drive, etc.) sometimes with a dozen or more drives for each system. This can easily amount to hundreds of drives that need to be restored using tools such as the Tivoli Storage Manager, Symantec NetBackUp servers, Fiber Channel libraries, etc. If restoring from tape, one quickly runs into a tape drive bottleneck. If restoring from LAN, then the network becomes a bottleneck. This process can typically take 6 or more hours to just bring up the backup restore servers before even database restores can be started. This process is not needed with System z. Just connect to the RESLIB volume containing the z/VM libraries, IPL the LPARs, and you have immediate access to the applications and data. Database restoration on distributed systems can also be an issue. If using tape, the manner in which tapes store data, and the data volume and file data size are a factor in restores. If it takes the mounting of multiple tapes to access a single server’s data to restore, then other systems are waiting for access to the tape drives. If the data is restored via the network, there needs to be enough network bandwidth on Backup/Restore server network adapters and LAN to restore the data. Crossing low bandwidth hops could cause a restore bottleneck. z Systems can run 50 - 100 restore jobs at a time if restoring from a fiber channel library media. Restores occur via SAN FICON environment and are NOT LAN based. In addition, there are 8x8 configurable FICON paths to reach 64GB bandwidth per subsystem. Production sites have hundreds or more servers. But for testing purposes the disaster recovery provider may not have the same number of servers available. For example, one could have 250 servers in

pg. 14

IBM Copyright

production, but only able to access 150 servers for a D/R test. This leaves a hole in the D/R plan since one is never sure that all the applications will come up without any problems. This is not the case with z System as the business resiliency providers all host enough z servers for valid testing. Finally, z System supports extensive end-end automation such as with GDPS (see section “GDPS Virtual Appliance”). This not only speeds up processes but more importantly removes people as a “Single Point of Failure” during recovery. It is designed to automate all actions needed to restart production workload in under 1 hour. This works with not only z/OS, but also Linux on z images. Due to the time needed to fully restore the distributed environment, bring up the applications, and resolve any data consistency issues between tape restores and disk remote copy restores, many sites have gotten to the point of just bringing up the distributed server environments and data after three days, then declared, “Success!” without actually running the applications or resolving the consistency issues. This leaves a big hole in the D/R testing with the possibility of unknown problems coming up should a real situation happen. z Systems are often fully restored and tested within a single shift. 5.1.2

D/R using inin-house recovery site

Due to the issues described above, many of the larger corporations have chosen to invest in the hardware and facility expenses for a dedicated in-house recovery site. This resolves many of the issues described above, but at the expense of costs to keep another copy of the physical hardware such as servers, disk, routers, etc. at the remote site, the floor space and energy usage of all the equipment, and the system management costs of making sure the D/R site stays at a mirror image of the production site. Consequently, many clients find themselves slowly moving production into their recovery environments over time to justify costs. Eventually the fine line between production and recovery environments will become blurred. Similarly, an additional complication is that sites often want to run Development and Test workloads on the recovery servers. In the wake of a disaster the testing infrastructure disappears just when it is needed the most since it becomes preempted for Production. One needs to plan for where this work will now be run. The most significant issue that is not resolved by in-house disaster recovery is the complexity of recovery. The more heterogeneous servers there are, the more one needs to constantly fine-tune and practice the D/R plan. Some considerations include: •

Documentation – Is the plan is well documented, detailed, easy to follow, consolidated, and current. In a real disaster, experienced staff may not be available.

•

Complexity of applications – Which applications are the critical ones that need to be restarted first? What about their dependencies? Is the e-mail system more important than customer facing applications so you can communicate problems that may come up during recovery?

•

Plethora of server types and levels – The more components, the more people on site are required to manage the recovery, and the more things can go wrong. What about compatibility of the different hardware and software levels. Does the configuration at the DR site reflect the same configuration at the Production sites?

pg. 15

IBM Copyright

•

Multiple disk types – How do you ensure data consistency across vendors? How do you protect against corrupted data? Do the tapes have all the needed current data

•

How many backup tools are used – For each tool are multiple people trained to use them?

•

Is there a plan to get back to the original environment – This is something often not taken into consideration. How do you resynchronize the data back on the original site?

Despite having an in-house recovery site, due to the complexity of trying to manage and control the recovery for hundreds or thousands of distributed servers, meeting the recovery time objective (RTO) is at times not obtainable. 5.1.3

Maintaining a Consistency Group

Once the databases are restored, they may not be usable. Different applications have different Recovery Time Objectives (RTO), or how long can the business accept the application being unavailable, and Recovery Point Objectives (RPO), how much data can the business afford to lose. The least expensive option is to make a copy of the database every 24 hours and send the tapes off site. This supports an RPO of 24 hours and RTO of typically three days. At the other end of the spectrum is disk to disk remote copy, which support an RPO of 0 (no data loss), with an RTO of 2 hours or less. A list of disaster recovery options can be found at: http://en.wikipedia.org/wiki/Seven_tiers_of_disaster_recovery . As one moves up the tiers the cost increases. With that in mind, many use different D/R options depending upon the application. Many recommend a mixed-tier approach with the D/R solution being used dependent on the recovery time objectives (RTO) and recovery point objections (RPO) for the applications being run. Some servers will then recover on a pre-staged dedicated assets and others on hot-site syndicated hardware made available within 24 hours of an event. In this case, systems and data may be recovered by order of priority with a staggered RTO. Critical database application data, usually on z system, can be made available within 4 hours or less, and applications / web services / data on distributed systems can be made available in 24 hours or greater. This causes complications. It is often the case that applications share common files. Not only that, but often “Tier 1” applications rely on data generated by “Tier 3” applications. How is data consistency maintained when some data is 30 seconds old, and others are 24 hours old? Are the applications run and the data corruption accepted? How is the corruption resolved? Can the required nightly batch jobs be run? Even if all the data is replicated by disk remote copy, due to the different disk vendors being used, there is still the issue of a common consistency group between the vendors. The IBM San Volume Controller (SVC) can resolve the issue of providing a single consistency group between disk vendors by using the same Metro Mirror or Global Mirror session for all the disk being virtualized under it. There are several tools that can be used to help manage and monitor the remote copy environment. This includes IBM Spectrum Control, Virtual Storage Center (VSC), Tivoli Productivity Center for Replication (TPC-R) and GDPS. Note that the GDPS Control LPAR requires z/OS with ECKD disk.

pg. 16

IBM Copyright

5.2 GDPS/PPRC AND GDPS VIRTUAL APPLIANCE In a real site disaster, there is not have the luxury of several weeks’ notice to update D/R plans and get the key personnel to the remote site ahead of time. In fact, the key personnel may be unavailable, may not physically be able to get to the remote site or connect to it through the network, may have other priorities such as the physical safety of family members or the home, or may not survive the event. GDPS is an integrated end-to-end automated disaster recovery solution designed to remove people as a Single Point of Failure. There are several different flavors of GDPS, depending on the type of replication being used. It includes GDPS/PPRC HyperSwap Manager and GDPS/PPRC to manage and automate synchronous Metro Mirror replication, GDPS/XRC and GDPS/Global Mirror for asynchronous replication, and GDPS/Active-Active based on long distance software based replication. GDPS/PPRC enables HyperSwap, the ability to dynamically switch to secondary disk without requiring applications to be quiesced. Swapping in under seven seconds user impact time for 10,000 device pairs, this provides near-continuous data availability for planned actions and unplanned events. It provides disk remote copy management, and data consistency for remote disk up to 200 km away with qualified DWDMs. GDPS/PPRC is designed to fully automate the recovery at the remote site. This includes disk reconfiguration, managing servers, Sysplex resources, CBU, activation profiles, etc. GDPS/PPRC can be used with any disk vendor that supports the Metro Mirror protocol. GDPS automation includes: • • • • • • • • • • • • • • • • •

Disk error detection HyperSwap for disk availability “Freeze” capability to ensure data consistency with intelligent determination of freeze trigger (mirroring or disk issues) Perform disk reconfiguration Perform tape reconfiguration Perform CF reconfiguration Manage CBU / OOCUoD policies Manage STP configuration Shut down discretionary workload on Site 2 Load Production IODF Modify activation profile on HMC IPL Prod LPARs Respond to startup messages Initiate application startup Verify network connections Manage z/OS resources such as Couple Data Sets, checkpoint data sets, etc. “Toggle” between sites

All this is done to support a Recovery Time Objective (RTO) less than an hour with a Recovery Point Objective (RPO) of zero. GDPS/PPRC is application and data independent. It can be used to provide a consistent recovery for z/OS as well as non-z/OS data. This is especially important for when a multi-tier application has dependencies

pg. 17

IBM Copyright

upon multiple operating system architectures. It is not enough that z/OS data is consistent, but it needs to be consistent with non-IBM System z® data to allow rapid business resumption. As well as everything listed above, additional automation of the Linux on z environment includes: • Coordinated Site Takeover with z/OS • Coordinated HyperSwap with z/OS • Single point of control • Coordinated recovery from a Linux node or cluster failure • Monitor heartbeats for node or cluster failure • Automatically re-IPL failing node(s) in the failing cluster • Data consistency across System z, Linux and/or z/VM • Disk Subsystem maintenance (planned actions) • Non-disruptively HyperSwap z/VM and guests or native Linux • Live Guest Relocation • Orderly shutdown / startup • Start / Stop Linux clusters and nodes • Start / Stop maintenance mode for clusters and nodes • Disk Subsystem failure (unplanned actions) • Non-disruptively HyperSwap z/VM and guests following a HyperSwap trigger • Policy-based order to restart Linux clusters and nodes • Single point of control to manage disk mirroring configurations GDPS Virtual Appliance is based on GDPS/PPRC, designed for sites who do not have z/OS skills to manage the GDPS control system (“K-Sys”). The GDPS Virtual Appliance delivers the GDPS/PPRC capabilities through a self-contained GDPS controlling system that is delivered as an appliance. A graphical user interface is provided for monitoring of the environment and performing various actions including maintaining the GDPS control system, making z/OS invisible to the system programmers. This provides IBM z Systems customers who run z/VM and their associated guests such as Linux on z Systems with similar high availability and disaster recovery benefits as what is available for z/OS systems. The automation capability of GDPS is unique and without peer in the distributed world.

6 SUMMARY With the proliferation of intelligent phones and mobile computing, users increasingly have higher expectations for availability and when service is unavailable, it is easier to share frustrations on social media to friends. Users getting this information can cause dissatisfaction with a brand, even if they personally were not even affected. This impacts customer retention and the bottom line profitability. If given a choice between what infrastructure to place customer-facing and mission-critical applications, one would want to choose the platform that can provide the most benefit for the corporation. Much has been written about the Total Cost of Ownership (TCO) benefits of Linux on z System including what is found at www.ibm.com/systems/z/os/linux/resources/doc_wp.html , even without considering the availability impacts to cost. When one adds the benefits of a highly available and secure hardware base, extreme virtualization that is also designed to share hardware resources, additional RAS customization

pg. 18

IBM Copyright

supplied by the Linux distributor, and fast and automated end-end disaster recovery, placing the Linux applications on z System becomes the best choice for the business.

APPENDIX A – SELECTED Z SYSTEM AVAILABILITY FEATURES FEATURES

6.1 UNPLANNED OUTAGE AVOIDANCE Unplanned outage avoidance by using “n+1” components is what one normally thinks of when thinking about availability, the z System goes well beyond that. A partial list of availability features include: •

Power: o o o o

N+1 Power subsystems N+1 Internal batteries Dual AC inputs Voltage transformation module (VTM) technology with triple redundancy on the VTM.

•

Cooling o Hybrid cooling system o N+1 blowers o Modular refrigeration units

•

Cores: o Dual instruction and execution with instruction retry o Concurrently checkstop individual cores without outage o Transparent CPU Sparing so if there is a problem with a core, then spares that come with the server would detect this and take over. This would be invisible to the applications and they would continue without any interruption. o Point to point SMP fabric

•

Memory: o Redundant Array of Independent Memory (RAIM). Based on the RAID concept for disk, memory can be set up to recover if there are any failures in a memory array. This provides protection at the dynamic random access memory (DRAM), dual inline memory module (DIMM), and memory channel levels. o Extensive error detection and correction from DIMM level failures, including components such as the controller application specific integrated circuit (ASIC), the power regulators, the clocks, and the board o Error detection and correction from Memory channel failures such as signal lines, control lines, and drivers/receivers on the MCM o ECC on memory, control circuitry, system memory data bus, and fabric controller o Dynamic memory chip sparing

pg. 19

IBM Copyright

o o o o

Hardware memory scrubbing Storage protection facility Memory capacity backup Partial memory restart

•

Cache / Arrays o Translation lookaside buffer retry / delete o Redundant branch history table o Concurrent L1 and L2 cache delete o Concurrent L1 and L2 cache directory delete o L1 and L2 cache relocate o ECC for cache

•

Input / Output o FCP end to end checking o Redundant I/O interconnect o Multiple channel paths o Redundant Ethernet service network w/ VLAN o System Assist Processors (SAPs) o Separate I/O CHPIDs o Shared I/O capability o Address limit checking o Dynamic path reconnect o Channel subsystem monitoring

•

Security o Integrated cryptographic accelerator o Tamper-resistant Crypto Express feature o Trusted Key Entry (TKE) 5.2 with optional Smart Card reader o EAL Level 5 certified – the only platform that attained this level

•

General o Extensive testing of all parts, components, and system during the manufacturing phases o Comprehensive field tracking o Transparent Oscillator failover o Automatic Support Element switchover o Service processor reboot and sparing o ECC on drawer interconnect o Redundant drawer interconnect o Frame Bolt Down Feature o Storage Protection Keys o FlashExpress (improved dump data capture)

pg. 20

IBM Copyright

6.2 PLANNED OUTAGE AVOIDANCE AVOIDANCE Another aspect of availability is avoidance of planned outage. Some System z features in support of this include: •

Power • Concurrent internal battery maintenance • Concurrent Power maintenance

•

Cooling • Concurrent Thermal maintenance

•

Cores • Concurrent processor book repair / add • Transparent Oscillator maintenance

•

Memory • Concurrent memory repair / add • Concurrent memory upgrade • Concurrent memory bus adapter replacement • Concurrent MBA hub upgrade • Concurrent repair on all parts in an I/O cage • Upgrade on any I/O card type • Concurrently checkstop individual channels • Concurrent STI repair • Concurrent I/O cage controller maintenance • Dynamic I/O reconfiguration • Hot-pluggable I/O • Transparent SAP sparing • Dynamic SAP reassignment • Dynamic I/O Enablement

•

Security • Dynamically add Crypto Express processor • Concurrent Crypto-PCI upgrade

•

General • Concurrent Microcode (Firmware) updates – Install and Activate driver levels and MicroCode Load (MCL) levels based upon bundle number while applications are still running. • Concurrent major LIC upgrades (CPUs, LPAR, channels), OSA, Power and Thermal, Service Processor, HMC, … ) • Dynamic Swapping of Processor Types • On/Off Capacity Upgrades on Demand (OOCUoD) • Capacity Backup (CBU) • Concurrent service processor maintenance

pg. 21

IBM Copyright

• •

Dynamic logical partition (LPAR) add Dynamic add Logical CP to a Partition

APPENDIX B – REFERENCES ZSW03236USEN Servers

High-Availability of System Resources: Architectures for Linux on IBM System z

ZSL03210USEN

Comparing Virtualization Methods

IBM Systems Journal

http://researchweb.watson.ibm.com/journal/index.html

SG24-6374

GDPS Family: An Introduction to Concepts and Capabilities http://www.redbooks.ibm.com/redpieces/abstracts/sg246374.html?Open

GDPS Home Page

http://www.ibm.com/systems/z/advantages/gdps/index.html

Linux on z Systems Tuning http://www.ibm.com/developerworks/linux/linux390/perf/tuning_diskio.html Linux on System z Disk I/O Performance http://www.vm.ibm.com/education/lvc/LVC0918.pdf Effectively running Linux on IBM System z in a virtualized environment and cloud – http://events.linuxfoundation.org/sites/events/files/eeus13_mild.pdf Mainframe Total Cost of Ownership Issues http://www-01.ibm.com/software/htp/tpf/tpfug/tgs07/tgs07e.pdf

pg. 22

IBM Copyright