XSeDe: Accelerating Scientific Discovery

4 downloads 65445 Views 1MB Size Report
uted infrastructure, support services, and technical ... tecture are XSEDE's professional systems engineer- .... simple 2D plots to complex 3D visualizations with ..... tickets. Results inform continuous improvement of. XSEDE. allocations.
Scientific Cyberinfrastructure

XSEDE: Accelerating Scientific Discovery John Towns | University of Illinois Tim Cockerill and Maytal Dahan | University of Texas at Austin Ian Foster | University of Chicago Kelly Gaither | University of Texas at Austin Andrew Grimshaw | University of Virginia Victor Hazlewood | University of Tennessee Scott Lathrop | Shodor Education Foundation Dave Lifka | Cornell University Gregory D. Peterson | University of Tennessee Ralph Roskies and J. Ray Scott | Pittsburgh Supercomputing Center Nancy Wilkins-Diehr | San Diego Supercomputer

Driven by community needs, the Extreme Science and Engineering Discovery Environment (XSEDE) project substantially enhances the productivity of a growing community of scientists. XSEDE’s integrated, comprehensive suite of advanced digital services federates with other high-end facilities and with campus-based resources, serving as the foundation for a national e-science infrastructure ecosystem.

S

cientists are increasingly integrating distributed resources and instruments directly into their research and educational pursuits. Researchers use advanced digital resources and services every day to expand their understanding of our world. Computational technologies and resources such as supercomputers, visualization systems, storage systems, and collections of data, software, and networks—what we refer to as advanced digital services—are critical to the success of those researchers. Access to an array of integrated and well-supported high-end digital services is necessary for the advancement of knowledge in many domains spanning research supported by many funding agencies (such as the US National Science Foundation, Department of Energy, National Institutes of Health, Department of Defense, and National Oceanic and Atmospheric Administration). More pointedly, research now requires more than just supercomputers, and

62

CISE-16-05-Towns.indd 62

Computing in Science & Engineering

the Extreme Science and Engineering Discovery Environment (XSEDE) represents a step toward a more comprehensive and cohesive set of digital services. Imagine someone creating a new environmental observatory that collects real-time video from thousands of sites around the country, with those video streams and myriad other instrument feeds collected, processed, integrated, shared, and distributed to scientists and educators. We want new and existing communities to access such digital services at a fraction of the cost they would otherwise incur because they can outsource much of their complexity to XSEDE. According to the NSF workshop report History & Theory of Infrastructure: Lessons for New Scientific Cyberinfrastructures, “Cyberinfrastructure is the set of organizational practices, technical infrastructure and social norms that collectively provide for the smooth operation of research and education work at a distance. All three are objects of design and

1521-9615/14/$31.00 © 2014 IEEE

Copublished by the IEEE CS and the AIP

September/October 2014

9/16/14 2:59 PM

e­ ngineering; a cyberinfrastructure will fail if any one is ignored.”1 Given that the term cyberinfrastructure doesn’t lend itself to intuitive interpretation, we’ve adopted the more contemporary term “e-science infrastructure” (or e-infrastructure). Other major domain-specific science projects—the Large Hadron Collider, the Southern California Earthquake Center, the Network for Earthquake Engineering Simulation, the Ocean Observing Infrastructure project, and the iPlant Collaborative—have developed, or are working to develop, integrated e-infrastructures for specific research challenges. The Advanced Cyberinfrastructure (ACI) Division of NSF’s Computer and Information Science and Engineering (CISE) Directorate funded XSEDE initially as a five-year, $130 million effort to extend the scope and impact of the preceding HPC-focused TeraGrid2 project to offer a more comprehensive, powerful solution: XSEDE is a virtual organization that provides a dynamic distributed infrastructure, support services, and technical expertise that enable scholars, researchers, and engineers to address the most important and challenging problems. XSEDE is an integrated e-science infrastructure ecosystem with well-defined interfaces for allocations, support, and other key services that researchers can use to interactively share computing resources, data, and expertise. XSEDE is implementing tools, methods, and policies to federate with other high-end facilities and with campus-based resources, serving as the foundation for a national e-science infrastructure ecosystem. Common authentication and trust mechanisms, global namespace and filesystems, remote job submission and monitoring, and file transfer services are further examples of XSEDE’s advanced digital services. XSEDE’s innovative, open architecture facilitates an unparalleled level of integration, making possible the continuous addition of new technology capabilities. Enabling this architecture are XSEDE’s professional systems engineering approach and technology identification efforts, which ensure robustness and security while continuously strengthening XSEDE by incorporating new and improved technologies and services driven by the evolving needs of existing, emerging, and new communities. XSEDE is led by principal investigators (PIs) from the following: ■■

National Center for Supercomputing Applications, University of Illinois (NCSA; www.ncsa. illinois.edu);

■■

■■

■■

■■

These are world-class e-infrastructure centers with vast experience; XSEDE includes technology and education partners who strongly complement their expertise. Here, we’ll discuss how these centers provide support and ongoing expansion, so that the scientific community can utilize XSEDE’s advanced digital services. Managing the XSEDE project presents some unusual challenges in its scale, complexity, and extremely distributed nature; we’ll look at some of these issues and how they’re addressed. XSEDE’s Impact To provide a better sense of XSEDE’s use and impact, let’s consider some exemplars of how XSEDE enables discovery. Impact on Wall Street Mao Ye, assistant professor of finance at the University of Illinois at Urbana-Champaign, and colleagues Chen Yao (also of Illinois) and Maureen O’Hara (from Cornell), used XSEDE resources at PSC and SDSC and XSEDE’s Extended Collaborative Support Service to analyze the impact of high-frequency trading (HFT) on the stock market. The result was that both NASDAQ and NYSE changed their policy on reporting odd-lot trading. Previously, odd lots (trades of less than 100 shares) hadn’t been revealed on the publicly available “consolidated tape,” with only big investment banks and sophisticated computer-powered highfrequency traders paying to see them from individual exchanges. It was thought that they were used only by small retail investors and thus weren’t important. The work of Ye and his colleagues discovered that more and more big trades were being “sliced and diced” to less than 100 shares, so they remained hidden. Thanks to their work, US regulators have since decided that all odd lots will be included in the publicly available “consolidated data.” On 9 December, 2013, millions of previously hidden US stock trades were revealed for the first time.

www.computer.org/cise

CISE-16-05-Towns.indd 63

National Institute for Computational Sciences, University of Tennessee (NICS; www.nics.­ tennessee.edu); Pittsburgh Supercomputing Center (PSC; http:// www.psc.edu); Texas Advanced Computing Center, University of Texas at Austin (TACC; www.tacc.utexas. edu); and San Diego Supercomputer Center, University of California, San Diego (SDSC; www.sdsc.edu).



63

9/16/14 2:59 PM

Scientific Cyberinfrastructure

18.3% cancelled in 0.001 second

26.1% cancelled in >1 second

24.2% cancelled in 0.001–0.05 second

31.4% cancelled in 0.05–1 second

Figure 1. Fleeting orders: On 30 August 2011, approximately 3 million orders were submitted to the NASDAQ exchange to trade the stock SPDR S&P 500 Trust (ticker symbol SPY). This image shows that 18.3 percent of the orders were cancelled within 1 millisecond, and 42.5 percent of orders had a lifespan of less than 50 milliseconds, less time than it takes to transfer a signal between New York and California. More than 40 percent of orders, in other words, disappeared before a trader in California could react.

Figure 2. Carbon dioxide molecules (gray and red) trapped in a metal organic framework (MOF).

In addition, the US Senate is discussing another Ye proposal that would put a speed limit on HFT. As shown in Figure 1, for the time period analyzed, more than 40 percent of all trades were submitted and canceled in a period of time before a trader (and the computer-driven system for placing orders associated with high-frequency traders) not in close physical proximity could even notice—much less react to—their existence. This creates a bias in the system favoring those physically close to the trading system. 64

CISE-16-05-Towns.indd 64



Improving Carbon Capture Human-generated carbon dioxide is “likely” (with a greater than 66 percent probability) to be altering the global climate.3 But capturing the gas from smokestacks and other waste streams could allow continued use of fossil fuels with much less damage to the climate and the economy. A family of materials called metal organic frameworks (MOFs) can be “tuned” to trap specific target molecules. A team led by Patrick Nugent of the University of South Florida and Youssef Belmabkhout of the King Abdullah University of Science and Technology tested a series of MOFs in the lab to measure the materials’ performance in trapping carbon dioxide. Brian Space at the University of South Florida in turn used these data in computer models to suggest new MOFs to test in the laboratory. The back-and-forth between the computer models and the laboratory allowed the researchers to produce MOFs that are superior in capturing carbon dioxide (see Figure 2). The new materials can trap carbon dioxide even in the presence of water vapor, a real-life component of exhausts that prevented previous candidate materials from working effectively. The researchers used a number of XSEDE supercomputers, including PSC’s Blacklight, SDSC’s Trestles, TACC’s Ranger and Georgia Tech’s Keeneland, taking advantage of each machine’s strengths in different phases of the work. HPC Helps Win the 2013 Nobel Prize in Chemistry Some skeptics still wonder whether anyone can make fundamental discoveries in biochemistry by using high-performance computing (HPC). The awarding of the 2013 Nobel Prize in Chemistry to Martin Karplus, Michael Levitt, and Arieh Warshel should allay that concern and provide clear evidence of the value of HPC for making discoveries. Using resources in XSEDE’s NSF HPC predecessor, they established how simulating interactions between the constituent atoms, using both classical and quantum mechanics, can predict the behavior of biomolecules. Sven Lidin, Chairman of the Nobel Committee for Chemistry, likens this approach to doing chemistry outside of a traditional laboratory. “This is how far theoretical chemistry has come … to us, ‘theory’ has become the new ‘experiments.’” HPC has become a pervasive tool in almost all fields of modern science. It took more than 20 years for the Karplus, Levitt, Warshel work to earn the 2013 Nobel prize. It wouldn’t be surprising if some of today’s more than 5,000 XSEDE users become future Nobelists. September/October 2014

9/16/14 2:59 PM

Current Resources and Services XSEDE supports an evolving set of hardware and software resources. Through funding from NSF and other state and federal agencies, new resources are regularly being introduced into XSEDE’s portfolio. The more traditional systems include large, tightly coupled clusters, massively parallel systems, and large, shared memory resources. Recently, XSEDE has incorporated systems oriented toward data-­ intensive research and high-throughput computing. The list of currently available resources is at www. xsede.org/resources. Each computing resource has associated with it disk storage, including both a wide-area file system spanning the resources, and archival storage resources for longer term storage. Disk capacities for each resource are currently in the range of 1–10 petabytes. Archival storage resources provide tens to hundreds of petabytes, and these are expected to grow with time as technological innovations are deployed for production use. A wide range of software is available on various XSEDE resources supporting diverse fields of study in addition to many tools and support software. Several hundred software packages are supported and a complete list is maintained at www.xsede. org/software. Included in this set are visualization tools that span a broad range of applicability, from simple 2D plots to complex 3D visualizations with high resolution rendering. All systems supported by XSEDE have consistently formatted user documentation to facilitate use by the community. These are complemented by documentation associated with software supported on the resources. A number of programming models and usage modalities are supported on XSEDE resources from highly scalable message passingbased applications to threaded parallelism to high throughput and ensemble computing. Likewise, a broad range of data types and data formats are also supported. XSEDE has a growing set of self-paced, Webbased and in-person training offerings covering topics from introductory programming to advanced use of systems and applications. A complete list of training is available at www.xsede.org/training. The Case for XSEDE XSEDE provides access to a comprehensive set of high-end digital services, integrated into a generalpurpose infrastructure. It’s judiciously distributed but architecturally and functionally integrated. The motivation for this approach and infrastructure emerges from two fundamental arguments.

First, scientific advancement across multiple disciplines requires a variety of resources and services and thus the availability of comprehensive e-infrastructure composed of heterogeneous digital resources. These requirements are well-motivated and described in a number of reports,4–7 and in particular, in the reports of the NSF Advisory Committee for Cyberinfrastructure’s (ACCI) six task forces that address long-term cyberinfrastructure issues.8–13 These reports led to the development of two key documents from the NSF: Cyberinfrastructure Vision for 21st Century Discovery (CIF21)14 and a subsidiary document, Advanced Computing Infrastructure: Vision and Strategic Plan.15 The second argument, learned from more than 30 years of experience though underpinning NSF’s high-end computing programs for nearly three decades, is that high-end computational science is better served if these capabilities leverage the aggregate expertise of a small number of leading institutions rather than being fully centralized at a single institution or being fully decentralized. Full centralization leads to less agility in adapting to changing user demands and to a single point of failure for the entire high-end computational science and engineering enterprise. Different sites each offer unique human talent to address particular community needs, whether in architecture or the expression of particular algorithms and methodologies. For the nation’s scientific vitality, it’s best to have several leadership perspectives for addressing the broad range of disciplinary needs. Users will be able to work most closely on resources, and with people, at sites they find most effective in addressing their particular needs and challenges. This point was clearly articulated in the Community Input on the Future of HighPerformance Computing Workshop sponsored by the NSF ACCI HPC Task Force held in December 2009 in the very first recommendation: Supporting a relatively small number of centers and equipping these centers with diverse and complementary machines that are upgraded regularly will provide continued access to HPC resources, with the added benefit of expanding the variety of state-of-the-art platforms available to researchers. NSF coordination, balancing and oversight of these centers, with an emphasis on sharing distributed responsibilities, would ensure uninterrupted access to leading-edge computing resources.16

Even with a heterogeneous set of digital resources and services and the leverage of expertise of

www.computer.org/cise

CISE-16-05-Towns.indd 65



65

9/16/14 2:59 PM

Scientific Cyberinfrastructure

multiple institutions, it’s only through the support for tight yet flexible integration and interoperability of these resources and services that a growing number of scientific research activities can move forward efficiently and effectively. This is the foundational motivation for XSEDE, which will support “progress toward the resolution of (Grand Challenge) problems … (which) will require unusual coordination of and collaboration between [italics added] the diverse communities of researchers.”17 XSEDE’s vision is a world of digitally enabled scholars, researchers, and engineers participating in multidisciplinary collaborations while seamlessly accessing computing resources and sharing data to tackle society’s grand challenges. To realize this vision, XSEDE will substantially enhance the productivity of a growing community through access to advanced digital services that support open research. Making codes run faster and more easily allows researchers to get more science done in a fixed amount of time. Lowering the barrier for access to and use of digital services enables additional research in established communities and in new communities who haven’t harnessed these services to date. Such productivity increases can be the difference between an infeasible project and a feasible one, reducing the time to publishing scientific findings. XSEDE strives to provide the perception of a single environment rather than a set of different resources within different administrative domains. That system can include not only local resources connecting to XSEDE, but also any additional frequently accessed resources (such as campus systems, national resources, and collaborators’ resources, databases, or instruments). Compute and data resources should be accessible in a uniform fashion. Productivity for those utilizing multiple sites is greatly enhanced by XSEDE via features such as: single sign-on extended to support campus c­redential-based authentication, submission of a single resource allocation request with a single review process, a single name-space for files, documentation with a standard organization and look-and-feel, and a coordinated system of helpdesk support, general user support, and extended collaborative support services. XSEDE also makes it easier for researchers to migrate jobs from heavily used systems to those with more availability or to quickly access resources on architectures which users discover would be more suitable for portions of their problems. The lowering of usability barriers facilitates new communities’ incorporation of advanced digital resources into their regular work environment and will unleash new developments in science that these communities are only beginning to articulate. 66

CISE-16-05-Towns.indd 66



XSEDE’s Goals and Measuring Progress To support XSEDE’s mission and to guide the project’s activities toward the realization of XSEDE’s vision, three strategic goals are defined: ■■

■■

■■

Deepen and extend—XSEDE will deepen the use—make more effective use—of the advanced digital research services ecosystem by existing scholars, researchers, and engineers, and extend the use to new communities. We’ll contribute to preparation—workforce development—of the current and next generation of scholars, researchers, and engineers in the use of advanced digital technologies via education, training, and outreach; and we’ll raise the general awareness of the value of advanced digital research services. Advance the ecosystem—exploiting its internal ­efforts and drawing on those of others. XSEDE will advance the broader ecosystem of advanced digital services by creating an open and evolving e-infrastructure, and by enhancing the array of technical expertise and support services offered. Sustain the ecosystem—XSEDE will sustain the advanced digital services ecosystem by assuring and maintaining a reliable and secure infrastructure, and providing excellent user support services. XSEDE will further operate an effective, productive, and innovative virtual organization.

Each strategic goal contains subgoals that define the objectives to be met for successfully delivering the project’s mission and realizing the project’s vision. For each of the goals we track key performance indicators (KPIs) to measure progress toward the goal. While the goals and associated activities below are clearly linked, they fall under the high-level umbrella project goal of “accelerating science” or “science impact.” This overarching high-level goal is the collective realization of all of the activities of XSEDE. Communities Supported by XSEDE The national, and global, community that relies on XSEDE annually involves more than 10,000 researchers as of the calendar year (CY) 2013, with a sustained growth rate of more than 10 percent per year, working on approximately 2,500 projects in an expanding number of disciplines. This community spans hundreds of institutions in all 50 states plus the District of Columbia, Puerto Rico, and the Virgin Islands. XSEDE’s reach currently includes support to hundreds of international collaborators September/October 2014

9/16/14 2:59 PM

of US researchers at over 100 institutions in more than 35 countries, reflective of the increasingly international nature of collaborative research. The XSEDE User Portal To effectively serve the diverse audience of users, we realized the value of a unified Web resource for services and information geared specifically to the XSEDE user community. The XSEDE User Portal (XUP) offers detailed, current, and reliable information about the hardware, software, and services available via XSEDE. New users can easily create an account in minutes. The XSEDE User Portal is the central source for XSEDE users to interactively manage their allocations, login to XSEDE systems, transfer files, manage profiles, add publications, read and subscribe to user news, receive user support, register for training events, and communicate with other users on the user forums. XSEDE also offers an XUP mobile Web version that provides access to allocation information, resource information, user news, documentation, training, and more. The goal of the XUP is to lower the barrier of entry into XSEDE and enable scientists to be more productive by giving them direct control of the services and resources at their fingertips. Supporting and Expanding the Community XSEDE’s services are available at no cost to the open research community. Many aren’t aware that these resources exist or don’t make the most effective use of them. The XSEDE team possesses deep expertise in the operation and use of advanced digital resources and offers this expertise to the community. We continually work to raise awareness across the nation of XSEDE’s resources and services—in new domain areas, among underrepresented demographic populations, and among the next generation of computational researchers, educators, innovators, decision makers, and students. Extended Collaborative Support Services The Extended Collaborative Support Service (ECSS) improves the productivity of the XSEDE user community through meaningful collaborations, consultations, and training activities, and expands the XSEDE user base by engaging members of underrepresented communities and diverse domain areas. Members of the community are paired with expert ECSS staff to work together to solve challenging problems from code optimization to workflows to scientific visualizations.

I­ n-depth support, lasting from weeks to a year, can be requested at any time through the XSEDE allocations process. ECSS leverages the skills of more than 80 technical experts from a dozen institutions covering many scientific disciplines and computational techniques. ECSS provides support in the following five key areas. Extended support for research teams. ECSS works

with individual (or small groups of) researchers to optimize their application codes, improve their work and data flows, and increase the effectiveness of their use of digital infrastructure.

Novel and innovative projects. ECSS proactively

identifies subfields that aren’t traditional users of XSEDE who may be empowered to make transformative breakthroughs by exploiting XSEDE’s resources and services. Examples include genomics, history, digital humanities, computational finance, epidemiology, social network analysis, media analytics, and machine learning. XSEDE has enabled researchers in very diverse fields of study to reach well beyond what they previously could achieve.

Extended support for community capabilities. ECSS pairs an expert with developers of a widely used community code to deploy, harden, and optimize the software to increase productivity not only of a particular research team, but of an entire community. For example, we worked with developers of the genomics codes All-Paths and Trinity to deploy these codes on various XSEDE systems in response to requests from many different researchers. Such efforts are initiated by requests from users, or internally in consultation with our User Advisory Committee, which helps identify codes of widespread interest. This team also characterizes availability and usage of community codes. Extended support for science gateways. ECSS pairs

experts with researchers interested in Web portals, also called science gateways, to serve various communities. Commonly, a research team has an existing science gateway, but needs more powerful supporting resources. Many gateway users benefit from advanced computing resources without having to learn how to use those resources. Increasingly, we support individual principal investigators who want to make their codes more accessible to larger communities through Web interfaces. In 2013, over 40 percent of XSEDE users accessed XSEDE through one of more than 35 science gateways.

www.computer.org/cise

CISE-16-05-Towns.indd 67



67

9/16/14 2:59 PM

Scientific Cyberinfrastructure

Extended support for training, education, and outreach. ECSS provides the technical expertise needed

for our training and outreach efforts. We conduct advanced training, engage communities at professional meetings, and deliver tutorials. To improve the skills of the Campus Champions (discussed more in a bit) and spread ECSS expertise to campuses, we initiated the Campus Champion Fellows program, which pairs a Campus Champion with an ECSS expert to work closely on a real-world project for one year. Fellows then apply their acquired expertise to better serve their campus community. As a part of ongoing professional development efforts, ECSS’ monthly public symposium broadcasts highlight work on ECSS projects for the benefit of the broader community as well as XSEDE’s own staff. User Services Not surprisingly, XSEDE can’t provide an extended level of support to all 10,000+ researchers annually making use of XSEDE resources and services. We provide a broad set of support services to help users effectively utilize the XSEDE ecosystem. We’re also focused on attracting and preparing future users to do the same. Training. This service is designed to develop and en-

hance the skills of the community for the effective conduct of research and education activities utilizing XSEDE services. The training offerings range from beginning to advanced topics. Delivery modalities include interactive in-person classes, online self-paced training modules, and Web-based offerings—including video-based training and hybrid courses where the primary instructor is remote but sites bring in local assistants for hands-on sessions. Education and training work together to issue certificates of core competencies gained. Materials from all of the XSEDE training and education events are made freely available via the HPC University repository (http://hpcuniversity.org/trainingMaterials).

User information and interfaces. This type of support enables the discovery, understanding, and effective utilization of XSEDE’s powerful capabilities and services by the community. The User Information and Interfaces team develops APIs that include authenticated interfaces to our services and features such as allocations, user news, tickets, user guides, documentation, queues, and resource information. The same team develops new and improved mobile interfaces, which users increasingly rely on for access to XSEDE services.

68

CISE-16-05-Towns.indd 68



User engagement. The staff interact with the community to gauge their productivity and satisfaction, and to provide a mechanism for collecting and reporting on emerging needs and requirements. This is accomplished by conducting user surveys, interviewing users, and performing analyses of help tickets. Results inform continuous improvement of XSEDE. Allocations. This service facilitates easy access to

XSEDE’s advanced digital resources, enabling scientists to achieve their research and education goals. Multiple allocations mechanisms are supported including: startup allocations (requested at any time) that enable researchers to explore the XSEDE environment for suitability to their needs; education and training allocations (requested at any time) that enable PI’s to utilize XSEDE to teach courses or provide training; and annual research allocations, which can be requested quarterly, to enable PI’s to conduct their science and engineering research. Education and Outreach The Education and Outreach (E&O) team’s mission is to be the national resource center for computational science and engineering education and outreach, and a matchmaker between the community and XSEDE’s broad array of resources and services. We seek to increase the number of new users and/or participants, especially from among underrepresented groups. To increase access to and the use of XSEDE resources and services on campuses across the country, our efforts include programs in education, champions, under-­represented community engagement, student engagement, and campus bridging. The Education team prepares a diverse community of current and next-generation researchers, scholars, educators, and practitioners in the use of data analysis and management, modeling, simulation, and visualization techniques. The Education program focuses on working with faculty to incorporate computational tools, resources, and methods into undergraduate and graduate certificate and degree programs across all disciplines. XSEDE provides a set of core undergraduate and graduate competencies as a foundation for developing course curricula, a series of curriculum development workshops, and a rich repository of reviewed quality learning materials. Besides providing an imprimatur, this work reduces duplication of effort and helps universities develop curricula much more quickly. September/October 2014

9/16/14 2:59 PM

The Champions Program engages a larger, more diverse, and sustained community of researchers, and builds campus capacity to provide digital resource support to researchers, educators, staff, and students across the country. The Campus Champions Program identifies individuals across the country who can raise awareness of XSEDE on their campus, quickly get new users started with XSEDE resources and services, help individuals get their own allocations, serve as a source of general information about XSEDE, help arrange local training, and provide XSEDE with timely feedback to improve its resources and services. Growth, enthusiasm, and success have led us to expand to include three new types of Champions: ■■

■■

■■

regional champions, providing an increased level of support to champions within a region to scale the program nationally; student champions, assisting campus champions in serving their local community; and domain champions, advising researchers, faculty, and students in specific fields of study.

The Scholars Program selects under-represented minority and female students from majority serving and minority serving institutions. The Scholars attend the annual XSEDE conference, attend webinars to learn about computational science tools, and gain skills to be more competitive for summer internship and Research Experiences for Undergraduates (REU) opportunities. The Summer Research Experience matches students with XSEDE-focused projects each summer. The students receive mentoring from project personnel during a summer immersion program, and an opportunity to present their work at the XSEDE annual conference. XSEDE Campus Bridging makes it convenient and intuitive for users to simultaneously employ their research group, departmental, and campus systems (on their own campus and at other campuses), and XSEDE resources and services—as transparently and easily as possible. The two-part mission is as follows: ■■

The Under-represented Community Engagement team engages a significant number of faculty and students from under-represented groups in utilizing XSEDE resources and services. To expand the pool of well-prepared scientists and engineers, the changing dynamics of the US population demands that we must also work to increase the number of currently under-represented students in computational science and engineering fields. To quote Richard Tapia of Rice University: “No first-world nation can maintain its economic health when such a large part of its population is outside mainstream activity including all technological, scientific, and computational activity.”18 XSEDE’s Minority Faculty Council provides recommendations and strategies for broadening participation. Additionally, the Minority Research Community (MRC) was formed to build community, promote peer support, seek assistance, and share information among researchers and faculty at Minority Serving Institutions (MSI). The MRC conducts webinars and conference calls that help introduce Campus Bridging resources, science gateways, allocations, technical assistance, and opportunities for student engagement among MSI campuses. Student engagement is key, especially from underrepresented groups, to create a vibrant, diverse, and collaborative community empowered through mentoring, education, and training to expand the knowledgeable HPC workforce for generations to come.

■■

The Campus Bridging team is creating and distributing tools that will align the national e-infrastructure to broaden usage, improve interoperability between campus-based resources and XSEDE, and widely deploy educational resources to aid development of a 21st-century knowledge workforce. Delivering and Supporting New Capabilities in an Evolving Environment Now let’s look at how XSEDE supports scientific needs in a time of continuous innovation and ongoing developments. Architecture for a National Distributed e-Science Infrastructure Ecosystem XSEDE is developing and deploying an advanced ­distributed systems architecture rooted in researcher

www.computer.org/cise

CISE-16-05-Towns.indd 69

User facing—to make it possible for researchers, educators, and students to access remote digital resources (across XSEDE, other campuses, and so on) individually or in concert, as simply as if they were peripherals attached to their own personal laptop or computer. National leadership—to disseminate software, training materials, and information that enables the US research community to leverage the nation’s aggregate e-infrastructure (XSEDE, or other federally-funded, state, campus, and laboratory resources) to maximize US innovation and global competitiveness.



69

9/16/14 2:59 PM

Scientific Cyberinfrastructure

r­ equirements and hardened by systems engineering that brings new capabilities to the community and allows for individualized experiences, consistent and ­enduring software interfaces, improved data management, and mechanisms for campus resources to be transparently integrated into the overall XSEDE ecosystem. With the support of our partners at the Software Engineering Institute—a leader in the field of software engineering—we extended and adapted traditional systems engineering practices to take on the difficult but necessary ongoing task of defining and documenting a comprehensive architectural design for our distributed environment. We’re utilizing a structured systems engineering process by which user requirements map to architectural designs, which in turn are composed of architectural components that are implemented and integrated and deployed as production services. XSEDE’s ecosystem addresses not just traditional supercomputer center functions (such as a user portal and help desk) but also remote access to XSEDE computational resources, interoperation with other facilities, and increasingly the delivery of digital services that support research workflows, such as community management, collaboration, data storage, gateways, and instruments. We’ve made great strides in formally defining XSEDE stakeholder Use Cases. In addition, we defined essential elements of our architecture distilled from the Use Cases and derived generalizations—Canonical Use Cases. We’ll continue to disseminate definition and design documents to allow others to benefit from our work. The Use Cases and architecture definition are made public in the XSEDE Digital Object Repository (www.ideals.illinois.edu/ handle/2142/35973/browse?type=title). Realizing the XSEDE Architecture: Software Development and Integration The Software Development and Integration (SD&I) team delivers deployment-ready software and services that satisfy stakeholder requirements and implement XSEDE’s architecture, provide software and service maintenance and support, and enable the extended community of XSEDE software providers to use SD&I’s software engineering documentation and related software and services. SD&I transforms architectural designs into new architecture software components that are delivered to Operations for deployment. In addition, we enhance/fix existing operational software components and assist in retiring obsolete software components. SD&I utilizes four key processes to complete that mission: 70

CISE-16-05-Towns.indd 70



■■

■■

■■

■■

Active design reviews ensure that SD&I-­delivered software components implement the functionality defined by the Use Cases, and that these software components meet the quality needs of users. Open, continuous planning ensures that SD&I effort is focused on XSEDE stakeholder-approved priorities. Continuous development and integration organizes development/integration and testing teams that will work on as many specific software delivery and retirement tasks as possible. Engineering improvements that advance SD&I’s software engineering processes and documentation so that development/integration and testing teams can efficiently deliver quality software on time in coordination with all our stakeholders.

XSEDE Operations The XSEDE Operations team provisions and supports an integrated e-infrastructure that incorporates a spectrum of digital capabilities to support research and education. XSEDE Operations is subdivided into six groups: Security, Data Services, Networking, Software Testing and Deployment, Accounting and Account Management, and Systems Operational Support. The Security group protects the confidentiality, integrity, and availability of XSEDE resources and services. This includes monitoring to detect and respond to any security incidents. In addition, the security team provides guidance for security policies. The Data Services group facilitates data movement and management, including maintaining the existing operational infrastructure, improving user documentation for data movement and XSEDE storage resources, configuring the XSEDE-Wide File System, and preparing for upcoming software deployments. The Networking team monitors, maintains, and improves the XSEDEnet high-speed interconnect between the core Service Providers (SP). Each site connects to Internet2’s Advanced Layer 2 Service (AL2S) at 10 Gbps, with many of them moving to 100 Gbps. Campus Bridging sites can also connect to XSEDEnet SP sites via AL2S. The networking team also monitors XSEDEnet performance using the perfSONAR Performance Toolkit (http://psps.perfsonar.net/toolkit). The Software Testing and Deployment group performs operational testing as part of the XSEDE engineering process and supports and coordinates the deployment of XSEDE software. This testing September/October 2014

9/16/14 2:59 PM

ensures that software deployed for XSEDE is of sufficient quality to perform as expected, enabling users to focus on their scientific inquiry. The Accounting and Account Management group manages the interfaces and databases for XSEDE-wide resource allocation and accounting of usage. A&AM provides support for the XSEDE Resource Allocation Service for submission of allocation requests for computational or data resources to the XSEDE Resource Allocations Committee to make ­allocations decisions, and track allocations usage. The Systems Operational Support group provides front-line support via the XSEDE Operations Center (XOC) and system administration for a diverse set of more than 50 centralized enterprise services. The XOC provides a 24/7 help desk for assistance and problem resolution, with issues tracked using a ticketing system to ensure quality service. XSEDE Technology Investigation Service As a continually evolving research support ecosystem, we must constantly be on the lookout for new technologies to address the changing needs of the communities XSEDE supports. The Technology Investigation Service (TIS) identifies and evaluates technologies to close the gap between user needs and the XSEDE service offerings. TIS builds upon its technology expertise and experience, focusing on technologies that offer enhanced capabilities and are promising for insertion into XSEDE’s environment. The TIS Technology Identification team engages the community to identify candidate technologies relevant to XSEDE and the e-science community at large. We maintain the XSEDE Technology Evaluation Database (XTED), a publicly available and Web-accessible database catalog of technology projects. Through this catalog, you can easily identify technology projects by capability, developer, application, and scope. The TIS Technology Evaluation team evaluates and recommends technologies to improve XSEDE. Input from those outside XSEDE is welcomed and encouraged. This team works closely with the A&D team, XSEDE Operations, and the User Engagement team to determine appropriate technologies for evaluations. Once evaluations are complete the review is shared with the SD&I group for possible insertion into XSEDE. Completed evaluations are recorded in the XTED and XSEDE Digital Object Repository. Managing the Program XSEDE employs a professional management and systems engineering approach that ensures c­ontinuous

and timely improvements while respecting robustness and security. The efforts are connected to user inputs through continuous feedback collection and advisory inputs. XSEDE establishes policies and procedures to ensure persistent responsiveness to the requirements of communities—existing, emerging, and future— by inviting leaders in these communities to participate in the partnership’s oversight and management. The partnership also provides regular and transparent means for the broader communities to provide input into the evolution of XSEDE’s environment and services. Thus, XSEDE not only transforms the conduct of science and the education of current and future generations of science, technology, engineering, and mathematics (STEM) practitioners, but it also is transformed itself by the communities who are utilizing it for high-impact research and education. XSEDE Governance As a distributed project involving 20 partner institutions and many stakeholders, including the NSF and thousands of researchers, XSEDE’s organizational structure and governance is designed to balance efficiency and inclusiveness while promoting effective project performance. The model has strong central management to provide rapid response to issues and opportunities, delegation, and decentralization of decision-making authority, openness to genuine stakeholder participation, and professional project management practices, including formal risk management and change control. Various stakeholders have input through distinct advisory bodies with direct access to the XSEDE project director and senior management team through regularly scheduled meetings. In order to remain well informed of the requirements of the user community, XSEDE leadership receives advice and counsel from the XSEDE Advisory Board, the User Advisory Committee, the XD Service Providers Forum, and the TEOS Advisory Committee. These advisory committees are intimately involved with XSEDE management in guiding the project toward optimal operations, service, and support.

C

onvenient and flexible access to an array of integrated and well-supported high-end digital services is critical for the advancement of knowledge. XSEDE’s vision is a world of digitally enabled scholars, researchers, and engineers participating in multidisciplinary collaborations while seamlessly accessing computing resources and sharing data to tackle society’s grand challenges.

www.computer.org/cise

CISE-16-05-Towns.indd 71



71

9/16/14 2:59 PM

Scientific Cyberinfrastructure

Acknowledgments XSEDE represents a very progressive effort and wouldn’t be possible without the support of the US National Science Foundation via grant OCI 10-53575. In addition, the Technology Identification Service is supported via NSF grant number OCI 09-46505. A project of such scale isn’t something developed and executed by a small group of individuals, and the authors wish to extend their great appreciation to the more than 300 individuals directly involved in the project and the hundreds more indirectly involved. Finally, no such project is successful without the strong support and guidance of insightful program officers, and we wish to thank Barry Schneider (formerly from the NSF) for his tireless efforts in the early days of XSEDE and Rudi Eigenmann from NSF for his efforts to continue to drive us towards excellence.

References 1. P.N. Edwards et al., Understanding Infrastructure: Dynamics, Tensions, and Design, tech. report, US National Science Foundation, 2007; http://deepblue. lib.umich.edu/handle/2027.42/49353. 2. C. Catlett et al., “TeraGrid: Analysis of Organization, System Architecture, and Middleware Enabling New Types of Applications,” Advances in Parallel Computing, Lucio Grandinetti, ed., IOS Press, ch. 3, 2008, pp. 225–249. 3. Intergovernmental Panel on Climate Change (IPCC), Climate Change 2007: Synthesis Report. Contribution of Working Groups I, II and III to the Fourth Assessment Report of the Intergovernmental Panel on Climate Change, tech. report, Core Writing Team (R.K. Pachauri and A. Reisinger), eds., IPCC, 2007; www. ipcc.ch/publications_and_data/publications_ipcc_ fourth_assessment_report_synthesis_report.htm. 4. President’s Information Technology Advisory Council, Computational Science: Ensuring America’s Competitiveness, tech. report, 2005; www.nitrd. gov/pitac/reports/20050609_computational/ computational.pdf. 5. J. Davis, Cyberinfrastructure in Chemical and Biological Process Systems: Impact and Directions, tech. report, NSF, 2006. 6. M. Welshons, Our Cultural Commonwealth: The Report of the Am. Council of Learned Societies Commission on Cyberinfrastructure for the Humanities and Social Sciences, tech. report, Am. Council of Learned Societies, 2006. 7. NSF Task Force on Cyberlearning, Fostering Learning in the Networked World: The Cyberlearning Opportunity and Challenge; A 21st Century Agenda for the National Science Foundation, tech. report NSF 08-204, NSF, 2008. 72

CISE-16-05-Towns.indd 72



8. NSF Advisory Committee for Cyberinfrastructure Task Force on Campus Bridging, Final Report, tech. report, NSF, Mar. 2011. 9. NSF Advisory Committee for Cyberinfrastructure Task Force on Cyberlearning and Workforce Development, Final Report, tech. report, NSF, Mar. 2011. 10. NSF Advisory Committee for Cyberinfrastucture Task Force on Grand Challenges, Final Report, NSF, tech. report, Mar. 2011. 11. NSF Advisory Committee for Cyberinfrastucture Task Force on Software for Science and Engineering, Final Report, tech. report, NSF, Mar. 2011. 12. NSF Advisory Committee for Cyberinfrastructure Task Force on Data and Visualization, Final Report, tech. report, NSF, Mar. 2011. 13. NSF Advisory Committee for Cyberinfrastructure Task Force on High-Performance Computing, Final Report, tech. report, NSF, Mar. 2011. 14. NSF Cyberinfrastructure Council, Cyberinfrastructure Vision for 21st Century Discovery, tech. report, NSF, Mar. 2007; www.nsf.gov/pubs/2007/nsf0728/index. jsp?org=EF. 15. NSF, Advanced Computing Infrastructure: Vision and Strategic Plan, tech. report NSF 12-051, NSF, Feb. 2012; www.nsf.gov/pubs/2012/nsf12051/nsf12051.pdf. 16. T. Zacharia and J.L. Kinter. Community Input on the Future of High Performance Computing, tech. report, NSF, Dec. 2009; www.nsf.gov/cise/aci/taskforces/ TaskForceReport_HPC.pdf. 17. NSF Advisory Committee for Cyberinfrastructure Task Force on Grand Challenges, Cyberscience and Engineering: Final Report, tech. report, NSF, Nov. 2010. 18. T. Tapia, “Diversifying the Science and Technology Community,” A White House Roundtable Dialogue for President Clinton’s Initiative on Race: Proc. Panel Discussion and Position Papers, Am. Assoc. for the Advancement of Science, 1998. John Towns is the Executive Director of Science and Technology at the National Center for Supercomputing Applications (NCSA) at the University of Illinois. He’s also a principal investigator and project director for the Extreme Science and Engineering Discovery Environment (XSEDE) project and the operations manager for the Illinois Campus Cluster Program. His background is in computational astrophysics, utilizing a variety of computational architectures with a focus on application performance analysis. At NCSA, he provides leadership and direction in the support of an array of computational science and engineering research projects, making use of advanced computing resources and services. Towns has an MS in physics and astronomy from the University of Illinois. Contact him at [email protected]. September/October 2014

9/16/14 2:59 PM

Tim Cockerill is currently the director of Center Programs for the Texas Advanced Computing Center at the University of Texas at Austin. His research has centered on the development of optoelectronic devices in compound semiconductor materials, and his current interests are in applying organizational and program management methodologies in academic environments. Cockerill has a PhD in electrical and computer engineering from the University of Illinois at UrbanaChampaign. He’s a member of IEEE. Contact him at [email protected]. Maytal Dahan is a software engineer and project lead in the Web and Mobile Applications Group for the Texas Advanced Computing Center at the University of Texas at Austin. Her research has focused on the development of Web applications to help advance eScience. She leads the User Information and Interfaces area in XSEDE and is deputy director for User Services. Maytal has an MS in software engineering from the University of Texas at Austin. Contact her at [email protected]. Ian Foster is the director of the Computation Institute (a joint project between the University of Chicago and Argonne), a professor of computer science at the University of Chicago, and he’s an Argonne Distinguished Fellow at Argonne National Laboratory. His research interests include parallel, distributed, and data-intensive computing. Foster has a PhD in computer science from Imperial College. He’s a Fellow of the American Association for the Advancement of Science (AAAS), ACM, and British Computer Society (BCS). Contact him at foster@ cs.uchicago.edu. Kelly Gaither is the Director of Visualization and a senior research scientist at the Texas Advanced Computing Center at the University of Texas at Austin. Her research includes scientific visualization, feature detection, largescale visualization clusters, remote and collaborative visualization, and visualization interfaces and applications. Gaither has a PhD in computational engineering from Mississippi State University. She’s a member of IEEE and ACM, and is an editorial board member for the IEEE Transactions on Visualization and Computer Graphics. Contact her at [email protected]. Andrew Grimshaw is a professor at the University of Virginia, and he’s the chief designer and architect of Mentat, Legion, Genesis II, and the co-architect for XSEDE. His research interests include grid computing, high-­ performance computing (HPC), compilers for parallel systems, and operating systems. Grimshaw has a PhD in computer science from the University of

Illinois at Urbana-Champaign. He’s the president of the Open Grid Forum (OGF), having served both as a member of the OGF’s board of directors and as Architecture Area Director. Contact him at grimshaw@ virginia.edu. Victor Hazlewood is the chief operating officer of the Joint Institute for Computational Sciences (JICS) at the University of Tennessee, and is currently the deputy director of operations for XSEDE, the US National Cyberinfrastructure project. His work has been in a variety of areas of IT and HPC, including systems, operations, security, grid middleware and infrastructure. Hazlewood has a BS in computer science from Texas A&M University, and is currently a doctoral candidate in electrical engineering and computer science at the University of Tennessee. He’s a Certified Information Systems Security Professional. Contact him at [email protected]. Scott Lathrop is with the Shodor Education Foundation, and splits his time between being the XSEDE Director of Education and Outreach, and being the Blue Waters Technical Program Manager for Education. Lathrop coordinates education and outreach activities among the XSEDE Service Providers involved in the XSEDE project. He also coordinates undergraduate and graduate education activities for the NSF funded Blue Waters project. Lathrop has a BA in mathematics from Rochester University. Contact him at [email protected]. Dave Lifka is the director of the Center for Advanced Computing and associate CIO for Cornell University. His areas of expertise include sustainable models for academic research computing facilities, parallel job scheduling and resource management systems, data management, high-throughput systems, Web services, and cloud computing. Lifka has a PhD in computer science from the Illinois Institute of Technology. His scheduling technologies have been commercially licensed and he has received a ComputerWorld/Smithsonian award for innovations in IT. Contact him at [email protected]. Gregory D. Peterson is the director of the National Institute for Computational Sciences and a professor in the Electrical Engineering and Computer Science Department at the University of Tennessee, Knoxville. He also serves as a co-principal investigator and director of operations for XSEDE. His research interests include emerging computer architectures, electronic design automation, performance evaluation, and computational science and engineering. Peterson earned his doctorate in electrical engineering from Washington University in St. Louis. Contact him at [email protected].

www.computer.org/cise

CISE-16-05-Towns.indd 73



73

9/16/14 2:59 PM

Scientific Cyberinfrastructure

Ralph Roskies is a professor of physics at the University of Pittsburgh and a founder and co-scientific director of the Pittsburgh Supercomputing Center (PSC). His work focuses on being an advisor to, and reviewer of, US and international supercomputing centers. Roskies has a PhD in physics from Princeton University. He’s currently a member of the Board of Regents of the National Library of Medicine and a Fellow of the American Physical Society. Contact him at [email protected]. J. Ray Scott is the director of Systems and Operations at the Pittsburgh Supercomputing Center, and at XSEDE, his role is director and co-principal investigator of the Technology Investigation Service (TIS). His interests include Big Data research and operations. Scott has a BA in mathematics from Gettysburg College. Contact him at [email protected].

Nancy Wilkins-Diehr is an associate director at the San ­Diego Supercomputer and a co-principal investigator on the XSEDE program. Her work focuses on HPC and managing large virtual organizations of technical experts. WilkinsDiehr has an MS in aerospace engineering from San Diego State University. She’s a member of ACM’s Special Interest Group on HPC (SIGHPC). Contact her at wilkinsn@ sdsc.edu.

Selected articles and columns from IEEE Computer Society publications are also available for free at http://ComputingNow.computer.org.

Call rticles for A g putin m o C e te s t rvasiv e the la P E r s on e p a IEE p l u s efu e, seek s

a c ce s s

guid

e li n e

s:

/ rg /mc uter.o .comp w w w .htm u th o r sive /a p e r va :

e Furth

a r det

sive p e r va

@ co m

.c www

74

CISE-16-05-Towns.indd 74

il s

.org p u te r

g ter.or ompu



sive

men velop

ts in p

er vasiv

s . Topic uting p m o us c iquito ware n d ub a , e il y, sof t g lo mob o ch n a r e te g an d hard w e d lu sensin inc ld r o l-w n, re , re a rac tio truc tu r inte s e a t r u f p in m an- co ing n , hu m io t c a includ r inte tions, a r y. e id privac cons te m s y, and s it y r s u d c an y, se labilit nt , s c a e m y deplo

e peer-r

or Au t h

ible,

viewe

d de

/perva

September/October 2014

9/16/14 2:59 PM