Workshop and Tutorial schedule,
|Saturday, June 22||Sunday, June 23|
|SW Mudd 227||T2||T3||T4||T5|
|SW Mudd 233||T1||-||-||-|
|SW Mudd 627||W11||W11||-||-|
|SW Mudd 633||W22||W22||W3||W3|
Thanks to Columbia University, wireless network connectivity (802.11b) will be available in the rooms where the ICS'02 tutorials, workshops and technical sessions will be held. We plan to have some of the tutorial and workshop proceedings available online during the sessions, so bring your wireless card. More details are available at http://www.columbia.edu/acis/networks/wireless/
[This is a shorter version of the actual tutorial presented. It excludes proprietary information about IBA companies, their products, and the trends. Please check with Prof. Panda (firstname.lastname@example.org) for additional information.]
This tutorial is intended for researchers, scientists, engineers, managers, developers, professors, and students engaged in research, design, and development of next generation high performance computing systems (clusters, servers, and data centers).
The emerging InfiniBand Architecture (IBA) standard is generating a lot of excitement towards building next generation high performance computing systems in a radical different manner. This is leading to the following common questions among many scientists, engineers, managers, developers, and users associated with High Performance Computing:
This tutorial is designed to provide answers to the above questions. We will start with the background behind the origin of the IBA standard. Then we will make the attendees familiar with the novel features of IBA (such as elimination of the standard PCI-bus based architecture; provision for multiple transport services and mechanisms to support QoS and protection in the network; uniform treatment of interprocessor communication and I/O, hardware support for remote DMA, atomic, and multicast operations; support for virtual lanes and service levels; and support for low latency communication with Virtual Interface). We will compare and contrast the IBA standard with other on-going developments/standards. We will show how the IBA standard facilitates the next generation computing systems to be designed not only to deliver high performance but also RAS (Reliability, Availability, and Serviceability). Open research challenges in designing communication and I/O subsystems of next generation HPC systems with IBA will be outlined. Challenges in developing efficient programming model layers (Message Passing Interface (MPI), Distributed Shared Memory (DSM), and Get/Put) on top of IBA-based communication subsystems will be discussed. Performance numbers obtained on clusters with first generation InfiniBand products and their comparisons with other contemporary interconnects (Myrinet, Gigabit Ethernet, and GigaNet) will be presented. The tutorial will conclude with an overview of on-going IBA related research projects, IBA products, and the market time frame for the IBA products.
Dhabaleswar K. Panda is a Professor of Computer Science at the Ohio State University. He obtained his Ph.D. in computer engineering from the University of Southern California. His research interests include parallel computer architecture, high performance computing, user-level communication protocols, interprocessor communication and synchronization, network-based computing, and Quality of Service. He has published over 100 papers in major journals and international conferences related to these research areas. Dr. Panda and his research group members have been doing extensive research on VIA and InfiniBand. His research group has collaborated with IBM T.J. Watson in designing a high performance VIA implementation for the IBM Netfinity cluster system and with Intel on designing a comprehensive micro-benchmark suite to evaluate VIA/IBA implementations. His research group is currently collaborating with Sandia National Laboratory and Mellanox (a leading company producing IBA Products) on designing next generation High Performance Computing systems with Infiniband.
Dr. Panda has served on Program Committees and Organizing Committees of several parallel processing and high performance computing conferences and on editorial boards for several parallel processing journals. He was General Co-Chair for the 2001 International Conference on Parallel Processing; Program Co-Chair of the 1999 International Conference on Parallel Processing, 1997 and 1998 Workshops on Communication and Architectural Support for Network-Based Parallel Computing (CANPC); Program Co-Chair of the Int'l Workshop on Communication Architecture for Clusters (CAC '01); an Associate Editor of the IEEE Transactions on Parallel and Distributed Computing; Co-Guest-Editor for two special issue volumes of Journal of Parallel and Distributed Computing on "Workstation Clusters and Network-based Computing"' an IEEE Distinguished Visitor Speaker and an IEEE Chapters Tutorials Program Speaker. Currently, he is serving as a Program Co-Chair of International Workshop on Communication Architecture for Clusters (CAC '02). Dr. Panda is a recipient of the NSF Faculty Early CAREER Development Award, the Lumley Research Award (1997 and 2001) at the Ohio State University, and an Ameritech Faculty Fellow Award. Dr. Panda is listed as a distinguished scientist in "Who's Who in America" and in "American Men & Women of Science".
The target audience is a mixture of computer scientists, computational scientists and code developers interested in performance analysis of parallel architectures and "real-life" applications. The tutorial will also be useful to those trying to define needs for future-generation, high-end computing systems, from either the buyers or the designers point of view.
This tutorial presents a methodical, simplified approach to performance analysis and modeling of large-scale, parallel, scientific applications. The heart of the tutorial covers analytical modeling of application scalability using several real case studies. The case studies demonstrate how performance modeling can be used to estimate performance that can be expected from a future computer system, diagnose system performance 'glitches' in comparison with true application performance during system installation, accurately identify performance bottlenecks in existing systems, provide a tuning "roadmap" to application developers, and enable "point-design" studies for computer architects designing new systems.
We will not emphasize any particular machine in the tutorial, nor performance rankings; rather, we will generally address performance of RISC processors and of widely utilized parallel systems such as the SGI Origin 2000, IBM SP2/3, Compaq HPC systems, clusters, and Cray T3E.
The authors are members of an internationally-recognized team of performance evaluation experts and have nearly thirty years combined experience in application benchmarking, optimization, and performance modeling. They have given numerous tutorials and invited and contributed lectures on performance at major conferences and at various universities and other institutions. One is a Gordon Bell prize winner and co-author of new SIAM monograph on performance. See http://www.c3.lanl.gov/par_arch/
This tutorial is intended for researchers and practitioners who want to track new developments in short range wireless communication, but who don't have time or patience to read all specifications. Computer professionals who want to develop better understanding of technology trends and identify new market opportunities in the area of wireless networking will also benefit from this tutorial. Basic understanding of layered network architecture is expected. No background in analog radio, signal processing, or wireless communication is required.
The promise of untethered computing in the workplace is becoming a reality. IEEE 802.11b, the 11Mbps wireless LAN standard, has finally arrived, and early market response has been positive. As the WLAN market takes off, Bluetooth, another emerging standard for short-range wireless networking, is also gathering force. Several vendors have demonstrated Bluetooth products, including cordless headsets, PCMCIA cards, and LAN access points. Both standards are competing for the same airwaves, but are they also chasing the same market? Will Bluetooth and 802.11b complement each other, or will one technology eventually displace the other?
This tutorial will explain the key design aspects of 802.11 and Bluetooth standards and illustrate how technology innovation and market forces are shaping their evolution.
Pravin Bhagwat is an entrepreneur and a well-known researcher in the area of wireless and mobile networking. Currently, he is directing a large-scale 802.11 deployment project in India and also working as a visiting professor in the computer science department, IIT Kanpur. He was the principal architect at Reefedge, Inc., a wireless networking infrastructure and software company based in NJ. He played an active role in the standardization of Bluetooth PAN profile and also served as the chair of the Internet Engineering Task Force BOF on IP over Bluetooth. Prior to working for ReefEdge, he worked as technology consultant in the Networking Research group at AT&T Labs-Research, and as a member of research staff at IBM Thomas J. Watson Research Center. He is the chief architect of BlueSky, an indoor wireless networking system for palmtop computers, and the co-inventor of TCP splicing, a technique for building fast application layer proxies. He actively serves on program committees of networking conferences and has published numerous technical papers and patents in the area of mobile computing and wireless communication. He received his Ph.D. in computer science from the University of Maryland, College Park. He also holds and adjunct faculty appointment at Winlab, Rutgers University.
This tutorial is intended to provide industry and university-based computer architects and processors designers with an overview of minimally clocked systems and the impact of such a design style on processor performance and power consumption.
This tutorial addresses the problem of minimally clocked processor design. Minimally clocked or Globally Asynchronous Locally Synchronous systems (GALS) are an intermediate style of design between synchronous and fully asynchronous systems. GALS systems contain several independent synchronous blocks which operate with their own local clocks and communicate asynchronously with each other. The main feature of these systems is the absence of a global timing reference and the use of several distinct local clocks (or clock domains), possibly running at different frequencies. In the case of high-end core processors, global clock distribution issues are perhaps the best motivating factor for the study of GALS systems: with each technology shrink, the clock distribution network of a large chip grows rapidly in complexity and requires large design effort, power consumption and die area.
As opposed to fully synchronous processors, minimally clocked processors offer the advantage of fine-grain control of local clock speeds and voltages, thus providing additional power savings capabilities, under a wide variety of applications and workloads.
Diana Marculescu is an Assistant Professor of ECE at Carnegie Mellon University. She has received her Ph.D. in Computer Engineering in 1998 from University of Southern California and her M.S. in Computer Science from "Politehnica" University of Bucharest in 1991. After spending 2 years at University of Maryland, Dr. Marculescu has joined Carnegie Mellon University where she is currently leading the Energy Aware Computing (EnyAC) group focusing on techniques and tools for enabling synergistic hardware/software power management and novel paradigms for energy-delay efficient computing. Diana Marculescu is a recipient of a National Science Foundation CAREER Award (2000-2004) and a member of the organizing committee of the ACM/IEEE International Symposium on Low Power Electronics and Design. She also serves on the technical program committee of several conferences, including IEEE/ACM International Conference on Computer-Aided Design and IEEE Design, Automation and Test in Europe Conference. Her research interests are in the area of energy aware computing, VLSI, computer architecture and CAD for power modeling and estimation.
David H. Albonesi is an Associate Professor of Electrical and Computer Engineering at the University of Rochester and Director of the Advanced Computer Architecture Laboratory. He received his B.S.E.E. from the University of Massachusetts Amherst in 1982, his M.S.E.E. from Syracuse University in 1986, and his Ph.D. in Electrical and Computer Engineering from the University of Massachusetts Amherst in 1996. Prior to receiving his Ph.D., he held technical and management leadership positions for 10 years at IBM Corporation (1982-86) and Prime Computer, Incorporated (1986-1992). The primary focus of his industry work was on the design, implementation, and debugging of low-latency, high-bandwidth memory hierarchies for high performance processors, the development of shared memory multiprocessor systems, and the development and application of architectural evaluation, design implementation, and hardware emulation tools. For this work, he received three corporate excellence awards and four U.S. patents. At Rochester, he leads the Complexity-Adaptive Processing (CAP) project and is also conducting research in understanding and improving dynamic branch prediction, multithreaded architectures, and VLIW architectures for voice and video applications. Dr. Albonesi has received a National Science Foundation CAREER Award and an IBM Faculty Partnership Award. He co-founded the Workshop on Complexity-Effective Design that was initially held at the 27th International Symposium on Computer Architecture, was held last year at ISCA-28, and will be held again this year at ISCA-29.
Pradip Bose received his B.Tech degree in Electronics and Electrical Communication Engineering from the Indian Institute of Technology, Kharagpur, India in 1977 and the M.S. and Ph.D degrees in Electrical and Computer Engineering from the University of Illinois, Urbana-Champaign, in 1981 and 1983 respectively. Since May 1983, Dr. Bose has been a Research Staff Member at the IBM T. J. Watson Research Center, Yorktown Heights, NY. During this time, Dr. Bose has conducted research projects that led to well-known IBM products such as RS/6000 and POWER3. Between 1989-1990 he has led the UNDP (United Nations Development Program) funded program to establish a Center for Advanced Research on Fifth Generation Computer Systems at Indian Statistical Institute (ISI), Calcutta, India, as part of his assignment as a Visiting Associate Professor at ISI. His current research interests include: high performance, low power computer architectures and their performance evaluation, verification and testing. Dr. Bose has over 60 refereed publications and is the author of a book by MIT Press (to appear in late 2002). He is active in many conference committees and is a senior member of IEEE; in 2001-2002, he was Program Chair of IEEE Int'l. Symp. on Performance Analysis of Systems and Software (ISPASS), and he is a member of the program committees of MICRO-35 and HPCA-9. His most recent conference tutorials include offerings (with other co-speakers) at ISCA-2001, HPCA-2001 and Sigmetrics-2001.
This tutorial is intended for researchers, professionals, and students engaged in designing, developing, and using energy-efficient compute and storage clusters.
Power consumption is rapidly becoming a key design issue for servers deployed in large data centers and web hosting facilities. In fact, a significant fraction of the operation cost of these centers is due to power consumption and cooling. Computing nodes in these densely packed systems also often overheat, leading to intermittent failures. These problems are likely to worsen as newer server-class processors offer higher levels of performance at the expense of increased power consumption.
Energy conservation techniques have traditionally focused on single-node systems, be they portable and mobile computers, or single-node servers. However, most data center users employ clusters of servers for scalability, reliability, and cost considerations. Consequently, energy management techniques in this environment must take a holistic approach. This tutorial will present an in-depth look at techniques for energy management in server clusters, composed of both compute and storage nodes.
While we will provide a brief introduction to single-node energy management techniques, the bulk of the tutorial will focus on mechanisms and policies for energy management in clusters. These mechanisms and policies will be discussed with an emphasis on practicality. The tutorial will include case studies where we will examine how to put together clusters that meet a performance and energy budget, along with workloads for evaluating clusters, and metrics for measuring energy/performance tradeoffs. The tutorial will conclude with an overview of research activities in cluster energy management in industry, research labs, and academia.
Consumption in Data Centers
Conservation for Single-Node Systems
Energy Conservation for Front-End and Compute Clusters (1)
4:15-4:30 - Break
Conservation for Front-End and Compute Clusters (2)
Conservation for Storage Clusters
and Energy Conservation
Ram Rajamony is a researcher at the Low-Power Computing Research Center at IBM's Austin Research Laboratory. His research interests are in energy-efficient computing, high-performance computing, networking, and operating sytems. He has published papers in venues such as ISCA, HPCA, PPoPP, and PACT. He won the Best Student Paper award at SIGMETRICS in 1998. Dr. Rajamony holds one patent and has many more pending at the USPTO. He served on the Texas Advanced Technology Program committee in 1999 and the SAN-2001 program committee. Dr. Rajamony received his PhD from Rice University in 1998.
Ricardo Bianchini received his Ph.D. degree in Computer Science from the University of Rochester in 1995. From 1995 until 1999, he was an Assistant Professor at the Federal University of Rio de Janeiro, Brazil. Since January/2000, he has been an Assistant Professor with Rutgers University. Prof. Bianchini's current research interests include power and energy conservation, system support for network servers, and next-generation I/O architectures. He has published more than 50 technical papers and has been on the program committee of several conferences and workshops. He is currently the Coordinator of the IEEE Task Force on Cluster Computing for South America.