21st ACM International Conference on Supercomputing

June 16-20, 2007
Crowne Plaza Seattle
Seattle, WA, USA
Sunday. June 17
8:30 - 12:15 International Workshop on Advanced Low Power Systems
8:30 - 15:00 13th Workshop on Job Scheduling Strategies for Parallel Processing
8:30 - 17:00 Tutorial: How to think algorithmically in parallel?
13:30 - 17:00 OpenSpeedShop Tutorial: An Open Source Performance Analysis Framework for Cluster Platforms
Monday. June 18
8:30-8:45 Welcome
8:45-9:45 Keynote 1: Current trends in computer architectures: Multi-cores, Many-cores and Special-cores.
Avi Mendelson. Senior Computer Architect, Mobility Group, Intel. Adjunct Professor in CS/EE departments, Technion.
9:45-10:45 Session 1: Algorithms and applications I

Scalability of the Nutch search engine. Jose Moreira, Dilma Da Silva, Parijat Dube, Maged Michael, Doron Shiloach and Li Zhang (IBM T.J. Watson Research Center)

Scalability Analysis of SPMD Codes Using Expectations. Cristian Coarfa (Baylor College of Medicine), John Mellor-Crummey (Rice University), Nathan Froyd (CodeSourcery) and Yuri Dotsenko (Rice University)
10:45-11:00 Coffee break
11:00-12:30 Session 2: Runtime systems

Proactive Fault Tolerance for HPC with Xen Virtualization. Arun Babu Nagarajan (North Carolina State University), Frank Mueller (North Carolina State University), Christian Engelmann (Oak Ridge National Laboratory) and Stephen L. Scott (Oak Ridge National Laboratory)

Optimization and Bottleneck Analysis of Network Block I/O in Commodity Storage Systems. Manolis Marazakis, Angelos Bilas and Vassilis Papaefstathiou (FORTH-ICS)

GridRod - A Dynamic Runtime Scheduler for Grid Workflows. Shahaan Ayyub and David Abramson (Monash University)
12:30-14:00 Lunch
14:00-15:30 Session 3: Workload characterization

Locality of Sampling and Diversity in Parallel System Workloads. Dror G. Feitelson (Hebrew University)

Modeling Correlated Workloads by Combining Model Based Clustering and A Localized Sampling Algorithm. Hui Li, Michael Muskulus and Lex Wolters (Leiden University)

Characteristics of Workloads Used in High Performance and Technical Computing. Razvan Cheveresan, Matt Ramsay, Chris Feucht (Sun Microsystems) and Ilya Sharapov (Apple)
15:30-15:45 Coffee break
15:45-17:45 Session 4: Algorithms and applications II

An Operation Stacking Framework for Large Ensemble Computations. Mehmet Belgin, Calvin J. Ribbens and Godmar Back (Virginia Tech)

Executing Irregular Scientific Applications on Stream Architectures. Mattan Erez (University of Texas at Austin), Jung Ho Ahn (Hewlett Packard Labs), A'Jayanth Gummaraju, William J. Dally and Mendel Rosenblum (Stanford University)

Novel Force Matrix Transformations with Optimal Load-Balance for 3-body Potential based Parallel Molecular Dynamics in a Heterogeneous Cluster Environment. Sumanth J.V, David Swanson and Hong Jiang (University of Nebraska-Lincoln)

Representation-transparent Matrix Algorithms with Scalable Performance. Peter Gottschling, Michael D. Adams and David S. Wise (Indiana University)
Tuesday. June 19
8:30-9:30 Keynote 2: Harnessing Massive Parallelism in the era of Parallelism for the Masses.
Craig Stunkel, Senior Manager of Deep Computing Software and Applications, IBM Research
9:30-11:00 Session 5: Architecture - Processor

Tradeoff between Data-, Instruction-, and Thread-Level Parallelism in Stream Processors. Jung Ho Ahn (Stanford University and HP Labs), Mattan Erez (University of Texas at Austin) and William J. Dally (Stanford University)

An L2-Miss-Driven Early Register Deallocation for SMT Processors. Joseph Sharkey and Dmitry Ponomarev (SUNY Binghamton)

Sequencer Virtualization. Perry Wang, Jamison Collins, Gautham Chinya, Bernard Lint, Asit Mallick, Koichi Yamada and Hong Wang (Intel)
11:00-11:15 Coffee break
11:15-12:45 Session 6: Message passing systems

Automatic Nonblocking Communication for Partitioned Global Address Space Programs. Wei Chen, Dan Bonachea, Costin Iancu and Kathy Yelick (UC Berkeley and Lawrence Berkeley Lab)

A Study of Process Arrival Patterns for MPI Collective Operations. Ahmad Faraj (IBM), Pitch Patarasuk and Xin Yuan (Florida State University)

High Performance MPI Design using Unreliable Datagram for Ultra-Scale InfiniBand Clusters. Matthew J. Koop, Sayantan Sur, Qi Gao and Dhabaleswar K. Panda (The Ohio State University)
12:45-14:00 Lunch
14:00-16:00 Session 7: Architecture - Memory hierarchy

Compression in Cache Design. Ali-Reza Adl-Tabatabai, Anwar Ghuloum, Shobhit Kanaujia (Intel)

Performance Driven Data Cache Prefetching in a Dynamic Software Optimization System. Jean Christophe Beyler and Philippe Clauss (Universite Louis Pasteur)

Optimization of Data Prefetch Helper Threads with Path-Expression Based Statistical Modeling. Tor M. Aamodt (University of British Columbia) and Paul Chow (University of Toronto)

Increasing Cache Capacity through Word Filtering. Prateek Pujara and Aneesh Aggarwal (Binghamton University)
16:00 Social event and banquet
Wednesday. June 20
9:00-10:30 Session 8: Architecture - Muliprocessor systems

Active Memory Operations. Zhen Fang (Intel), Lixin Zhang (IBM Austin Research Labs), John Carter (University of Utah), Ali Ibrahim (AMD) and Mike Parker (Cray)

Cooperative Cache Partitioning for Chip Multiprocessors. Jichuan Chang and Guri Sohi (University of Wisconsin-Madison)

A Low Cost Mixed-mode Parallel Processor Architecture for Embedded Systems. Shorin Kyo, Takuya Koga, Lieske Hanno, Shouhei Nomoto and Shin'ichiro Okazaki (NEC Corporation)
10:30-10:45 Coffee break
10:45-12:45 Session 9: Application optimization

Sensitivity Analysis for Automatic Parallelization on Multi-Cores. Silvius Rus, Maikel Pennings and Lawrence Rauchwerger (Texas A&M)

Adaptive Performance Control for Distributed Scientific Coupled Models. Mohamed Hussein, Ken Mayes and John Gurd (Manchester University)

Adaptive Strassen’s Matrix Multiplication. Paolo D'Alberto (ECE Carnegie Mellon University) and Alex Nicolau (University of Caifornia at Irvine)

Scheduling FFT Computation on SMP and Multi-core Systems. Ayaz Ali, Lennart Johnsson and Jaspal Subhlok (University of Houston)
12:45-13:00 Closing
14:30 - 18:00 Workshop on Manycore Computing - Keynotes (open to ICS attendees)
Thursday. June 21
8:30 - 17:00 Workshop on Manycore Computing - Discussion panel (by invitation only)

Keynote talks

Current trends in computer architectures: Multi-cores, Many-cores and Special-cores

Avi Mendelson - Senior Computer Architect, Mobility Group, Intel and Adjunct Professor in CS/EE departments, Technion.

Power thermal and process limitations encourage modern processors to integrate few cores on the same die in order to maintain overall performance growth, expected by the industry. While two years ago, most processors where single core configuration, the majority of the current processors contain dual or quad cores and the number of cores on die is expected to grow over time.

One can observe three different trends in the development of future computer architectures: (1) Multi-cores - integrating a set of cores, each of them preserves the same "single thread" performance as previous generation, (2) Many-cores - integrating large number of cores, trading single threaded performance with MTL (multi-threaded level) performance and (3) special cores - an integration of multi-cores with many cores, fixed function logic and/or programmable logic such as FPGA.

The success of each of these trends depends on both hardware and software technologies. In my talk I will explain why the market is moving from single core to multi/many/special core architectures, examine each of the new trends and try to predict the success/failure factor (software and hardware) for each of them.

Harnessing Massive Parallelism in the era of Parallelism for the Masses

Craig Stunkel - Senior Manager of Deep Computing Software and Applications, IBM Research

As we embrace this new era of multicore and heterogeneous processors, we find ourselves ill-equipped to leverage the full potential of increasing parallelism. There are no widely-adopted parallel programming languages. With massive datasets, the growing importance of sensor-based systems, and increasingly large data models the I/O wall is looking more daunting than the memory wall. Current high-end software technology is not easily used by non-sophisticated developers.

However, the stakes are high for the entire industry: we must effectively leverage parallel systems, even down to laptops and embedded systems, to profit from the multicore revolution. In response, there is a renewed focus on creatively addressing parallelism and the mismatch between "free" compute versus expensive communication. Parallel languages are receiving more serious attention again. Researchers are investigating new programming models such as transactional memory. There is a renewed focus on parallel compiler functionality and performance. Performance tools for parallel programs are improving. As a result, the effort needed to create efficient, scalable programs should noticeably improve over the next few years.

On the very high end, minimization of noise and communication delay remain paramount. Operating systems must be as simple and predictable as possible to minimize jitter and to increase reliability. Synchronization and collective communication must be efficiently supported. Whenever possible, we must find ways of bringing the program to the data instead of the other way around. I use examples from IBM Blue Gene systems to illustrate some high-end scaling strategies for systems software and for application porting and tuning.

Designed and coded by Alex Ramirez, (c)2005-2007