Sunday. June 17 |
8:30 - 12:15 |
International Workshop on Advanced Low Power Systems |
8:30 - 15:00 |
13th Workshop on Job Scheduling Strategies for Parallel Processing |
8:30 - 17:00 |
Tutorial: How to think algorithmically in parallel? |
13:30 - 17:00 |
OpenSpeedShop Tutorial: An Open Source Performance Analysis Framework for Cluster Platforms |
Monday. June 18 |
8:30-8:45 |
Welcome |
8:45-9:45 |
Keynote 1:
Current trends in computer architectures: Multi-cores,
Many-cores and Special-cores.
Avi Mendelson. Senior Computer Architect, Mobility Group, Intel. Adjunct Professor in CS/EE departments, Technion. |
9:45-10:45 |
Session 1: Algorithms and applications I |
|
Scalability of the Nutch search engine. Jose Moreira, Dilma Da Silva, Parijat Dube, Maged Michael, Doron Shiloach and Li Zhang (IBM T.J. Watson Research Center) |
|
Scalability Analysis of SPMD Codes Using Expectations. Cristian Coarfa (Baylor College of Medicine), John Mellor-Crummey (Rice University), Nathan Froyd (CodeSourcery) and Yuri Dotsenko (Rice University) |
10:45-11:00 |
Coffee break |
11:00-12:30 |
Session 2: Runtime systems |
|
Proactive Fault Tolerance for HPC with Xen Virtualization. Arun Babu Nagarajan (North Carolina State University), Frank Mueller (North Carolina State University), Christian Engelmann (Oak Ridge National Laboratory) and Stephen L. Scott (Oak Ridge National Laboratory) |
|
Optimization and Bottleneck Analysis of Network Block I/O in Commodity Storage Systems. Manolis Marazakis, Angelos Bilas and Vassilis Papaefstathiou (FORTH-ICS) |
|
GridRod - A Dynamic Runtime Scheduler for Grid Workflows. Shahaan Ayyub and David Abramson (Monash University) |
12:30-14:00 |
Lunch |
14:00-15:30 |
Session 3: Workload characterization |
|
Locality of Sampling and Diversity in Parallel System Workloads. Dror G. Feitelson (Hebrew University) |
|
Modeling Correlated Workloads by Combining Model Based Clustering and A Localized Sampling Algorithm. Hui Li, Michael Muskulus and Lex Wolters (Leiden University) |
|
Characteristics of Workloads Used in High Performance and Technical Computing. Razvan Cheveresan, Matt Ramsay, Chris Feucht (Sun Microsystems) and Ilya Sharapov (Apple) |
15:30-15:45 |
Coffee break |
15:45-17:45 |
Session 4: Algorithms and applications II |
|
An Operation Stacking Framework for Large Ensemble Computations. Mehmet Belgin, Calvin J. Ribbens and Godmar Back (Virginia Tech) |
|
Executing Irregular Scientific Applications on Stream Architectures. Mattan Erez (University of Texas at Austin), Jung Ho Ahn (Hewlett Packard Labs), A'Jayanth Gummaraju, William J. Dally and Mendel Rosenblum (Stanford University) |
|
Novel Force Matrix Transformations with Optimal Load-Balance for 3-body Potential based Parallel Molecular Dynamics in a Heterogeneous Cluster Environment. Sumanth J.V, David Swanson and Hong Jiang (University of Nebraska-Lincoln) |
|
Representation-transparent Matrix Algorithms with Scalable Performance. Peter Gottschling, Michael D. Adams and David S. Wise (Indiana University) |
Tuesday. June 19 |
8:30-9:30 |
Keynote 2:
Harnessing Massive Parallelism in the era of Parallelism
for the Masses.
Craig Stunkel, Senior Manager of Deep Computing Software and Applications, IBM Research |
9:30-11:00 |
Session 5: Architecture - Processor |
|
Tradeoff between Data-, Instruction-, and Thread-Level Parallelism in Stream Processors. Jung Ho Ahn (Stanford University and HP Labs), Mattan Erez (University of Texas at Austin) and William J. Dally (Stanford University) |
|
An L2-Miss-Driven Early Register Deallocation for SMT Processors. Joseph Sharkey and Dmitry Ponomarev (SUNY Binghamton) |
|
Sequencer Virtualization. Perry Wang, Jamison Collins, Gautham Chinya, Bernard Lint, Asit Mallick, Koichi Yamada and Hong Wang (Intel) |
11:00-11:15 |
Coffee break |
11:15-12:45 |
Session 6: Message passing systems |
|
Automatic Nonblocking Communication for Partitioned Global Address Space Programs. Wei Chen, Dan Bonachea, Costin Iancu and Kathy Yelick (UC Berkeley and Lawrence Berkeley Lab) |
|
A Study of Process Arrival Patterns for MPI Collective Operations. Ahmad Faraj (IBM), Pitch Patarasuk and Xin Yuan (Florida State University) |
|
High Performance MPI Design using Unreliable Datagram for Ultra-Scale InfiniBand Clusters. Matthew J. Koop, Sayantan Sur, Qi Gao and Dhabaleswar K. Panda (The Ohio State University) |
12:45-14:00 |
Lunch |
14:00-16:00 |
Session 7: Architecture - Memory hierarchy |
|
Compression in Cache Design. Ali-Reza Adl-Tabatabai, Anwar Ghuloum, Shobhit Kanaujia (Intel) |
|
Performance Driven Data Cache Prefetching in a Dynamic Software Optimization System. Jean Christophe Beyler and Philippe Clauss (Universite Louis Pasteur) |
|
Optimization of Data Prefetch Helper Threads with Path-Expression Based Statistical Modeling. Tor M. Aamodt (University of British Columbia) and Paul Chow (University of Toronto) |
|
Increasing Cache Capacity through Word Filtering. Prateek Pujara and Aneesh Aggarwal (Binghamton University) |
16:00 |
Social event and banquet |
Wednesday. June 20 |
9:00-10:30 |
Session 8: Architecture - Muliprocessor systems |
|
Active Memory Operations. Zhen Fang (Intel), Lixin Zhang (IBM Austin Research Labs), John Carter (University of Utah), Ali Ibrahim (AMD) and Mike Parker (Cray) |
|
Cooperative Cache Partitioning for Chip Multiprocessors. Jichuan Chang and Guri Sohi (University of Wisconsin-Madison) |
|
A Low Cost Mixed-mode Parallel Processor Architecture for Embedded Systems. Shorin Kyo, Takuya Koga, Lieske Hanno, Shouhei Nomoto and Shin'ichiro Okazaki (NEC Corporation) |
10:30-10:45 |
Coffee break |
10:45-12:45 |
Session 9: Application optimization |
|
Sensitivity Analysis for Automatic Parallelization on Multi-Cores. Silvius Rus, Maikel Pennings and Lawrence Rauchwerger (Texas A&M) |
|
Adaptive Performance Control for Distributed Scientific Coupled Models. Mohamed Hussein, Ken Mayes and John Gurd (Manchester University) |
|
Adaptive Strassen’s Matrix Multiplication. Paolo D'Alberto (ECE Carnegie Mellon University) and Alex Nicolau (University of Caifornia at Irvine) |
|
Scheduling FFT Computation on SMP and Multi-core Systems. Ayaz Ali, Lennart Johnsson and Jaspal Subhlok (University of Houston) |
12:45-13:00 |
Closing |
14:30 - 18:00 |
Workshop on Manycore
Computing - Keynotes (open to ICS attendees) |
Thursday. June 21 |
8:30 - 17:00 |
Workshop on Manycore Computing - Discussion panel (by invitation only) |
Keynote talks
Current trends in computer architectures: Multi-cores, Many-cores
and Special-cores
Avi Mendelson - Senior Computer Architect, Mobility Group,
Intel and Adjunct Professor in CS/EE departments, Technion.
Power thermal and process limitations encourage modern processors to
integrate few cores on the same die in order to maintain overall
performance growth, expected by the industry. While two years ago, most
processors where single core configuration, the majority of the current
processors contain dual or quad cores and the number of cores on die is
expected to grow over time.
One can observe three different trends in the development of future
computer architectures: (1) Multi-cores - integrating a set of cores,
each of them preserves the same "single thread" performance as previous
generation, (2) Many-cores - integrating large number of cores, trading
single threaded performance with MTL (multi-threaded level) performance
and (3) special cores - an integration of multi-cores with many cores,
fixed function logic and/or programmable logic such as FPGA.
The success of each of these trends depends on both hardware and
software technologies. In my talk I will explain why the market is
moving from single core to multi/many/special core architectures,
examine each of the new trends and try to predict the success/failure
factor (software and hardware) for each of them.
Harnessing Massive Parallelism in the era of Parallelism for the
Masses
Craig Stunkel - Senior Manager of Deep Computing Software and
Applications, IBM Research
As we embrace this new era of multicore and heterogeneous processors, we
find ourselves ill-equipped to leverage the full potential of increasing
parallelism. There are no widely-adopted parallel programming languages.
With massive datasets, the growing importance of sensor-based systems, and
increasingly large data models the I/O wall is looking more daunting than
the memory wall. Current high-end software technology is not easily used
by
non-sophisticated developers.
However, the stakes are high for the entire industry: we must effectively
leverage parallel systems, even down to laptops and embedded systems, to
profit from the multicore revolution. In response, there is a renewed
focus
on creatively addressing parallelism and the mismatch between "free"
compute
versus expensive communication. Parallel languages are receiving more
serious attention again. Researchers are investigating new programming
models such as transactional memory. There is a renewed focus on parallel
compiler functionality and performance. Performance tools for parallel
programs are improving. As a result, the effort needed to create
efficient,
scalable programs should noticeably improve over the next few years.
On the very high end, minimization of noise and communication delay remain
paramount. Operating systems must be as simple and predictable as possible
to minimize jitter and to increase reliability. Synchronization and
collective communication must be efficiently supported. Whenever possible,
we must find ways of bringing the program to the data instead of the other
way around. I use examples from IBM Blue Gene systems to illustrate some
high-end scaling strategies for systems software and for application
porting
and tuning.
|