PPoPP 2018 (series) /
HPCA/CGO/PPoPP/CC joint program
(PDF version) (static HTML version)
Saturday February 24th, 2018
HPCA | CGO | PPoPP | CC | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
[08:00 - 18:15] Registration | |||||||||||
[08:30 - 10:00] Room: Europa 3 | [08:30 - 10:00] Room: Europa 7 | [09:15 - 10:00] Room: Europa 2 | [09:15 - 10:00] Room: Europa 6 | [08:30 - 10:00] Room: Europa 5 | [08:30 - 08:45] Room: Europa 1 | ||||||
AACBB: Accelerator Architecture in Computational Biology and Bioinformatics | HIPINEB: High-Performance Interconnection Networks in the Exascale and Big-Data Era | LLVM Performance Workshop | RWDSL'18: 3rd International Workshop on Real World Domain Specific Languages | WPMVP: Workshop on Programming Models for SIMD/Vector Processing | CC: International Conference on Compiler Construction Compiler Construction | ||||||
Opening Remarks | Opening | How to Evaluate "In-Memory Computing" Performances without Hardware Measurements? | Welcome | Keynote TBA | Opening | ||||||
Keynote 1: "Accelerating Genome Analysis: A Primer on an Ongoing Journey"
Onur Mutlu (ETH, CMU) |
Keynote: "The three L's in modern high-performance networking: Low latency, Low cost, Low processing load" | Industrial Experience with the Migration of Legacy Models using a DSL | Vectorization of a spectral finite-element numerical kernel (Application) | [08:45 - 10:00] Room: Europa 1 | |||||||
Exploring Speed/Accuracy Trade-offs | CC Keynote | ||||||||||
Accelerating Duplicate Marking In The Cloud |
Rethinking Compilers in the Rise of Machine Learning and AI
Xipeng Shen (North Carolina State University, USA) |
||||||||||
[10:00 - 10:30] Coffee Break with Snack | |||||||||||
[10:30 - 12:10] Room: Europa 3 | [10:30 - 12:00] Room: Europa 7 | [10:30 - 12:00] Room: Europa 2 | [10:30 - 11:50] Room: Europa 6 | [10:30 - 12:00] Room: Europa 5 | [10:30 - 12:00] Room: Europa 1 | ||||||
AACBB: Accelerator Architecture in Computational Biology and Bioinformatics | HIPINEB Technical Session 1 (research papers) | LLVM Performance Workshop | RWDSL'18: 3rd International Workshop on Real World Domain Specific Languages | WPMVP: Workshop on Programming Models for SIMD/Vector Processing | Session 1: Polyhedral Compilation | ||||||
Invited Talk: "Next Generation Sequencing: Big Data meets High Performance Computing Architectures"
Bertil Schmidt (JGU Mainz) |
Analysis and improvement of Valiant routing in low-diameter networks | Optimizing LLVM IR for Guided Vectorization | Saiph: Towards a DSL for High-Performance Computational Fluid Dynamics. | Small SIMD Matrices for CERN High Throughput Computing | Modeling the Conflicting Demands of Parallelism and Temporal/Spatial Locality in Affine Scheduling | ||||||
GAME: GPU Acceleration of Metagenomics Clustering | Node-type-based load-balancing routing for Parallel Generalized Fat-Trees | Efficient use of memory by reducing size of AST dumps in cross file analysis by clang static analyzer | CFDlang: High-level code generation for high-order methods in fluid dynamics | SIMDization of Small Tensor Multiplication Kernels for Wide SIMD Vector Processors | A Polyhedral Compilation Framework for Loops with Dynamic Data-Dependent Bounds | ||||||
Exact Alignment with FM-index on the Intel Xeon Phi Knights Landing Processor | Analyzing topology parameters for achieving energy-efficient k-ary n-cubes | MIPP: a Portable C++ SIMD Wrapper and its use for Error Correction Coding in 5G Standard | Polyhedral Expression Propagation | ||||||||
Optimizations of Sequence Alignment on FPGA: A Case Study of Extended Sequence Alignment | |||||||||||
[12:00 - 13:30] Lunch | |||||||||||
[13:30 - 15:10] Room: Europa 3 | [13:30 - 15:00] Room: Europa 7 | [13:30 - 15:00] Room: Europa 2 | [13:30 - 14:50] Room: Europa 6 | [13:30 - 15:00] Room: Europa 5 | [13:30 - 15:00] Room: Europa 1 | ||||||
AACBB: Accelerator Architecture in Computational Biology and Bioinformatics | HIPINEB Technical Session 2 (research papers) | LLVM Performance Workshop | RWDSL'18: 3rd International Workshop on Real World Domain Specific Languages | WPMVP: Workshop on Programming Models for SIMD/Vector Processing | Session 2: Data-Flow and Pointer/Alias Analysis | ||||||
Keynote 2: "Automata Processor and its Applications in Bioinformatics"
Srinivas Aluru (Georgia Tech) |
Evaluating Energy Saving Strategies on Torus, K-Ary N-Tree, and Dragonfly | Cache-aware Scheduling and Performance Modeling with LLVM-Polly and Kerncraft | dsmodels: A Little Language for Dynamical Systems | Ikra-Cpp: A C++/CUDA DSL for Object-Oriented Programming with Structure-of-Arrays Layout | Computing Partially Path-Sensitive MFP Solutions in Data Flow Analyses | ||||||
Streaming Gap-Aware Seed Alignment on the Cache Automaton | VEF3 traces: towards a complete framework for modelling network workloads for exascale systems | Enabling Automatic Partitioning of Data-Parallel Kernels with Polyhedral Compilation | D'Artagnan: An Embedded DSL Framework for Distributed Embedded Systems | Usuba, Optimizing & Trustworthy Bitslicing Compiler | An Efficient Data Structure for Must-Alias Analysis | ||||||
Processing-in-Storage Architecture for Large-Scale Biological Sequence Alignment | Improving the Efficiency of Future Exascale Systems with rCUDA | A Data Layout Transformation for Vectorizing Compilers | Parallel Sparse Flow-Sensitive Points-to Analysis | ||||||||
The Genomic Benchmark Suite: Characterization and Architecture Implications | |||||||||||
[15:00 - 15:30] Coffee Break with Snack | |||||||||||
[15:30 - 17:50] Room: Europa 3 | [15:30 - 17:00] Room: Europa 7 | [15:30 - 17:00] Room: Europa 2 | [15:30 - 17:00] Room: Europa 6 | [15:30 - 17:00] Room: Europa 5 | [15:30 - 17:00] Room: Europa 1 | ||||||
AACBB: Accelerator Architecture in Computational Biology and Bioinformatics | Panel Session: "Industrial perspective of high-speed communication technology evolution" | LLVM Performance Workshop | RWDSL'18: 3rd International Workshop on Real World Domain Specific Languages | WPMVP: Workshop on Programming Models for SIMD/Vector Processing | Session 3: Code Generation and Optimisation | ||||||
Invited Talk: "Addressing Computational Burden to Realize Precision Medicine"
Can Alkan (Bilkent University) |
Industrial perspective of high-speed communication technology evolution
moderated by Prof. Young Cho (University of Southern California), Panelists: Eitan Zahavi, Mellanox Technologies, Israel, Ola Torudbakken, Skala Norge AS, Norway, Cyriel Minkenberg, Rockley Photonics Inc., Switzrland |
Tensor Comprehensions | Q#: Enabling Scalable Quantum Computing and Development with a High-level DSL | Investigating automatic vectorization for real-time 3D scene understanding | PAYJIT: Space-Optimal JIT Compilation and Its Practical Implementation | ||||||
Burrows-Wheeler Short Read Aligner on AWS EC2 F1 | LLVM Q&A Panel: Questions Welcome | A Task-Based DSL for Microcomputers | Panel Discussion | Finding Missed Compiler Optimizations by Differential Testing | |||||||
Towards BIMAX: Binary Inclusion-MAXimal parallel implementation for gene expression analysis | Close | Fast and Flexible Instruction Selection with Constraints | |||||||||
Memory: The Dominant Bottleneck in Genomic Workloads | |||||||||||
Gene Sequencing: Where Time Goes | |||||||||||
Are Next-Generation HPC Systems Ready for Population-level Genomics Data Analytics? | |||||||||||
Closing remarks | |||||||||||
[18:15] Departure of the busses to the Heurigen | |||||||||||
[18:30] Heurigen: Toni & Birgit Nigl | |||||||||||
Sunday February 25th, 2018
HPCA | CGO | PPoPP | CC | ||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
[08:00 - 18:30] Registration | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
[08:30 - 10:00] Room: Europa 5 | [08:30 - 10:00] Room: Europa 7 | [08:30 - 10:00] Room: Pacific 3 | [08:30 - 10:00] Room: Europa 3 | [08:30 - 10:00] Room: Europa 2 | [08:30 - 10:00] Room: Pacific 1 | [08:30 - 10:00] Room: Pacific 2 | [08:30 - 10:00] Room: Europa 6 | [08:45 - 10:00] Room: Europa 1 | |||||||||||||||||||||||||||||||||||||||||||||||
WP3: Second Workshop on Pioneering Processor Paradigms | Accelerating Big Data Processing with Hadoop, Spark and Memcached on Datacenters with Modern Architectures | Tutorial: Improving security with reversibility and session types | PMAM: Workshop on Programming Models and Applications for Multicores and Manycores | GPGPU: Workshop on General Purpose Processing Using GPU | An Introduction to Intel® Threading Building Blocks (Intel® TBB) and its Support for Heterogeneous Programming | Productive parallel programming on FPGA with high-level synthesis | Debugging and Profiling Task Parallel Programs with TASKPROF | CC Keynote | |||||||||||||||||||||||||||||||||||||||||||||||
Welcome and Introduction Pradip Bose | Session 1 | Session 1 | Opening Remarks | Welcome: The Organizers | Session 1 | Session 1 | Session 1 |
Compiler and Language Design for Quantum Computing
Bettina Heim (Microsoft Research, USA) |
|||||||||||||||||||||||||||||||||||||||||||||||
Keynote I: TBD
Mikko H. Lipasti (MICRO 2017 Test of Time Award, University of Wisconsin - Madison) |
Keynote: "Building the next Generation of MapReduce Programming Models over MPI to Fill the Gaps between Data Analytics and Supercomputers" |
Keynote 1: "Initial Steps toward Making GPU a First-Class Computing Resource: Sharing and Resource Management"
Jun Yang (William Kepler Whiteford Professor of Electrical and Computer Engineering, University of Pittsburgh) |
|||||||||||||||||||||||||||||||||||||||||||||||||||||
[09:40 - 10:00] Room: Europa 5 | [09:30 - 10:00] Room: Europa 2 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
WP3: Retrospective Survey I | GPGPU Session 1: Persistent Data Structures | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
On the Evaluation of Computer Architectures | A Case For Persist Barriers in GPUs | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
[10:00 - 10:30] Coffee Break with Snack | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
[10:30 - 11:20] Room: Europa 5 | [10:30 - 12:00] Room: Europa 7 | [10:30 - 12:00] Room: Pacific 3 | [10:30 - 12:00] Room: Europa 3 | [10:30 - 12:00] Room: Europa 2 | [10:30 - 12:00] Room: Pacific 1 | [10:30 - 12:00] Room: Pacific 2 | [10:30 - 12:00] Room: Europa 6 | [10:30 - 12:00] Room: Europa 1 | |||||||||||||||||||||||||||||||||||||||||||||||
WP3: Invited Talk | Accelerating Big Data Processing with Hadoop, Spark and Memcached on Datacenters with Modern Architectures | Tutorial: Improving security with reversibility and session types | PMAM Session 1: GPU and Accelerator | GPGPU Session 2: Applications/Frameworks | An Introduction to Intel® Threading Building Blocks (Intel® TBB) and its Support for Heterogeneous Programming | Productive parallel programming on FPGA with high-level synthesis | Debugging and Profiling Task Parallel Programs with TASKPROF | Session 4: Compilation for Specialised Domains | |||||||||||||||||||||||||||||||||||||||||||||||
40 years since dusk: will hardware capabilities finally make our systems more capable?
Lluis Vilanova (Technion) |
Session 2 | Session 2 | Extending ILUPACK with a Task-Parallel Version of BiCG for Dual-GPU Servers | Overcoming the Difficulty of Large-scale CGH Generation on multi-GPU Cluster | Session 2 | Session 2 | Session 2 | Compiling for Concise Code and Efficient I/O | |||||||||||||||||||||||||||||||||||||||||||||||
[11:20 - 12:00] Room: Europa 5 | Reduction to Band Form for the Singular Value Decomposition on Graphics Accelerators | Transparent Avoidance of Redundant Data Transfer on GPU-enabled Apache Spark | Termination Checking and Task Decomposition for Task-Based Intermittent Programs | ||||||||||||||||||||||||||||||||||||||||||||||||||||
WP3: New/Exploratory paradigms | Combining PREM compilation and ILP scheduling for high-performance and predictable MPSoC execution | GPU-based Acceleration of Detailed Tissue-Scale Cardiac Simulations | A Session Type Provider: Compile-Time API Generation of Distributed Protocols with Refinements in F# | ||||||||||||||||||||||||||||||||||||||||||||||||||||
A Multi-component Branch Predictor Design for Low Resource Budget Processors | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
FFT implementation using mono-instruction set computer architecture | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
[12:00 - 13:30] Lunch | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
[13:20 - 14:20] Room: Europa 5 | [13:30 - 15:00] Room: Europa 7 | [13:30 - 15:00] Room: Pacific 2 | [13:30 - 15:00] Room: Pacific 3 | [13:30 - 15:00] Room: Europa 3 | [13:30 - 14:30] Room: Europa 2 | [13:30 - 15:00] Room: Pacific 1 | [13:30 - 15:00] Room: Europa 6 | [13:30 - 15:00] Room: Europa 1 | |||||||||||||||||||||||||||||||||||||||||||||||
WP3: Second Workshop on Pioneering Processor Paradigms | PULP: An open hardware platform, the story so far | Turning HPC clusters into High Performance & High Throughput facilities by using remote GPU virtualization | Tutorial: Improving security with reversibility and session types | PMAM Session 2: Fine-grain Parallelism | GPGPU: Workshop on General Purpose Processing Using GPU | An Introduction to Intel® Threading Building Blocks (Intel® TBB) and its Support for Heterogeneous Programming | High Performance Distributed Deep Learning: A Beginner's Guide | Session 5: Code Translation and Transformation | |||||||||||||||||||||||||||||||||||||||||||||||
Keynote II: TBD
TBD |
PULP concept and goals | [Session 1.1] Presentation of remote GPU virtualization techniques and rCUDA features (50 minutes) | Session 3 | Fast and Accurate Performance Analysis of Synchronization |
Keynote 2: "Generating High Performance GPU Code using Rewrite Rules with Lift"
Christophe Dubach (University of Edinburgh) |
Session 3 | Session 1 | Tail Call Elimination and Data Representation for Functional Languages on the Java Virtual Machine | |||||||||||||||||||||||||||||||||||||||||||||||
[14:20 - 15:00] Room: Europa 5 | State of the art of open source hardware design | [Session 1.2] Practical demonstration about how to install and use rCUDA (40 minutes) | Supporting Fine-grained Dataflow Parallelism in Big Data Systems | CAnDL: A Domain Specific Language for Compiler Analysis | |||||||||||||||||||||||||||||||||||||||||||||||||||
WP3: Restrospective Survey II | Summary of PULP systems: PULP, PULPino, PULPissimo | Intra-Task Parallelism in Automotive Real-Time Systems | Semantic Reasoning about the Sea of Nodes | ||||||||||||||||||||||||||||||||||||||||||||||||||||
This Architecture Tastes Like Microarchitecture | PULP cores: OR10N, RI5CY, Zero-riscy, Ariane | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
Project CrayOn: Back to the future for a more General-Purpose GPU? | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
[15:00 - 15:30] Coffee Break with Snack | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
[15:30 - 15:50] Room: Europa 5 | [15:30 - 17:30] Room: Europa 7 | [15:30 - 17:00] Room: Pacific 2 | [15:30 - 17:00] Room: Pacific 3 | [15:30 - 17:00] Room: Europa 3 | [15:30 - 16:30] Room: Europa 2 | [15:30 - 17:00] Room: Pacific 1 | [15:30 - 17:00] Room: Europa 6 | [15:30 - 17:00] Room: Europa 1 | |||||||||||||||||||||||||||||||||||||||||||||||
WP3: Restrospective Survey III | PULP: An open hardware platform, the story so far | Turning HPC clusters into High Performance & High Throughput facilities by using remote GPU virtualization | Tutorial: Improving security with reversibility and session types | PMAM Session 3: Cache and Pipeline | GPGPU Session 3: Concurrent Kernels | An Introduction to Intel® Threading Building Blocks (Intel® TBB) and its Support for Heterogeneous Programming | High Performance Distributed Deep Learning: A Beginner's Guide | Session 6: Compile- and Run-Time Analysis | |||||||||||||||||||||||||||||||||||||||||||||||
45-year CPU evolution: one law and two equations | Advanced PULP silicon implementations | [Session 2] Guided exercises so that the audience uses rCUDA in a cluster located at Technical University of Valencia, Spain | Session 4 | Understanding Parallelization Tradeoffs for Linear Pipelines | MaxPair: Enhance OpenCL Concurrent Kernel Execution by Weighted Maximum Matching | Session 4 | Session 2 | Towards a Compiler Analysis for Parallel Algorithmic Skeletons | |||||||||||||||||||||||||||||||||||||||||||||||
[15:30 - 15:50] Room: Europa 5 | Acceleration for PULP systems, examples from cryptography and neural networks | Time for attendees to freely exercise with rCUDA in the remote cluster (a set of exercises is proposed) | An Evaluation of Vectorization and Cache Reuse Tradeoffs on Modern CPUs | Generalized Profile-Guided Iterator Recognition | |||||||||||||||||||||||||||||||||||||||||||||||||||
WP3: Panel Session | PULP Programming | VAIL: A Victim-Aware Cache Policy for Improving Lifetime of Hybrid Memory | Efficient Dynamic Analysis for Node.js | ||||||||||||||||||||||||||||||||||||||||||||||||||||
Panel TBD
Invited Pioneers and speakers plus the retrospective paper authors |
[17:00 - 17:05] Room: Europa 3 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
[15:30 - 15:50] Room: Europa 5 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
WP3: Recap/discussion; clossing remarks, action items | Closing Remarks | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
Discussion driven by workshop organizers. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
[18:00] HPCA/CGO/PPoPP Welcome Reception and Poster Session | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
[19:45] (Anthony’s Bar) Women-in-Computer-Architecture (WICARCH) get-together | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
Monday February 26th, 2018
HPCA | CGO | PPoPP | |||||||
---|---|---|---|---|---|---|---|---|---|
[08:00 - 18:00] | Registration | ||||||||
[08:30 - 08:45] | Opening | ||||||||
[08:45 - 09:55] | (Europa 4) HPCA Keynote: What is the role of Architecture and Software Researchers on the Road to Quantum Supremacy? Margaret Martonosi (Princeton University) | ||||||||
[09:55 - 10:20] | Coffee Break with Snack | ||||||||
[10:20 - 10:30] | Room: Europa 4 | [10:20 - 11:45] | Room: Europa 2 | [10:20 - 11:35] | Room: Europa 3 | ||||
Test of Time Award Session | Session 1: Managed Runtimes | Session 1: Concurrent Data Structures | |||||||
HPCA Test of Time Award | SIMD Intrinsics on Managed Language Runtimes | Session chair: Xipeng Shen (North Carolina State University) | |||||||
[10:30 - 12:00] | Room: Europa 4 | CollectionSwitch: A Framework for Efficient and Dynamic Collection Selection | Interval-Based Memory Reclamation | ||||||
Best Paper Session | Analyzing and Optimizing Task Granularity on the JVM | Harnessing Epoch-based Reclamation for Efficient Range Queries | |||||||
Session chair: Josep Torrellas (UIUC) | A Persistent Lock-Free Queue for Non-Volatile Memory | ||||||||
Amdahl's Law in the Datacenter Era: A Market for Fair Processor Allocation | |||||||||
iNPG: Accelerating Critical Section Access with In-Network Packet Generation for NoC based Many-cores | |||||||||
Enabling Efficient Network Service Function Chain Deployment on Heterogeneous Server Platform | |||||||||
Reducing Data Transfer Energy by Exploiting Similarity within a Data Transaction | |||||||||
[11:45 - 13:15] | Lunch | ||||||||
[13:15 - 14:55] | Room: Europa 4 | [13:15 - 14:55] | Room: Europa 5+6 | [13:15 - 14:55] | Room: Europa 2 | [13:15 - 14:55] | Room: Europa 3 | ||
Session 2A: Architecture for Neural Network | Session 2B: Cache and Memory | Session 2: Resilience and Security | Session 2: Compilers and runtime systems | ||||||
Session chair: Rajeev Balasubramonian (University of Utah) | Session chair: Paul V. Gratz (Texas A&M University) | Automating Efficient Variable-Grained Resiliency for Low-Power IoT Systems | Session chair: I-Ting Angelina Lee (Washington University in St. Louis) | ||||||
Making Memristive Neural Network Accelerators Reliable | A Hybrid Cache Partitioning-Sharing Technique for Commodity Multicores | Resilient Decentralized Android Application Repackaging Detection Using Logic Bombs | Juggler: A Dependency-Aware Task Based Execution Framework for GPUs | ||||||
Towards Efficient Microarchitectural Design for Accelerating Unsupervised GAN-based Deep Learning | SIPT: Speculatively Indexed, Physically Tagged Caches | nAdroid: Statically Detecting Ordering Violations in Android Applications | HPVM: Heterogeneous Parallel Virtual Machine | ||||||
Compressing DMA Engine: Leveraging Activation Sparsity for Training Deep Neural Networks | Domino Temporal Data Prefetcher | SGXElide: Enabling Enclave Code Secrecy via Self-Modification | Hierarchical Memory Management for Mutable State | ||||||
In-situ AI: Towards Autonomous and Incremental Deep Learning for IoT Systems | ProFess: A Probabilistic Hybrid Main Memory Management Framework for High Performance and Fairness | SuperNeurons: Dynamic GPU Memory Management for Training Deep Neural Networks | |||||||
[14:55 - 15:15] | Coffee Break with Snack | ||||||||
[15:15 - 16:55] | Room: Europa 4 | [15:15 - 16:55] | Room: Europa 5+6 | [15:15 - 15:25] | Room: Europa 2 | [15:15 - 16:30] | Room: Europa 3 | ||
Session 3A: Security | Session 3B: GPU Cache and Memory | Test of Time Award Session | Session 3: Performance | ||||||
Session chair: David R. Kaeli (Northeastern University) | Session chair: Bradford M. Beckmann (AMD) | CGO Test of Time Award | Session chair: Milind Chabbi (Baidu Research) | ||||||
RCoal: Mitigating GPU Timing Attack via Subwarp-based Randomized Coalescing Techniques | Accelerate GPU Concurrent Kernel Execution by Mitigating Memory Pipeline Stalls | [15:25 - 16:55] | Room: Europa 2 | Bridging the Gap between Deep Learning and Sparse Matrix Format Selection | |||||
Are Coherence Protocol States vulnerable to Information Leakage? | LATTE-CC: Latency Tolerance Aware Adaptive Cache Compression Management for Energy Efficient GPUs | Session 3: Best Paper Finalists | Optimizing N-Dimensional, Winograd-Based Convolution for Manycore CPUs | ||||||
Record-Replay Architecture as a General Security Framework | GETM: high-performance GPU transactional memory via eager conflict detection | Poker: Permutation-based SIMD Execution of Intensive Tree Search by Path Encoding | vSensor: Leveraging Fixed-Workload Snippets of Programs for Performance Variance Detection | ||||||
The DRAM Latency PUF: Quickly Evaluating Physical Unclonable Functions by Exploiting the Latency-Reliability Tradeoff in Modern DRAM Devices | Efficient and Fair Multi-programming in GPUs via Effective Bandwidth Management | High Performance Stencil Code Generation with LIFT | |||||||
Qubit Allocation | |||||||||
Dominance-based Duplication Simulation (DBDS): Code Duplication to Enable Compiler Optimizations | |||||||||
[16:55 - 17:15] | Break | ||||||||
[17:15 - 18:55] | Room: Europa 4 | [17:15 - 18:55] | Room: Europa 5+6 | [17:00 - 19:00] | Room: Europa 7 | [17:15 - 17:45] | Room: Europa 3 | [17:15 - 17:45] | Room: Europa 3 |
Session 4A: Microarchitecture and Benchmark | Session 4B: Persistent and NVM memory | ||||||||
Session chair: Benjamin Lee (Duke University) | Session chair: Hai Li (Duke University) | Student Research Competition | CGO & PPoPP Artifact Evaluation | CGO & PPoPP Artifact Evaluation | |||||
A Novel Register Renaming Technique for Out-of-Order Processors | Crash Consistency in Encrypted Non-Volatile Main Memory Systems | ||||||||
Wait of a Decade: Did SPEC CPU 2017 Broaden the Performance Horizon? | Adaptive Memory Fusion: Towards Transparent, Agile Integration of Persistent Memory | ||||||||
Architectural Support for Task Dependence Management with Flexible Software Scheduling | Efficient Hardware-based Undo+Redo Logging for Persistent Memory Systems | [18:00 - 19:00] | Room: Europa 2 | [18:00 - 19:00] | Room: Europa 3 | ||||
GDP: Using Dataflow Properties to Accurately Estimate Interference-free Performance at Runtime | Enabling Fine-Grain Restricted Coset Coding Through Word-Level Compression for PCM | ||||||||
[19:15 - 20:15] | Room: Europa 4 | CGO Business Meeting | PPoPP Business Meeting | ||||||
HPCA Business Meeting | |||||||||
Tuesday February 27th, 2018
HPCA | CGO | PPoPP | |||||
---|---|---|---|---|---|---|---|
[08:00 - 17:00] | Registration | ||||||
[08:00 - 09:40] | Room: Europa 4 | [08:00 - 09:40] | Room: Europa 5+6 | [08:00 - 09:40] | Room: Europa 2 | [08:00 - 09:40] | Room: Europa 3 |
Session 5A: GPU | Session 5B: Secure memory | Session 4: Linear Algebra and Vectorization | Session 4: Best Paper Candidates | ||||
Session chair: Minsoo Rhu (POSTECH) | Session chair: Rui Hou (Chinese Academy of Science) | The Generalized Matrix Chain Algorithm | Session chair: Idit Keidar (Technion) | ||||
Perception-Oriented 3D Rendering Approximation for Modern Graphics Processors | D-ORAM: Path-ORAM Delegation for Low Execution Interference on Cloud Servers with Untrusted Memory | CVR: Efficient Vectorization of SpMV on X86 Processors | Cache-Tries: Concurrent Lock-Free Hash Tries with Constant-Time Operations | ||||
Warp Scheduling for Fine-Grained Synchronization | Secure DIMM: Moving ORAM Primitives Closer to Memory | Look-Ahead SLP: Auto-vectorization in the Presence of Commutative Operations | Featherlight On-the-fly False-sharing Detection | ||||
WIR: Warp Instruction Reuse to Minimize Repeated Computations in GPUs | Comprehensive VM Protection against Untrusted Hypervisor through Retrofitted AMD Memory Encryption | Conflict-Free Vectorization of Associative Irregular Applications with Recent SIMD Architectural Advances | Register Optimizations for Stencils on GPUs | ||||
G-TSC: Timestamp Based Coherence for GPUs | SYNERGY: Rethinking Secure-Memory Design for Error-Correcting Memories | FlashR: Parallelize and Scale R for Machine Learning using SSDs | |||||
[09:40 - 10:05] | Coffee Break with Snack | ||||||
[10:05 - 11:45] | Room: Europa 4 | [10:05 - 11:45] | Room: Europa 5+6 | [10:05 - 11:45] | Room: Europa 2 | [10:05 - 11:45] | Room: Europa 3 |
Session 6A: Novel Architecture | Session 6B: In-Memory Computing | Session 5: Static and Dynamic Analysis | Session 5: Concurrency control and fault tolerance | ||||
Session chair: Kei Hiraki (University of Tokyo) | Session chair: Jishen Zhao (UCSD) | Scalable Concurrency Debugging with Distributed Graph Processing | Session chair: Walter Binder (USI) | ||||
A Case for Packageless Processors | RC-NVM: Enabling Symmetric Row and Column Memory Accesses for In-Memory Databases | Lightweight Detection of Cache Conflicts | DisCVar: Discovering Critical Variables Using Algorithmic Differentiation for Transient Faults | ||||
Extending the Power-Efficiency and Performance of Photonic Interconnects for Heterogeneous Multicores | GraphR: Accelerating Graph Processing Using ReRAM | CUDAAdvisor: LLVM-Based Runtime Profiling for Modern GPUs | Practical Concurrent Traversals in Search Trees | ||||
Routerless Networks-on-Chip | GraphP: Reducing Communication of PIM-based Graph Processing with Efficient Data Partition | May-Happen-in-Parallel Analysis with Static Vector Clocks | Communication-Avoiding Parallel Minimum Cuts and Connected Components | ||||
HeatWatch: Optimizing 3D NAND Read Operations With Self-Recovery and Temperature Awareness | PM3: Power Modeling and Power Management for Processing-in-Memory | Safe Privatization in Transactional Memory | |||||
[11:45 - 13:15] | Lunch | ||||||
[11:45 - 12:30] | (lunch room) Women in Academia and Industry Lunch Session | ||||||
[12:35 - 13:10] | (Europa 4) Women in Academia and Industry Panel | ||||||
[13:15 - 14:25] | (Europa 4) CGO Keynote: Biological Computation Sara-Jane Dunn (Microsoft Research Limited) | ||||||
[14:25 - 14:50] | Coffee Break with Snack | ||||||
[14:50 - 16:30] | Room: Europa 4 | [14:50 - 16:30] | Room: Europa 5+6 | [14:50 - 16:30] | Room: Europa 2 | [14:50 - 16:30] | Room: Europa 3 |
Session 7A: Industry Track | Session 7B: Best of CAL | Session 6: Memory usage Optimisation | Session 6: Models and Libraries | ||||
Session chair: Lieven Eeckhout (Ghent University) | Session chair: Dan Sorin (Duke University) | DeLICM: Scalar Dependence Removal at Zero Memory Cost | Session chair: Zoltan Majo (Ergon Informatik AG) | ||||
Don't Correct the Tags in a Cache, just Check their Hamming Distance from the Lookup Tag | Resistive Address Decoder | Loop Transformations Leveraging Hardware Prefetching | Making Pull-Based Graph Processing Performant | ||||
Reliability-aware Data Placement for Heterogeneous Memory Architecture | Transcending Hardware Limits with Software Out-of-order Processing | Transforming Loop Chains via Macro Dataflow Graphs | An Effective Fusion and Tile Size Model for Optimizing Image Processing Pipelines | ||||
SmarCo: An Efficient Many-Core Processor for High-Throughput Applications in Datacenters | Sensing CPU voltage noise through Electromagnetic Emanations | Local Memory-Aware Kernel Perforation | LazyGraph: Lazy Data Coherency for Replicas in Distributed Graph-Parallel Computation | ||||
Lost in Abstraction: Pitfalls of Analyzing GPUs at the Intermediate Language Level | PAM: Parallel Augmented Maps | ||||||
[17:00] | Departure of the busses to Palais Liechtenstein | ||||||
[18:00] | Banquet at Palais Liechtenstein | ||||||
Wednesday February 28th, 2018
HPCA | CGO | PPoPP | |||||
---|---|---|---|---|---|---|---|
[08:00 - 09:00] | (Europa 4) PPoPP Keynote: From confusion to clarity: hardware concurrency programming models 2008-2018 Peter Sewell (University of Cambridge) | ||||||
[09:00 - 09:25] | Coffee Break with Snack | ||||||
[09:25 - 11:05] | Room: Europa 4 | [09:25 - 11:05] | Room: Europa 5+6 | [09:25 - 11:05] | Room: Europa 2 | [09:25 - 11:05] | Room: Europa 3 |
Session 8A: Industry Track (applications) | Session 8B: Memory | Session 7: Program Generation and Synthesis | Session 7: Parallel frameworks and applications | ||||
Session chair: Andrew Putnam (Microsoft) | Session chair: Guangyu Sun (Peking University) | AutoPA: Automatically Generating Active Driver from Original Passive Driver Code | Session chair: Bernhard Egger (Seoul National University) | ||||
Applied Machine Learning at Facebook: A Datacenter Infrastructure Perspective | ERUCA: Efficient DRAM Resource Utilization and Resource Conflict Avoidance for Memory System Parallelism | Synthesizing an Instruction Selection Rule Library from Semantic Specifications | Efficient Shuffle Management with SCache for DAG Computing Frameworks | ||||
Amdahl's Law in Big Data Analytics: Alive and Kicking in TPCx-BB (BigBench) | DUO: Dual Use of On-chip Redundancy for High Reliability | Synthesizing Programs That Expose Performance Bottlenecks | High-Performance Genomics Data Analysis Framework with In-Memory Computing | ||||
Memory Hierarchy for Web Search | Memory System Design for Ultra Low Power, Computationally Error Resilient Processor Microarchitectures | Program Generation for Small-Scale Linear Algebra Applications | Griffin: Uniting CPU and GPU in Information Retrieval Systems for Intra-Query Parallelism | ||||
Characterizing Resource Sensitivity of Database Workloads | NACHOS : Software-Driven Hardware-Assisted Memory Disambiguation for Accelerators | swSpTRSV: a Fast Sparse Triangular Solve with Sparse Level Tile Layout on Sunway Architectures | |||||
[11:05 - 11:20] | Break | ||||||
[11:20 - 12:35] | Room: Europa 4 | [11:20 - 12:35] | Room: Europa 5+6 | [11:20 - 12:35] | Room: Europa 2 | [11:20 - 12:10] | Room: Europa 3 |
Session 9A: Accelerators | Session 9B: Power | Session 8: Compilation for Specialised Domains | Session 8: Race Detection | ||||
Session chair: Xuehai Qian (USC) | Session chair: Guru Venkataramani (George Washington University) | Optimal DNN Primitive Selection with Partitioned Boolean Quadratic Programming | Session chair: Jesper Larsson Träff (TU Wien) | ||||
OuterSPACE: An Outer product based SPArse matrix multiplication acCElerator | Power and Energy Characterization of an Open Source 25-core Manycore Processor | Register Allocation for Intel Processor Graphics | VerifiedFT: A Verified, High-Performance Dynamic Race Detector | ||||
Searching for Potential gRNA Off-Target Sites for CRISPR/Cas9 using Automata Processing across Different Platforms | A Spot Capacity Market to Increase Power Infrastructure Utilization in Multi-Tenant Data Centers | A Compiler for Cyber-Physical Digital Microfluidic Biochips | Efficient Parallel Determinacy Race Detection for Two-Dimensional Dags | ||||
Characterizing and Mitigating Output Reporting Bottlenecks in Spatial-Reconfigurable Automata Processing Architectures | GPGPU Power Modeling for Multi-Domain Voltage-Frequency Scaling | ||||||
[12:35] | [12:35 - 12:45] | Room: Europa 2 | [12:10] | ||||
Best Paper Award Session | |||||||
HPCA Closing | CGO 2018 Best Paper Award | PPoPP Closing | |||||
[12:45] | |||||||
CGO Closing | |||||||