HPCA/CGO/PPoPP/CC joint program

Saturday February 24th, 2018

HPCA		CGO		PPoPP	CC
[08:00 - 18:15] Registration
[08:30 - 10:00] Room: Europa 3	[08:30 - 10:00] Room: Europa 7	[09:15 - 10:00] Room: Europa 2	[09:15 - 10:00] Room: Europa 6	[08:30 - 10:00] Room: Europa 5	[08:30 - 08:45] Room: Europa 1
AACBB: Accelerator Architecture in Computational Biology and Bioinformatics	HIPINEB: High-Performance Interconnection Networks in the Exascale and Big-Data Era	LLVM Performance Workshop	RWDSL'18: 3rd International Workshop on Real World Domain Specific Languages	WPMVP: Workshop on Programming Models for SIMD/Vector Processing	CC: International Conference on Compiler Construction Compiler Construction
Opening Remarks	Opening	How to Evaluate "In-Memory Computing" Performances without Hardware Measurements?	Welcome	Keynote TBA	Opening
Keynote 1: "Accelerating Genome Analysis: A Primer on an Ongoing Journey" Onur Mutlu (ETH, CMU)	Keynote: "The three L's in modern high-performance networking: Low latency, Low cost, Low processing load"		Industrial Experience with the Migration of Legacy Models using a DSL	Vectorization of a spectral finite-element numerical kernel (Application)	[08:45 - 10:00] Room: Europa 1
Exploring Speed/Accuracy Trade-offs					CC Keynote
Accelerating Duplicate Marking In The Cloud					Rethinking Compilers in the Rise of Machine Learning and AI Xipeng Shen (North Carolina State University, USA)

[10:00 - 10:30] Coffee Break with Snack
[10:30 - 12:10] Room: Europa 3	[10:30 - 12:00] Room: Europa 7	[10:30 - 12:00] Room: Europa 2	[10:30 - 11:50] Room: Europa 6	[10:30 - 12:00] Room: Europa 5	[10:30 - 12:00] Room: Europa 1
AACBB: Accelerator Architecture in Computational Biology and Bioinformatics	HIPINEB Technical Session 1 (research papers)	LLVM Performance Workshop	RWDSL'18: 3rd International Workshop on Real World Domain Specific Languages	WPMVP: Workshop on Programming Models for SIMD/Vector Processing	Session 1: Polyhedral Compilation
Invited Talk: "Next Generation Sequencing: Big Data meets High Performance Computing Architectures" Bertil Schmidt (JGU Mainz)	Analysis and improvement of Valiant routing in low-diameter networks	Optimizing LLVM IR for Guided Vectorization	Saiph: Towards a DSL for High-Performance Computational Fluid Dynamics.	Small SIMD Matrices for CERN High Throughput Computing	Modeling the Conflicting Demands of Parallelism and Temporal/Spatial Locality in Affine Scheduling
GAME: GPU Acceleration of Metagenomics Clustering	Node-type-based load-balancing routing for Parallel Generalized Fat-Trees	Efficient use of memory by reducing size of AST dumps in cross file analysis by clang static analyzer	CFDlang: High-level code generation for high-order methods in fluid dynamics	SIMDization of Small Tensor Multiplication Kernels for Wide SIMD Vector Processors	A Polyhedral Compilation Framework for Loops with Dynamic Data-Dependent Bounds
Exact Alignment with FM-index on the Intel Xeon Phi Knights Landing Processor	Analyzing topology parameters for achieving energy-efficient k-ary n-cubes			MIPP: a Portable C++ SIMD Wrapper and its use for Error Correction Coding in 5G Standard	Polyhedral Expression Propagation
Optimizations of Sequence Alignment on FPGA: A Case Study of Extended Sequence Alignment

[12:00 - 13:30] Lunch
[13:30 - 15:10] Room: Europa 3	[13:30 - 15:00] Room: Europa 7	[13:30 - 15:00] Room: Europa 2	[13:30 - 14:50] Room: Europa 6	[13:30 - 15:00] Room: Europa 5	[13:30 - 15:00] Room: Europa 1
AACBB: Accelerator Architecture in Computational Biology and Bioinformatics	HIPINEB Technical Session 2 (research papers)	LLVM Performance Workshop	RWDSL'18: 3rd International Workshop on Real World Domain Specific Languages	WPMVP: Workshop on Programming Models for SIMD/Vector Processing	Session 2: Data-Flow and Pointer/Alias Analysis
Keynote 2: "Automata Processor and its Applications in Bioinformatics" Srinivas Aluru (Georgia Tech)	Evaluating Energy Saving Strategies on Torus, K-Ary N-Tree, and Dragonfly	Cache-aware Scheduling and Performance Modeling with LLVM-Polly and Kerncraft	dsmodels: A Little Language for Dynamical Systems	Ikra-Cpp: A C++/CUDA DSL for Object-Oriented Programming with Structure-of-Arrays Layout	Computing Partially Path-Sensitive MFP Solutions in Data Flow Analyses
Streaming Gap-Aware Seed Alignment on the Cache Automaton	VEF3 traces: towards a complete framework for modelling network workloads for exascale systems	Enabling Automatic Partitioning of Data-Parallel Kernels with Polyhedral Compilation	D'Artagnan: An Embedded DSL Framework for Distributed Embedded Systems	Usuba, Optimizing & Trustworthy Bitslicing Compiler	An Efficient Data Structure for Must-Alias Analysis
Processing-in-Storage Architecture for Large-Scale Biological Sequence Alignment	Improving the Efficiency of Future Exascale Systems with rCUDA			A Data Layout Transformation for Vectorizing Compilers	Parallel Sparse Flow-Sensitive Points-to Analysis
The Genomic Benchmark Suite: Characterization and Architecture Implications

[15:00 - 15:30] Coffee Break with Snack
[15:30 - 17:50] Room: Europa 3	[15:30 - 17:00] Room: Europa 7	[15:30 - 17:00] Room: Europa 2	[15:30 - 17:00] Room: Europa 6	[15:30 - 17:00] Room: Europa 5	[15:30 - 17:00] Room: Europa 1
AACBB: Accelerator Architecture in Computational Biology and Bioinformatics	Panel Session: "Industrial perspective of high-speed communication technology evolution"	LLVM Performance Workshop	RWDSL'18: 3rd International Workshop on Real World Domain Specific Languages	WPMVP: Workshop on Programming Models for SIMD/Vector Processing	Session 3: Code Generation and Optimisation
Invited Talk: "Addressing Computational Burden to Realize Precision Medicine" Can Alkan (Bilkent University)	Industrial perspective of high-speed communication technology evolution moderated by Prof. Young Cho (University of Southern California), Panelists: Eitan Zahavi, Mellanox Technologies, Israel, Ola Torudbakken, Skala Norge AS, Norway, Cyriel Minkenberg, Rockley Photonics Inc., Switzrland	Tensor Comprehensions	Q#: Enabling Scalable Quantum Computing and Development with a High-level DSL	Investigating automatic vectorization for real-time 3D scene understanding	PAYJIT: Space-Optimal JIT Compilation and Its Practical Implementation
Burrows-Wheeler Short Read Aligner on AWS EC2 F1		LLVM Q&A Panel: Questions Welcome	A Task-Based DSL for Microcomputers	Panel Discussion	Finding Missed Compiler Optimizations by Differential Testing
Towards BIMAX: Binary Inclusion-MAXimal parallel implementation for gene expression analysis			Close		Fast and Flexible Instruction Selection with Constraints
Memory: The Dominant Bottleneck in Genomic Workloads
Gene Sequencing: Where Time Goes
Are Next-Generation HPC Systems Ready for Population-level Genomics Data Analytics?
Closing remarks

[18:15] Departure of the busses to the Heurigen
[18:30] Heurigen: Toni & Birgit Nigl

Sunday February 25th, 2018

HPCA				CGO	PPoPP								CC
[08:00 - 18:30] Registration
[08:30 - 10:00] Room: Europa 5		[08:30 - 10:00] Room: Europa 7		[08:30 - 10:00] Room: Pacific 3	[08:30 - 10:00] Room: Europa 3	[08:30 - 10:00] Room: Europa 2		[08:30 - 10:00] Room: Pacific 1		[08:30 - 10:00] Room: Pacific 2		[08:30 - 10:00] Room: Europa 6	[08:45 - 10:00] Room: Europa 1
WP3: Second Workshop on Pioneering Processor Paradigms		Accelerating Big Data Processing with Hadoop, Spark and Memcached on Datacenters with Modern Architectures		Tutorial: Improving security with reversibility and session types	PMAM: Workshop on Programming Models and Applications for Multicores and Manycores	GPGPU: Workshop on General Purpose Processing Using GPU		An Introduction to Intel® Threading Building Blocks (Intel® TBB) and its Support for Heterogeneous Programming		Productive parallel programming on FPGA with high-level synthesis		Debugging and Profiling Task Parallel Programs with TASKPROF	CC Keynote
Welcome and Introduction Pradip Bose		Session 1		Session 1	Opening Remarks	Welcome: The Organizers		Session 1		Session 1		Session 1	Compiler and Language Design for Quantum Computing Bettina Heim (Microsoft Research, USA)
Keynote I: TBD Mikko H. Lipasti (MICRO 2017 Test of Time Award, University of Wisconsin - Madison)					Keynote: "Building the next Generation of MapReduce Programming Models over MPI to Fill the Gaps between Data Analytics and Supercomputers"	Keynote 1: "Initial Steps toward Making GPU a First-Class Computing Resource: Sharing and Resource Management" Jun Yang (William Kepler Whiteford Professor of Electrical and Computer Engineering, University of Pittsburgh)
[09:40 - 10:00] Room: Europa 5						[09:30 - 10:00] Room: Europa 2
WP3: Retrospective Survey I						GPGPU Session 1: Persistent Data Structures
On the Evaluation of Computer Architectures						A Case For Persist Barriers in GPUs

[10:00 - 10:30] Coffee Break with Snack
[10:30 - 11:20] Room: Europa 5		[10:30 - 12:00] Room: Europa 7		[10:30 - 12:00] Room: Pacific 3	[10:30 - 12:00] Room: Europa 3	[10:30 - 12:00] Room: Europa 2		[10:30 - 12:00] Room: Pacific 1		[10:30 - 12:00] Room: Pacific 2		[10:30 - 12:00] Room: Europa 6	[10:30 - 12:00] Room: Europa 1
WP3: Invited Talk		Accelerating Big Data Processing with Hadoop, Spark and Memcached on Datacenters with Modern Architectures		Tutorial: Improving security with reversibility and session types	PMAM Session 1: GPU and Accelerator	GPGPU Session 2: Applications/Frameworks		An Introduction to Intel® Threading Building Blocks (Intel® TBB) and its Support for Heterogeneous Programming		Productive parallel programming on FPGA with high-level synthesis		Debugging and Profiling Task Parallel Programs with TASKPROF	Session 4: Compilation for Specialised Domains
40 years since dusk: will hardware capabilities finally make our systems more capable? Lluis Vilanova (Technion)		Session 2		Session 2	Extending ILUPACK with a Task-Parallel Version of BiCG for Dual-GPU Servers	Overcoming the Difficulty of Large-scale CGH Generation on multi-GPU Cluster		Session 2		Session 2		Session 2	Compiling for Concise Code and Efficient I/O
[11:20 - 12:00] Room: Europa 5					Reduction to Band Form for the Singular Value Decomposition on Graphics Accelerators	Transparent Avoidance of Redundant Data Transfer on GPU-enabled Apache Spark							Termination Checking and Task Decomposition for Task-Based Intermittent Programs
WP3: New/Exploratory paradigms					Combining PREM compilation and ILP scheduling for high-performance and predictable MPSoC execution	GPU-based Acceleration of Detailed Tissue-Scale Cardiac Simulations							A Session Type Provider: Compile-Time API Generation of Distributed Protocols with Refinements in F#
A Multi-component Branch Predictor Design for Low Resource Budget Processors
FFT implementation using mono-instruction set computer architecture

[12:00 - 13:30] Lunch
[13:20 - 14:20] Room: Europa 5	[13:30 - 15:00] Room: Europa 7		[13:30 - 15:00] Room: Pacific 2	[13:30 - 15:00] Room: Pacific 3	[13:30 - 15:00] Room: Europa 3		[13:30 - 14:30] Room: Europa 2		[13:30 - 15:00] Room: Pacific 1		[13:30 - 15:00] Room: Europa 6		[13:30 - 15:00] Room: Europa 1
WP3: Second Workshop on Pioneering Processor Paradigms	PULP: An open hardware platform, the story so far		Turning HPC clusters into High Performance & High Throughput facilities by using remote GPU virtualization	Tutorial: Improving security with reversibility and session types	PMAM Session 2: Fine-grain Parallelism		GPGPU: Workshop on General Purpose Processing Using GPU		An Introduction to Intel® Threading Building Blocks (Intel® TBB) and its Support for Heterogeneous Programming		High Performance Distributed Deep Learning: A Beginner's Guide		Session 5: Code Translation and Transformation
Keynote II: TBD TBD	PULP concept and goals		[Session 1.1] Presentation of remote GPU virtualization techniques and rCUDA features (50 minutes)	Session 3	Fast and Accurate Performance Analysis of Synchronization		Keynote 2: "Generating High Performance GPU Code using Rewrite Rules with Lift" Christophe Dubach (University of Edinburgh)		Session 3		Session 1		Tail Call Elimination and Data Representation for Functional Languages on the Java Virtual Machine
[14:20 - 15:00] Room: Europa 5	State of the art of open source hardware design		[Session 1.2] Practical demonstration about how to install and use rCUDA (40 minutes)		Supporting Fine-grained Dataflow Parallelism in Big Data Systems								CAnDL: A Domain Specific Language for Compiler Analysis
WP3: Restrospective Survey II	Summary of PULP systems: PULP, PULPino, PULPissimo				Intra-Task Parallelism in Automotive Real-Time Systems								Semantic Reasoning about the Sea of Nodes
This Architecture Tastes Like Microarchitecture	PULP cores: OR10N, RI5CY, Zero-riscy, Ariane
Project CrayOn: Back to the future for a more General-Purpose GPU?

[15:00 - 15:30] Coffee Break with Snack
[15:30 - 15:50] Room: Europa 5	[15:30 - 17:30] Room: Europa 7		[15:30 - 17:00] Room: Pacific 2	[15:30 - 17:00] Room: Pacific 3	[15:30 - 17:00] Room: Europa 3		[15:30 - 16:30] Room: Europa 2		[15:30 - 17:00] Room: Pacific 1		[15:30 - 17:00] Room: Europa 6		[15:30 - 17:00] Room: Europa 1
WP3: Restrospective Survey III	PULP: An open hardware platform, the story so far		Turning HPC clusters into High Performance & High Throughput facilities by using remote GPU virtualization	Tutorial: Improving security with reversibility and session types	PMAM Session 3: Cache and Pipeline		GPGPU Session 3: Concurrent Kernels		An Introduction to Intel® Threading Building Blocks (Intel® TBB) and its Support for Heterogeneous Programming		High Performance Distributed Deep Learning: A Beginner's Guide		Session 6: Compile- and Run-Time Analysis
45-year CPU evolution: one law and two equations	Advanced PULP silicon implementations		[Session 2] Guided exercises so that the audience uses rCUDA in a cluster located at Technical University of Valencia, Spain	Session 4	Understanding Parallelization Tradeoffs for Linear Pipelines		MaxPair: Enhance OpenCL Concurrent Kernel Execution by Weighted Maximum Matching		Session 4		Session 2		Towards a Compiler Analysis for Parallel Algorithmic Skeletons
[15:30 - 15:50] Room: Europa 5	Acceleration for PULP systems, examples from cryptography and neural networks		Time for attendees to freely exercise with rCUDA in the remote cluster (a set of exercises is proposed)		An Evaluation of Vectorization and Cache Reuse Tradeoffs on Modern CPUs								Generalized Profile-Guided Iterator Recognition
WP3: Panel Session	PULP Programming				VAIL: A Victim-Aware Cache Policy for Improving Lifetime of Hybrid Memory								Efficient Dynamic Analysis for Node.js
Panel TBD Invited Pioneers and speakers plus the retrospective paper authors					[17:00 - 17:05] Room: Europa 3
[15:30 - 15:50] Room: Europa 5
WP3: Recap/discussion; clossing remarks, action items					Closing Remarks
Discussion driven by workshop organizers.

[18:00] HPCA/CGO/PPoPP Welcome Reception and Poster Session
[19:45] (Anthony’s Bar) Women-in-Computer-Architecture (WICARCH) get-together

Monday February 26th, 2018

HPCA				CGO				PPoPP
[08:00 - 18:00]	Registration
[08:30 - 08:45]	Opening
[08:45 - 09:55]	(Europa 4) HPCA Keynote: What is the role of Architecture and Software Researchers on the Road to Quantum Supremacy? Margaret Martonosi (Princeton University)
[09:55 - 10:20]	Coffee Break with Snack
[10:20 - 10:30]	Room: Europa 4			[10:20 - 11:45]	Room: Europa 2			[10:20 - 11:35]	Room: Europa 3
	Test of Time Award Session				Session 1: Managed Runtimes				Session 1: Concurrent Data Structures
	HPCA Test of Time Award				SIMD Intrinsics on Managed Language Runtimes				Session chair: Xipeng Shen (North Carolina State University)
[10:30 - 12:00]	Room: Europa 4				CollectionSwitch: A Framework for Efficient and Dynamic Collection Selection				Interval-Based Memory Reclamation
	Best Paper Session				Analyzing and Optimizing Task Granularity on the JVM				Harnessing Epoch-based Reclamation for Efficient Range Queries
	Session chair: Josep Torrellas (UIUC)								A Persistent Lock-Free Queue for Non-Volatile Memory
	Amdahl's Law in the Datacenter Era: A Market for Fair Processor Allocation
	iNPG: Accelerating Critical Section Access with In-Network Packet Generation for NoC based Many-cores
	Enabling Efficient Network Service Function Chain Deployment on Heterogeneous Server Platform
	Reducing Data Transfer Energy by Exploiting Similarity within a Data Transaction

[11:45 - 13:15]	Lunch
[13:15 - 14:55]	Room: Europa 4	[13:15 - 14:55]	Room: Europa 5+6	[13:15 - 14:55]	Room: Europa 2			[13:15 - 14:55]	Room: Europa 3
	Session 2A: Architecture for Neural Network		Session 2B: Cache and Memory		Session 2: Resilience and Security				Session 2: Compilers and runtime systems
	Session chair: Rajeev Balasubramonian (University of Utah)		Session chair: Paul V. Gratz (Texas A&M University)		Automating Efficient Variable-Grained Resiliency for Low-Power IoT Systems				Session chair: I-Ting Angelina Lee (Washington University in St. Louis)
	Making Memristive Neural Network Accelerators Reliable		A Hybrid Cache Partitioning-Sharing Technique for Commodity Multicores		Resilient Decentralized Android Application Repackaging Detection Using Logic Bombs				Juggler: A Dependency-Aware Task Based Execution Framework for GPUs
	Towards Efficient Microarchitectural Design for Accelerating Unsupervised GAN-based Deep Learning		SIPT: Speculatively Indexed, Physically Tagged Caches		nAdroid: Statically Detecting Ordering Violations in Android Applications				HPVM: Heterogeneous Parallel Virtual Machine
	Compressing DMA Engine: Leveraging Activation Sparsity for Training Deep Neural Networks		Domino Temporal Data Prefetcher		SGXElide: Enabling Enclave Code Secrecy via Self-Modification				Hierarchical Memory Management for Mutable State
	In-situ AI: Towards Autonomous and Incremental Deep Learning for IoT Systems		ProFess: A Probabilistic Hybrid Main Memory Management Framework for High Performance and Fairness						SuperNeurons: Dynamic GPU Memory Management for Training Deep Neural Networks

[14:55 - 15:15]	Coffee Break with Snack
[15:15 - 16:55]	Room: Europa 4	[15:15 - 16:55]	Room: Europa 5+6	[15:15 - 15:25]	Room: Europa 2			[15:15 - 16:30]	Room: Europa 3
	Session 3A: Security		Session 3B: GPU Cache and Memory		Test of Time Award Session				Session 3: Performance
	Session chair: David R. Kaeli (Northeastern University)		Session chair: Bradford M. Beckmann (AMD)		CGO Test of Time Award				Session chair: Milind Chabbi (Baidu Research)
	RCoal: Mitigating GPU Timing Attack via Subwarp-based Randomized Coalescing Techniques		Accelerate GPU Concurrent Kernel Execution by Mitigating Memory Pipeline Stalls	[15:25 - 16:55]	Room: Europa 2				Bridging the Gap between Deep Learning and Sparse Matrix Format Selection
	Are Coherence Protocol States vulnerable to Information Leakage?		LATTE-CC: Latency Tolerance Aware Adaptive Cache Compression Management for Energy Efficient GPUs		Session 3: Best Paper Finalists				Optimizing N-Dimensional, Winograd-Based Convolution for Manycore CPUs
	Record-Replay Architecture as a General Security Framework		GETM: high-performance GPU transactional memory via eager conflict detection		Poker: Permutation-based SIMD Execution of Intensive Tree Search by Path Encoding				vSensor: Leveraging Fixed-Workload Snippets of Programs for Performance Variance Detection
	The DRAM Latency PUF: Quickly Evaluating Physical Unclonable Functions by Exploiting the Latency-Reliability Tradeoff in Modern DRAM Devices		Efficient and Fair Multi-programming in GPUs via Effective Bandwidth Management		High Performance Stencil Code Generation with LIFT
					Qubit Allocation
					Dominance-based Duplication Simulation (DBDS): Code Duplication to Enable Compiler Optimizations

[16:55 - 17:15]	Break
[17:15 - 18:55]	Room: Europa 4	[17:15 - 18:55]	Room: Europa 5+6	[17:00 - 19:00]	Room: Europa 7	[17:15 - 17:45]	Room: Europa 3	[17:15 - 17:45]	Room: Europa 3
	Session 4A: Microarchitecture and Benchmark		Session 4B: Persistent and NVM memory
	Session chair: Benjamin Lee (Duke University)		Session chair: Hai Li (Duke University)		Student Research Competition		CGO & PPoPP Artifact Evaluation		CGO & PPoPP Artifact Evaluation
	A Novel Register Renaming Technique for Out-of-Order Processors		Crash Consistency in Encrypted Non-Volatile Main Memory Systems
	Wait of a Decade: Did SPEC CPU 2017 Broaden the Performance Horizon?		Adaptive Memory Fusion: Towards Transparent, Agile Integration of Persistent Memory
	Architectural Support for Task Dependence Management with Flexible Software Scheduling		Efficient Hardware-based Undo+Redo Logging for Persistent Memory Systems			[18:00 - 19:00]	Room: Europa 2	[18:00 - 19:00]	Room: Europa 3
	GDP: Using Dataflow Properties to Accurately Estimate Interference-free Performance at Runtime		Enabling Fine-Grain Restricted Coset Coding Through Word-Level Compression for PCM
[19:15 - 20:15]	Room: Europa 4						CGO Business Meeting		PPoPP Business Meeting

	HPCA Business Meeting

Tuesday February 27th, 2018

HPCA				CGO		PPoPP
[08:00 - 17:00]	Registration
[08:00 - 09:40]	Room: Europa 4	[08:00 - 09:40]	Room: Europa 5+6	[08:00 - 09:40]	Room: Europa 2	[08:00 - 09:40]	Room: Europa 3
	Session 5A: GPU		Session 5B: Secure memory		Session 4: Linear Algebra and Vectorization		Session 4: Best Paper Candidates
	Session chair: Minsoo Rhu (POSTECH)		Session chair: Rui Hou (Chinese Academy of Science)		The Generalized Matrix Chain Algorithm		Session chair: Idit Keidar (Technion)
	Perception-Oriented 3D Rendering Approximation for Modern Graphics Processors		D-ORAM: Path-ORAM Delegation for Low Execution Interference on Cloud Servers with Untrusted Memory		CVR: Efficient Vectorization of SpMV on X86 Processors		Cache-Tries: Concurrent Lock-Free Hash Tries with Constant-Time Operations
	Warp Scheduling for Fine-Grained Synchronization		Secure DIMM: Moving ORAM Primitives Closer to Memory		Look-Ahead SLP: Auto-vectorization in the Presence of Commutative Operations		Featherlight On-the-fly False-sharing Detection
	WIR: Warp Instruction Reuse to Minimize Repeated Computations in GPUs		Comprehensive VM Protection against Untrusted Hypervisor through Retrofitted AMD Memory Encryption		Conflict-Free Vectorization of Associative Irregular Applications with Recent SIMD Architectural Advances		Register Optimizations for Stencils on GPUs
	G-TSC: Timestamp Based Coherence for GPUs		SYNERGY: Rethinking Secure-Memory Design for Error-Correcting Memories				FlashR: Parallelize and Scale R for Machine Learning using SSDs

[09:40 - 10:05]	Coffee Break with Snack
[10:05 - 11:45]	Room: Europa 4	[10:05 - 11:45]	Room: Europa 5+6	[10:05 - 11:45]	Room: Europa 2	[10:05 - 11:45]	Room: Europa 3
	Session 6A: Novel Architecture		Session 6B: In-Memory Computing		Session 5: Static and Dynamic Analysis		Session 5: Concurrency control and fault tolerance
	Session chair: Kei Hiraki (University of Tokyo)		Session chair: Jishen Zhao (UCSD)		Scalable Concurrency Debugging with Distributed Graph Processing		Session chair: Walter Binder (USI)
	A Case for Packageless Processors		RC-NVM: Enabling Symmetric Row and Column Memory Accesses for In-Memory Databases		Lightweight Detection of Cache Conflicts		DisCVar: Discovering Critical Variables Using Algorithmic Differentiation for Transient Faults
	Extending the Power-Efficiency and Performance of Photonic Interconnects for Heterogeneous Multicores		GraphR: Accelerating Graph Processing Using ReRAM		CUDAAdvisor: LLVM-Based Runtime Profiling for Modern GPUs		Practical Concurrent Traversals in Search Trees
	Routerless Networks-on-Chip		GraphP: Reducing Communication of PIM-based Graph Processing with Efficient Data Partition		May-Happen-in-Parallel Analysis with Static Vector Clocks		Communication-Avoiding Parallel Minimum Cuts and Connected Components
	HeatWatch: Optimizing 3D NAND Read Operations With Self-Recovery and Temperature Awareness		PM3: Power Modeling and Power Management for Processing-in-Memory				Safe Privatization in Transactional Memory

[11:45 - 13:15]	Lunch
[11:45 - 12:30]	(lunch room) Women in Academia and Industry Lunch Session
[12:35 - 13:10]	(Europa 4) Women in Academia and Industry Panel
[13:15 - 14:25]	(Europa 4) CGO Keynote: Biological Computation Sara-Jane Dunn (Microsoft Research Limited)
[14:25 - 14:50]	Coffee Break with Snack
[14:50 - 16:30]	Room: Europa 4	[14:50 - 16:30]	Room: Europa 5+6	[14:50 - 16:30]	Room: Europa 2	[14:50 - 16:30]	Room: Europa 3
	Session 7A: Industry Track		Session 7B: Best of CAL		Session 6: Memory usage Optimisation		Session 6: Models and Libraries
	Session chair: Lieven Eeckhout (Ghent University)		Session chair: Dan Sorin (Duke University)		DeLICM: Scalar Dependence Removal at Zero Memory Cost		Session chair: Zoltan Majo (Ergon Informatik AG)
	Don't Correct the Tags in a Cache, just Check their Hamming Distance from the Lookup Tag		Resistive Address Decoder		Loop Transformations Leveraging Hardware Prefetching		Making Pull-Based Graph Processing Performant
	Reliability-aware Data Placement for Heterogeneous Memory Architecture		Transcending Hardware Limits with Software Out-of-order Processing		Transforming Loop Chains via Macro Dataflow Graphs		An Effective Fusion and Tile Size Model for Optimizing Image Processing Pipelines
	SmarCo: An Efficient Many-Core Processor for High-Throughput Applications in Datacenters		Sensing CPU voltage noise through Electromagnetic Emanations		Local Memory-Aware Kernel Perforation		LazyGraph: Lazy Data Coherency for Replicas in Distributed Graph-Parallel Computation
	Lost in Abstraction: Pitfalls of Analyzing GPUs at the Intermediate Language Level						PAM: Parallel Augmented Maps

[17:00]	Departure of the busses to Palais Liechtenstein
[18:00]	Banquet at Palais Liechtenstein

Wednesday February 28th, 2018

HPCA				CGO		PPoPP
[08:00 - 09:00]	(Europa 4) PPoPP Keynote: From confusion to clarity: hardware concurrency programming models 2008-2018 Peter Sewell (University of Cambridge)
[09:00 - 09:25]	Coffee Break with Snack
[09:25 - 11:05]	Room: Europa 4	[09:25 - 11:05]	Room: Europa 5+6	[09:25 - 11:05]	Room: Europa 2	[09:25 - 11:05]	Room: Europa 3
	Session 8A: Industry Track (applications)		Session 8B: Memory		Session 7: Program Generation and Synthesis		Session 7: Parallel frameworks and applications
	Session chair: Andrew Putnam (Microsoft)		Session chair: Guangyu Sun (Peking University)		AutoPA: Automatically Generating Active Driver from Original Passive Driver Code		Session chair: Bernhard Egger (Seoul National University)
	Applied Machine Learning at Facebook: A Datacenter Infrastructure Perspective		ERUCA: Efficient DRAM Resource Utilization and Resource Conflict Avoidance for Memory System Parallelism		Synthesizing an Instruction Selection Rule Library from Semantic Specifications		Efficient Shuffle Management with SCache for DAG Computing Frameworks
	Amdahl's Law in Big Data Analytics: Alive and Kicking in TPCx-BB (BigBench)		DUO: Dual Use of On-chip Redundancy for High Reliability		Synthesizing Programs That Expose Performance Bottlenecks		High-Performance Genomics Data Analysis Framework with In-Memory Computing
	Memory Hierarchy for Web Search		Memory System Design for Ultra Low Power, Computationally Error Resilient Processor Microarchitectures		Program Generation for Small-Scale Linear Algebra Applications		Griffin: Uniting CPU and GPU in Information Retrieval Systems for Intra-Query Parallelism
	Characterizing Resource Sensitivity of Database Workloads		NACHOS : Software-Driven Hardware-Assisted Memory Disambiguation for Accelerators				swSpTRSV: a Fast Sparse Triangular Solve with Sparse Level Tile Layout on Sunway Architectures

[11:05 - 11:20]	Break
[11:20 - 12:35]	Room: Europa 4	[11:20 - 12:35]	Room: Europa 5+6	[11:20 - 12:35]	Room: Europa 2	[11:20 - 12:10]	Room: Europa 3
	Session 9A: Accelerators		Session 9B: Power		Session 8: Compilation for Specialised Domains		Session 8: Race Detection
	Session chair: Xuehai Qian (USC)		Session chair: Guru Venkataramani (George Washington University)		Optimal DNN Primitive Selection with Partitioned Boolean Quadratic Programming		Session chair: Jesper Larsson Träff (TU Wien)
	OuterSPACE: An Outer product based SPArse matrix multiplication acCElerator		Power and Energy Characterization of an Open Source 25-core Manycore Processor		Register Allocation for Intel Processor Graphics		VerifiedFT: A Verified, High-Performance Dynamic Race Detector
	Searching for Potential gRNA Off-Target Sites for CRISPR/Cas9 using Automata Processing across Different Platforms		A Spot Capacity Market to Increase Power Infrastructure Utilization in Multi-Tenant Data Centers		A Compiler for Cyber-Physical Digital Microfluidic Biochips		Efficient Parallel Determinacy Race Detection for Two-Dimensional Dags
	Characterizing and Mitigating Output Reporting Bottlenecks in Spatial-Reconfigurable Automata Processing Architectures		GPGPU Power Modeling for Multi-Domain Voltage-Frequency Scaling

[12:35]				[12:35 - 12:45]	Room: Europa 2	[12:10]
					Best Paper Award Session
	HPCA Closing				CGO 2018 Best Paper Award		PPoPP Closing
				[12:45]

					CGO Closing

HPCA/CGO/PPoPP/CC joint program

Saturday February 24th, 2018

Sunday February 25th, 2018

Monday February 26th, 2018

Tuesday February 27th, 2018

Wednesday February 28th, 2018

Venue Floor Plan

Tracks

Workshops