Seamless Optimization of the GEMM Kernel for Task-based Programming Models
International Conference on Supercomputing
Seamless Optimization of the GEMM Kernel for Task-based Programming Models
30:42
Dynamic Memory Management in Massively Parallel Systems: A Case on GPUs
International Conference on Supercomputing
Dynamic Memory Management in Massively Parallel Systems: A Case on GPUs
38:51
MegTaiChi: Dynamic Tensor-based Memory Management Optimization for DNN Training
International Conference on Supercomputing
MegTaiChi: Dynamic Tensor-based Memory Management Optimization for DNN Training
25:43
Dense Dynamic Blocks: Optimizing SpMM for Processors with Vector and Matrix Units Using ML...
International Conference on Supercomputing
Dense Dynamic Blocks: Optimizing SpMM for Processors with Vector and Matrix Units Using ML...
24:37
Toward Accelerated Stencil Computation by Adapting Tensor Core Unit on GPU
International Conference on Supercomputing
Toward Accelerated Stencil Computation by Adapting Tensor Core Unit on GPU
28:36
High Throughput Multidimensional Tridiagonal Systems Solvers on FPGAs
International Conference on Supercomputing
High Throughput Multidimensional Tridiagonal Systems Solvers on FPGAs
30:09
AnySeq/GPU - A Novel Approach For Faster Sequence Alignment On GPUs
International Conference on Supercomputing
AnySeq/GPU - A Novel Approach For Faster Sequence Alignment On GPUs
28:13
Efficient, Out-of-Memory Sparse MTTKRP on Massively Parallel Architectures
International Conference on Supercomputing
Efficient, Out-of-Memory Sparse MTTKRP on Massively Parallel Architectures
26:24
Cloak: Tolerating Non-Volatile Cache Read Latency
International Conference on Supercomputing
Cloak: Tolerating Non-Volatile Cache Read Latency
29:27
Fast-Track Cache: A Huge Racetrack Memory L1 Data Cache
International Conference on Supercomputing
Fast-Track Cache: A Huge Racetrack Memory L1 Data Cache
27:16
Parallel K-Clique Counting on GPUs
International Conference on Supercomputing
Parallel K-Clique Counting on GPUs
30:46
SnuHPL: High Performance LINPACK for Heterogeneous GPUs
International Conference on Supercomputing
SnuHPL: High Performance LINPACK for Heterogeneous GPUs
29:33
SparseLNR: Accelerating Sparse Tensor Computations Using Loop Nest Restructuring
International Conference on Supercomputing
SparseLNR: Accelerating Sparse Tensor Computations Using Loop Nest Restructuring
26:10
VICO: Demand-driven Verification for Improving Compiler Optimizations
International Conference on Supercomputing
VICO: Demand-driven Verification for Improving Compiler Optimizations
23:02
Optimized MPI Collective Algorithms for Dragonfly Topology
International Conference on Supercomputing
Optimized MPI Collective Algorithms for Dragonfly Topology
20:01
Lifting C Semantics for Dataflow Optimization
International Conference on Supercomputing
Lifting C Semantics for Dataflow Optimization
18:43
Towards Low-Latency I/O Services for Mixed Workloads Using Ultra-Low Latency SSDs
International Conference on Supercomputing
Towards Low-Latency I/O Services for Mixed Workloads Using Ultra-Low Latency SSDs
28:46
Bring Orders into Uncertainty: Enabling Efficient Uncertain Graph Processing via Novel Path Sampl...
International Conference on Supercomputing
Bring Orders into Uncertainty: Enabling Efficient Uncertain Graph Processing via Novel Path Sampl...
24:11
CEAZ: Accelerating Parallel I/O via Hardware-Algorithm Co-Designed Adaptive Lossy Compression
International Conference on Supercomputing
CEAZ: Accelerating Parallel I/O via Hardware-Algorithm Co-Designed Adaptive Lossy Compression
23:21
Efficient Exact K-Nearest Neighbor Graph Construction for Billion-Scale Datasets on GPUs TensorCores
International Conference on Supercomputing
Efficient Exact K-Nearest Neighbor Graph Construction for Billion-Scale Datasets on GPUs TensorCores
28:19
Software-Defined Floating-Point Number Formats And Their Application To Graph Processing
International Conference on Supercomputing
Software-Defined Floating-Point Number Formats And Their Application To Graph Processing
27:25
MASTIFF: Structure-Aware Minimum Spanning Tree/Forest
International Conference on Supercomputing
MASTIFF: Structure-Aware Minimum Spanning Tree/Forest
7:28
KrakenOnMem: A Memristor-Augmented HW/SW Framework for Taxonomic Profiling
International Conference on Supercomputing
KrakenOnMem: A Memristor-Augmented HW/SW Framework for Taxonomic Profiling
29:15
Clairvoyant: A Log-Based Transformer-Decoder for Failure Prediction in Large-Scale Systems
International Conference on Supercomputing
Clairvoyant: A Log-Based Transformer-Decoder for Failure Prediction in Large-Scale Systems
28:08
Preparing for Performance Analysis at Exascale
International Conference on Supercomputing
Preparing for Performance Analysis at Exascale
22:09
uiCA: Accurate Throughput Prediction of Basic Blocks on Recent Intel Microarchitectures
International Conference on Supercomputing
uiCA: Accurate Throughput Prediction of Basic Blocks on Recent Intel Microarchitectures
29:33
Beyond Time Complexity: Data Movement Complexity Analysis for Matrix Multiplication
International Conference on Supercomputing
Beyond Time Complexity: Data Movement Complexity Analysis for Matrix Multiplication
23:12
Rethinking Graph Data Placement for Graph Neural Network Training on Multiple GPUs
International Conference on Supercomputing
Rethinking Graph Data Placement for Graph Neural Network Training on Multiple GPUs
25:07
Handling Heavy-tailed Input of Transformer Inference on GPUs
International Conference on Supercomputing
Handling Heavy-tailed Input of Transformer Inference on GPUs
24:05
PAME: Precision-Aware Multi-Exit DNN Serving for Reducing Latencies of Batched Inferences
International Conference on Supercomputing
PAME: Precision-Aware Multi-Exit DNN Serving for Reducing Latencies of Batched Inferences
16:51
A Data-Centric Optimization Framework for Machine Learning
International Conference on Supercomputing
A Data-Centric Optimization Framework for Machine Learning
24:33
GAPS: GPU-Acceleration of PDE Solvers for Wave Simulation
International Conference on Supercomputing
GAPS: GPU-Acceleration of PDE Solvers for Wave Simulation
30:01
The Rise of Matrix Processing
International Conference on Supercomputing
The Rise of Matrix Processing
1:13:17
SnuQS: Scaling Quantum Circuit Simulation using Storage Devices
International Conference on Supercomputing
SnuQS: Scaling Quantum Circuit Simulation using Storage Devices
29:58
ASAP: Automatic Synthesis of Area-Efficient and Precision-Aware CGRAs
International Conference on Supercomputing
ASAP: Automatic Synthesis of Area-Efficient and Precision-Aware CGRAs
12:46
LITE: A Low-Cost Practical Inter-Operable GPU TEE
International Conference on Supercomputing
LITE: A Low-Cost Practical Inter-Operable GPU TEE
21:51
Large-Scale Visual Analysis in the Age of Data
International Conference on Supercomputing
Large-Scale Visual Analysis in the Age of Data
1:26:18
Efficiently Emulating High-Bitwidth Computation with Low-Bitwidth Hardware
International Conference on Supercomputing
Efficiently Emulating High-Bitwidth Computation with Low-Bitwidth Hardware
21:41
Performance-Detective: Automatic Deduction of Cheap and Accurate Performance Models
International Conference on Supercomputing
Performance-Detective: Automatic Deduction of Cheap and Accurate Performance Models
16:19
Low Overhead and Context Sensitive Profiling of GPU-accelerated Applications
International Conference on Supercomputing
Low Overhead and Context Sensitive Profiling of GPU-accelerated Applications
32:11
Calipers: A Criticality-aware Framework for Modeling Processor Performance
International Conference on Supercomputing
Calipers: A Criticality-aware Framework for Modeling Processor Performance
28:12
The Computing and Information Science and Engineering Landscape: A Look Forward
International Conference on Supercomputing
The Computing and Information Science and Engineering Landscape: A Look Forward
41:50