International Conference on Supercomputing
Seamless Optimization of the GEMM Kernel for Task-based Programming Models
30:42
International Conference on Supercomputing
Dynamic Memory Management in Massively Parallel Systems: A Case on GPUs
38:51
International Conference on Supercomputing
MegTaiChi: Dynamic Tensor-based Memory Management Optimization for DNN Training
25:43
International Conference on Supercomputing
Dense Dynamic Blocks: Optimizing SpMM for Processors with Vector and Matrix Units Using ML...
24:37
International Conference on Supercomputing
Toward Accelerated Stencil Computation by Adapting Tensor Core Unit on GPU
28:36
International Conference on Supercomputing
High Throughput Multidimensional Tridiagonal Systems Solvers on FPGAs
30:09
International Conference on Supercomputing
AnySeq/GPU - A Novel Approach For Faster Sequence Alignment On GPUs
28:13
International Conference on Supercomputing
Efficient, Out-of-Memory Sparse MTTKRP on Massively Parallel Architectures
26:24
International Conference on Supercomputing
Cloak: Tolerating Non-Volatile Cache Read Latency
29:27
International Conference on Supercomputing
Fast-Track Cache: A Huge Racetrack Memory L1 Data Cache
27:16
International Conference on Supercomputing
Parallel K-Clique Counting on GPUs
30:46
International Conference on Supercomputing
SnuHPL: High Performance LINPACK for Heterogeneous GPUs
29:33
International Conference on Supercomputing
SparseLNR: Accelerating Sparse Tensor Computations Using Loop Nest Restructuring
26:10
International Conference on Supercomputing
VICO: Demand-driven Verification for Improving Compiler Optimizations
23:02
International Conference on Supercomputing
Optimized MPI Collective Algorithms for Dragonfly Topology
20:01
International Conference on Supercomputing
Lifting C Semantics for Dataflow Optimization
18:43
International Conference on Supercomputing
Towards Low-Latency I/O Services for Mixed Workloads Using Ultra-Low Latency SSDs
28:46
International Conference on Supercomputing
Bring Orders into Uncertainty: Enabling Efficient Uncertain Graph Processing via Novel Path Sampl...
24:11
International Conference on Supercomputing
CEAZ: Accelerating Parallel I/O via Hardware-Algorithm Co-Designed Adaptive Lossy Compression
23:21
International Conference on Supercomputing
Efficient Exact K-Nearest Neighbor Graph Construction for Billion-Scale Datasets on GPUs TensorCores
28:19
International Conference on Supercomputing
Software-Defined Floating-Point Number Formats And Their Application To Graph Processing
27:25
International Conference on Supercomputing
MASTIFF: Structure-Aware Minimum Spanning Tree/Forest
7:28
International Conference on Supercomputing
KrakenOnMem: A Memristor-Augmented HW/SW Framework for Taxonomic Profiling
29:15
International Conference on Supercomputing
Clairvoyant: A Log-Based Transformer-Decoder for Failure Prediction in Large-Scale Systems
28:08
International Conference on Supercomputing
Preparing for Performance Analysis at Exascale
22:09
International Conference on Supercomputing
uiCA: Accurate Throughput Prediction of Basic Blocks on Recent Intel Microarchitectures
29:33
International Conference on Supercomputing
Beyond Time Complexity: Data Movement Complexity Analysis for Matrix Multiplication
23:12
International Conference on Supercomputing
Rethinking Graph Data Placement for Graph Neural Network Training on Multiple GPUs
25:07
International Conference on Supercomputing
Handling Heavy-tailed Input of Transformer Inference on GPUs
24:05
International Conference on Supercomputing
PAME: Precision-Aware Multi-Exit DNN Serving for Reducing Latencies of Batched Inferences
16:51
International Conference on Supercomputing
A Data-Centric Optimization Framework for Machine Learning
24:33
International Conference on Supercomputing
GAPS: GPU-Acceleration of PDE Solvers for Wave Simulation
30:01
International Conference on Supercomputing
The Rise of Matrix Processing
1:13:17
International Conference on Supercomputing
SnuQS: Scaling Quantum Circuit Simulation using Storage Devices
29:58
International Conference on Supercomputing
ASAP: Automatic Synthesis of Area-Efficient and Precision-Aware CGRAs
12:46
International Conference on Supercomputing
LITE: A Low-Cost Practical Inter-Operable GPU TEE
21:51
International Conference on Supercomputing
Large-Scale Visual Analysis in the Age of Data
1:26:18
International Conference on Supercomputing
Efficiently Emulating High-Bitwidth Computation with Low-Bitwidth Hardware
21:41
International Conference on Supercomputing
Performance-Detective: Automatic Deduction of Cheap and Accurate Performance Models
16:19
International Conference on Supercomputing
Low Overhead and Context Sensitive Profiling of GPU-accelerated Applications
32:11
International Conference on Supercomputing
Calipers: A Criticality-aware Framework for Modeling Processor Performance
28:12
International Conference on Supercomputing
The Computing and Information Science and Engineering Landscape: A Look Forward
41:50