International Conference on Supercomputing

Seamless Optimization of the GEMM Kernel for Task-based Programming Models

International Conference on Supercomputing

Seamless Optimization of the GEMM Kernel for Task-based Programming Models

30:42

Dynamic Memory Management in Massively Parallel Systems: A Case on GPUs

International Conference on Supercomputing

Dynamic Memory Management in Massively Parallel Systems: A Case on GPUs

38:51

MegTaiChi: Dynamic Tensor-based Memory Management Optimization for DNN Training

International Conference on Supercomputing

MegTaiChi: Dynamic Tensor-based Memory Management Optimization for DNN Training

25:43

Dense Dynamic Blocks: Optimizing SpMM for Processors with Vector and Matrix Units Using ML...

International Conference on Supercomputing

Dense Dynamic Blocks: Optimizing SpMM for Processors with Vector and Matrix Units Using ML...

24:37

Toward Accelerated Stencil Computation by Adapting Tensor Core Unit on GPU

International Conference on Supercomputing

Toward Accelerated Stencil Computation by Adapting Tensor Core Unit on GPU

28:36

High Throughput Multidimensional Tridiagonal Systems Solvers on FPGAs

International Conference on Supercomputing

High Throughput Multidimensional Tridiagonal Systems Solvers on FPGAs

30:09

AnySeq/GPU - A Novel Approach For Faster Sequence Alignment On GPUs

International Conference on Supercomputing

AnySeq/GPU - A Novel Approach For Faster Sequence Alignment On GPUs

28:13

Efficient, Out-of-Memory Sparse MTTKRP on Massively Parallel Architectures

International Conference on Supercomputing

Efficient, Out-of-Memory Sparse MTTKRP on Massively Parallel Architectures

26:24

Cloak: Tolerating Non-Volatile Cache Read Latency

International Conference on Supercomputing

Cloak: Tolerating Non-Volatile Cache Read Latency

29:27

Fast-Track Cache: A Huge Racetrack Memory L1 Data Cache

International Conference on Supercomputing

Fast-Track Cache: A Huge Racetrack Memory L1 Data Cache

27:16

Parallel K-Clique Counting on GPUs

International Conference on Supercomputing

Parallel K-Clique Counting on GPUs

30:46

SnuHPL: High Performance LINPACK for Heterogeneous GPUs

International Conference on Supercomputing

SnuHPL: High Performance LINPACK for Heterogeneous GPUs

29:33

SparseLNR: Accelerating Sparse Tensor Computations Using Loop Nest Restructuring

International Conference on Supercomputing

SparseLNR: Accelerating Sparse Tensor Computations Using Loop Nest Restructuring

26:10

VICO: Demand-driven Verification for Improving Compiler Optimizations

International Conference on Supercomputing

VICO: Demand-driven Verification for Improving Compiler Optimizations

23:02

Optimized MPI Collective Algorithms for Dragonfly Topology

International Conference on Supercomputing

Optimized MPI Collective Algorithms for Dragonfly Topology

20:01

Lifting C Semantics for Dataflow Optimization

International Conference on Supercomputing

Lifting C Semantics for Dataflow Optimization

18:43

Towards Low-Latency I/O Services for Mixed Workloads Using Ultra-Low Latency SSDs

International Conference on Supercomputing

Towards Low-Latency I/O Services for Mixed Workloads Using Ultra-Low Latency SSDs

28:46

Bring Orders into Uncertainty: Enabling Efficient Uncertain Graph Processing via Novel Path Sampl...

International Conference on Supercomputing

Bring Orders into Uncertainty: Enabling Efficient Uncertain Graph Processing via Novel Path Sampl...

24:11

CEAZ: Accelerating Parallel I/O via Hardware-Algorithm Co-Designed Adaptive Lossy Compression

International Conference on Supercomputing

CEAZ: Accelerating Parallel I/O via Hardware-Algorithm Co-Designed Adaptive Lossy Compression

23:21

Efficient Exact K-Nearest Neighbor Graph Construction for Billion-Scale Datasets on GPUs TensorCores

International Conference on Supercomputing

Efficient Exact K-Nearest Neighbor Graph Construction for Billion-Scale Datasets on GPUs TensorCores

28:19

Software-Defined Floating-Point Number Formats And Their Application To Graph Processing

International Conference on Supercomputing

Software-Defined Floating-Point Number Formats And Their Application To Graph Processing

27:25

MASTIFF: Structure-Aware Minimum Spanning Tree/Forest

International Conference on Supercomputing

MASTIFF: Structure-Aware Minimum Spanning Tree/Forest

7:28

KrakenOnMem: A Memristor-Augmented HW/SW Framework for Taxonomic Profiling

International Conference on Supercomputing

KrakenOnMem: A Memristor-Augmented HW/SW Framework for Taxonomic Profiling

29:15

Clairvoyant: A Log-Based Transformer-Decoder for Failure Prediction in Large-Scale Systems

International Conference on Supercomputing

Clairvoyant: A Log-Based Transformer-Decoder for Failure Prediction in Large-Scale Systems

28:08

Preparing for Performance Analysis at Exascale

International Conference on Supercomputing

Preparing for Performance Analysis at Exascale

22:09

uiCA: Accurate Throughput Prediction of Basic Blocks on Recent Intel Microarchitectures

International Conference on Supercomputing

uiCA: Accurate Throughput Prediction of Basic Blocks on Recent Intel Microarchitectures

29:33

Beyond Time Complexity: Data Movement Complexity Analysis for Matrix Multiplication

International Conference on Supercomputing

Beyond Time Complexity: Data Movement Complexity Analysis for Matrix Multiplication

23:12

Rethinking Graph Data Placement for Graph Neural Network Training on Multiple GPUs

International Conference on Supercomputing

Rethinking Graph Data Placement for Graph Neural Network Training on Multiple GPUs

25:07

Handling Heavy-tailed Input of Transformer Inference on GPUs

International Conference on Supercomputing

Handling Heavy-tailed Input of Transformer Inference on GPUs

24:05

PAME: Precision-Aware Multi-Exit DNN Serving for Reducing Latencies of Batched Inferences

International Conference on Supercomputing

PAME: Precision-Aware Multi-Exit DNN Serving for Reducing Latencies of Batched Inferences

16:51

A Data-Centric Optimization Framework for Machine Learning

International Conference on Supercomputing

A Data-Centric Optimization Framework for Machine Learning

24:33

GAPS: GPU-Acceleration of PDE Solvers for Wave Simulation

International Conference on Supercomputing

GAPS: GPU-Acceleration of PDE Solvers for Wave Simulation

30:01

The Rise of Matrix Processing

International Conference on Supercomputing

The Rise of Matrix Processing

1:13:17

SnuQS: Scaling Quantum Circuit Simulation using Storage Devices

International Conference on Supercomputing

SnuQS: Scaling Quantum Circuit Simulation using Storage Devices

29:58

ASAP: Automatic Synthesis of Area-Efficient and Precision-Aware CGRAs

International Conference on Supercomputing

ASAP: Automatic Synthesis of Area-Efficient and Precision-Aware CGRAs

12:46

LITE: A Low-Cost Practical Inter-Operable GPU TEE

International Conference on Supercomputing

LITE: A Low-Cost Practical Inter-Operable GPU TEE

21:51

Large-Scale Visual Analysis in the Age of Data

International Conference on Supercomputing

Large-Scale Visual Analysis in the Age of Data

1:26:18

Efficiently Emulating High-Bitwidth Computation with Low-Bitwidth Hardware

International Conference on Supercomputing

Efficiently Emulating High-Bitwidth Computation with Low-Bitwidth Hardware

21:41

Performance-Detective: Automatic Deduction of Cheap and Accurate Performance Models

International Conference on Supercomputing

Performance-Detective: Automatic Deduction of Cheap and Accurate Performance Models

16:19

Low Overhead and Context Sensitive Profiling of GPU-accelerated Applications

International Conference on Supercomputing

Low Overhead and Context Sensitive Profiling of GPU-accelerated Applications

32:11

Calipers: A Criticality-aware Framework for Modeling Processor Performance

International Conference on Supercomputing

Calipers: A Criticality-aware Framework for Modeling Processor Performance

28:12

The Computing and Information Science and Engineering Landscape: A Look Forward

International Conference on Supercomputing

The Computing and Information Science and Engineering Landscape: A Look Forward

41:50

次のページ