Selected Publications

    is/are with the HPCP Lab.
1 denotes co-first authors.
* indicates (co-)correspondence.

  1. GraNNDis: Fast Distributed Graph Neural Network Training Framework for Multi-Server Clusters
    Jaeyong Song, Hongsun Jang, Hunseong Lim, Jaewon Jung, Youngsok Kim, and Jinho Lee
    33rd International Conference on Parallel Architectures and Compilation Techniques (PACT), Oct. 2024 (to appear)
    System SW AI GPU

  2. CR2: Community-aware Compressed Regular Representation for Graph Processing on a GPU
    Shinnung Jeong, Sungjun Cho, Yongwoo Lee, Hyunjun Park, Seonyeong Heo, Gwangsun Kim, Youngsok Kim, and Hanjun Kim
    53rd International Conference on Parallel Processing (ICPP), Aug. 2024 (to appear)
    System SW GPU

  3. PID-Comm: A Fast and Flexible Collective Communication Framework for Commodity Processing-in-DIMM Devices
    Si Ung Noh1, Junguk Hong1, Chaemin Lim, Seongyeon Park, Jeehyun Kim, Hanjun Kim, Youngsok Kim*, and Jinho Lee*
    51st IEEE/ACM International Symposium on Computer Architecture (ISCA), June–July 2024
    System SW PIM

  4. Orchestrating Multiple Mixed Precision Models on a Shared Precision-Scalable NPU
    Kiung Jung, Seok Namkoong, Hongjun Um, Hyejun Kim, Youngsok Kim, and Yongjun Park
    25th ACM International Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES), June 2024
    System SW AI

  5. Fully Harnessing the Performance Potential of DRAM-less Mobile Flash Storage
    Jaesun No, Gyusun Lee, Youngsok Kim, and Jinkyu Jeong
    38th International Conference on Massive Storage Systems and Technology (MSST), June 2024
    System SW Mobile

  6. MPC-Wrapper: Fully Harnessing the Potential of Samsung Aquabolt-XL HBM2-PIM on FPGAs
    Jinwoo Choi1, Yeonan Ha1, Hanna Cha, Seil Lee, Sungchul Lee, Jounghoo Lee, Shin-haeng Kang, Bongjun Kim, Hanwoong Jung, Hanjun Kim, and Youngsok Kim*
    32nd IEEE International Symposium on Field-Programmable Custom Computing Machines (FCCM), May 2024
    Architecture PIM FPGA

  7. AGAThA: Fast and Efficient GPU Acceleration of Guided Sequence Alignment for Long Read Mapping
    Seongyeon Park, Junguk Hong, Jaeyong Song, Hajin Kim, Youngsok Kim, and Jinho Lee
    29th ACM Symposium on Principles and Practice of Parallel Programming (PPoPP), Mar. 2024
    System SW GPU

  8. Smart-Infinity: Fast Large Language Model Training using Near-Storage Processing on a Real System
    Hongsun Jang, Jaeyong Song, Jaewon Jung, Jaeyoung Park, Youngsok Kim, and Jinho Lee
    30th IEEE International Symposium on High-Performance Computer Architecture (HPCA), Mar. 2024
    Received the HPCA 2024 Best Paper Award (Honorable Mention)!
    Architecture SmartSSD AI

  9. McCore: A Holistic Management of High-Performance Heterogeneous Multicores
    Jaewon Kwon, Yongju Lee, Hongju Kal, Minjae Kim, Youngsok Kim, and Won Woo Ro
    56th IEEE/ACM International Symposium on Microarchitecture (MICRO), Oct. 2023
    Architecture

  10. Virtual PIM: Resource-aware Dynamic DPU Allocation and Workload Scheduling Framework for Multi-DPU PIM Architecture
    Donghyeon Kim, Taehoon Kim, Inyong Hwang, Taehyeong Park, Hanjun Kim, Youngsok Kim, and Yongjun Park
    32nd International Conference on Parallel Architectures and Compilation Techniques (PACT), Oct. 2023
    System SW PIM

  11. Enabling Fine-Grained Spatial Multitasking on Systolic-Array NPUs Using Dataflow Mirroring
    Jinwoo Choi, Yeonan Ha, Jounghoo Lee, Sangsu Lee, Jinho Lee, Hanhwi Jang*, and Youngsok Kim*
    IEEE Transactions on Computers (TC), Aug. 2023
    Architecture AI Modeling

  12. Occamy: Memory-efficient GPU Compiler for DNN Inference
    Jaeho Lee, Shinnung Jeong, Seungbin Song, Kunwoo Kim, Heelim Choi, Youngsok Kim, and Hanjun Kim
    60th ACM/IEEE Design Automation Conference (DAC), July 2023
    System SW GPU AI

  13. Design and Analysis of a Processing-in-DIMM Join Algorithm: A Case Study with UPMEM DIMMs [GitHub]
    Chaemin Lim, Suhyun Lee, Jinwoo Choi, Jounghoo Lee, Seongyeon Park, Hanjun Kim, Jinho Lee, and Youngsok Kim*
    2023 ACM International Conference on Management of Data (SIGMOD), June 2023
    System SW PIM Database

  14. Pipe-BD: Pipelined Parallel Blockwise Distillation
    Hongsun Jang, Jaewon Jung, Jaeyong Song, Joonsang Yu, Youngsok Kim, and Jinho Lee
    26th Design, Automation, and Test in Europe Conference (DATE), Apr. 2023
    System SW AI GPU

  15. Optimus-CC: Efficient Large NLP Model Training with 3D Parallelism Aware Communication Compression
    Jaeyong Song1, Jinkyu Yim1, Jaewon Jung, Hongsun Jang, Hyung-Jin Kim, Youngsok Kim, and Jinho Lee
    28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), Mar. 2023
    System SW AI GPU

  16. SGCN: Exploiting Compressed-Sparse Features in Deep Graph Convolutional Network Accelerators
    Mingi Yoo1, Jaeyong Song1, Jounghoo Lee, Namhyung Kim, Youngsok Kim*, and Jinho Lee*
    29th IEEE International Symposium on High-Performance Computer Architecture (HPCA), Feb.–Mar. 2023
    Architecture AI

  17. GuardiaNN: Fast and Secure On-Device Inference in TrustZone Using Embedded SRAM and Cryptographic Hardware
    Jinwoo Choi1, Jaeyeon Kim1, Chaemin Lim1, Suhyun Lee, Jinho Lee, Dokyung Song, and Youngsok Kim*
    23rd ACM/IFIP International Middleware Conference (Middleware), Nov. 2022
    System SW Mobile AI Security

  18. Decoupling Schedule, Topology Layout, and Algorithm to Easily Enlarge the Tuning Space of GPU Graph Processing
    Shinnung Jeong, Yongwoo Lee, Jaeho Lee, Heelim Choi, Seungbin Song, Jinho Lee, Youngsok Kim, and Hanjun Kim
    31st International Conference on Parallel Architectures and Compilation Techniques (PACT), Oct. 2022
    System SW GPU

  19. Slice-and-Forge: Making Better Use of Caches for Graph Convolutional Network Accelerators
    Mingi Yoo1, Jaeyong Song1, Hyeyoon Lee, Jounghoo Lee, Namhyung Kim, Youngsok Kim, and Jinho Lee
    31st International Conference on Parallel Architectures and Compilation Techniques (PACT), Oct. 2022
    Received the PACT 2022 Best Paper Award!
    Architecture AI

  20. GCoM: A Detailed GPU Core Model for Accurate Analytical Modeling of Modern GPUs
    Jounghoo Lee, Yeonan Ha, Suhyun Lee, Jinyoung Woo, Jinho Lee, Hanhwi Jang, and Youngsok Kim*
    49th IEEE/ACM International Symposium on Computer Architecture (ISCA), June 2022
    Architecture GPU Modeling

  21. SALoBa: Maximizing Data Locality and Workload Balance for Fast Sequence Alignment on GPUs
    Seongyeon Park, Hajin Kim, Tanveer Ahmad, Nauman Ahmed, Zaid Al-Ars, Peter Hofstee, Youngsok Kim, and Jinho Lee
    36th IEEE International Parallel and Distributed Processing Symposium (IPDPS), May–June 2022
    System SW GPU

  22. Dataflow Mirroring: Architectural Support for Highly Efficient Fine-Grained Spatial Multitasking on Systolic-Array NPUs
    Jounghoo Lee1, Jinwoo Choi1, Jaeyeon Kim, Jinho Lee, and Youngsok Kim*
    58th ACM/IEEE Design Automation Conference (DAC), Dec. 2021
    Architecture AI

  23. Making a Better Use of Caches for GCN Accelerators with Feature Slicing and Automatic Tile Morphing
    Mingi Yoo1, Jaeyong Song1, Jounghoo Lee, Namhyung Kim, Youngsok Kim, and Jinho Lee
    IEEE Computer Architecture Letters (CAL), June 2021
    Architecture AI

  24. Thread-aware Area-efficient High-level Synthesis Compiler for Embedded Devices
    Changsu Kim, Shinnung Jeong, Sungjun Cho, Yongwoo Lee, William Song, Youngsok Kim, and Hanjun Kim
    19th ACM/IEEE International Symposium on Code Generation and Optimization (CGO), Mar. 2021
    System SW FPGA

  25. Real-Time Object Detection System with Multi-Path Neural Networks
    Seonyeong Heo, Sungjun Cho, Youngsok Kim, and Hanjun Kim
    26th IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS), Apr. 2020
    System SW GPU AI

  26. FlexLearn: Fast and Highly Efficient Brain Simulations Using Flexible On-Chip Learning
    Eunjin Baek1, Hunjun Lee1, Youngsok Kim, and Jangwoo Kim
    52nd IEEE/ACM International Symposium on Microarchitecture (MICRO), Oct. 2019
    Architecture Neuromorphic

  27. μLayer: Low Latency On-Device Inference Using Cooperative Single-Layer Acceleration and Processor-Friendly Quantization
    Youngsok Kim, Joonsung Kim, Dongju Chae, Daehyun Kim, and Jangwoo Kim
    14th ACM European Conference on Computer Systems (EuroSys), Mar. 2019
    System SW Mobile AI

  28. Flexon: A Flexible Digital Neuron for Efficient Spiking Neural Network Simulations
    Dayeol Lee1, Gwangmu Lee1, Dongup Kwon, Sunghwa Lee, Youngsok Kim, and Jangwoo Kim
    45th ACM/IEEE International Symposium on Computer Architecture (ISCA), June 2018
    Architecture Neuromorphic

  29. DCS-ctrl: A Fast and Flexible Device-Control Mechanism for Device-Centric Server Architecture
    Dongup Kwon1, Jaehyung Ahn1, Dongju Chae, Mohammadamin Ajdari, Jaewon Lee, Suheon Bae, Youngsok Kim, and Jangwoo Kim
    45th ACM/IEEE International Symposium on Computer Architecture (ISCA), June 2018
    Architecture FPGA

  30. Google Workloads for Consumer Devices: Mitigating Data Movement Bottlenecks
    Amirali Boroumand, Saugata Ghose, Youngsok Kim, Rachata Ausavarungnirun, Eric Shiu, Rahul Thakur, Daehyun Kim, Aki Kuusela, Allan Knies, Parthasarathy Ranganathan, and Onur Mutlu
    23rd ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), Mar. 2018
    Architecture PIM Mobile

  31. GPUpd: A Fast and Scalable Multi-GPU Architecture Using Cooperative Projection and Distribution
    Youngsok Kim, Jae-Eon Jo, Hanhwi Jang, Minsoo Rhu, Hanjun Kim, and Jangwoo Kim
    50th IEEE/ACM International Symposium on Microarchitecture (MICRO), Oct. 2017
    Architecture GPU

  32. CloudSwap: A Cloud-Assisted Swap Mechanism for Mobile Devices
    Dongju Chae, Joonsung Kim, Youngsok Kim, Jangwoo Kim, Kyung-Ah Chang, Sang-Bum Suh, and Hyogun Lee
    16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), May 2016
    System SW Mobile

  33. Efficient Footprint Caching for Tagless DRAM Caches
    Hakbeom Jang1, Yongjun Lee1, Jongwon Kim, Youngsok Kim, Jangwoo Kim, Jinkyu Jeong, and Jae W. Lee
    22nd IEEE International Symposium on High Performance Computer Architecture (HPCA), Mar. 2016
    Architecture

  34. DCS: A Fast and Scalable Device-Centric Server Architecture
    Jaehyung Ahn1, Dongup Kwon1, Youngsok Kim, Mohammadamin Ajdari, Jaewon Lee, and Jangwoo Kim
    48th IEEE/ACM International Symposium on Microarchitecture (MICRO), Dec. 2015
    Architecture FPGA

  35. Stealing Webpages Rendered on Your Browser by Exploiting GPU Vulnerabilities
    Sangho Lee, Youngsok Kim, Jangwoo Kim, and Jong Kim
    35th IEEE Symposium on Security and Privacy (S&P), May 2014
    Security GPU

  36. GPUdmm: A High-Performance and Memory-Oblivious GPU Architecture Using Dynamic Memory Management
    Youngsok Kim, Jaewon Lee, Jae-Eon Jo, and Jangwoo Kim
    20th IEEE International Symposium on High Performance Computer Architecture (HPCA), Feb. 2014
    Architecture GPU

  37. ScaleGPU: GPU Architecture for Memory-Unaware GPU Programming
    Youngsok Kim, Jaewon Lee, Donggyu Kim, and Jangwoo Kim
    IEEE Computer Architecture Letters (CAL), July 2013
    Architecture GPU