Projects · C++ · Python · CUDA · Deep Learning System

Needle: High-performance DL System

A Deep Learning framework with customized GPU and CPU backend in C++ and Python

2023.07.29 · 1 min read · by Zhenlin Wang

Introduction

Needle is a Deep Learning framework with customized GPU and CPU backend in C++ and Python. This is an attempt to simulate PyTorch’s imperative style, especially its way of auto-differentiation and computational graph traversal. In the meantime, we enable accelerated computing with custom ndarrays implementation via low level C++ CUDA programming. This enables tensor operations to run on GPUs and other specialized hardwares.

Key contributions

Tech Stack & Methodology

         

Acknowledgement

This project is inspired by 10-414/714 Deep Learning Systems by Carnegie Mellon University. Extensions based on this are built and are still under development (more to come!).