Using C++ and CUDA Extensions to write high performance kernels in PyTorch — This article describes how to use Torch’s CUDA extension library to write high performance kernels for PyTorch modules. Background Occasionally, you may need to process a tensor (transform or apply a kernel) that isn’t in PyTorch’s standard library. You could detach the tensor, perform the transformation, then move it back to…