1. For mixed precision(fp16) with Nvidia NGC containers please see :
Automatic Mixed Precision for Deep Learning
2. In the Nvidia NGC containers (Pytorch, Tensorflow, or mxnet) when using FP32 datatype, it uses TF32 by default. Please see:
Accelerating AI Training with NVIDIA TF32 Tensor Cores
To disable TF32, you need to set the following env variable:
NVIDIA_TF32_OVERRIDE=0