Pages

Nov 24, 2015

nnForge v2.0.1

Hi,

I significantly improved performance of CUDA backend recently in nnForge v2.0.1:
  • Multiple improvements to reduce total buffer sizes, allows running larger chunks (3x for ImageNet):
    • Taking buffer sizes into account when coloring graph
    • Maxout, ReLU, and MaxSubsampling layers consume much less memory in CUDA backend
    • Action graph is optimized to exclude unnecessary concurrency - taking into account device width here
  • Migrated to cuDNN v3
  • Reusing CUDA streams
  • Allocating chunk of mem for fixed working buffers - improves perf
  • Few bug-fixes
See buffer graph coloring for the optimized action graph of VGG-A-like schema to the right. You can get this and other interesting graphs by specifying "--debug_mode 1" option.