Apr 19, 2014

nnForge v1.1.4

Here is new v1.1.4 release of nnForge. It contains mostly performance improvements but they are rather significant: the performance of convolutional layer in GPU (CUDA) backend improved from 1.15 TFLOPs to 2 TFLOPs for Galaxy Zoo example when running on NVIDIA GeForce GTX Titan, and the whole network performance improved from 1 TFLOPs to 1.55 TFLOPs. See the NSight VSE profiling screenshot:

2 TFLOPs on Titan running at 784 MHz... how efficient is it? Let's see: 2 TFLOPs / (784 Mhz * 14 SM * 192 fma/SM * 2 op/fma) = 47% of theoretical peak, which I consider a pretty good number. And there is certaily room for improvement here. Training could also benefit from these improvements; I plan to port these changes to hessian calculators and updaters soon.

Here are all the changes in this release:
  • C++11 limited support added: you can build everything except for CUDA backend - this is due to NVCC not yet supporting C++11
  • Improved testing and validating (feed forward) performance of convolutional layers in CUDA backend for Kepler at the same time greatly simplifying the code
  • Improved performance of max subsampling 2d tester for CUDA backend. The implementation is far from optimal yet

Apr 12, 2014

Galaxy Zoo

I took the second place in Galazy Zoo competition. Organizers requested the report from all prize winners, here is mine. Sander Dieleman won the challenge with a large margin. He used convolutional neural networks too, although his approach was more sophisticated. Team 6789, which took the thrird place, used convnets too!

Apr 5, 2014

nnForge v1.1.3

I labelled the latest changes in nnForge with v1.1.3:
  • Snapshot functionality is redesigned fully - it is now doing backpropagation, the feature is still in beta
  • Ability to define custom error functions is added
    • Cross-entropy error function is added, use with care - not tested yet
  • Galaxy Zoo example added - see Galaxy Zoo challenge
  • cuda_max_global_memory_usage_ratio is set to 0.8 by default - This should help those running code on a primary videocard
  • per_layer_mu mode is added - More robust training in some cases
  • Fixes:
    • Fixed crash when using output transformer
    • Fixed backprop for local_contrast_subtractive_2d_updater in CUDA backend
    • Fixed build with Boost 1.55