- Performance of weight updaters for convolutional layers is improved in CUDA backend. Mostly for Kepler architecture. The increase is much smaller than for forward prop, I got >800 GFLOPs on training single Galaxy Zoo network on GeForce Titan
- Convolutional 1D, 2D, 3D, and 4D layers are fully supported by CUDA backend (it was 2D and 3D only before)
- Fixed training multiple networks with CPU backend
- Fixed supervised_data_mem_reader for float input data
May 18, 2014
nnForge v1.1.5
I ported performance improvemetns for forward propagation of convolutional layers in GPU backend to hessian calculators and weight updaters. Get latest release of nnForge v1.1.5:
Apr 19, 2014
nnForge v1.1.4
Here is new v1.1.4 release of nnForge. It contains mostly performance improvements but they are rather significant: the performance of convolutional layer in GPU (CUDA) backend improved from 1.15 TFLOPs to 2 TFLOPs for Galaxy Zoo example when running on NVIDIA GeForce GTX Titan, and the whole network performance improved from 1 TFLOPs to 1.55 TFLOPs. See the NSight VSE profiling screenshot:
2 TFLOPs on Titan running at 784 MHz... how efficient is it? Let's see: 2 TFLOPs / (784 Mhz * 14 SM * 192 fma/SM * 2 op/fma) = 47% of theoretical peak, which I consider a pretty good number. And there is certaily room for improvement here. Training could also benefit from these improvements; I plan to port these changes to hessian calculators and updaters soon.
Here are all the changes in this release:
Here are all the changes in this release:
- C++11 limited support added: you can build everything except for CUDA backend - this is due to NVCC not yet supporting C++11
- Improved testing and validating (feed forward) performance of convolutional layers in CUDA backend for Kepler at the same time greatly simplifying the code
- Improved performance of max subsampling 2d tester for CUDA backend. The implementation is far from optimal yet
Apr 12, 2014
Galaxy Zoo
I took the second place in Galazy Zoo competition. Organizers requested the report from all prize winners, here is mine. Sander Dieleman won the challenge with a large margin. He used convolutional neural networks too, although his approach was more sophisticated. Team 6789, which took the thrird place, used convnets too!
Apr 5, 2014
nnForge v1.1.3
I labelled the latest changes in nnForge with v1.1.3:
- Snapshot functionality is redesigned fully - it is now doing backpropagation, the feature is still in beta
- Ability to define custom error functions is added
- Cross-entropy error function is added, use with care - not tested yet
- Galaxy Zoo example added - see Galaxy Zoo challenge
- cuda_max_global_memory_usage_ratio is set to 0.8 by default - This should help those running code on a primary videocard
- per_layer_mu mode is added - More robust training in some cases
- Fixes:
- Fixed crash when using output transformer
- Fixed backprop for local_contrast_subtractive_2d_updater in CUDA backend
- Fixed build with Boost 1.55
Feb 7, 2014
nnForge v1.1.2
I brushed up parameters for nnForge toolset. I also changed default values for some of them; if you run GTSRB you will probably need to update config file. Here is the full change list:
- Deterministic transformator added for testing and validating
- snapshots are made on ANNs from batch directory
- Toolset parameters changed:
- learning_rate_decay_rate is exposed as a command line parameter
- training_speed parameter renamed to learning_rate, training_speed_degradation is dropped
- training_iteration_count renamed to training_epoch_count
- train command does batch train, batch_train command is removed
- validate and test now work in batch mode, validate_batch and test_batch removed
- mu_increase_factor is set to 1.0 by default
- max_mu set to 1.0 by default
- Bug-fixes
Jan 11, 2014
nnForge v1.1.1
I've just published new nnForge release v1.1.1:
- Using space-filling curve for all the convolutional updaters, testers and hessians in CUDA backend, training large networks performance improved
- Improved concurrent training and loading/processing input data for all the stages by loading data in a separate host thread, CUDA backend only
- In-memory supervised data reader added
- Added NVTX profiling for reading input data, CUDA backend only
- Fixed:
- Binding texture to too large linear buffer
- Average subsampling backprop in CUDA backend is wrong for non-even configs
- Fixed performance in Windws with WDDM driver
Dec 27, 2013
Moved to blogger
Moved the blog from Zoho to Blogger platform for a number of reasons including better uptime, design, ability to edit posts in place. All posts are copied to the new platform.
Subscribe to:
Posts (Atom)