Nov 30, 2014

nnForge v1.1.11

Hi, I am releasing nnForge v1.1.11 with a number of significant changes:

  • Padding added to sparse convolutional layers
  • Sparse convolutional layers implemented in GPU backend (Kepler+ only)
  • Fixed bug with dropout when error function is fuzed with last activation function
  • Array with random numbers extended to 256K elements (for dropout)

Nov 3, 2014

nnForge v1.1.10

Hi, here is nnForge v1.1.10. The main new feature is zero-padding for convolutional layers, I should have implemented it long before. The full list of changes:

  • You can now specify zero-padding for input data for convolutional layers
  • Memory usage calculations improved
  • Learning rates is per part now (was per parameter) - training consumes less memory, bigger networks might be trained
  • Dropout implementation is simplified
  • Minor fixes

Oct 4, 2014

nnForge v1.1.9

I released nnForge v1.1.9:
  • More sparse cases supported in GPU backend for convolutional layers, improved perf
  • convert_data_type_transformer added
  • Hessian based learning algo is removed
  • Galaxy Zoo example removed. Use previous releases to get it
  • Reporting average weights/updates after each batch
  • Image classifier demo added, improved perf for running single entry through the tester

Aug 23, 2014

nnForge v1.1.8

Hi, nnForge v1.1.8 is released:
  • Sparse (in feature map dimension) convolutional layer added, with full support in CPU backend and fully connected (spatial) 1x1 support in GPU backend
  • You can use -std=c++11 now with CUDA 6.5 toolkit
  • Gradient check added
  • GTSRB switched to batch training
  • Boost and OpenCV libs default paths are /usr now
  • Improved performance for 1x1 convolutions in GPU backend
  • Minor fixes

Jul 12, 2014

nnForge v1.1.7

It is a big release. I added a number of useful features you would expect a NN lib should have. Here is the full list:
  • Mini-batches added
  • Weight decay added
  • Momentum added
  • Cross Entropy error function is renamed to Negative Losss Likelihood, true Cross Entropy added
  • Sigmoid layer added, with correct biases initialization for the classifier
  • Splitting single epoch into multiple epochs through epoch_count_in_training_set parameter
  • max_subsampling layer supports 1D and 4D in GPU backend (was 2D and 3D only)
  • rotate_band_data_transformer is extended to all dimensions (was 2D only)
  • extract_data_transformer extended to data of any dimension in case input and output windows match
  • snapshot_data: added scaling and 3D (video)
  • Sigmoid+Coss-entropy and Softmax+Negative-log-likelihood fusion implemented in CPU and GPU backends to increase accuracy
  • Max L2 bound on incoming weights implementation is dropped (*)
  • Conversion to bw image fixed in GTSRB example
  • max subsampling updater and hessian - corner cases fixed in CPU backend
(*) I did that because L2 bound on incoming weights didn't improve quality in any problem I worked on. Supporting it is not free. So I decided to drop it.

Jun 27, 2014

Jun 6, 2014

nnForge v1.1.6

I implemented a number of quite useful features in nnForge recently:
  • Stochastic Gradien Descent training method is added
  • Resume training fuctionality added
  • Duplicating output to log file
  • Logging current settings at the toolset initialization
  • rgb_to_yuv_convert_layer_tester added in CPU backend
  • Readers are redesign to allow variable data readers
  • classifier_result is extended to top-N
  • Added possibility yo split single reader into multiple epochs
  • Multiple fixes

May 18, 2014

nnForge v1.1.5

I ported performance improvemetns for forward propagation of convolutional layers in GPU backend to hessian calculators and weight updaters. Get latest release of nnForge v1.1.5:
  • Performance of weight updaters for convolutional layers is improved in CUDA backend. Mostly for Kepler architecture. The increase is much smaller than for forward prop, I got >800 GFLOPs on training single Galaxy Zoo network on GeForce Titan
  • Convolutional 1D, 2D, 3D, and 4D layers are fully supported by CUDA backend (it was 2D and 3D only before)
  • Fixed training multiple networks with CPU backend
  • Fixed supervised_data_mem_reader for float input data

Apr 19, 2014

nnForge v1.1.4

Here is new v1.1.4 release of nnForge. It contains mostly performance improvements but they are rather significant: the performance of convolutional layer in GPU (CUDA) backend improved from 1.15 TFLOPs to 2 TFLOPs for Galaxy Zoo example when running on NVIDIA GeForce GTX Titan, and the whole network performance improved from 1 TFLOPs to 1.55 TFLOPs. See the NSight VSE profiling screenshot:

2 TFLOPs on Titan running at 784 MHz... how efficient is it? Let's see: 2 TFLOPs / (784 Mhz * 14 SM * 192 fma/SM * 2 op/fma) = 47% of theoretical peak, which I consider a pretty good number. And there is certaily room for improvement here. Training could also benefit from these improvements; I plan to port these changes to hessian calculators and updaters soon.

Here are all the changes in this release:
  • C++11 limited support added: you can build everything except for CUDA backend - this is due to NVCC not yet supporting C++11
  • Improved testing and validating (feed forward) performance of convolutional layers in CUDA backend for Kepler at the same time greatly simplifying the code
  • Improved performance of max subsampling 2d tester for CUDA backend. The implementation is far from optimal yet

Apr 12, 2014

Galaxy Zoo

I took the second place in Galazy Zoo competition. Organizers requested the report from all prize winners, here is mine. Sander Dieleman won the challenge with a large margin. He used convolutional neural networks too, although his approach was more sophisticated. Team 6789, which took the thrird place, used convnets too!

Apr 5, 2014

nnForge v1.1.3

I labelled the latest changes in nnForge with v1.1.3:
  • Snapshot functionality is redesigned fully - it is now doing backpropagation, the feature is still in beta
  • Ability to define custom error functions is added
    • Cross-entropy error function is added, use with care - not tested yet
  • Galaxy Zoo example added - see Galaxy Zoo challenge
  • cuda_max_global_memory_usage_ratio is set to 0.8 by default - This should help those running code on a primary videocard
  • per_layer_mu mode is added - More robust training in some cases
  • Fixes:
    • Fixed crash when using output transformer
    • Fixed backprop for local_contrast_subtractive_2d_updater in CUDA backend
    • Fixed build with Boost 1.55

Feb 7, 2014

nnForge v1.1.2

I brushed up parameters for nnForge toolset. I also changed default values for some of them; if you run GTSRB you will probably need to update config file. Here is the full change list:
  • Deterministic transformator added for testing and validating
  • snapshots are made on ANNs from batch directory
  • Toolset parameters changed:
    • learning_rate_decay_rate is exposed as a command line parameter
    • training_speed parameter renamed to learning_rate, training_speed_degradation is dropped
    • training_iteration_count renamed to training_epoch_count
    • train command does batch train, batch_train command is removed
    • validate and test now work in batch mode, validate_batch and test_batch removed
    • mu_increase_factor is set to 1.0 by default
    • max_mu set to 1.0 by default
  • Bug-fixes

Jan 11, 2014

nnForge v1.1.1

I've just published new nnForge release v1.1.1:
  • Using space-filling curve for all the convolutional updaters, testers and hessians in CUDA backend, training large networks performance improved 
  • Improved concurrent training and loading/processing input data for all the stages by loading data in a separate host thread, CUDA backend only
  • In-memory supervised data reader added
  • Added NVTX profiling for reading input data, CUDA backend only
  • Fixed:
    • Binding texture to too large linear buffer
    • Average subsampling backprop in CUDA backend is wrong for non-even configs
    • Fixed performance in Windws with WDDM driver