Maxim Milakov

nnForge v2.3.0

2016-11-30T21:58:00.000+03:00

I have added multi-GPU support to nnForge! Both training and inferene can be done on multiple GPUs now. Single node only is supported. Training is parallelized with data parallel approach, where mini-batch is split across multiple GPUs.
The framework moved to C++11 now, you will need gcc 4.7 or newer to build the lib, and MS VS 2013 for Windows.

nnForge v2.2.0

2016-07-05T21:59:00.001+03:00

Hi, nnForge v2.2.0 is published!

Convolutional layer

strides added
w/out bias option added

check_gradient command added
Imagenet: reproduced ResNet50 result (7.5% Top5 single crop)
Average subsampling layer allows specifying output size instead of subsampling window sizes
Added profiling to CUDA backend
Max subsampling layer:

round_up mode added
Strides added

Step learning rate decay policy added
Added update_bn_weights action (but calculating mean and invsigma during training works well)
Spatial Transformer:

affine_grid_generator_layer added
linear_sampler layer added

Utilizing cudnnFindConvolution*AlgorithmEx functions to get maximum perf (cuDNN v5 is required for that)
Added strides to sparse convolution layer

nnForge v2.1.0

2016-02-21T17:21:00.005+03:00

2 months passed since the last release, this one is pretty big. A number of layers added, existing layers' functionality is extended. Here is the full list of changes in nnForge v2.1.0:

New layers added: Concat, Reshape, CDFMax, PrefixSum, Upsampling, Add (element-wise), CDF2PDF, EntryConvolution
Average and Max subsampling layers are now capable of subsampling in feature map and entry directions
MSE Layer reworked into generic LError layer (L2 by default)
Max subsampling can do MIN as well
Optional scale parameter for AverageSubsampling layer added
Detailed info on layers in the schema dumped
Dumping graph with layer configs in debug mode
Added dumping data in CSV format
Runtime layer replacement with data layers
Bug fixes

nnForge v2.0.2

2015-12-20T00:00:00.001+03:00

Small release nnForge v2.0.2 here:

Gradient modifier layer added
Structured_data_constant_reader added
Error function layers accept the 3rd optional input layer - mask
ADAM training algo implemented, use "--momentum_type adam", rate should generally be much smaller than for other methods
Changed default value for cuda_fixed_working_buffers_ratio to 0.4

I get very nice 5.4 TFLOPS on the whole model when training VGG-A with cuDNN v4 RC.

nnForge v2.0.1

2015-11-24T00:56:00.000+03:00

Hi,

I significantly improved performance of CUDA backend recently in nnForge v2.0.1:

Multiple improvements to reduce total buffer sizes, allows running larger chunks (3x for ImageNet):

Taking buffer sizes into account when coloring graph
Maxout, ReLU, and MaxSubsampling layers consume much less memory in CUDA backend
Action graph is optimized to exclude unnecessary concurrency - taking into account device width here

Migrated to cuDNN v3
Reusing CUDA streams
Allocating chunk of mem for fixed working buffers - improves perf
Few bug-fixes

See buffer graph coloring for the optimized action graph of VGG-A-like schema to the right. You can get this and other interesting graphs by specifying "--debug_mode 1" option.

nnForge v2.0.0

2015-11-07T11:13:00.000+03:00

Hi all,

6 months passed since last nnForge release and there is a good reason for it: I have been working on a major framework redesign, and now it is out! See nnForge v2.0.0:

The model is now arbitrary DAG (directed acyclic graph)
Running independent actions in mutiple streams in CUDA backend
Memory buffers are heavily reused

The changes are so radical, I had to drop support for the old trained data storage format. Unfortunately this means you will have to re-train your models from scratch.

Expect more goodies in near future!

nnForge v1.2.0

2015-04-30T20:32:00.001+03:00

Hi, this is a pretty big release of nnForge. The most important improvement is that mode schemas are now stored in Protobuf format. You now define the schema via plain text file. Use convert_schema action to convert from old binary format to new one. I also implemented Overfeat functionality - this allows running inference on large input data with fine-frained results efficiently.

All the change are:

Schema:

Model schema is now stord in Protobuf format. Use convert_schema to convert schemas in old binary format to new one

Input and output data normalizers are stored in protobuf format now. Use convert_input_normalizer and convert_output_normalizer to convert existing binary normalizers to new format
Schema and data are compatible now if non-empty layers match. Now empty-data layers don't matter

Training data:

Improvements insupervised_image_stream_reader
embed_data_transformer added

Training:

Nesterov momentum added (see --momentum_type option)
uniform_intensity_data_transformer added
Momentum data is kept between epochs (it is save and restored as well)
ROC result outputs accuracy, precision, recall, and F-score now (in addition to AUC)

Visualization:

snapshot_invalid now saves images, including binary classifier case

Inference:

Overfeat functionality added (see tiling option of max subsampling layer, and untile layer)

nnForge v1.1.13

2015-03-26T23:41:00.003+03:00

nnForge v1.1.13 is published with a number of improvements:

Data transformrs:

Stretch added to distort sampler transformer
perspective distortions added to distort_2d transformer
reshape_data_transformer added
elastic_deformation_2d_data_transformer added

Mixture of models:

Added --test_validate_save_output and --test_validate_load_output options
Running testing and validation from a mixture of output_values

Readers:

supervised_shuffle_entries_data_reader is made deterministic
deterministic image data reader is extended to sampler

Layers:

Parametric ReLU added (with CPU and GPU backends)
Average subsampling is reverted to native implementation (3D and 4D support)

Others:

Taking RELUs into account when initializing weights
validate_progress_network_data_pusher is extended with frequency parameter
Quasi-random training data randomization is dropped
Memory consumption reduced during testing
Resume training (-R) can now be applied with multiple ANNs training (-N)
VS2013 projects and solution added (using CUDA 7.0)
Fixed fancy backprop for analyzer
Bug-fixes

nnForge v1.1.12

2015-01-21T22:10:00.000+03:00

I finally started using cuDNN for some layers of nnForge library, the perf improved. Fermi GPUs are no longer supported; nnForge will run on Kepler and Mawell GPUs only (or CPUs). You will need cuDNN of version at least v2 RC2. Here are all the changes in nnForge v1.1.12:

Using cuDNN for a lot of layers now, Fermi is no longer supported
New transformers added: convert_to_polar_data_transformer, negate_data_transformer
New readers added: supervised_shuffle_entries_data_reader, image related readers (from raw jpegs stored in a single file)
Dropout functionality is moved into its own layer with better randomization
Soft recified linear layer removed

nnForge v1.1.11

2014-11-30T13:06:00.000+03:00

Hi, I am releasing nnForge v1.1.11 with a number of significant changes:

Padding added to sparse convolutional layers
Sparse convolutional layers implemented in GPU backend (Kepler+ only)
Fixed bug with dropout when error function is fuzed with last activation function
Array with random numbers extended to 256K elements (for dropout)

nnForge v1.1.10

2014-11-03T19:09:00.000+03:00

Hi, here is nnForge v1.1.10. The main new feature is zero-padding for convolutional layers, I should have implemented it long before. The full list of changes:

You can now specify zero-padding for input data for convolutional layers
Memory usage calculations improved
Learning rates is per part now (was per parameter) - training consumes less memory, bigger networks might be trained
Dropout implementation is simplified
Minor fixes

nnForge v1.1.9

2014-10-04T23:52:00.001+04:00

I released nnForge v1.1.9:

More sparse cases supported in GPU backend for convolutional layers, improved perf
convert_data_type_transformer added
Hessian based learning algo is removed
Galaxy Zoo example removed. Use previous releases to get it
Reporting average weights/updates after each batch
Image classifier demo added, improved perf for running single entry through the tester

nnForge v1.1.8

2014-08-23T15:15:00.002+04:00

Hi, nnForge v1.1.8 is released:

Sparse (in feature map dimension) convolutional layer added, with full support in CPU backend and fully connected (spatial) 1x1 support in GPU backend
You can use -std=c++11 now with CUDA 6.5 toolkit
Gradient check added
GTSRB switched to batch training
Boost and OpenCV libs default paths are /usr now
Improved performance for 1x1 convolutions in GPU backend
Minor fixes

nnForge v1.1.7

2014-07-12T22:27:00.001+04:00

It is a big release. I added a number of useful features you would expect a NN lib should have. Here is the full list:

Mini-batches added
Weight decay added
Momentum added
Cross Entropy error function is renamed to Negative Losss Likelihood, true Cross Entropy added
Sigmoid layer added, with correct biases initialization for the classifier
Splitting single epoch into multiple epochs through epoch_count_in_training_set parameter
max_subsampling layer supports 1D and 4D in GPU backend (was 2D and 3D only)
rotate_band_data_transformer is extended to all dimensions (was 2D only)
extract_data_transformer extended to data of any dimension in case input and output windows match
snapshot_data: added scaling and 3D (video)
Sigmoid+Coss-entropy and Softmax+Negative-log-likelihood fusion implemented in CPU and GPU backends to increase accuracy
Max L2 bound on incoming weights implementation is dropped (*)
Conversion to bw image fixed in GTSRB example
max subsampling updater and hessian - corner cases fixed in CPU backend

(*) I did that because L2 bound on incoming weights didn't improve quality in any problem I worked on. Supporting it is not free. So I decided to drop it.

Jetson TK1

2014-06-27T14:21:00.002+04:00

I got my own Jetson TK1, with Ubuntu 14.04 LTS running on it!

nnForge v1.1.6

2014-06-06T21:23:00.000+04:00

I implemented a number of quite useful features in nnForge recently:

Stochastic Gradien Descent training method is added
Resume training fuctionality added
Duplicating output to log file
Logging current settings at the toolset initialization
rgb_to_yuv_convert_layer_tester added in CPU backend
Readers are redesign to allow variable data readers
classifier_result is extended to top-N
Added possibility yo split single reader into multiple epochs
Multiple fixes

nnForge v1.1.5

2014-05-18T00:11:00.003+04:00

I ported performance improvemetns for forward propagation of convolutional layers in GPU backend to hessian calculators and weight updaters. Get latest release of nnForge v1.1.5:

Performance of weight updaters for convolutional layers is improved in CUDA backend. Mostly for Kepler architecture. The increase is much smaller than for forward prop, I got >800 GFLOPs on training single Galaxy Zoo network on GeForce Titan
Convolutional 1D, 2D, 3D, and 4D layers are fully supported by CUDA backend (it was 2D and 3D only before)
Fixed training multiple networks with CPU backend
Fixed supervised_data_mem_reader for float input data

nnForge v1.1.4

2014-04-19T01:21:00.003+04:00

Here is new v1.1.4 release of nnForge. It contains mostly performance improvements but they are rather significant: the performance of convolutional layer in GPU (CUDA) backend improved from 1.15 TFLOPs to 2 TFLOPs for Galaxy Zoo example when running on NVIDIA GeForce GTX Titan, and the whole network performance improved from 1 TFLOPs to 1.55 TFLOPs. See the NSight VSE profiling screenshot:

2 TFLOPs on Titan running at 784 MHz... how efficient is it? Let's see: 2 TFLOPs / (784 Mhz * 14 SM * 192 fma/SM * 2 op/fma) = 47% of theoretical peak, which I consider a pretty good number. And there is certaily room for improvement here. Training could also benefit from these improvements; I plan to port these changes to hessian calculators and updaters soon.

Here are all the changes in this release:

C++11 limited support added: you can build everything except for CUDA backend - this is due to NVCC not yet supporting C++11
Improved testing and validating (feed forward) performance of convolutional layers in CUDA backend for Kepler at the same time greatly simplifying the code
Improved performance of max subsampling 2d tester for CUDA backend. The implementation is far from optimal yet

Galaxy Zoo

2014-04-12T13:11:00.000+04:00

I took the second place in Galazy Zoo competition. Organizers requested the report from all prize winners, here is mine. Sander Dieleman won the challenge with a large margin. He used convolutional neural networks too, although his approach was more sophisticated. Team 6789, which took the thrird place, used convnets too!

nnForge v1.1.3

2014-04-05T09:42:00.001+04:00

I labelled the latest changes in nnForge with v1.1.3:

Snapshot functionality is redesigned fully - it is now doing backpropagation, the feature is still in beta
Ability to define custom error functions is added

Cross-entropy error function is added, use with care - not tested yet

Galaxy Zoo example added - see Galaxy Zoo challenge
cuda_max_global_memory_usage_ratio is set to 0.8 by default - This should help those running code on a primary videocard
per_layer_mu mode is added - More robust training in some cases
Fixes:

Fixed crash when using output transformer
Fixed backprop for local_contrast_subtractive_2d_updater in CUDA backend
Fixed build with Boost 1.55

nnForge v1.1.2

2014-02-07T00:02:00.002+04:00

I brushed up parameters for nnForge toolset. I also changed default values for some of them; if you run GTSRB you will probably need to update config file. Here is the full change list:

Deterministic transformator added for testing and validating
snapshots are made on ANNs from batch directory
Toolset parameters changed:

learning_rate_decay_rate is exposed as a command line parameter
training_speed parameter renamed to learning_rate, training_speed_degradation is dropped
training_iteration_count renamed to training_epoch_count
train command does batch train, batch_train command is removed
validate and test now work in batch mode, validate_batch and test_batch removed
mu_increase_factor is set to 1.0 by default
max_mu set to 1.0 by default

Bug-fixes

nnForge v1.1.1

2014-01-11T20:13:00.000+04:00

I've just published new nnForge release v1.1.1:

Using space-filling curve for all the convolutional updaters, testers and hessians in CUDA backend, training large networks performance improved
Improved concurrent training and loading/processing input data for all the stages by loading data in a separate host thread, CUDA backend only
In-memory supervised data reader added
Added NVTX profiling for reading input data, CUDA backend only
Fixed:

Binding texture to too large linear buffer
Average subsampling backprop in CUDA backend is wrong for non-even configs
Fixed performance in Windws with WDDM driver

Moved to blogger

2013-12-27T11:19:00.001+04:00

Moved the blog from Zoho to Blogger platform for a number of reasons including better uptime, design, ability to edit posts in place. All posts are copied to the new platform.

nnForge v1.1.0

2013-11-23T21:33:00.000+04:00

I've just published new nnForge release v1.1.0, which has a lot of new functionality and fixes implemented:

Squared Hinge Loss error function added
Local contrast subtractive layer hessian and updater implementations added both to CPU and GPU backend
Maxout layer added with CPU and GPU backends implemented
Added tester functionality for rgb_to_you_convert layer in CUDA backend
Learning rate decay functionality for tail iterations is added
Fixed:

Functionality bug in L2 incoming weights regularizer
Functionality bug for rectangular local contrast subtractive
Recovered snapshot_invalid functionality

Convolutional Neural Networks talk

2013-10-23T21:31:00.000+04:00

Just did a presentation on convolutional neural networks at Computer Vision meet-up at Yandex. Here are the slides (in Russian).