Nov 30, 2016

nnForge v2.3.0

I have added multi-GPU support to nnForge! Both training and inferene can be done on multiple GPUs now. Single node only is supported. Training is parallelized with data parallel approach, where mini-batch is split across multiple GPUs.
The framework moved to C++11 now, you will need gcc 4.7 or newer to build the lib, and MS VS 2013 for Windows.

Jul 5, 2016

nnForge v2.2.0

Hi, nnForge v2.2.0 is published!
  • Convolutional layer
    • strides added
    • w/out bias option added
  • check_gradient command added
  • Imagenet: reproduced ResNet50 result (7.5% Top5 single crop)
  • Average subsampling layer allows specifying output size instead of subsampling window sizes
  • Added profiling to CUDA backend
  • Max subsampling layer:
    • round_up mode added
    • Strides added
  • Step learning rate decay policy added
  • Added update_bn_weights action (but calculating mean and invsigma during training works well)
  • Spatial Transformer:
    • affine_grid_generator_layer added
    • linear_sampler layer added
  • Utilizing cudnnFindConvolution*AlgorithmEx functions to get maximum perf (cuDNN v5 is required for that)
  • Added strides to sparse convolution layer

Feb 21, 2016

nnForge v2.1.0

2 months passed since the last release, this one is pretty big. A number of layers added, existing layers' functionality is extended. Here is the full list of changes in nnForge v2.1.0:
  • New layers added: Concat, Reshape, CDFMax, PrefixSum, Upsampling, Add (element-wise), CDF2PDF, EntryConvolution
  • Average and Max subsampling layers are now capable of subsampling in feature map and entry directions
  • MSE Layer reworked into generic LError layer (L2 by default)
  • Max subsampling can do MIN as well
  • Optional scale parameter for AverageSubsampling layer added
  • Detailed info on layers in the schema dumped
  • Dumping graph with layer configs in debug mode
  • Added dumping data in CSV format
  • Runtime layer replacement with data layers
  • Bug fixes

Dec 20, 2015

nnForge v2.0.2

Small release nnForge v2.0.2 here:
  • Gradient modifier layer added
  • Structured_data_constant_reader added
  • Error function layers accept the 3rd optional input layer - mask
  • ADAM training algo implemented, use "--momentum_type adam", rate should generally be much smaller than for other methods
  • Changed default value for cuda_fixed_working_buffers_ratio to 0.4
I get very nice 5.4 TFLOPS on the whole model when training VGG-A with cuDNN v4 RC.

Nov 24, 2015

nnForge v2.0.1


I significantly improved performance of CUDA backend recently in nnForge v2.0.1:
  • Multiple improvements to reduce total buffer sizes, allows running larger chunks (3x for ImageNet):
    • Taking buffer sizes into account when coloring graph
    • Maxout, ReLU, and MaxSubsampling layers consume much less memory in CUDA backend
    • Action graph is optimized to exclude unnecessary concurrency - taking into account device width here
  • Migrated to cuDNN v3
  • Reusing CUDA streams
  • Allocating chunk of mem for fixed working buffers - improves perf
  • Few bug-fixes
See buffer graph coloring for the optimized action graph of VGG-A-like schema to the right. You can get this and other interesting graphs by specifying "--debug_mode 1" option.

Nov 7, 2015

nnForge v2.0.0

Hi all,

6 months passed since last nnForge release and there is a good reason for it: I have been working on a major framework redesign, and now it is out! See nnForge v2.0.0:
  • The model is now arbitrary DAG (directed acyclic graph)
  • Running independent actions in mutiple streams in CUDA backend
  • Memory buffers are heavily reused
The changes are so radical, I had to drop support for the old trained data storage format. Unfortunately this means you will have to re-train your models from scratch.

Expect more goodies in near future!

Apr 30, 2015

nnForge v1.2.0

Hi, this is a pretty big release of nnForge. The most important improvement is that mode schemas are now stored in Protobuf format. You now define the schema via plain text file. Use convert_schema action to convert from old binary format to new one. I also implemented Overfeat functionality - this allows running inference on large input data with fine-frained results efficiently.

All the change are:
  • Schema:
    • Model schema is now stord in Protobuf format. Use convert_schema to convert schemas in old binary format to new one

    • Input and output data normalizers are stored in protobuf format now. Use convert_input_normalizer and convert_output_normalizer to convert existing binary normalizers to new format
    • Schema and data are compatible now if non-empty layers match. Now empty-data layers don't matter
  • Training data:
    • Improvements insupervised_image_stream_reader
    • embed_data_transformer added
  • Training:
    • Nesterov momentum added (see --momentum_type option)
    • uniform_intensity_data_transformer added
    • Momentum data is kept between epochs (it is save and restored as well)
    • ROC result outputs accuracy, precision, recall, and F-score now (in addition to AUC)
  • Visualization:
    • snapshot_invalid now saves images, including binary classifier case
  • Inference:
    • Overfeat functionality added (see tiling option of max subsampling layer, and untile layer)