Maxim Milakov

Oct 4, 2014

nnForge v1.1.9

I released nnForge v1.1.9:

More sparse cases supported in GPU backend for convolutional layers, improved perf
convert_data_type_transformer added
Hessian based learning algo is removed
Galaxy Zoo example removed. Use previous releases to get it
Reporting average weights/updates after each batch
Image classifier demo added, improved perf for running single entry through the tester

Aug 23, 2014

nnForge v1.1.8

Hi, nnForge v1.1.8 is released:

Sparse (in feature map dimension) convolutional layer added, with full support in CPU backend and fully connected (spatial) 1x1 support in GPU backend
You can use -std=c++11 now with CUDA 6.5 toolkit
Gradient check added
GTSRB switched to batch training
Boost and OpenCV libs default paths are /usr now
Improved performance for 1x1 convolutions in GPU backend
Minor fixes

Jul 12, 2014

nnForge v1.1.7

It is a big release. I added a number of useful features you would expect a NN lib should have. Here is the full list:

Mini-batches added
Weight decay added
Momentum added
Cross Entropy error function is renamed to Negative Losss Likelihood, true Cross Entropy added
Sigmoid layer added, with correct biases initialization for the classifier
Splitting single epoch into multiple epochs through epoch_count_in_training_set parameter
max_subsampling layer supports 1D and 4D in GPU backend (was 2D and 3D only)
rotate_band_data_transformer is extended to all dimensions (was 2D only)
extract_data_transformer extended to data of any dimension in case input and output windows match
snapshot_data: added scaling and 3D (video)
Sigmoid+Coss-entropy and Softmax+Negative-log-likelihood fusion implemented in CPU and GPU backends to increase accuracy
Max L2 bound on incoming weights implementation is dropped (*)
Conversion to bw image fixed in GTSRB example
max subsampling updater and hessian - corner cases fixed in CPU backend

(*) I did that because L2 bound on incoming weights didn't improve quality in any problem I worked on. Supporting it is not free. So I decided to drop it.

Jun 27, 2014

Jetson TK1

I got my own Jetson TK1, with Ubuntu 14.04 LTS running on it!

Jun 6, 2014

nnForge v1.1.6

I implemented a number of quite useful features in nnForge recently:

Stochastic Gradien Descent training method is added
Resume training fuctionality added
Duplicating output to log file
Logging current settings at the toolset initialization
rgb_to_yuv_convert_layer_tester added in CPU backend
Readers are redesign to allow variable data readers
classifier_result is extended to top-N
Added possibility yo split single reader into multiple epochs
Multiple fixes

May 18, 2014

nnForge v1.1.5

I ported performance improvemetns for forward propagation of convolutional layers in GPU backend to hessian calculators and weight updaters. Get latest release of nnForge v1.1.5:

Performance of weight updaters for convolutional layers is improved in CUDA backend. Mostly for Kepler architecture. The increase is much smaller than for forward prop, I got >800 GFLOPs on training single Galaxy Zoo network on GeForce Titan
Convolutional 1D, 2D, 3D, and 4D layers are fully supported by CUDA backend (it was 2D and 3D only before)
Fixed training multiple networks with CPU backend
Fixed supervised_data_mem_reader for float input data

Apr 19, 2014

Here is new v1.1.4 release of nnForge. It contains mostly performance improvements but they are rather significant: the performance of convolutional layer in GPU (CUDA) backend improved from 1.15 TFLOPs to 2 TFLOPs for Galaxy Zoo example when running on NVIDIA GeForce GTX Titan, and the whole network performance improved from 1 TFLOPs to 1.55 TFLOPs. See the NSight VSE profiling screenshot:

2 TFLOPs on Titan running at 784 MHz... how efficient is it? Let's see: 2 TFLOPs / (784 Mhz * 14 SM * 192 fma/SM * 2 op/fma) = 47% of theoretical peak, which I consider a pretty good number. And there is certaily room for improvement here. Training could also benefit from these improvements; I plan to port these changes to hessian calculators and updaters soon.

Here are all the changes in this release:

C++11 limited support added: you can build everything except for CUDA backend - this is due to NVCC not yet supporting C++11
Improved testing and validating (feed forward) performance of convolutional layers in CUDA backend for Kepler at the same time greatly simplifying the code
Improved performance of max subsampling 2d tester for CUDA backend. The implementation is far from optimal yet