- More sparse cases supported in GPU backend for convolutional layers, improved perf
- convert_data_type_transformer added
- Hessian based learning algo is removed
- Galaxy Zoo example removed. Use previous releases to get it
- Reporting average weights/updates after each batch
- Image classifier demo added, improved perf for running single entry through the tester
Oct 4, 2014
nnForge v1.1.9
I released nnForge v1.1.9:
Aug 23, 2014
nnForge v1.1.8
Hi, nnForge v1.1.8 is released:
- Sparse (in feature map dimension) convolutional layer added, with full support in CPU backend and fully connected (spatial) 1x1 support in GPU backend
- You can use -std=c++11 now with CUDA 6.5 toolkit
- Gradient check added
- GTSRB switched to batch training
- Boost and OpenCV libs default paths are /usr now
- Improved performance for 1x1 convolutions in GPU backend
- Minor fixes
Jul 12, 2014
nnForge v1.1.7
It is a big release. I added a number of useful features you would expect a NN lib should have. Here is the full list:
- Mini-batches added
- Weight decay added
- Momentum added
- Cross Entropy error function is renamed to Negative Losss Likelihood, true Cross Entropy added
- Sigmoid layer added, with correct biases initialization for the classifier
- Splitting single epoch into multiple epochs through epoch_count_in_training_set parameter
- max_subsampling layer supports 1D and 4D in GPU backend (was 2D and 3D only)
- rotate_band_data_transformer is extended to all dimensions (was 2D only)
- extract_data_transformer extended to data of any dimension in case input and output windows match
- snapshot_data: added scaling and 3D (video)
- Sigmoid+Coss-entropy and Softmax+Negative-log-likelihood fusion implemented in CPU and GPU backends to increase accuracy
- Max L2 bound on incoming weights implementation is dropped (*)
- Conversion to bw image fixed in GTSRB example
- max subsampling updater and hessian - corner cases fixed in CPU backend
(*) I did that because L2 bound on incoming weights didn't improve quality in any problem I worked on. Supporting it is not free. So I decided to drop it.
Jun 27, 2014
Jun 6, 2014
nnForge v1.1.6
I implemented a number of quite useful features in nnForge recently:
- Stochastic Gradien Descent training method is added
- Resume training fuctionality added
- Duplicating output to log file
- Logging current settings at the toolset initialization
- rgb_to_yuv_convert_layer_tester added in CPU backend
- Readers are redesign to allow variable data readers
- classifier_result is extended to top-N
- Added possibility yo split single reader into multiple epochs
- Multiple fixes
May 18, 2014
nnForge v1.1.5
I ported performance improvemetns for forward propagation of convolutional layers in GPU backend to hessian calculators and weight updaters. Get latest release of nnForge v1.1.5:
- Performance of weight updaters for convolutional layers is improved in CUDA backend. Mostly for Kepler architecture. The increase is much smaller than for forward prop, I got >800 GFLOPs on training single Galaxy Zoo network on GeForce Titan
- Convolutional 1D, 2D, 3D, and 4D layers are fully supported by CUDA backend (it was 2D and 3D only before)
- Fixed training multiple networks with CPU backend
- Fixed supervised_data_mem_reader for float input data
Apr 19, 2014
nnForge v1.1.4
Here is new v1.1.4 release of nnForge. It contains mostly performance improvements but they are rather significant: the performance of convolutional layer in GPU (CUDA) backend improved from 1.15 TFLOPs to 2 TFLOPs for Galaxy Zoo example when running on NVIDIA GeForce GTX Titan, and the whole network performance improved from 1 TFLOPs to 1.55 TFLOPs. See the NSight VSE profiling screenshot:
2 TFLOPs on Titan running at 784 MHz... how efficient is it? Let's see: 2 TFLOPs / (784 Mhz * 14 SM * 192 fma/SM * 2 op/fma) = 47% of theoretical peak, which I consider a pretty good number. And there is certaily room for improvement here. Training could also benefit from these improvements; I plan to port these changes to hessian calculators and updaters soon.
Here are all the changes in this release:
Here are all the changes in this release:
- C++11 limited support added: you can build everything except for CUDA backend - this is due to NVCC not yet supporting C++11
- Improved testing and validating (feed forward) performance of convolutional layers in CUDA backend for Kepler at the same time greatly simplifying the code
- Improved performance of max subsampling 2d tester for CUDA backend. The implementation is far from optimal yet
Subscribe to:
Posts (Atom)