tag:blogger.com,1999:blog-60100716113208808052024-03-14T08:40:03.973+03:00Maxim MilakovA researcher in machine learning and high-performance computingAnonymoushttp://www.blogger.com/profile/01846927924685074598noreply@blogger.comBlogger36125tag:blogger.com,1999:blog-6010071611320880805.post-87011510903508918252016-11-30T21:58:00.000+03:002016-11-30T21:58:09.107+03:00nnForge v2.3.0I have added multi-GPU support to <a href="http://nnforge.org/">nnForge</a>! Both training and inferene can be done on multiple GPUs now. Single node only is supported. Training is parallelized with data parallel approach, where mini-batch is split across multiple GPUs.<br />
The framework moved to C++11 now, you will need gcc 4.7 or newer to build the lib, and MS VS 2013 for Windows.Anonymoushttp://www.blogger.com/profile/01846927924685074598noreply@blogger.com0tag:blogger.com,1999:blog-6010071611320880805.post-51062321673433058872016-07-05T21:59:00.001+03:002016-07-05T21:59:37.258+03:00nnForge v2.2.0Hi, <a href="http://nnforge.org/">nnForge </a>v2.2.0 is published!<br />
<ul>
<li>Convolutional layer</li>
<ul>
<li>strides added</li>
<li>w/out bias option added</li>
</ul>
<li>check_gradient command added</li>
<li>Imagenet: reproduced ResNet50 result (7.5% Top5 single crop)</li>
<li>Average subsampling layer allows specifying output size instead of subsampling window sizes</li>
<li>Added profiling to CUDA backend</li>
<li>Max subsampling layer:</li>
<ul>
<li>round_up mode added</li>
<li>Strides added</li>
</ul>
<li>Step learning rate decay policy added</li>
<li>Added update_bn_weights action (but calculating mean and invsigma during training works well)</li>
<li>Spatial Transformer:</li>
<ul>
<li>affine_grid_generator_layer added</li>
<li>linear_sampler layer added</li>
</ul>
<li>Utilizing cudnnFindConvolution*AlgorithmEx functions to get maximum perf (cuDNN v5 is required for that)</li>
<li>Added strides to sparse convolution layer</li>
</ul>
Anonymoushttp://www.blogger.com/profile/01846927924685074598noreply@blogger.com0tag:blogger.com,1999:blog-6010071611320880805.post-60967553254431785442016-02-21T17:21:00.005+03:002016-02-21T17:21:59.618+03:00nnForge v2.1.02 months passed since the last release, this one is pretty big. A number of layers added, existing layers' functionality is extended. Here is the full list of changes in <a href="http://nnforge.org/">nnForge</a> v2.1.0:<br />
<ul>
<li>New layers added: Concat, Reshape, CDFMax, PrefixSum, Upsampling, Add (element-wise), CDF2PDF, EntryConvolution</li>
<li>Average and Max subsampling layers are now capable of subsampling in feature map and entry directions</li>
<li>MSE Layer reworked into generic LError layer (L2 by default)</li>
<li>Max subsampling can do MIN as well</li>
<li>Optional scale parameter for AverageSubsampling layer added</li>
<li>Detailed info on layers in the schema dumped</li>
<li>Dumping graph with layer configs in debug mode</li>
<li>Added dumping data in CSV format</li>
<li>Runtime layer replacement with data layers</li>
<li>Bug fixes</li>
</ul>
Anonymoushttp://www.blogger.com/profile/01846927924685074598noreply@blogger.com1tag:blogger.com,1999:blog-6010071611320880805.post-73530425341876185782015-12-20T00:00:00.001+03:002015-12-20T00:00:10.463+03:00nnForge v2.0.2Small release <a href="http://nnforge.org/">nnForge</a> v2.0.2 here:<br />
<ul>
<li>Gradient modifier layer added</li>
<li>Structured_data_constant_reader added</li>
<li>Error function layers accept the 3rd optional input layer - mask</li>
<li>ADAM training algo implemented, use "--momentum_type adam", rate should generally be much smaller than for other methods</li>
<li>Changed default value for cuda_fixed_working_buffers_ratio to 0.4</li>
</ul>
<div>
I get very nice 5.4 TFLOPS on the whole model when training VGG-A with cuDNN v4 RC.</div>
Anonymoushttp://www.blogger.com/profile/01846927924685074598noreply@blogger.com0tag:blogger.com,1999:blog-6010071611320880805.post-5855188651819782772015-11-24T00:56:00.000+03:002015-11-24T00:59:56.200+03:00nnForge v2.0.1<a href="http://2.bp.blogspot.com/-fiJZWph7TBA/VlOKyKWlF2I/AAAAAAAABUs/OzA29BBtrnc/s1600/backward_prop_cuda_per_entry_buffers_00006.png" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" height="320" src="http://2.bp.blogspot.com/-fiJZWph7TBA/VlOKyKWlF2I/AAAAAAAABUs/OzA29BBtrnc/s320/backward_prop_cuda_per_entry_buffers_00006.png" width="38" /></a>Hi,<br />
<br />
I significantly improved performance of CUDA backend recently in <a href="http://nnforge.org/">nnForge</a> v2.0.1:<br />
<ul>
<li>Multiple improvements to reduce total buffer sizes, allows running larger chunks (3x for ImageNet):</li>
<ul>
<li>Taking buffer sizes into account when coloring graph</li>
<li>Maxout, ReLU, and MaxSubsampling layers consume much less memory in CUDA backend</li>
<li>Action graph is optimized to exclude unnecessary concurrency - taking into account device width here</li>
</ul>
<li>Migrated to cuDNN v3</li>
<li>Reusing CUDA streams</li>
<li>Allocating chunk of mem for fixed working buffers - improves perf</li>
<li>Few bug-fixes</li>
</ul>
<div>
See buffer graph coloring for the optimized action graph of VGG-A-like schema to the right. You can get this and other interesting graphs by specifying "--debug_mode 1" option.</div>
<div>
<br /></div>
Anonymoushttp://www.blogger.com/profile/01846927924685074598noreply@blogger.com1tag:blogger.com,1999:blog-6010071611320880805.post-66207411117404825362015-11-07T11:13:00.000+03:002015-11-07T11:13:36.051+03:00nnForge v2.0.0Hi all,<br />
<br />
6 months passed since last nnForge release and there is a good reason for it: I have been working on a major framework redesign, and now it is out! See <a href="http://nnforge.org/">nnForge</a> v2.0.0:<br />
<ul>
<li>The model is now arbitrary DAG (directed acyclic graph)</li>
<li>Running independent actions in mutiple streams in CUDA backend</li>
<li>Memory buffers are heavily reused</li>
</ul>
<div>
The changes are so radical, I had to drop support for the old trained data storage format. Unfortunately this means you will have to re-train your models from scratch.</div>
<div>
<br /></div>
<div>
Expect more goodies in near future!</div>
Anonymoushttp://www.blogger.com/profile/01846927924685074598noreply@blogger.com1tag:blogger.com,1999:blog-6010071611320880805.post-18782593316322459912015-04-30T20:32:00.001+03:002015-04-30T20:32:29.199+03:00nnForge v1.2.0Hi, this is a pretty big release of <a href="http://nnforge.org/">nnForge</a>. The most important improvement is that mode schemas are now stored in <a href="https://developers.google.com/protocol-buffers/">Protobuf</a> format. You now define the schema via plain text file. Use convert_schema action to convert from old binary format to new one. I also implemented <a href="http://arxiv.org/abs/1312.6229">Overfeat</a> functionality - this allows running inference on large input data with fine-frained results efficiently.<br />
<br />
All the change are:<br />
<ul>
<li>Schema:</li>
<ul>
<li>Model schema is now stord in Protobuf format. Use convert_schema to convert schemas in old binary format to new one</li>
</ul>
</ul>
<br />
<ul><ul>
<li>Input and output data normalizers are stored in protobuf format now. Use convert_input_normalizer and convert_output_normalizer to convert existing binary normalizers to new format</li>
<li>Schema and data are compatible now if non-empty layers match. Now empty-data layers don't matter</li>
</ul>
<li>Training data:</li>
<ul>
<li>Improvements insupervised_image_stream_reader</li>
<li>embed_data_transformer added</li>
</ul>
<li>Training:</li>
<ul>
<li>Nesterov momentum added (see --momentum_type option)</li>
<li>uniform_intensity_data_transformer added</li>
<li>Momentum data is kept between epochs (it is save and restored as well)</li>
<li>ROC result outputs accuracy, precision, recall, and F-score now (in addition to AUC)</li>
</ul>
<li>Visualization:</li>
<ul>
<li>snapshot_invalid now saves images, including binary classifier case</li>
</ul>
<li>Inference:</li>
<ul>
<li>Overfeat functionality added (see tiling option of max subsampling layer, and untile layer)</li>
</ul>
</ul>
Anonymoushttp://www.blogger.com/profile/01846927924685074598noreply@blogger.com0tag:blogger.com,1999:blog-6010071611320880805.post-6303692757718576562015-03-26T23:41:00.003+03:002015-03-26T23:42:39.472+03:00nnForge v1.1.13<a href="http://nnforge.org/">nnForge</a> v1.1.13 is published with a number of improvements:<br />
<br />
<ul>
<li>Data transformrs:</li>
<ul>
<li>Stretch added to distort sampler transformer</li>
<li>perspective distortions added to distort_2d transformer</li>
<li>reshape_data_transformer added</li>
<li>elastic_deformation_2d_data_transformer added</li>
</ul>
<li>Mixture of models:</li>
<ul>
<li>Added --test_validate_save_output and --test_validate_load_output options</li>
<li>Running testing and validation from a mixture of output_values</li>
</ul>
<li>Readers:</li>
<ul>
<li>supervised_shuffle_entries_data_reader is made deterministic</li>
<li>deterministic image data reader is extended to sampler</li>
</ul>
<li>Layers:</li>
<ul>
<li>Parametric ReLU added (with CPU and GPU backends)</li>
<li>Average subsampling is reverted to native implementation (3D and 4D support)</li>
</ul>
<li>Others:</li>
<ul>
<li>Taking RELUs into account when initializing weights</li>
<li>validate_progress_network_data_pusher is extended with frequency parameter</li>
<li>Quasi-random training data randomization is dropped</li>
<li>Memory consumption reduced during testing</li>
<li>Resume training (-R) can now be applied with multiple ANNs training (-N)</li>
<li>VS2013 projects and solution added (using CUDA 7.0)</li>
<li>Fixed fancy backprop for analyzer</li>
<li>Bug-fixes</li>
</ul>
</ul>
Anonymoushttp://www.blogger.com/profile/01846927924685074598noreply@blogger.com0tag:blogger.com,1999:blog-6010071611320880805.post-11236874413229234022015-01-21T22:10:00.000+03:002015-01-21T22:10:06.638+03:00nnForge v1.1.12I finally started using <a href="https://developer.nvidia.com/cuDNN">cuDNN</a> for some layers of <a href="http://nnforge.org/">nnForge</a> library, the perf improved. Fermi GPUs are no longer supported; nnForge will run on Kepler and Mawell GPUs only (or CPUs). You will need cuDNN of version at least <b>v2 RC2</b>. Here are all the changes in nnForge v1.1.12:<br /><ul>
<li>Using cuDNN for a lot of layers now, Fermi is no longer supported</li>
<li>New transformers added: convert_to_polar_data_transformer, negate_data_transformer</li>
<li>New readers added: supervised_shuffle_entries_data_reader, image related readers (from raw jpegs stored in a single file)</li>
<li>Dropout functionality is moved into its own layer with better randomization</li>
<li>Soft recified linear layer removed</li>
</ul>
Anonymoushttp://www.blogger.com/profile/01846927924685074598noreply@blogger.com0tag:blogger.com,1999:blog-6010071611320880805.post-70728777267453685672014-11-30T13:06:00.000+03:002014-11-30T13:06:05.774+03:00nnForge v1.1.11Hi, I am releasing <a href="http://nnforge.org/">nnForge</a> v1.1.11 with a number of significant changes:<br />
<br />
<ul>
<li>Padding added to sparse convolutional layers</li>
<li>Sparse convolutional layers implemented in GPU backend (Kepler+ only)</li>
<li>Fixed bug with dropout when error function is fuzed with last activation function</li>
<li>Array with random numbers extended to 256K elements (for dropout)</li>
</ul>
<br />
Anonymoushttp://www.blogger.com/profile/01846927924685074598noreply@blogger.com0tag:blogger.com,1999:blog-6010071611320880805.post-33315938579204378052014-11-03T19:09:00.000+03:002014-11-03T19:09:09.939+03:00nnForge v1.1.10Hi, here is <a href="http://nnforge.org/">nnForge</a> v1.1.10. The main new feature is zero-padding for convolutional layers, I should have implemented it long before. The full list of changes:<br />
<br />
<ul>
<li>You can now specify zero-padding for input data for convolutional layers</li>
<li>Memory usage calculations improved</li>
<li>Learning rates is per part now (was per parameter) - training consumes less memory, bigger networks might be trained</li>
<li>Dropout implementation is simplified</li>
<li>Minor fixes</li>
</ul>
Anonymoushttp://www.blogger.com/profile/01846927924685074598noreply@blogger.com0tag:blogger.com,1999:blog-6010071611320880805.post-46026561122776860112014-10-04T23:52:00.001+04:002014-10-04T23:52:41.954+04:00nnForge v1.1.9I released <a href="http://nnforge.org/">nnForge</a> v1.1.9:<br />
<ul>
<li>More sparse cases supported in GPU backend for convolutional layers, improved perf</li>
<li>convert_data_type_transformer added</li>
<li>Hessian based learning algo is removed</li>
<li>Galaxy Zoo example removed. Use previous releases to get it</li>
<li>Reporting average weights/updates after each batch</li>
<li>Image classifier demo added, improved perf for running single entry through the tester</li>
</ul>
<div>
<br /></div>
Anonymoushttp://www.blogger.com/profile/01846927924685074598noreply@blogger.com0tag:blogger.com,1999:blog-6010071611320880805.post-68069075937612117282014-08-23T15:15:00.002+04:002014-08-23T15:20:30.823+04:00nnForge v1.1.8Hi, <a href="http://nnforge.org/">nnForge</a> v1.1.8 is released:<br />
<ul>
<li>Sparse (in feature map dimension) convolutional layer added, with full support in CPU backend and fully connected (spatial) 1x1 support in GPU backend</li>
<li>You can use -std=c++11 now with CUDA 6.5 toolkit</li>
<li>Gradient check added</li>
<li>GTSRB switched to batch training</li>
<li>Boost and OpenCV libs default paths are /usr now</li>
<li>Improved performance for 1x1 convolutions in GPU backend</li>
<li>Minor fixes</li>
</ul>
Anonymoushttp://www.blogger.com/profile/01846927924685074598noreply@blogger.com0tag:blogger.com,1999:blog-6010071611320880805.post-40095902613450363172014-07-12T22:27:00.001+04:002014-07-12T22:27:12.857+04:00nnForge v1.1.7It is a big release. I added a number of useful features you would expect a NN lib should have. Here is the full list:<br /><ul>
<li>Mini-batches added</li>
<li>Weight decay added</li>
<li>Momentum added</li>
<li>Cross Entropy error function is renamed to Negative Losss Likelihood, true Cross Entropy added</li>
<li>Sigmoid layer added, with correct biases initialization for the classifier</li>
<li>Splitting single epoch into multiple epochs through epoch_count_in_training_set parameter</li>
<li>max_subsampling layer supports 1D and 4D in GPU backend (was 2D and 3D only)</li>
<li>rotate_band_data_transformer is extended to all dimensions (was 2D only)</li>
<li>extract_data_transformer extended to data of any dimension in case input and output windows match</li>
<li>snapshot_data: added scaling and 3D (video)</li>
<li>Sigmoid+Coss-entropy and Softmax+Negative-log-likelihood fusion implemented in CPU and GPU backends to increase accuracy</li>
<li>Max L2 bound on incoming weights implementation is dropped (*)</li>
<li>Conversion to bw image fixed in GTSRB example</li>
<li>max subsampling updater and hessian - corner cases fixed in CPU backend</li>
</ul>
<div>
(*) I did that because L2 bound on incoming weights didn't improve quality in any problem I worked on. Supporting it is not free. So I decided to drop it.</div>
Anonymoushttp://www.blogger.com/profile/01846927924685074598noreply@blogger.com0tag:blogger.com,1999:blog-6010071611320880805.post-11502416806909318162014-06-27T14:21:00.002+04:002014-06-27T14:21:45.247+04:00Jetson TK1I got my own <a href="https://developer.nvidia.com/jetson-tk1">Jetson TK1</a>, with Ubuntu 14.04 LTS running on it!<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="http://3.bp.blogspot.com/-mokVJzJOJBM/U61FI5BojqI/AAAAAAAABL0/4l3XAuAL8do/s1600/Jetson.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://3.bp.blogspot.com/-mokVJzJOJBM/U61FI5BojqI/AAAAAAAABL0/4l3XAuAL8do/s1600/Jetson.jpg" height="226" width="400" /></a></div>
<br />Anonymoushttp://www.blogger.com/profile/01846927924685074598noreply@blogger.com0tag:blogger.com,1999:blog-6010071611320880805.post-4970097384459608102014-06-06T21:23:00.000+04:002014-06-06T21:23:04.193+04:00nnForge v1.1.6I implemented a number of quite useful features in <a href="http://nnforge.org/">nnForge</a> recently:<br />
<ul>
<li>Stochastic Gradien Descent training method is added</li>
<li>Resume training fuctionality added</li>
<li>Duplicating output to log file</li>
<li>Logging current settings at the toolset initialization</li>
<li>rgb_to_yuv_convert_layer_tester added in CPU backend</li>
<li>Readers are redesign to allow variable data readers</li>
<li>classifier_result is extended to top-N</li>
<li>Added possibility yo split single reader into multiple epochs</li>
<li>Multiple fixes</li>
</ul>
Anonymoushttp://www.blogger.com/profile/01846927924685074598noreply@blogger.com0tag:blogger.com,1999:blog-6010071611320880805.post-63955528811465724392014-05-18T00:11:00.003+04:002014-05-22T14:50:21.223+04:00nnForge v1.1.5I ported performance improvemetns for forward propagation of convolutional layers in GPU backend to hessian calculators and weight updaters. Get latest release of <a href="http://nnforge.org/">nnForge</a> v1.1.5:<br />
<ul>
<li>Performance of weight updaters for convolutional layers is improved in CUDA backend. Mostly for Kepler architecture. The increase is much smaller than for forward prop, I got >800 GFLOPs on training single <a href="https://github.com/milakov/nnForge/tree/master/examples/galaxy_zoo">Galaxy Zoo</a> network on GeForce Titan</li>
<li>Convolutional 1D, 2D, 3D, and 4D layers are fully supported by CUDA backend (it was 2D and 3D only before)</li>
<li>Fixed training multiple networks with CPU backend</li>
<li>Fixed supervised_data_mem_reader for float input data</li>
</ul>
Anonymoushttp://www.blogger.com/profile/01846927924685074598noreply@blogger.com0tag:blogger.com,1999:blog-6010071611320880805.post-47483984961573789852014-04-19T01:21:00.003+04:002014-04-19T01:55:46.718+04:00nnForge v1.1.4Here is new v1.1.4 release of <a href="http://nnforge.org/">nnForge</a>. It contains mostly performance improvements but they are rather significant: the performance of convolutional layer in GPU (CUDA) backend improved from 1.15 TFLOPs to 2 TFLOPs for <a href="https://github.com/milakov/nnForge/tree/master/examples/galaxy_zoo">Galaxy Zoo</a> example when running on NVIDIA GeForce GTX Titan, and the whole network performance improved from 1 TFLOPs to 1.55 TFLOPs. See the NSight VSE profiling screenshot:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="http://2.bp.blogspot.com/-N5JVGpvAv7A/U1GfHY42WlI/AAAAAAAABIs/ipOKPepmygw/s1600/nsight_flops.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://2.bp.blogspot.com/-N5JVGpvAv7A/U1GfHY42WlI/AAAAAAAABIs/ipOKPepmygw/s1600/nsight_flops.png" /></a></div>
<br />
<div>
2 TFLOPs on Titan running at 784 MHz... how efficient is it? Let's see: 2 TFLOPs / (784 Mhz * 14 SM * 192 fma/SM * 2 op/fma) = 47% of theoretical peak, which I consider a pretty good number. And there is certaily room for improvement here. Training could also benefit from these improvements; I plan to port these changes to hessian calculators and updaters soon.<br />
<br />
Here are all the changes in this release:<br />
<ul>
<li>C++11 limited support added: you can build everything except for CUDA backend - this is due to NVCC not yet supporting C++11</li>
<li>Improved testing and validating (feed forward) performance of convolutional layers in CUDA backend for Kepler at the same time greatly simplifying the code</li>
<li>Improved performance of max subsampling 2d tester for CUDA backend. The implementation is far from optimal yet</li>
</ul>
</div>
Anonymoushttp://www.blogger.com/profile/01846927924685074598noreply@blogger.com0tag:blogger.com,1999:blog-6010071611320880805.post-74566462088127625372014-04-12T13:11:00.000+04:002014-04-12T13:11:28.020+04:00Galaxy ZooI took the second place in <a href="https://www.kaggle.com/c/galaxy-zoo-the-galaxy-challenge">Galazy Zoo competition</a>. Organizers requested the report from all prize winners, <a href="https://github.com/milakov/nnForge/blob/master/examples/galaxy_zoo/galaxy_zoo.pdf?raw=true">here is mine</a>. Sander Dieleman won the challenge with a large margin. He used convolutional neural networks too, although <a href="http://benanne.github.io/2014/04/05/galaxy-zoo.html">his approach was more sophisticated</a>. Team <i>6789</i>, which took the thrird place, used convnets too!Anonymoushttp://www.blogger.com/profile/01846927924685074598noreply@blogger.com0tag:blogger.com,1999:blog-6010071611320880805.post-33771874333722881322014-04-05T09:42:00.001+04:002014-04-05T09:43:45.446+04:00nnForge v1.1.3I labelled the latest changes in <a href="http://nnforge.org/">nnForge</a> with v1.1.3:<br />
<ul>
<li>Snapshot functionality is redesigned fully - it is now doing backpropagation, the feature is still in beta</li>
<li>Ability to define custom error functions is added</li>
<ul>
<li>Cross-entropy error function is added, use with care - not tested yet</li>
</ul>
<li>Galaxy Zoo example added - see <a href="https://www.kaggle.com/c/galaxy-zoo-the-galaxy-challenge">Galaxy Zoo challenge</a></li>
<li>cuda_max_global_memory_usage_ratio is set to 0.8 by default - This should help those running code on a primary videocard</li>
<li>per_layer_mu mode is added - More robust training in some cases</li>
<li>Fixes:</li>
<ul>
<li>Fixed crash when using output transformer</li>
<li>Fixed backprop for local_contrast_subtractive_2d_updater in CUDA backend</li>
<li>Fixed build with Boost 1.55</li>
</ul>
</ul>
Anonymoushttp://www.blogger.com/profile/01846927924685074598noreply@blogger.com0tag:blogger.com,1999:blog-6010071611320880805.post-67643576962174148772014-02-07T00:02:00.002+04:002014-02-07T00:02:19.940+04:00nnForge v1.1.2I brushed up parameters for <a href="http://nnforge.org/">nnForge</a> toolset. I also changed default values for some of them; if you run GTSRB you will probably need to update config file. Here is the full change list:<br />
<ul>
<li>Deterministic transformator added for testing and validating</li>
<li>snapshots are made on ANNs from batch directory</li>
<li>Toolset parameters changed:</li>
<ul>
<li>learning_rate_decay_rate is exposed as a command line parameter</li>
<li>training_speed parameter renamed to learning_rate, training_speed_degradation is dropped</li>
<li>training_iteration_count renamed to training_epoch_count</li>
<li>train command does batch train, batch_train command is removed</li>
<li>validate and test now work in batch mode, validate_batch and test_batch removed</li>
<li>mu_increase_factor is set to 1.0 by default</li>
<li>max_mu set to 1.0 by default</li>
</ul>
<li>Bug-fixes</li>
</ul>
Anonymoushttp://www.blogger.com/profile/01846927924685074598noreply@blogger.com0tag:blogger.com,1999:blog-6010071611320880805.post-20971651254713996682014-01-11T20:13:00.000+04:002014-01-11T20:16:18.062+04:00nnForge v1.1.1I've just published new <a href="http://nnforge.org/">nnForge</a> release v1.1.1:<br />
<ul>
<li>Using space-filling curve for all the convolutional updaters, testers and hessians in CUDA backend, training large networks performance improved </li>
<li>Improved concurrent training and loading/processing input data for all the stages by loading data in a separate host thread, CUDA backend only</li>
<li>In-memory supervised data reader added</li>
<li>Added NVTX profiling for reading input data, CUDA backend only</li>
<li>Fixed:</li>
<ul>
<li>Binding texture to too large linear buffer</li>
<li>Average subsampling backprop in CUDA backend is wrong for non-even configs</li>
<li>Fixed performance in Windws with WDDM driver</li>
</ul>
</ul>
Anonymoushttp://www.blogger.com/profile/01846927924685074598noreply@blogger.com0tag:blogger.com,1999:blog-6010071611320880805.post-83802846136103110022013-12-27T11:19:00.001+04:002013-12-27T11:19:24.348+04:00Moved to bloggerMoved the blog from Zoho to Blogger platform for a number of reasons including better uptime, design, ability to edit posts in place. All posts are copied to the new platform.Anonymoushttp://www.blogger.com/profile/01846927924685074598noreply@blogger.com0tag:blogger.com,1999:blog-6010071611320880805.post-56223449908543134492013-11-23T21:33:00.000+04:002013-12-27T15:09:46.343+04:00nnForge v1.1.0I've just published new <a href="http://nnforge.org/">nnForge</a> release v1.1.0, which has a lot of new functionality and fixes implemented:<br />
<div>
<ul>
<li>Squared Hinge Loss error function added</li>
<li>Local contrast subtractive layer hessian and updater implementations added both to CPU and GPU backend</li>
<li>Maxout layer added with CPU and GPU backends implemented</li>
<li>Added tester functionality for rgb_to_you_convert layer in CUDA backend</li>
<li>Learning rate decay functionality for tail iterations is added</li>
<li>Fixed:</li>
<ul>
<li>Functionality bug in L2 incoming weights regularizer</li>
<li>Functionality bug for rectangular local contrast subtractive</li>
<li>Recovered snapshot_invalid functionality</li>
</ul>
</ul>
</div>
Anonymoushttp://www.blogger.com/profile/01846927924685074598noreply@blogger.com0tag:blogger.com,1999:blog-6010071611320880805.post-69323217930650213772013-10-23T21:31:00.000+04:002013-12-28T00:05:34.382+04:00Convolutional Neural Networks talkJust did a presentation on convolutional neural networks at Computer Vision meet-up at Yandex. <a href="https://drive.google.com/file/d/0B2hfQbOo3RqBZWMyT1JSMlBLVVk/edit?usp=sharing">Here</a> are the slides (in Russian).Anonymoushttp://www.blogger.com/profile/01846927924685074598noreply@blogger.com0