Metadata-Version: 2.1
Name: spatial-correlation-sampler
Version: 0.2.0
Summary: Correlation module for pytorch
Home-page: https://github.com/ClementPinard/Pytorch-Correlation-extension
Author: Clément Pinard
Author-email: clement.pinard@ensta-paristech.fr
License: UNKNOWN
Description: 
        [![PyPI](https://img.shields.io/pypi/v/spatial-correlation-sampler.svg)](https://pypi.org/project/spatial-correlation-sampler/)
        
        
        # Pytorch Correlation module
        
        this is a custom C++/Cuda implementation of Correlation module, used e.g. in [FlowNetC](https://arxiv.org/abs/1504.06852)
        
        This [tutorial](http://pytorch.org/tutorials/advanced/cpp_extension.html) was used as a basis for implementation, as well as
        [NVIDIA's cuda code](https://github.com/NVIDIA/flownet2-pytorch/tree/master/networks/correlation_package)
        
        - Build and Install C++ and CUDA extensions by executing `python setup.py install`,
        - Benchmark C++ vs. CUDA by running `python benchmark.py {cpu, cuda}`,
        - Run gradient checks on the code by running `python grad_check.py --backend {cpu, cuda}`.
        
        # Requirements
        
        This module is expected to compile for Pytorch `1.2`, on `Python > 3.5` and `Python 2.7`.
        
        # Installation
        
        this module is available on pip
        
        `pip install spatial-correlation-sampler`
        
        For a cpu-only version, you can install from source with
        
        `python setup_cpu.py install`
        
        # Known Problems
        
        This module needs compatible gcc version and CUDA to be compiled.
        Namely, CUDA 9.1 and below will need gcc5, while CUDA 9.2 and 10.0 will need gcc7
        See [this issue](https://github.com/ClementPinard/Pytorch-Correlation-extension/issues/1) for more information
        
        # Usage
        
        API has a few difference with NVIDIA's module
         * output is now a 5D tensor, which reflects the shifts horizontal and vertical.
         ```
        input (B x C x H x W) -> output (B x PatchH x PatchW x oH x oW)
         ```
         * Output sizes `oH` and `oW` are no longer dependant of patch size, but only of kernel size and padding
         * Patch size `patch_size` is now the whole patch, and not only the radii.
         * `stride1` is now `stride` and`stride2` is `dilation_patch`, which behave like dilated convolutions
         * equivalent `max_displacement` is then `dilation_patch * (patch_size - 1) / 2`.
         * to get the right parameters for FlowNetC, you would have
         ```
        kernel_size=1
        patch_size=21,
        stride=1,
        padding=0,
        dilation_patch=2
         ```
        
        # Benchmark
        
         * default parameters are from `benchmark.py`, FlowNetC parameters are same as use in `FlowNetC` with a batch size of 4, described in [this paper](https://arxiv.org/abs/1504.06852), implemented [here](https://github.com/lmb-freiburg/flownet2) and [here](https://github.com/NVIDIA/flownet2-pytorch/blob/master/networks/FlowNetC.py).
         * Feel free to file an issue to add entries to this with your hardware !
        
        ## CUDA Benchmark
        
         * See [here](https://gist.github.com/ClementPinard/270e910147119831014932f67fb1b5ea) for a benchmark script working with [NVIDIA](https://github.com/NVIDIA/flownet2-pytorch/tree/master/networks/correlation_package)'s code, and Pytorch.
         * Benchmark are launched with environment variable `CUDA_LAUNCH_BLOCKING` set to `1`.
         * Only `float32` is benchmarked.
         * FlowNetC correlation parameters where launched with the following command:
         
         ```bash
         CUDA_LAUNCH_BLOCKING=1 python benchmark.py --scale ms -k1 --patch 21 -s1 -p0 --patch_dilation 2 -b4 --height 48 --width 64 -c256 cuda
         
         CUDA_LAUNCH_BLOCKING=1 python NV_correlation_benchmark.py --scale ms -k1 --patch 21 -s1 -p0 --patch_dilation 2 -b4 --height 48 --width 64 -c256
         ```
        
         | implementation | Correlation parameters |  device |     pass |      min time |      avg time |
         | -------------- | ---------------------- | ------- | -------- | ------------: | ------------: |
         |           ours |                default | 980 GTX |  forward |  **5.745 ms** |  **5.851 ms** |
         |           ours |                default | 980 GTX | backward |     77.694 ms |     77.957 ms |
         |         NVIDIA |                default | 980 GTX |  forward |     13.779 ms |     13.853 ms |
         |         NVIDIA |                default | 980 GTX | backward | **73.383 ms** | **73.708 ms** |
         |                |                        |         |          |               |               |
         |           ours |               FlowNetC | 980 GTX |  forward |  **26.102 ms** |  **26.179 ms** |
         |           ours |               FlowNetC | 980 GTX | backward | **208.091 ms** | **208.510 ms** |
         |         NVIDIA |               FlowNetC | 980 GTX |  forward |      35.363 ms |      35.550 ms |
         |         NVIDIA |               FlowNetC | 980 GTX | backward |     283.748 ms |     284.346 ms |
         
        ### Notes
         * The overhead of our implementation regarding `kernel_size` > 1 during backward needs some investigation, feel free to
         dive in the code to improve it !
         * The backward pass of NVIDIA is not entirely correct when stride1 > 1 and kernel_size > 1, because not everything
         is computed, see [here](https://github.com/NVIDIA/flownet2-pytorch/blob/master/networks/correlation_package/src/correlation_cuda_kernel.cu#L120).
        
        ## CPU Benchmark
        
          * No other implementation is avalaible on CPU.
          * It is obviously not recommended to run it on CPU if you have a GPU.
        
         | Correlation parameters |               device |     pass |    min time |    avg time |
         | ---------------------- | -------------------- | -------- | ----------: | ----------: |
         |                default | E5-2630 v3 @ 2.40GHz |  forward |  159.616 ms |  188.727 ms |
         |                default | E5-2630 v3 @ 2.40GHz | backward |  282.641 ms |  294.194 ms |
         |               FlowNetC | E5-2630 v3 @ 2.40GHz |  forward |  2.138 s |  2.144 s |
         |               FlowNetC | E5-2630 v3 @ 2.40GHz | backward | 7.006 s | 7.075 s |
        
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: POSIX :: Linux
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Description-Content-Type: text/markdown
