Sunday, June 15, 2014

Hardware Accelerated OpenCV Configuration

Note: This is a series of articles in which we document our attempt to build an autonomous drone for the Sparkfun AVC.  The previous post is here.

Part of the Sparkfun AVC is finding and popping red balloons. We wanted to do this using on board vision processing. For performance reasons we need to hardware accelerate the vision system. We are using the Jetson TK1 board from NVIDIA with the OpenCV library. Here are instructions for configuring OpenCV with CUDA hardware acceleration on the Jetson TK1.

Imaging the Jetson Board

The first step is to image the Jetson TK1 with the latest version of Linux 4 Tegra (L4T); at the time of writing this is Rel-19 which is available at https://developer.nvidia.com/linux-tegra-rel-19. There are good instructions for this available from NVIDIA in the quick start guide available on the L4T page.

I did "sudo ./flash.sh -S 8GiB jetson-tk1 mmcblk0p1"

Installing Cuda

You must be part of the CUDA/GPU Computing Registered Developer program to get CUDA. Signing up is free on the NVIDIA developer website. You will have to login or create an account.

Download the CUDA Toolkit; at the time of writing this is CUDA 6.0 Toolkit for L4T Rel-19.2 which is available from developer.nvidia.com/jetson-tk1-support. That page also contains a getting started with linux guide which has more instructions; I only give the minimum required for this specific case. For troubleshooting refer to the getting started with linux guide.

Install the CUDA Toolkit:
cd Downloads/
sudo dpkg -i cuda-repo-l4t-r19.2_6.0-42_armhf.deb
sudo apt-get update
sudo apt-get install cuda-toolkit-6-0

Create symbolic links for the CUDA Libraries for compatibility. I put them in /usr/local/lib:
cd /usr/local/cuda-6.0
sudo ln -s *.so /usr/local/lib

Ensure that your PATH contains the CUDA executables:
export PATH=/usr/local/cuda-6.0/bin:$PATH

Ensure that your LD_LIBRARY_PATH includes the CUDA libraries and custom built libraries:
export LD_LIBRARY_PATH=/usr/local/lib:/usr/local/cuda-6.0/lib:$LD_LIBRARY_PATH

Compiling OpenCV

Download and extract the OpenCV code. I am using version 2.4.9 because, at the time of writing, that is the latest version.

Create a build directory in the same directory as the source directory. I put the source code in my Downloads directory, which currently contains "cuda-repo-l4t-r19.2_6.0-42_armhf.deb", "opencv-2.4.9.zip", "opencv-2.4.9", "opencv-2.4.9-build".

The GUI functions in OpenCV depend on GTK, so if you plan to use them, install libgtk2.0-dev:
sudo apt-get install libgtk2.0-dev

You need CMake to configure OpenCV so install that:
sudo apt-get install cmake

Change into the build directory for OpenCV

Configure OpenCV with the appropriate CUDA_ARCH_BIN for your GPU's Compute Capability, which can be determined with the deviceQuery CUDA sample. For the Jetson TK1, this is 32, so I ran:
cmake  -DCUDA_ARCH_BIN=32 ../opencv-2.4.9

Run the compile and install using make; The command I used was:
sudo make -j4 install

The -j4 flag instructs it to run four jobs simultaneously, which gives a considerable speed-up when run on the Jetson which has four large ARM Cores, but allows some of the output to be out of order or interleaved.

Using GPU accelerated OpenCV

The GPU module for OpenCV is quite simple to use. The functions in the gpu namespace generally have identical semantics to the cpu variants, with the only difference being that they take cv::gpu::GpuMat arguments instead of cv::Mat arguments. Data must be uploaded and downloaded between the CPU and GPU in order to be used. GpuMat provides upload and download functions that take a single Mat as an argument to transfer the data between the CPU and GPU.

For example, the code below shows the difference between using the CPU and the GPU for the threshold function.

using namespace cv;

Mat src_host, dest_host;

//on the CPU
threshold(src_host, dest_host, THRESH_VAL);

//on the GPU
gpu::GpuMat src, dest;
src.upload(src_host);
gpu::threshold(src, dest, THRESH_VAL);
dest.download(dest_host); 

My next article will discuss the specific algorithm we are using to find balloons.

No comments:

Post a Comment

Please Note: All comments are moderated. That's why you won't see your comment appear right away. If it's not some stupid piece of spam, it will appear soon.

Note: Only a member of this blog may post a comment.