\(\def\dt{\Delta t}\)
\(\newcommand{\transpose}[0]{^\mathrm{T}}\)
\(\newcommand{\half}[0]{\tfrac{1}{2}}\)
\(\newcommand{\Half}[0]{\frac{1}{2}}\)
\(\newcommand{\norm}[1]{\left\lVert#1\right\rVert}\)
\(\newcommand\given[1][]{\:#1\vert\:}\)
UP | HOME

Introduction to Image Processing in ROS

Table of Contents

1 Introduction

This page will describe some of the basic image processing concepts in ROS. The goal of this page is not to teach computer vision or image processing. Rather, the goal is to introduce the ROS packages and related non-ROS packages that are useful for image processing when using ROS.

2 Primary Concepts

The first step to working with any sort of camera in ROS is to find a driver that will work with your camera. If you are using a simple USB or built-in webcam, there are a variety of drivers to choose from, such as:

  1. uvc_camera
  2. libuvc_camera
  3. usb_cam

Note that in this instance "UVC" stands for USB video device class. This is the protocol that these cameras communicate over. Also note that uvcvideo is the kernel driver responsible for communicating with these cameras. It may take a bit of testing and tweaking of parameters to get a particular camera to work as you would like. Additionally, these packages have overlapping, but not identical, functionality. I typically use uvc_camera simply because that is what I usually use, but I have found instances where the other packages had features that I wanted to use.

If you have some sort of special machine vision camera, there may already be a ROS driver (e.g. mv_bluefox_driver, pointgrey_camera_driver, or pylon_camera). If there is no ROS driver available for your camera, then it is up to you to write one (this is generally not too difficult).

2.1 What does a camera driver do?

The camera driver's primary goal is to get data off of the camera and convert it into a sensor_msgs/Image message and then publish it on some topic (often called image_raw). The drivers usually also offer a variety of parameters for configuring the functionality of the camera, defining the namespaces that should be used, and defining camera calibration information. If you are using a pre-made driver it is up to you to determine how to properly set these parameters. If you are writing your own driver, it is up to you to decide the extent of the reconfiguration options that your driver will support.

2.2 Image Pipeline

The image_pipeline is meta-package that contains many of the key tools for working with cameras in the ROS world. This meta-package is "designed to process raw camera images into useful inputs to vision algorithms: rectified mono/color images, stereo disparity images, and stereo point clouds." Generically, the tools in this meta-package are designed to interact with a camera driver and metadata on the calibration parameters of a specific camera to automatically provide a variety of image streams that have been pre-processed for easy use in image processing algorithms. Here are the most important components:

image_proc
This implements a set of nodelets that listen to a raw camera stream (image_raw) and information on camera calibration (camera_info) and produce four output topics. The output topics are:
  1. image_mono: Monochrome unrectified image
  2. image_rect: Monochrome rectified image
  3. image_color: Color unrectified image
  4. image_rect_color: Color rectified image

    If you need to rectify or convert your image's colorspace, you are far better off running the image_proc node and having it do the work.

camera_calibration
This package uses OpenCV's camera calibration tools to calibrate monocular or stereo cameras using a checkerboard calibration target. Once calibration is complete, the calibration information is written to a YAML file that can then be used with your camera driver to provide a camera_info topic. There is a great tutorial on using this package
image_view
This is a simple image viewer for viewing camera topics. You can also view camera streams directly in rviz. This is nice because markers and robot models are automatically superimposed onto the image streams in rviz (assuming that your /tf tree is defined appropriately).
depth_image_proc
This package contains a variety of nodelets for processing depth images. A depth image is the type of data that is produced by a Kinect. A depth image is basically just a grayscale image where the pixel values correspond to depth (instead of intensity). Combining a calibrated depth camera with a depth image, it is possible to build a point cloud. This is one of the primary features of this package.

2.3 Image Transport

There are several packages available for transporting image data in non-standard formats or through non-standard mechanisms. These are often very helpful.

image_transport
This package allows subscribing and publishing of low-bandwidth, compressed image formats. Transporting images in a compressed format can save considerable bandwidth at the expense of minimal computational effort. This is especially useful when sending images over a network. As of right now, this package only supports C++, but even if you ultimately want to use Python for image processing, this package could be useful for writing a middle-man decompression node to save bandwidth.
web_video_server
This is a new version of mjpeg_server, and is very similar to jpeg_streamer. When configured properly, this package provides a video stream of a ROS image transport topic that can be accessed via HTTP. This could be useful whenever you are trying to view an image stream in a browser.

3 Pinhole Cameras

3.1 Camera Calibration Resources

4 Using OpenCV with ROS

4.1 vision_opencv

cv_bridge
This tool is used for converting ROS messages of type sensor_msgs/Image into OpenCV image formats. You should always expect to use this package when combining ROS and OpenCV.
image_geometry
This package can utilize a sensor_msgs/CameraInfo message to instantiate a class containing a pinhole camera model. This class then has many image-related geometry computations available. For example, image_geometry.PinholeCameraModel.project3dToPixel takes in a tuple/list of \((x,y,z)\) world coordinates of a point, and it returns the corresponding pixel coordinates \((u,v)\). The inverse of this function is image_geometry.PinholeCameraModel.projectPixelTo3dRay. This package helps to guarantee that you correctly interpret and use the information from a calibrated camera (which can be tricky). Using this package for geometry calculations is highly recommended.
opencv_apps
This package contains many image processing nodes that perform a single, common operation. Using these nodes, one can build up a sequence of image processing operations using only a single launch file. Example nodes include an edge detection node, a face detection node, multiple optical flow nodes, etc. This can be a great way to prototype new applications.

5 Miscellaneous Useful Image Processing Packages

5.1 Tag Tracking

There are a variety of tag tracking solutions available as ROS packages. They vary in terms of their robustness, ease of use, number of tags tracked, etc. A good introduction can be found in the ROS Tag Tracking mini project from this class in 2014.

5.2 Optical Flow and Visual SLAM

There are a variety of solutions available for optical flow and visual SLAM. Most of these are experimental, research-grade solutions, but you may find them useful.

Creative Commons License
ME 495: Embedded Systems in Robotics by Jarvis Schultz is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.