\(\def\dt{\Delta t}\)
\(\newcommand{\transpose}[0]{^\mathrm{T}}\)
\(\newcommand{\half}[0]{\tfrac{1}{2}}\)
\(\newcommand{\Half}[0]{\frac{1}{2}}\)
\(\newcommand{\norm}[1]{\left\lVert#1\right\rVert}\)
\(\newcommand\given[1][]{\:#1\vert\:}\)
UP | HOME

ROS and Parallel Processing in Python

Table of Contents

1 Introduction

ROS code in Python is inherently multithreaded. Every time you create callbacks for subscribers, timers, service providers, etc. you are, behind-the-scenes, creating new threads to process these callbacks. Writing multithreaded code or code with some type of parallel execution brings up a variety of issues that are not relevant in normal, serially-executed code. Since writing Python code in ROS means that you are certainly writing parallel code, it makes sense to learn a little bit about threading and parallel processing in Python in general

2 Parallel Processing in General

2.1 Process vs Threads

A process is simply an instance of a running program in Linux. The operating system itself manages when and how processes get CPU time via its scheduler. Here are a few relevant facts about processes:

  • Every process gets a unique identifier called a process id (PID)
  • ps is the command line tool for viewing running processes
  • Every process has a parent process
    • The parent to all processes is called init
    • Inherently, this means that any process can create children processes
  • Typically processes get their own, isolated memory
  • Here are a few useful commands for working with processes:
    • pstree is a command line tool useful for viewing parent/child relationships
    • kill is used for sending different signals to processes
    • killall kill processes by name
    • pkill and pgrep lookup and kill processes based on a variety of attributes
  • A daemon is a process that runs for a long time, and doesn't require direct control from a user through a terminal or GUI (network managers, printing services, etc.)

A thread exists within a process, and this is a common way to add parallel execution to a given program.

  • Many processes are multithreaded.
  • A scheduler splits CPU time up between the threads in a given program.
  • On CPUs with more than one core, threads can run at the exact same time. The proliferation of multicore processors has resulted in a dramatic increase in the number of multithreaded programs over the years.
  • Threads almost always share memory and resources.

2.2 Synchronization: Locks

Writing code that can be executed in a parallel fashion can be very challenging. The primary issue is called synchronization, and it arises in both multithreaded processes and processes with shared memory. Synchronization is all about making resources that are shared safe. Improper synchronization can lead to race conditions, segmentation faults, memory corruption, and other serious problems. Let's look at a few examples:

  • Imagine two different threads/processes want to write to the same file at the same time. What should happen to the file?
  • Imagine two different threads/processes are sharing a piece of data in memory. At a single instant in time, the first thread/process wants to read the value, and the second thread/process wants to set the value. Who gets priority? Will the first thread/process get the unmodified or the modified copy of the memory?

A lock, also called a mutex, is how access to shared resources is controlled. There are many different techniques for implementing locks, and they may be implemented in hardware or software. Some of the common techniques are monitors, semaphores, spinlocks, barriers, and reader-writer locks. The correct technique is often problem-specific. Most modern languages have either built-in support, or commonly-used extensions that provide a variety of synchronization mechanisms.

Improper use of locks for synchronizing threads/processes can lead to a many other issues that are unique to multithreading/multiprocessing. For example, deadlock can occur when two or more processes have interdependent locks and can never continue. So let's say thread a is waiting for thread b to release a lock before continuing, but at the same time thread b is stuck waiting for thread a to release a different lock. Other common issues are priority inversion and starvation.

A note of caution, the semantics used in describing different ways of keeping code synchronized is often complicated. Locks, semaphores, mutexes, etc. are sometimes used interchangeably and in other situations they have specific meanings.

3 Threads in rospy

Documentation for the threading mechanisms in rospy is sparse. One could figure out what is really going on by reading the source code, but that would require quite a lot of work! Thought the years there has been much talk of re-writing rospy to provide more powerful and flexible threading models. But as far as I know, nothing has actually been done to accomplish this. The best documentation on rospy threading that I've found is a tiny discussion on answers.ros.org. Perhaps we should write some?

Here are some rules of thumb that I've extracted through testing:

  • Every unique topic subscribed to gets its own thread
    • Multiple subscribers to the same topic share a thread and the callbacks are handled with a queue
  • Each service provider gets its own thread
  • Timers get their own thread

Note that the threading model used by roscpp is completely different! The spin function in roscpp actually builds an execution queue. Where callbacks are queued and serviced in the order that they are received. If a use case requires the use of parallel processing with roscpp then you must specifically write your code to cause that to happen. See the roscpp documentation on Multi-threaded Spinning. The primary mechanisms for generating this behavior are the ros::MultiThreadedSpinner and the ros::AsyncSpinner.

3.1 Why should we care about the rospy threading model?

There are many situations where you may find yourself running into synchronization issues that are causing extremely hard-to-debug complications. This is made worse in rospy because the threading model is happening in the background i.e. you are not explicitly controlling threads, rospy is doing it for you. So it is easy to not realize that you should be controlling for synchronization.

Some examples of situations where you may need and/or want to explicitly deal with parallel processing are:

  • Whenever you are writing a custom GUI. GUIs often have many callbacks that are being executed in parallel. Combining a GUI with ROS means even more care needs to be taken.
  • Whenever you have multiple callbacks interacting with the same data you need to be careful that they aren't both trying to modify that data at the same time.
  • Often doing things in parallel is much faster. So in practice, you may find yourself wanting to add extra parallel execution to your code to increase performance. Some example algorithms in robotics that are often parallelized:
    • Particle filters
    • Search/planning algorithms
    • Template matching in image processing

4 Python-Specific Notes on Parallel Processing

4.1 Global Interpreter Lock (GIL)

  • Python requires an interpreter that takes the code you type/run and converts it into the machine code that the computer understands.
    • That is why Python is an interpreted language.
  • CPython, the most widely used Python interpreter, implements a Global Interpreter Lock to help prevent concurrency issues with its interpreted code
    • This is a mutex that prevents execution of multiple pieces of Python bytecode at the same time.
    • Often Python functions in a given module actually pass your data to compiled libraries (from C or Fortran), as opposed to more Python code. In this case, these lower level libraries may be designed to leverage parallel processing capabilities of your computer. These calls happen outside of the GIL, and their performance is not affected by it. However, they do require that the libraries are written to be natively thread-safe.
  • People often critique Python for this because it hinders Python's ability to be truly parallel
    • The plus side is that it helps prevent some issues that can occur with synchronization, it doesn't require all low-level libraries to be thread-safe, and single-threaded applications can often run faster.

4.2 Atomic operations

  • Many basic operations in Python are atomic (meaning they can happen in a single instruction)
    • This automatically prevents some classes of concurrency issues
  • Examples:
    • Reading or writing a single global variable (int, float, string)
    • Reading an item from a list
    • In-place list modification
    • Dictionary fetching

4.3 Threading module

  • This is Python's basic high-level module for multithreaded support
  • Capabilities:
    • Create threads
    • Implement Locks
    • Timers
    • Introspection
      • How many threads are there?
      • Which are running?
  • Key types of locks:
    threading.Lock()
    Once a given thread has acquired this lock, all other attempts to acquire the lock are blocking until the first lock is released. Any thread may release.
    threading.RLock()
    This lock, once acquired, can only be released by the thread that originally acquired the lock. Subsequent attempts to acquire are non-blocking for the original thread; the original thread must call release for every time it calls acquire.
    threading.Semaphore()
    This lock maintains a counter that represents the number of calls to release() minus the number of calls to acquire() plus an initial value. Calls to acquire() are blocking until they can return without the counter going negative. This can be used for specifically allowing up to a specific number of threads access to a shared resource.
    threading.BoundedSemaphore()
    This is very similar to the previous lock, but it raises a ValueError if the counter exceeds its initial value (indicating too many calls to release()). This can be useful for automatically detecting bugs.

4.4 Multiprocessing module

  • This module opens multiple Python interpreters as subprocesses to the current Python interpreter
    • This effectively bypasses the GIL problem
    • Can make parallel processing much faster!
  • Caution:
    • It is more difficult to share memory between processes
    • It can be a bit challenging to ensure code exits cleanly

5 Example

5.1 Bad multithreading

In the following example, we setup two Timers to run callbacks at slightly different rates. There is a single global variable that is a list of 100 elements. Each callback attempts to manipulate this list — the first callback tries to fill the list with zeros, and the second tries to fill it with ones. If either callback detects that the other callback has manipulated its data, it prints a warning.

#!/usr/bin/env python
import rospy

DT = 1/100.0
data = [0]*100

def t1cb(te):
    global data
    data = [0]*100
    rospy.sleep(0.1*DT)
    if any(data) == 1:
        rospy.logwarn("t1cb error: one detected in data")
    return

def t2cb(te):
    global data
    data = [1]*100
    rospy.sleep(0.1*DT)
    if any(data) == 0:
        rospy.logwarn("t2cb error: zero detected in data")
    return

rospy.init_node('multi_thread', log_level=rospy.INFO)
timer1 = rospy.Timer(rospy.Duration(DT), t1cb)
timer2 = rospy.Timer(rospy.Duration(1.3*DT), t2cb)

def main():
    rospy.spin()

if __name__=='__main__':
    main()

Running the above node will print many warnings! Each callback is in a different thread and they are both modifying the same data at the same time.

5.2 Fixed multithreading

We fix the example from above using the threading module. A threading.Lock object is instantiated as a global variable. This lock is then used to ensure that neither callback will be able to modify the global list while the other callback is using the list. Several print statements have also been added to show the status of the global lock and to print information about the threads that are created within the process.

#!/usr/bin/env python
import rospy
import threading

DT = 1/10.0
data = [0]*100
lock = threading.Lock()

def t1cb(te):
    global data
    with lock:
        data = [0]*100
        rospy.sleep(0.1*DT)
        if any(data) == 1:
            rospy.logwarn("t1cb error: one detected in data")
        rospy.loginfo("Inside  with timer 1 locked? " + str(lock.locked()))
    rospy.sleep(0.4*DT)
    rospy.loginfo("Outside with timer 1 locked? " + str(lock.locked()))
    return

def t2cb(te):
    global data
    with lock:
        data = [1]*100
        rospy.sleep(0.1*DT)
        if any(data) == 0:
            rospy.logwarn("t2cb error: zero detected in data")
        rospy.loginfo("Inside  with timer 2 locked? " + str(lock.locked()))
    rospy.sleep(0.4*DT)
    rospy.loginfo("Outside with timer 2 locked? " + str(lock.locked()))
    return

rospy.init_node('fixed_multi_thread', log_level=rospy.INFO)
timer1 = rospy.Timer(rospy.Duration(DT), t1cb)
timer2 = rospy.Timer(rospy.Duration(DT), t2cb)

def main():
    for thread in threading.enumerate():
        print "name = ",thread.name, "\talive = ",thread.is_alive(),"\tclass = ",thread.__class__
    rospy.spin()

if __name__=='__main__':
 main()   

6 Useful resources

Creative Commons License
ME 495: Embedded Systems in Robotics by Jarvis Schultz is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.