Threads, Spinning, and Async in ROS 2
Overview
- Python ROS 2 nodes (unlike their counterparts in ROS 1) are single-threaded and sequential by default.
- Lack of threads makes reasoning about ROS 2 python nodes easier, makes them more similar to C++ nodes (which were single-threaded in ROS 1) and also provides more control over execution
- ROS 2 python, however, supports highly configurable modes of execution, which require knowledge of both the python concurrency model and how ROS 2 fits in with it.
- In most cases, the single-threaded sequential of ROS 2 makes coding much easier and eliminates whole classes of possible bugs
- This default model, however, is sometimes overly restrictive and reduces potential performance
- There is also one common case where sticking with the default model makes coding much more difficult: using service clients.
Service Clients
- When a service is called using
service.async_call, the following happens:- The request is sent to the server
- The server processes the request and computes the response
- The response is sent back to the client
- The client receives the response
- The full sequence of events in a service call take an unspecified amount of time during which the client does not have a response.
What should the client do while waiting for the response?
- Wait until the response is received before doing anything. In this case the request is synchronous and the client blocks until it receives the result
- Perform other useful work while waiting for the result to be received. In this case the request is asynchronous and does not block (i.e., it is non-blocking). If the node cares about the response, it needs some way to process the response when it is received.
Steps to Call a Service
- Create a service client (call it
client) usingNode.create_client - Wait for the service to become available using
client.wait_for_service(timeout_sec…)= - Issue a request with
client.call_async- Synchronous calls are generally not well supported in ROS 2, although there is a synchronous calling method in
rclpy
- Synchronous calls are generally not well supported in ROS 2, although there is a synchronous calling method in
- If you don't want to block
- Do a little bit of work
- Check for a result
- Repeat until a result is ready
- If you do want to block, wait for the result until it is ready
- The Service Tutorial uses
rclpy.spin_until_future_completeto accomplish this task - But the code in the tutorial WILL NOT WORK if used from within another callback function!
- What is a future anyway? And why is it spinning?
- The Service Tutorial uses
The rest of these notes address how to easily wait for a service call to complete in ROS 2, while also explaining in some detail the concepts and machinery required to make it happen. For quick instruction see Patterns.
Events
- External events in ROS include
- Receiving a message on a topic (subscriber callback)
- A timer triggering (timer callback: the ROS time is external to the node)
- Receiving a response to a service request (future object)
- A parameter changing (the parameter callback)
- As external events occur, they must be explicitly processed by the node
- Each ROS node must explicitly process events by "spinning"
rclpy.spin_oncecauses the node to check for and process the next pending event (e.g., by calling a subscriber callback for a received message)rclpy.spincreates an event loop, which repeatedly checks for events and handles them- It is essentially an infinite loop that repeatedly calls
spin_once()
- It is essentially an infinite loop that repeatedly calls
- There is also
rclpy.spin_until_future_completewhich will callspin_once()repeatedly until a specified event occurs.
- As a general rule (there are exceptions) your node should be organized around a single call to
rclpy.spin()with periodic tasks being handled with a timer- Keeping all of the spinning in one place significantly simplifies the architecture of your node and helps you avoid common pitfalls.
Callbacks
- A callback is a function (or callable object) that is provided to an object to be called at a later time.
- In ROS 2, callbacks are explicitly registered when (for example)
- Creating a subscriber (the callback is called whenever a message is received)
- A timer is created (the callback is called whenever the timer is triggered)
- A service server is created (the callback is called whenever somebody calls the server)
- Callbacks are also implicitly used when a service client calls a service
- The callback is what receives the response from the service server
- Callbacks are only called whenever
rcply.spin_onceis called (which can be and usually is from withinrclpy.spin).
Callback Groups
- By default, ROS 2 only allows only one callback function to be pending on the callstack.
- This behavior means that spinning from within a callback will not allow other callback functions to be called (via
rclpy.spin_once). - In other words, the callbacks cannot be nested by default.
- This behavior means that spinning from within a callback will not allow other callback functions to be called (via
- The behavior of what callbacks can be called by
rclpy.spin_onceis controlled using Callback Groups - There are two primary callback groups types:
MutuallyExclusiveCallbackGroupandReentrantCallbackGroup.
MutuallyExclusiveCallbackGroup
- No more than one callback in the same
MutuallyExclusiveCallbackGroupcan ever be running concurrently. - By default, all callbacks are placed in the same
MutuallyExclusiveCallbackGroup. - The mutually exclusive property makes it easy to reason about callbacks that may touch the same variables (e.g., global variables or member variables) because
callbacks must always run to completion before another one can be called.
- For example, when a
timeris part of aMutuallyExclusiveCallbackGroupyou know that the timer callback will always run to completion before it is called again, even if the timer expires sooner or if multiple threads are used
- For example, when a
- If you explicitly
spin_oncein a callback within aMutuallyExclusiveCallbackGroupnone of the callbacks in that group will be called, even if there is a pending event.- Callbacks in other callback groups can, however, be called
ReentrantCallbackGroup
- All callbacks in a
ReentrantCallbackGroupcan run concurrently, this means that it is possible for a callback to be only partially complete before it is called again - If you explicitly
spin_oncefrom within a callback that is part of aReentrantCallbackGroupthen any callback in that same group can be called - For example, if a timer is in a
ReentrantCallbackGroup, and youspin_oncefrom the timer callback after the timer has expired (e.g., your callback took longer than the timer's period), thetimercallback will be called again.
ROS and Python Async I/O
Python
- The python asyncio library, along with the
asyncandawaitkeywords, enables single-threaded cooperative concurrency in python. - The heart of any asynchronous (async) program is an event loop. In ROS, the event loop is provided by
rclpy.spin. - The event loop is responsible for scheduling the execution of Tasks.
- Tasks are able to run coroutines which are functions that can suspend themselves mid-execution and then resume
- When a coroutine suspends itself mid-execution the event loop can schedule another Task
- A coroutine is created by declaring a function with the
asynckeyword:async def my_coroutine(): - A coroutine can suspend itself using the
awaitkeyword: this allows it to either- Call another coroutine and wait for it to finish (e.g.,
await my_couroutine1()) - Wait for the result of a Future object to become available.
- Call another coroutine and wait for it to finish (e.g.,
A function that may take a long time to compute a result (call it long_function) can instead
- Create a
Future()object (call itfuture) - Use a task to schedule a co-routine (call it
data_provider) that has access tofutureand can provide it with the data, when that data is available - Return
futureto the caller - When the co-routine that calls
long_function(call itcallback) first getsfuture,future.done()isFalse, meaning that data is not available callbackneeds to somehow wait forfuture.done()to becomeTrueand it can then find the data infuture.result()- One way of waiting for
future.done()to become true is toawaitthefuture - When the co-routine calls
awaitonfutureit suspends its execution, allowing other co-routines to run, includingdata_provider - When
data_providergets the data, it provides that data tofuture- Now
future.done()will be True andfuture.result()will contain the data - The next time
callbackis scheduled, it will resume from where it suspended execution viaawait, and thefuturewill now have data.
- Now
ROS 2
- In ROS 2, the event loop is provided by
rclpy.spin, and callbacks are scheduled by this event loop - Callbacks in ROS 2 can be functions or co-routines.
- If a callback is a co-routine, it can
awaiton a future - For a ROS 2 service client,
client.call_async()returns a future - So, if a callback is
asyncit canawaitthe result ofclient.call_async().- It will suspend its operation and return control to the event loop (
rclpy.spin) - If the callback groups are set up correctly,
rclpy.spin()will be able to call the server client callback when the response arrives - The future will then be done, and the callback that issued the service request will have the result and be able to continue
- To get the response from the service use:
response = await client.call_async()
- It will suspend its operation and return control to the event loop (
ROS 2 vs Python
- In
python asynciothe event loop is created by callingasyncio.run. In ROS 2, an event loop iteration us implemented using anrclpy.executor - In
python asyncioa task is created withasyncio.create_task. In ROS 2, a task is created withrclpy.executor.create_taskand is associated with a given executor. - In python
asyncio.Futurefutures are used. In ROS 2 the futures arerclpy.task.Futureobjects and are associated with an executors event loop. - Overall, the ROS 2 task objects behave in the same way as the corresponding
python asynciocounterparts. However, ROS 2 facilitates the creation of different types of event loops, and these objects are associated with these loops.- For example, in ROS 2 there is a
MultiThreadedExecutorthat uses multiple threads to process events.
- For example, in ROS 2 there is a
Deadlocks
- A deadlock occurs when code cannot make progress
- It is easy to accidentally cause a deadlock (where your program can make no further progress).
- You get no warning or error or any other indication that your program is in a deadlock. The program just stops but stays in memory and does not exit.
- When using multi-threading, deadlocks can happen in several different ways (e.g., by acquiring and releasing locks out of order)
- Deadlocks are, unfortunately possible even with single-threaded concurrency like
asyncio - In general, with ROS 2, a deadlock will occur if any part of your code depends on a
callbackrunning but either- The code is not able to
spin - Callback's
AandBare in the sameMutuallyExclusiveCallbackGroupand callbackArequires callbackBto run to make progress
- The code is not able to
- The most common source of
deadlocksoccurs when waiting for a response from a serviceclient(which implicitly has a hidden callback that must happen to receive the response data). - By sticking with certain patterns, deadlocks can be avoided.
Patterns
- The ROS 2 architecture provides a lot of control over concurrency, and generally helps avoid data races (when two different pieces of code try to write to the same memory)
- While there are many ways of handling the issues of deadlocks (such as spinning in the right places and making sure callback groups are setup properly) some patterns can help reduce the decisions that the programmer needs to make while also simplifying the reasoning about code
- Here are some general guidelines that generally work well and cover probably 80% of cases
Basic ROS 2 Node
- Use
rclpy.spinin only one place in your program (likely the main entry point)- Now you know exactly where the spinning occurs
- By following the other guidelines you will never need to spin again
- An exception to this rule is if you need to do some one-time setup when your node starts
- Then you may want to call services and spin until they are complete
- Most setup, however, is not one-time. You may want the ability to, for example, reset and re-run the setup.
- Structure the main execution of your code around a timer callback
- Usually you want to be issuing commands at a single fixed frequency
- The timer should advance the progress of your algorithm by one time-step
- Subscribers do very little work and instead store data that is then used by the timer
- This setup allows you to easily keep track of the node's state in a single location and allows. some calculations or decisions to take place over multiple timer ticks without needing to pause.
- If you have a timer running at
X hz, it's easy to subdivide that intoX/N hz. Multiple timers could also be used, but doing a subdivision can be a good choice because it gives more deterministic performance and control in terms of when the various tasks occur relative to each other.
- It is also possible to publish directly from a subscriber:
- This pattern may be appropriate when you want published output to happen as close to the frequency/timing of the incoming messages as possible
- This pattern provides the least amount of flexibility for, example, handling a flood of messages in an organized manner, and couples the timing of the node to the timing of the publishers it subscribes to
Service Clients
Here is information on two types of service calls: blocking/synchronous and non-blocking/asynchronous. Examples can be found at https://github.com/m-elwin/async_experiments and a walkthrough of the code is provided at here
Blocking Calls
- Blocking calls are the easiest to handle and are preferred unless performance requirements dictate otherwise
- Assume that you want to call the service from
callback - Make
callbackanasyncfunction. - Create a new
MutuallyExclusiveCallbackGroupand assign everyservice clientto that callback group. - All other callback functions remain in the default
MutuallyExclusiveCallbackGroup. - From within
callbackyou can nowawaitthe result of thecall_asyncon the service client:- This
awaitwill send the service request, suspendcallback, and return control to the event loop - When the event loop receives the response,
callbackwill be resumed where it had been suspended and you will have access to the response.
- This
- See the AwaitClient example in https://github.com/m-elwin/async_experiments.
Non-blocking
- For non-blocking service calls you need to implement logic that periodically checks the result of the future returned by
service_client.call_async - The logic that checks the result should be in a timer that checks the
.done()status of the future.- The
futurestatus can only be changed when the callback finishes (so other callbacks or called) or if it callsawait(so other callbacks can execute) - Therefore, it does not make sense to loop and check the
..done()status, instead check it once and if not, defer to the next time the callback is called.
- The
- In this pattern, the work is being divided up (by you) into short spurts that can be executed within a timer period, often in conjunction with a state machine
- See the FutureClient example in https://github.com/m-elwin/async_experiments.
Walkthrough
These walkthroughs take you through the code in https://github.com/m-elwin/async_experiments in the order in which it is executed and provide a conceptual guide to what is happening.
Deadlock
ros2 launch async_experiments experiment.launch.xml experiment:=deadlock- The deadlock node starts in the
async_client.py:deadlock_entryfunction. - The
DeadlockClientconstructor (__init__) is called:- It creates a service client
- It creates a timer
rclpy.spinis called, which implements an event loop as follows in pseudocode:while Node_Has_Not_Exited: Check for an event If an event has occured, call the appropriate callback
- After some time a timer event occurs and the timer callback (
DeadlockClient.timer_callback) is called from the event loop. - The timer callback calls the "delay" service using
call_asyncand retrieves aFutureobject calledfuture- This call sends a request to the node running the "delay" service, so that it may generate a response
- The future keeps track of the status of the response
- Initially, the response has not been received, so
future.done()returns False
- The timer enters a loop that waits for
future.done()to return true - Meanwhile, the node implementing the "delay" service does some processing, and sends a response
- The
DeadlockClientnode is still looping, waiting forfuture.done()to be true- It will loop forever, because this loop does not call anything that checks for responses that can change the status of the
Future()object sofuture.done()will always beFalse.
- It will loop forever, because this loop does not call anything that checks for responses that can change the status of the
Await
ros2 launch async_experiments experiment.launch.xml experiment:=await- The
awaitnode starts in theasync_client.py:await_entryfunction. - The
AwaitClientconstructor (__init__) is called:- A callback group called
cbgroupis created - A service client called
_clientis created and assigned tocbgroup - A timer is created, it's callback group is the default
MutuallyExclusiveCallbackGroup, which is different thancbgroup. - Based on this setup, the
_clientcallback can run when thetimer_callbackhas calledawaitand not complted.
- A callback group called
rclpy.spinis called, which implements an event loop as follows in pseudocode:while Node_Has_Not_Exited: Check for an event If an event has occured, call the appropriate callback
- After some time a timer event occurs and the timer callback (
AwaitClient.timer_callback) is called from the event loop. - The
timer_callbackcalls the "delay" service (running on thedelay_servernode) usingcall_asyncand retrieves aFutureobject- We refer to this object
futurein these notes. In the code we don't name this object because we directly await on the return value _clientalso has a reference tofuture(since it createdfutureit maintains its own reference to it)- The
call_asyncsends a request to thedelay_servernode so that the "delay" service may generate a response - The future keeps track of the status of the response
- The
timer_callbackcallsawaiton the future object - Initially, the response has not been received, so
future.done()returns False, thus thetimer_callbackfunction suspends itself, allowing therclpy.spinevent loop to continue processing events.
- We refer to this object
- Meanwhile, after receiving the request, the
delay_servernode begins creating the request Control of the
awaitnode is inrclpy.spin:while Node_Has_Not_Exited: Check for an event If an event has occured, call the appropriate callback
- Meanwhile,
delay_servernode finishes its computation and sends it's response back to the_client- The
awaitnode event loop receives this event and calls the default callback associated with_client
- The
- The
_client.callbackdoes the following:- Store the response in the
future - Signal that
futureis done
- Store the response in the
- After the service client callback completes the event loop is re-entered.
- Python detects that
future.done()is true and resumes executing the co-routine wherefuturewasawaited(thetimer_callback)
- Python detects that
- The
timer_callbackresumes execution from after theawaitstatement- Because the
futureisdone()it knows the service call has completed - It finishes and the event loop
rclpy.spin()is re-entered
- Because the
Concurrency
Execution Models
- In sequential computing, only one computation happens at a given time. The computations happen in sequence.
- When using a sequential model, callback groups don't matter (unless you explicitly
spin_oncefrom within a callback).- Unless
spin_onceis called, nothing attempts to call the callbacks - Each callback will run until completion.
- Unless
- When using a sequential model, callback groups don't matter (unless you explicitly
- In parallel computing, multiple computations can happen simultaneously.
- Callback groups matter, and there are other synchronization issues related to multi-threading
- In concurrent computing, multiple computations can be pending at the same time, but are not executing sequentially.
- For example, consider the following chess-playing scenarios
- A person playing one chess game, followed by another chess game: this is like sequential computation.
- Two pairs of people playing two chess games simultaneously: this is like parallel computation.
- One person playing two chess games against two opponents. First the person makes a move in game 1, then makes a move in game 2. Both games are in progress at the same time, but the person is not making a move in each game simultaneously.
Process
- A process is an abstraction used by the Linux Kernel (and other operating systems) to segregate memory address space.
- Code running in each process thinks it has access to the full memory-address space: in reality the kernel maps a processes memory into the physical ram
- Code running in each process cannot access memory used by other processes without operating-system intervention, increasing stability and security.
- Programmers do not need to worry about the intricacies of physical memory: to them memory is just a flat contiguous address space
- Every process has a "main" thread, which executes the machine code within the context of the process's memory address space
Thread
- A thread is subordinate unit of execution within a process, and it is what actually executes the machine code
- By default each process has a single "main" thread. But it can also create additional threads to run code simultaneously.
- All threads within a process share the same memory address space
- Each thread has its own stack (for calling functions and storing local variables)
- The heap and global variables are shared by threads
- Unlike processes, threads can directly read and write the memory used by another thread
- Each thread is scheduled for execution by the kernel
- If the machine has multiple CPUs then threads can be executed in parallel
- If there are more threads than CPUs (as is usually the case), the kernel rapidly switches between threads, executing little bits of code each time
- In this case the code is running concurrently, but it is fast enough to simulate simultaneity
- If two threads access the same memory and are not coordinated properly, a bug called a race condition can occur.
- A race condition can result in data corruption
- It can also result in a deadlock, where no thread can continue executing
- The exact interleaving of the instructions across multiple threads is non-deterministic because it depends on how the kernel
schedules execution, which in turn depends on what else is happening on the computer
- Non-deterministic bugs in multi-threaded code may be from a race condition
- Be cautious and have a plan before introducing extra threads into your program
- - As the operating system switches between threads, it is possible for instructions to be interrupted before completing:
- Generally, a single python statement maps to multiple machine instructions and thus a statement can be interrupted in the middle of executing
- Consider
i = i + 1. The thread can be interrupted in the middle of reading the value ofi, adding1to it, or storing the result ini - In the meantime, another thread can modify
i, leading to incorrect results.
- Atomic operations finish executing without interruption. Atomic operations guarantee that a complete result will be computed and seen by all threads.
- Synchronization primitives such as a mutex or a semaphore can be used to coordinate execution between threads
Threads in Python
- The most common python interpreter, CPython, implements a Global Interpreter Lock (GIL)
- The GIL simplifies the creation of the interpreter while preserving good single-threaded performance
- It also severely limits multi-threaded performance in python.
- The GIL prevents multiple python instructions from executing simultaneously on a multi-core system.
- Thus, multi-threading with CPython is like multi-threading on a single-core CPU: the python code is not executed simultaneously
- Operations on each thread are interleaved (that is, the python interpreter executes some commands on one thread, then switches to another).
- The order of this interleaving is generally non-deterministic
- The GIL does not prevent all race-conditions
- You still need to synchronize threads that read/write to the same shared variables.
- Bugs that occur due to specific orderings of operations on multiple threads can still occur
- Non-atomic python statements can be interrupted before completion
- The GIL only applies to python bytecode instructions
- Python code can call C code, and that C code can bypass the GIL
- Python waits for a system function (e.g., to reading from a file), it is not executing bytecode. Thus, another thread can run while the other thread waits for the system.
- Thus, multiple-threads in python can improve performance of input/output (I/O) bound (but not CPU bound) python programs but not
- CPU bound means performance is limited by the available CPU resources
- I/O bound means performance is limited by input/output operations (such as reading a file)
- Atomic Operations: The following operations in python are guaranteed to complete once started, prior to another thread being run
- Reading or writing a single variable of a basic type (int, float, string, etc)
- Assigning an object to a variable (e.g., x = y)
- Reading an item from a list
- Modifying an item in a list
- Getting an item from a dictionary
- A complete list of python atomic operations
Example
If you perform a non-atomic operation, the thread you are running on can be interrupted in the middle.
Assume we have two threads, both performing i = i + 1, a non-atomic operation. Thus what is actually executed gets broken into several interruptable steps
i = 0 # Two threads are started Thread 1 | Thread 2 i = i + 1 | i = i + 1
Operations can happen in multiple ways leading to different results: for example
Thread 1, reads i, it is 0 Thread 2, reads i, it is 0 Thread 1 adds 1 to what it read, yielding 1 Thread 1 stores 1 in i Thread 1 reads i it is 1 Thread 1 adds 1 to what it read, yielding 2 Thread 2 adds 1 to what it read, yielding 1 Thread 1 stores 2 in i Thread 2 stores 1 in i