Launching the comma body
At comma, our mission is:
In order to ship a bigger model, we needed to make the model run faster on the comma three and ensure that it wouldn’t draw more power. Since the driving model runs in a 20Hz loop, all the services using the GPU (camerad, modeld, and UI) need to comfortably fit in a 50ms window. openpilot already has its own optimized GPU model runner called thneed, and we were able to further optimize it (#23772) to shave a few more milliseconds off the model execution time. We also removed a copy of the frame in the UI (#24318) and significantly sped up the camera debayering (#24557).
On the power draw front, we reduced the power draw by 500mW, which was a bit more than big model cost us. We slightly downclocked the GPU (#24088) for about 150mW of savings. We saved another 350mW with camerad optimizations (#24452). Previously, camerad would debayer into RGB, convert the RGB into YUV, and output both RGB and YUV images for each camera. Now, camerad debayers straight into YUV with a single, well-optimized OpenCL kernel and only outputs YUV images.
Lowering power draw is always a good thing: your device runs cooler, the fan spins less, and device reliability improves. In the next release, we’ll further reduce power draw by removing a copy of each frame in the video encoder and removing a copy of the frame in the UI.
We track release metrics constantly to ensure DM is behaving as intended. One of the most important metrics is the number of distracted alerts per segment in the field. When comparing this to the offline test results from the validation set, we noticed that there had been a 60% discrepancy between the two. It is a likely answer to why there are unexpected (by tests) complaints about DM not working properly. This was found to be caused by the fact that our validation test runs on CPUs on PCs while the DM model runs on the comma three’s DSP, meaning that there are non-trivial errors introduced when running on the device.
Two major improvements were made to fix this issue. We have been using a technique called quantization aware training to make the DM model more robust to being run in 8-bit on the DSP. A bug was found in our training code where some layers are not set up correctly to be quantization aware. Fixing it significantly improved the on-device prediction accuracy and consistency. Additionally, to deal with the fact that the SNPE DSP runtime quantizes on a per-tensor basis, we tweaked the model to equalize different types of outputs, effectively increasing the accuracy with the outputs being more fine-grained. Note how the red line becomes much less jagged in the plots below.
Unlike all shipping ACC systems, openpilot has always disengaged on accelerator pedal press. Most community forks offered a feature to configure this behavior, and now openpilot ships with its own toggle.
Although many forks had this feature, they often just commented out the two lines that caused the disengagement. Implementing the toggle properly (#23588) required a bunch more work across the whole stack. In order to make it clear when you’re overriding openpilot’s ACC, the green engaged border turns grey while pressing the accelerator pedal. In the panda safety code, we enforce that no longitudinal actuation can be commanded while overriding (commaai/panda#884). The longitudinal planner also needed to be updated in order to ensure a smooth transition back to openpilot ACC after releasing the accelerator pedal (#23639). We also had to ensure that all the car ports were sending the same signals that the stock ACC system does while the user is overriding.
In the last release blog post, we introduced the MTBF report. Here’s the dump for the previous release:
MTBF analysis for openpilot v0.8.13-release
MTBF is the point estimate
MTBF_L is the two-sided 90.0% confidence interval
4127934 segments, 68226 hours, 21832 engaged hours
Immediate disables - 112 hours MTBF
+----+------------------------+----------+---------+-----------+
| | event | MTBF | count | dongles |
+====+========================+==========+=========+===========+
| 0 | event controlsMismatch | 143.63 | 152 | 37 |
+----+------------------------+----------+---------+-----------+
| 1 | event canError | 1364.51 | 16 | 12 |
+----+------------------------+----------+---------+-----------+
| 2 | booted onroad | 1984.74 | 11 | 11 |
+----+------------------------+----------+---------+-----------+
| 3 | event brakeUnavailable | 3118.88 | 7 | 6 |
+----+------------------------+----------+---------+-----------+
| 4 | event steerUnavailable | 3638.7 | 6 | 1 |
+----+------------------------+----------+---------+-----------+
| 5 | event accFaulted | 10916.1 | 2 | 2 |
+----+------------------------+----------+---------+-----------+
| 6 | event relayMalfunction | 21832.2 | 1 | 1 |
+----+------------------------+----------+---------+-----------+
Soft disables - 32 hours MTBF
+----+----------------------------+----------+---------+-----------+
| | event | MTBF | count | dongles |
+====+============================+==========+=========+===========+
| 0 | event commIssue | 46.55 | 469 | 47 |
+----+----------------------------+----------+---------+-----------+
| 1 | event cameraMalfunction | 49.51 | 441 | 39 |
+----+----------------------------+----------+---------+-----------+
| 2 | event steerTempUnavailable | 216.16 | 101 | 40 |
+----+----------------------------+----------+---------+-----------+
| 3 | event radarFault | 389.86 | 56 | 3 |
+----+----------------------------+----------+---------+-----------+
| 4 | event overheat | 1455.48 | 15 | 14 |
+----+----------------------------+----------+---------+-----------+
| 5 | event vehicleModelInvalid | 1559.44 | 14 | 6 |
+----+----------------------------+----------+---------+-----------+
| 6 | event calibrationInvalid | 2183.22 | 10 | 6 |
+----+----------------------------+----------+---------+-----------+
| 7 | event modeldLagging | 3118.88 | 7 | 5 |
+----+----------------------------+----------+---------+-----------+
| 8 | event espDisabled | 5458.05 | 4 | 4 |
+----+----------------------------+----------+---------+-----------+
| 9 | event usbError | 7277.39 | 3 | 3 |
+----+----------------------------+----------+---------+-----------+
| 10 | event lowMemory | 21832.2 | 1 | 1 |
+----+----------------------------+----------+---------+-----------+
The MTBF of all the immediate disable events improved significantly, except for event controlsMismatch
. The regression was largely due to a bug in the Nissan safety mode (commaai/panda#877). This bug had always been in the Nissan code, but it went unnoticed until it started getting hit in 0.8.13. The rest of the controlsMismatch
are uniformly distributed amongst a bunch of small bugs listed under Car Bug Fixes.
As openpilot approaches the reliability of automotive ECUs, we’re now finding ECU bugs that are cutting into our MTBF numbers. A bug in the Volkswagen Atlas showed up in our report as accFaulted
events. Since we query the car’s ECU firmware versions, we’re able to specifically filter out these known bugs from our report.
The comma body is the first non-car robot that openpilot supports. It runs the same openpilot that drives your car, and all of its code lives in selfdrive/car/body, just like a normal car port. openpilot started out only supporting one car, then we wrote a nice car abstraction layer. The body will do the same thing for openpilot. It’s trivial to support a new car in openpilot, and soon it’ll be similarly easy to use openpilot in any robotics project.
Check out the body blog post and order one at commabody.com!
Since openpilot will never work the same on every car, clearly communicating openpilot compatibility is important. Previously, our supported cars documentation simply split up the cars into officially and community supported cars, which said more about how actively the port was maintained than the quality of the openpilot experience. The new compatibility page clearly communicates the most relevant information that determines the quality of your openpilot experience with a tier system.
Cars are sorted into one of three tiers based on the quality of the lateral and longitudinal support, as well as whether the code is actively maintained. Gold tier cars have all five stars, Silver have four, and the rest are Bronze tier. Cars can always move within the tiers, and often move up purely with software updates.
As part of this project, we also moved all the static information in CARS.md to live inside openpilot, using information from the same car interface we use to control the car to generate the supported cars documentation on both GitHub and the website.
Latency logger is a new tool to track openpilot’s end-to-end latency, from the start of the rolling shutter on the camera frame to sending out the actuation messages on the CAN bus. Each of the diamonds below is a specific event, and new ones can be added for more granularity with a single line of code.
The comma body ships with three features at launch: balancing, video streaming, and joystick control. openpilot’s existing streaming tools weren’t good enough to remotely drive the body, so we drove the latency down across the stack, all the way from camerad (#24557) to decoding on your PC. After all the optimizations, the end-to-end latency of the streamer is sub 100ms! Once the body has arms and knees, the next step is to stream into a VR headset.
We’re hiring great engineers to own and work on all parts of the openpilot stack. If anything here interests you, apply for a job or join us on GitHub!