openpilot 0.8.13

11 minute read

This release is a big push on the long term stability and reliability front. We looked at the data from our previous release and fixed those bugs.

Reliability & Stability

One of the goals for openpilot 1.0 is to hit 1000 hours mean time between failure (MTBF), i.e. unplanned disengagements. openpilot’s state machine is driven by events of several different types. For MTBF, we’re interested in the immediate disable event type. We also track soft disable events, which leads to a state where openpilot will disengage in three seconds unless the triggering condition clears. Soft disable events can be either user triggered (e.g. unbuckling your seatbelt) or system triggered (e.g. a temporary system lag). For this report, we only include the system triggered events.

MTBF Report

In order to calculate the MTBF, we first count the eligible segments and calculate the mean time engaged and mean segment length to determine the total engaged hours. Then, we count the individual event occurrences from the logs.

Here’s the raw dump from our MTBF notebook. It shows the total amount of data from the release and the MTBF, both split by device type. Our goal is an overall MTBF of 1000 hours, but the report currently only includes the MTBF (in hours) for individual events. Note that most of the soft disables didn’t actually result in a disengagement, but we include them because we’re no less interested in fixing them.

MTBF analysis for openpilot v0.8.12-release

+-------------+------------+---------+-----------------+
|             |   segments |   hours |   engaged hours |
+=============+============+=========+=================+
| comma two   |    6986308 |  116438 |           30274 |
+-------------+------------+---------+-----------------+
| comma three |    2287966 |   38133 |            9915 |
+-------------+------------+---------+-----------------+
| total       |    9274274 |  154571 |           40189 |
+-------------+------------+---------+-----------------+


Immediate disables
+----+------------------------+-----------+-----------+----------+---------+
|    | event                  |      MTBF |     count |     MTBF |   count |
|    |                        |   (three) |   (three) |    (two) |   (two) |
+====+========================+===========+===========+==========+=========+
|  0 | event accFaulted       |    220.32 |        45 |   186.88 |     162 |
+----+------------------------+-----------+-----------+----------+---------+
|  1 | event brakeUnavailable |   1416.36 |         7 | 15137    |       2 |
+----+------------------------+-----------+-----------+----------+---------+
|  2 | event canError         |    826.21 |        12 |   369.2  |      82 |
+----+------------------------+-----------+-----------+----------+---------+
|  3 | event controlsMismatch |    210.95 |        47 |   288.32 |     105 |
+----+------------------------+-----------+-----------+----------+---------+
|  4 | event steerUnavailable |    354.09 |        28 |   840.94 |      36 |
+----+------------------------+-----------+-----------+----------+---------+


Soft disables
+----+---------------------------+-----------+-----------+----------+---------+
|    | event                     |      MTBF |     count |     MTBF |   count |
|    |                           |   (three) |   (three) |    (two) |   (two) |
+====+===========================+===========+===========+==========+=========+
|  0 | event commIssue           |    102.21 |        97 |   185.73 |     163 |
+----+---------------------------+-----------+-----------+----------+---------+
|  1 | event deviceFalling       |   9914.52 |         1 |   617.84 |      49 |
+----+---------------------------+-----------+-----------+----------+---------+
|  2 | event lowMemory           |      0    |         0 | 30274    |       1 |
+----+---------------------------+-----------+-----------+----------+---------+
|  3 | event modeldLagging       |   9914.52 |         1 |     0    |       0 |
+----+---------------------------+-----------+-----------+----------+---------+
|  4 | event radarFault          |    152.53 |        65 |  1513.7  |      20 |
+----+---------------------------+-----------+-----------+----------+---------+
|  5 | event usbError            |      0    |         0 |  3784.25 |       8 |
+----+---------------------------+-----------+-----------+----------+---------+
|  6 | event vehicleModelInvalid |     29.33 |       338 |    86.74 |     349 |
+----+---------------------------+-----------+-----------+----------+---------+

Fixes

Having not tracked the MTBF explicitly before, we were pretty happy with this. For this release, we focused on fixing all the immediate disables, but we also took a sizable chunk out of the soft disables. We’re going to continue to use this report to drive the MTBF up, and soon we should be fixing bugs that happen once in a few thousand hours.

Immediate disables

canError can be one of two different things: a lag in the boardd process or a broken cable from the car harness to the device. We split this out into two separate events for better tracking in the next release (#23362).

The rest of the events are car port bugs, all caused by panda and openpilot disagreeing on the car’s state. When panda and openpilot disagree, panda will block openpilot’s messages to the car. If the messages are blocked for too long, the ECUs in the car receiving those messages may fault. If the ECU faults, an event like brakeUnavailable or steerUnavailable will be thrown, otherwise openpilot’s internal controlsMismatch is thrown. See Car bug fixes for the specific fixes.

Soft disables

vehiceModelInvalid is by far the most common event, and we expect to have fixed ~90% of them. See paramsd fixes for details. radarFault is similar to canError above, but indicates failure of the connection to the radar CAN bus wires. The deviceFalling events were mostly false positives, so we disabled the falling device detector until it can be improved.

Driver Monitoring Improvements

In the previous release, we refactored the calculation of distracted driver poses, where pitch and yaw are now treated independently. This change allows more flexibility in policy tuning. In this update, we further refined the driver pose policy in the following aspects.

The driver pose learner was prone to being too lax or too strict if you drive in an unusual position for the first few minutes of the drive. By adding upper and lower bounds to the pose learner, it now will adapt well to those extreme initial cases. With this we were also able to remove the upper pitch threshold, allowing some room for leaning back.

Driver pose has much stronger correlation with brake predictions

We also found a strong correlation between driver behavior when openpilot is engaged and whether the driving model predicts it should brake at various speeds, more so than the probability of whether openpilot is engaged. Therefore, vehicle speed and the braking probability prediction from the driving model are used for fine-tuning the DM policy at runtime. Alerts will generally feel fairer as a result.

Localizer improvements

Roll compensation

Before we jump into roll compensation, we need to understand the vehicle model in openpilot. The lateral MPC gives us an actual curvature the car needs to achieve at this instant, which is translated to a steering angle using a vehicle model. We currently use a single track vehicle model, which takes into account the mass of the car, center of gravity location, tire stiffness, and steer ratio, some of which are learnt online. Apart from these factors, we also learn slow and fast angle offsets in paramsd to account for all the environmental effects on the curvature that are not explained by the simple vehicle model.

Single Track Vehicle Model

The slow angle offset, as is evident by the name, is designed to capture effects that are nearly constant over long periods of time. A bias in the steering wheel zero-position is typically what the slow angle offset learns. The fast angle offset, however, changes rapidly in a drive accounting for effects like road bank and lateral gusts. Despite being tuned to capture the transient signals (and not be too sensitive to the noise), the fast angle offset learnt is sometimes inaccurate and often laggy. This leads to over-steering (i.e. turn-cutting), a common problem while entering or exiting turns given that almost all turns are somewhat banked. But the localizer already outputs information about the car orientation from which the bank of the road can be determined. In this release, we updated the vehicle model and derived the steady state roll-compensation for calculating the steering angle.

Adding road roll to the vehicle model

Results so far are extremely encouraging. “Deviation from the planned path while in a turn”, is a metric we use to understand the extent of turn-cutting. We see that both the average and standard deviation of left and right turns have reduced by nearly half.

Fun Fact: The average deviation is biased in opposite directions for left and right turns. Roads are usually banked away from the turn to avoid tire slippage due to centrifugal forces on the car.

Deviation from the ideal path - 0.8.12-release vs 0.8.13

paramsd fixes

Nearly ~90% of vehicleModelInvalid errors are caused by steerRatio or stiffnessFactor going out of acceptable bounds. This typically happens during extended periods of straight driving when there is no effective information about steerRatio or stiffnessFactor. Given the filter standard deviations were unbounded, the filter states varied continuously (as uncertainty kept increasing) and “corrected” rapidly with new information (turns). With this release, we bound the uncertainty by observing current values of the filter (#23726). Additionally the observation noise of the steering angle was tuned to accommodate noisy steering angle values (least count of steer angle on the Prius is 0.1 degrees!).

paramsd simulation of a long, straight road

comma two focus improvements

The camera module on the comma two has no fixed focus (unlike the comma three), but requires moving a lens to the correct position to achieve proper focus. However, any longitudinal accelerations also act on this lens and can move it out of position. The camera module in the original EON suffered heavily from this, and we implemented a sag compensation. This would apply an opposing force whenever a forward/backward acceleration was measured by the accelerometer. However, in the comma two this effect due to acceleration is much less strong and the compensation was actually pulling the lens out of focus when a large acceleration was measured. Removing the compensation was an easy fix, but we also needed to verify with enough data that this change actually helped. camerad computes and logs the image sharpness in the cameraState packet which is present in the qlogs, which allows us to easily process this data. Below you can see histograms of the image sharpness under different accelerations, before and after the change. Note that before the change the sharpness under larger acceleration is significantly lower, while after the change the correlation is almost completely gone.

comma two focus scores under different accelerations - 0.8.12-release vs 0.8.13

NEOS 19

eon-neos-builder#54

This update to the comma two OS mostly improves stability. We fixed a rare bug in Android’s NetworkPolicy service which resulted in Zygote, a core Android daemon, restarting. When Zygote restarts, openpilot stays running, but openpilot’s UI is hidden by the boot animation while some core Android services restart. From the user’s perspective, this would look like a spontaneous reboot, but in reality, openpilot remains up. Along with the stability fixes, we also updated all the Python packages.

AGNOS 4

This update to the comma three OS is largely preparation for things to come. We added casync, which will allow us to dramatically reduce the download size of future AGNOS updates. GSM auto configuration has also been improved, which should fix SIM cards from some carriers not working in the comma three.

ADB support was also added. This allows any of the nice tools written for Android, like the Snapdragon Profiler, to work with the comma three.

Watching openpilot run on a comma three in the Snapdragon Profiler

Cars

DBC cleanup project

If you’ve followed openpilot development, you’d notice that the newer car ports are much simpler and cleaner than the older ones. This is mostly for the same reason that we started with giraffes: we didn’t expect how similar all the cars would be. Compare the Volkswagen port to the Honda port. The code to add support for an individual Volkswagen model is only a few lines in documentation, 5 lines of code, and about 15 lines of firmware values. In contrast, the Honda port is full of different code paths depending on the car model, including a separate DBC for nearly each car model.

We’ve reduced the Toyota DBCs into only three different ones, split by platform. Much of the work was simply moving repeated definitions of messages into common base DBCs. We also found common signals to replace many of the vehicle specific ones. We were able to confidently move to the new signals by verifying them with our millions of Toyota segments. The Honda DBCs are up next for the same treatment.

The DBC cleanups are part of a larger car porting experience improvement project. We’re starting by cleaning up the car code, from the DBCs to the CANParser (#23642). Once the code’s clean, we’re going to focus on building great docs and tools to do the three main things you do with cars in openpilot: fingerprinting, individual model ports, and full brand ports. You’ll be able to go for a single drive in dashcam mode, then develop your full port on your desk with the data from that drive. If you’ve got thoughts, join our Discussion on GitHub.

Enhancements

  • Subaru ECU firmware fingerprinting thanks to martinl! (#1878)

Bug fixes

  • Fixed controls mismatch on Honda Nidec and Bosch (commaai/panda#840)
  • Fixed inaccurate steering angle offsetting initialization on some Toyotas (#23747)
  • Fixed brake press false positives on some GM cars thanks to jshuler! (#23712)
  • Fixed rare brake faults on Hondas (#23657)
  • Fixed rare LKAS faults on Chryslers thanks to adhintz and jyoung8607! (#23515)

Car Ports

  • Hyundai Santa Fe Plug-in Hybrid 2022 support thanks to sunnyhaibin! (#2332)
  • Mazda CX-5 2022 support thanks to Jafaral! (#23704)
  • Subaru Impreza 2020 support thanks to martinl! (#21011)
  • Toyota Avalon 2022 support thanks to sshane! (#23381)
  • Toyota Prius v 2017 support thanks to CT921! (#23636)
  • Volkswagen Caravelle 2020 support thanks to jyoung8607! (#23735)

Tools

replay

Our replay tool has a brand new terminal UI, thanks to deanlee (#23608)! Replaying is one of the core tools for debugging and developing openpilot, so if you haven’t yet, try it out!

cabana

Cabana, our CAN analysis and reverse engineering tool, now supports CAN-FD. openpilot has supported CAN-FD since 0.8.11, and now cabana support makes the workflow for a CAN-FD car port no different than any other car port.

Join the team

We’re hiring great engineers to own and work on all parts of the openpilot stack. If anything here interests you, apply for a job or join us on GitHub!

Updated: