Autonomous driving systems use the most powerful CPUs and the most complex software ever seen in the automotive industry. On the hardware front, semiconductor manufacturers are pushing technology to the point where, for the first time in recent history, hardware is becoming less reliable. The miniaturization used in modern processors and memory makes them more susceptible to errors caused by electromagnetic interference and cross talk between neighboring cells and particle bombardment such as alpha rays. The complex role that software is being asked to take with regards to simultaneous image processing, deep learning, precise localization and intelligent control algorithms can lead to errors caused by subtle software interaction problems. These errors, known as Heisenbugs are non- reproducible, elusive bugs that can go undetected even with the most rigorous testing.
The occurrence of such hardware and software errors in an autonomous driving system can impact the system’s safety. To achieve ISO 26262 safety certification, these errors must be detected and handled. To help address these errors and functional safety system challenges, BlackBerry QNX has developed QNX Loosely Coupled Lock Step (LCLS).
To detect and recover from hardware and software errors caused in autonomous driving systems, system designers must implement compensation mechanisms. In previous generation systems, hardware lock step has been used to detect faulty CPU operation. This fault detection was done, by having duplicate CPUs execute the same code. If one of the CPUs misbehaved one could detect that something had gone wrong. However, since both CPUs will “correctly” execute the same code, hardware lock step does not compensate for random bit flips in memory or Heisenbugs. One could also use a hardware analyzer to check the internal states and determine if something has gone wrong. This technique is not practical for today’s high-performance hardware, where there are far too many internal states for a hardware checker to analyze in real time.
Clearly, hardware diagnostics on its own is not enough to detect all these errors. When paired with realtime software checking an efficient and complete means of verifying the system operation can be achieved. Such a system uses redundant copies of the software each of which perform safety-critical calculations, and the output of these copies is compared to perform the verification.
This, in essence, is the concept of QNX Loosely Coupled Lock Step.
Managing redundancy – the need for flexible configurations
QNX LCLS provides a means to test whether the hardware or software has been pushed beyond their safe operation capability. Inherently, autonomous driving is about building a functionally safe system, BlackBerry QNX LCLS helps detect potential errors that may interfere with that safe operation.
Depending on the safety design, redundant copies of safety-critical computations may need to be deployed on the same core (using temporal separation), or on different cores of a multi-core processor, or on different processors on the same board or on different ECUs over a network. In each of these different deployment configurations, synchronization and output comparison of the software copies becomes a challenge. With QNX LCLS, multiple software copies can be deployed transparently, dramatically reducing the development effort required to implement a redundancy scheme.
QNX LCLS middleware has been designed to offer many flexible configurations. Servers can be deployed on different cores in a multicore processor, on different processors within an electronic control unit (ECU) or across cores over an Ethernet network. In addition, the number of servers in a group can be flexibly and dynamically arranged in a redundant pair, in a two out of three-majority logic voting scheme or in other arrangements.