Engineering for Safety and Economy in Driving Automation

By Alison Young

Functional Safety Architect

NXP Semiconductors

By Bart Vermeulen

Technical Director, In-Vehicle Architectures

NXP Semiconductors

By Luc van Dijk

IC Architect

NXP Semiconductors

February 08, 2023


Engineering for Safety and Economy in Driving Automation

The main goal of autonomous driving is to eliminate accidents caused by human error. In fully autonomous vehicles, reverting to driver control in the event of a system failure is not an option; there is no driver and no manual controls provided to take over. Safety-critical systems must act, instead of using a “human fallback system” as a failsafe. Although this can be achieved through full redundancy of systems, alternative architectures that minimize the duplication of functions and systems are needed to avoid added cost and weight.

Automotive network architectures are adopting a zonal structure to reduce vehicle weight and cost for greater fuel economy, space savings, and affordability.

Domain and Zonal Architectures

Figure 1 compares the typical topologies of domain and zonal vehicle architectures. In the domain-based architecture on the left, sensors and actuators are connected according to the functional domain to which they belong. Each domain has a dedicated processor as part of the domain controller. In the zonal architecture on the right, sensors and actuators are connected according to their physical location in the vehicle. The zonal controllers, central compute module, or a combination of both handle the processing tasks traditionally executed by the domain controllers and central gateway.

Figure 1: Typical topologies of domain and zonal vehicle architectures.

High-priority data traffic, such as safety-critical control commands and certain types of sensor data, must reach its destination and respond within a specific maximum timeframe. For medium-priority traffic, such as in-vehicle entertainment data, acceptable transmission and response time can be maintained by ensuring that, on average, enough transmission bandwidth is available in the communication sub-system. Best-effort data traffic comes with no specific latency requirement. It is adequate that the data arrives eventually and “as fast as possible,” including retransmission of information in case the limits of the communication sub-system have been reached for a certain moment.

Evolution and Functional Safety

Studying an autonomous braking system (Figure 2) helps to explain the effects of domain and zonal architectures on the desired automotive safety integrity level (ASIL) grade, as defined in ISO 26262.

Figure 2: Data flow scheme in a typical autonomous-braking system.

The black-marked boxes in Figure 2 are electronic control units (ECU), and the gray-marked boxes represent information exchanged between the ECUs. The radar unit sends radar data to the object detection function, which extracts data about detected objects that is an input for the distance thresholding function. Distance thresholding calculates the deceleration required to keep distance to the vehicle in front and sends the appropriate brake command to the braking ECU in case the distance steps under a pre-defined limit.

The safety goals for this system are defined to avoid unintended braking and to avoid the unavailability of the required braking torque when it is needed. As there is a potential for life-threatening or fatal injury in the event of a malfunction, both goals should meet the requirement for ASIL D, which is the highest integrity requirement, according to ISO 26262.

The domain and zonal vehicle architectures affect these safety goals differently. Figure 3 shows an example of the relevant part for autonomous braking in a domain-based architecture.

Figure 3: Domain-based autonomous braking control.

Here, the radar, brake, and domain controller are connected via a single CAN bus. The radar module receives data from the radar front end, executes object detection, and performs distance thresholding tasks. Brake control commands are sent via the CAN bus to the brake module, which executes the commanded task.

Figure 4 shows how the same function may be implemented in a zonal architecture. The radar and brake units are connected via two separate CAN buses to two separate zone modules. These modules are both connected to a central brain and possibly to other zone modules within the vehicle. The radar module only contains a sensor, and the brake module contains an actuator. Unlike the radar and brake modules in the domain-based architecture, there is no major processing within either of these modules in the zonal architecture. Instead, the central compute module performs object detection and distance thresholding (computing). Hence, this architecture is referred to as a zonal architecture with central processing.

Figure 4: Autonomous braking in a zonal architecture with central processing.

Other approaches may be taken, such as executing the object detection and distance thresholding tasks within the zone module A and/or B. Such variants are known as zonal architectures with local zone processing.

Calculating FIT for ASIL-D Compliance

The Probability Metric of Hardware Failure (PMHF), a recognized ISO 26262 safety metric, is the average probability of the violation of a Safety Goal, expressed as failure in time (FIT). ISO 26262 requires a PMHF below 10 FIT (10-8 failure probability per hour) for ASIL D and below 100 FIT (10-7 failure probability per hour) for ASIL C.

A maximum PMHF value is assigned to each Safety Goal based on its ASIL grade and the ISO 26262 standard. This value is divided over the three different component groups distinguished in the example architectures: the sensor fusion and processing, communication, and actuator components.

Each of these individual component groups has its own failure probability PMHFx, where “x” is an ordering number of the components. The overall PMHF value for the application is the sum of the individual components’ PMHFx values. To meet the overall functional safety requirements for the application, the sum should be less than or equal to the maximum PMHF value that relates to the ASIL safety target.

In the transition from a domain to zonal architecture, the architectural changes and related task remapping affect the PMHF value of an application. In zonal architectures, a far greater number of communication and processing components are needed to execute the same application compared to a domain-based architecture.  

We have calculated the overall PMHF of the Safety Goals in our example application, thereby comparing the domain-based architecture with the two variations of the zonal architecture that we previously described. The relative contribution of each component group to the overall PMHF was then calculated, and the results are shown in Figure 5. The diagram confirms that the more distributed, zonal architectures cause a significant increase in the in-vehicle network (IVN) communication contribution to the overall application PMHF. It is also found that there is no significant change in the contribution of the processing. This is because the overall amount of processing does not change in the different architectures.

Figure 5: Relative PMHF contribution for each component group and architecture.

Fail Operational for Autonomous Driving

Fully autonomous driving, when passengers cannot take over in the event of failure, calls for fail-operational systems that ensure full or degraded functionality in case of a failure occurrence. Various architectures can realize this, although each comes with advantages and disadvantages.

Architecture Variant 1

Homogeneous redundancy duplicates the system into two independently parallel implementations (Figure 6). This variant provides fail-operational behavior in the presence of a random failure in one of the two implementations. Only one of the parallel implementations is active at the time, although the standby (redundant) path may be periodically self-tested to detect latent faults. If a failure occurs in the main path, the second path can be selected to ensure availability.

Figure 6: Full-redundant architecture.

This approach is based on the assumption that a systematic fault is unlikely to affect both implementations at the same time, but the impact of a systematic fault can be minimized by using different components in the two paths. This is referred to as diversification. Disadvantages include double silicon component count and, therefore, increased overall system cost.

Architecture Variant 2

A second variant (Figure 7) uses single CAN buses to connect the sensor fusion, processing, and actuator components. This avoids duplicating the CAN bus structure (i.e., cable) and instead uses a new type of CAN transceiver, allowing operation under a single failure within the network structure. Intra-zone CAN availability is improved while the backbone network remains fully redundant. It saves the expenses associated with redundant transceivers and the weight of cabling while allowing the same availability as a fully redundant architecture.

Figure 7: CAN and backbone improved availability.

Architecture Variant 3

A third variant, characterized by not duplicating the processing modules and Ethernet switches, is shown in Figure 8.

Figure 8: Processor availability improvement and fully redundant networks.

A second processor operating in parallel increases the availability of the processing modules. This processor may have a lower performance specification, hence obliging a fail-degraded operation mode. This can be acceptable for certain use cases, like safely moving the vehicle aside from the road.

Architecture Variant 4

The fourth variant (Figure 9) combines intra-zone CAN and processing availability improvements with a fully redundant backbone network. This arrangement improves the CAN and processing availability, which enable the cable savings seen in variant 2, with the reduction in the number of controller modules seen in variant 3.

Figure 9: Intra-zone CAN and processor availability improvement with redundant backbone.

The Future of Vehicle Network Architectures

Higher levels of vehicle autonomy avoid human involvement in the process of driving. At these higher levels, autonomous driving systems must fail into an operational state. Although full redundancy is an impracticable solution to this requirement, the thoughtful adoption of availability-improved communication and processing functionality can allow the same availability at a lower overall system cost.

Vehicle network architectures are moving to zonal architectures, aiming to support greater functionality while minimizing vehicle weight and cost. On the other hand, zonalization calls for careful design to ensure that safety-critical systems, such as autonomous braking, can reach the required ASIL. We have shown that the average probability of the violation of a Safety Goal due to the contribution of the in-vehicle network increases significantly in zonal networks when compared with traditional vehicle network topologies.