|Fig. 1: Illustration of radiation vulnerability of electronic components. (Courtesy of the ESA)|
Radiation is something that we encounter on a daily basis, but typically in such low doses that we do not take note of its effects. In high-radiation environments, however, such as near nuclear reactors or in outer space, the high-energy particles that we can neglect in day-to-day life can have a substantial impact upon the functionality of our electronics. The process of protecting against these impacts is called radiation hardening.
The smaller electronical components are, the more difficult it is to assure continued performance under high radiation conditions. With large, spread out components, a high energy particle is more likely to miss a component. If it does hit a component, it is likely to only damage one at a time. If the components are small and closely packed, however, that single high energy particle can now take out multiple bits of data. This type of damage is immediately detrimental and is referred to as Single Event Effects (SEE). Even if a device escapes SEE, however, long-term performance degradation can still occur. This paper will delve into the mechanisms of each failure type, and describe methods for mitigating those failures.
As an electronic device is bombarded with radiation, the particles can cause disruptions in the semiconductor's crystal structure, or can cause charge to build up on the insulator. This doesn't necessarily cause an immediately noticeable change, but can gradually increase over time. This usually manifests as a slow degradation in performance over the life of the component.  Eventually, once the radiation damage reaches a certain level, the components will no longer function as designed. The maximum amount of radiation that a circuit can typically withstand before failure is called Total Ionizing Dosage (TID), and is the main specification used to designate the radiation-hardness of an electrical device.
At a high level, SEE occur when a high energy particle collides with an electronic component, and immediately damages a component (destructive SEE), or causes a temporary behavioral change in a component (non-destructive SEE). Depending on the circuit layout, the effects can be classified into five main categories:
Single-Event Latchup (SEL):
The Issue: Latchup is an event that causes large current draws across a specific type of circuit layout called a pnpn structure. It is typically seen in CMOS circuits, but could also occur in any other device with the same structure. When a disruption causes the bias across this circuit to change direction, it begins to draw more and more current from the external network, and can continue until the device fails due to thermal overstress. 
The Solution: SEL mitigation strategies exist at the component design level, in the implementation scheme and use case, and in operational fault detection strategies. Component manufacturers are continually decreasing the size of their parts, which exacerbates radiation risk. However, they can mitigate for this risk through low substrate resistivity, pseudo-collectors between the P-channel devices and the P-well, and other design tactics.  Once a component brand has been selected, the user can further reduce risk by placing the component in current-limited circuits, using it only within its designated voltage range, and operating at lower temperatures. Finally, if latchup does occur during use, it can be detected by current measurements, and can be reset by decreasing the component's supply current to a low enough value (or typically power-cycling the device). If the latchup is left long enough to be destructive, however, the electrical properties of the part become unpredictable, and the part should then be electrically isolated from the rest of the system.
Single-Event Burnout (SEB):
The Issue: Single-event burnouts typically occur within MOSFETs. When the MOSFET is in its off state, an ion strike can create a transient current that activates the parasitic bipolar transister within the circuit.  When activated, the circuit begins to draw more and more current, ultimately causing device failure, as in the SEL scenario.
The Solution: The only effective mitigation for SEB is to operate the devices within their safe operating voltages. MOSFETs are only susceptible to SEB when they are in their off state and when the applied voltage is outside of its safe operating range. 
Single-Event Gate Rupture (SEGR):
The Issue: Single event gate ruptures are another MOSFET failure mode. Ion strikes can accumulate under the gate, increasing the electric field across the MOSFET to its dielectric breakdown point. 
The Solution: Similar to SEB, SEGR only occur when MOSFETs are outside of their safe operating voltages, and therefore using proper voltages is the only effective mitigation strategy.
Single-Event Transients (SET):
The Issue: When a charged particle impacts a circuit, it can generate a transient electrical response. The effect of the SET varies depending on where it initiates, how quickly it dissipates, and when it occurs relative to timing clock edges. If sensitive components are downstream of the transient, then it could turn into a system-level upset.
The Solution: Transients can be dissipated by capacitive filtering within the circuit, resampling the circuit output on a timescale that is longer than the expected transient duration, and by designing the overall circuit to be less sensitive to spurious transient signals. 
Single-Event, Multi-Cell, and Multi-Bit Upsets (SEU, MCU, and MBU):
The Issue: An SEU is the technical term for when a particle strike causes a corruption of stored information, such as a bit flip in memory. MCU and MBU are the same phenomenon, but occurring over multiple cells or bits. As component sizes become smaller and smaller, a single particle impact can effect multiple neighboring bits or cells, hence MCU/MBU.
The Solution: The effects of a SEU are localized, and there are many algorithms for detecting and correcting corrupted bits. As more bits are simultaneously corrupted, especially within the same data word, detection becomes more difficult. One way that designers use to decrease the probability of multiple corrupted bits within a word is to interleave bits from different words, so that each bit within a word is in a different physical location. 
Radiation hardening is a critical design element for electronics applications in high-radiation environments. With the multitude of failure modes due to radiation impacts, there are many different mitigation schemes that must be considered when designing a circuit. Currently, there is no fool-proof way to guard against all radiation damage, but by taking these strategies into consideration, the lifetime of electrical components can be greatly extended.
© Ashley Clark. The author grants permission to copy, distribute and display this work in unaltered form, with attribution to the author, for noncommercial purposes only. All other rights, including commercial rights, are reserved to the author.
 H. J. Barnaby, "Total Ionizing Dose Effects in Modern CMOS Technologies," IEEE Trans. Nucl. Sci. 53, 3102 (2006).
 "Understanding Latch-Up in Advanced CMOS Logic," Fairchild Semiconductor, AN-600, January 1989, Revised April 1999.
 S. Liu, et. al., "Single-Event Burnout and Avalanche Characteristics of Power DMOSFETs,", IEEE Trans. Nucl. Sci. 53, 3379 (2006).
 J. R. Brews, et. al., "A Conceptual Model of Single-Event Gate-Rupture in Power MOSFET's," IEEE Trans. Nucl. Sci. 40, 1959 (1993).
 P. E. Dodd and L. W. Massengill, "Basic Mechanisms and Modeling of Single-Event Upset in Digital Microelectronics," IEEE Trans. Nucl. Sci. 50, 583 (2003).