Background
Side-channel analysis (SCA) exploits physical leakage from a device during cryptographic operations. Power consumption correlates with the data being processed. This is the foundation of Correlation Power Analysis (CPA). While software AES is well-studied and straightforward to attack, hardware AES is a different beast that requires understanding the physical leakage at the hardware level before choosing the right attack. The MSPM0G3507 has not been previously studied as an SCA target. This work targets a platform with no prior published side-channel research.
The Target
| Device | Texas Instruments MSPM0G3507 |
| Peripheral | Hardware AES-128 engine |
| Capture | ChipWhisperer Nano, 8 MSPS clocked from the MSPM0 clock output, synchronous at 1 sample/clock cycle |
| Dataset | 100,000 traces |
Traces are captured with the ChipWhisperer Nano and stored in the eStraces format. All analysis — reverse CPA, leakage confirmation, and the final key recovery — is performed with eShard's scared library. The firmware is minimal — plaintext is loaded, hardware encrypts it, ciphertext is read back:
DL_AES_init(AES, DL_AES_MODE_ENCRYPT_ECB_MODE, DL_AES_KEY_LENGTH_128); DL_AES_setKey(AES, &g_key[0], DL_AES_KEY_LENGTH_128); /* Trigger high - SCA capture window start */ DL_GPIO_setPins(GPIO_USER_PINS_PORT, GPIO_USER_PINS_TRIGGER_PIN); DL_AES_loadDataIn(AES, &g_plaintext[0]); while (DL_AES_isBusy(AES)) ; DL_AES_getDataOut(AES, &g_ciphertext[0]); /* Trigger low - SCA capture window end */ DL_GPIO_clearPins(GPIO_USER_PINS_PORT, GPIO_USER_PINS_TRIGGER_PIN);
The trigger wraps the full AES operation including the ciphertext readout. The key is fixed and loaded once before the trigger. This matches a real deployment scenario exactly.
Software vs Hardware AES: Different Leakage, Different Attack
A standard CPA attack on software AES targets the output of the last round SubBytes operation using a Hamming Weight model. This works because on a CPU, every intermediate value produced during AES execution must be written to memory or moved across a data bus, and those bus transitions leak power proportional to the Hamming Weight of the value being moved. The SubBytes output is a clean, key-dependent intermediate value that sits right at the boundary between the ciphertext and the key, making it the ideal attack target.
Hardware AES works fundamentally differently. The MSPM0G3507 AES peripheral executes encryption in an iterative architecture, where each round is processed sequentially by dedicated hardware logic, taking a fixed number of clock cycles to complete. According to the datasheet, AES-128 encryption requires exactly 168 cycles regardless of clock frequency. This corresponds to roughly 16–17 cycles per round on average, including the initial AddRoundKey and final round, and matches the spacing observed in the traces. A fully parallel combinational implementation would complete in a handful of cycles with no observable round structure at all.
Within each round, the operations — SubBytes, ShiftRows, MixColumns, AddRoundKey — are implemented in hardware logic. The output of SubBytes feeds directly into ShiftRows, which feeds into MixColumns, and so on. None of these intermediate values are written to software-visible registers or external data buses. This is the critical difference: a standard SW-based CPA model targeting SubBytes output finds no dominant bus-level leakage to correlate with, because the physical event it models, a value being written across a software-visible data bus, does not occur for these internal states. Combinational logic still produces switching activity, but without a matching leakage model it does not yield a clean, high-SNR correlation peak. The leakage that actually dominates is elsewhere.
| Software AES (CPU) | Hardware AES (Peripheral) | |
|---|---|---|
| Execution | One operation per clock cycle, sequential | Iterative hardware — one round per ~16 cycles, state written to register at each round boundary |
| Intermediate values | Written to registers / memory — bus transitions leak | Directly wired between stages — never on a bus |
| Primary leakage point | SubBytes output written to data bus → Hamming Weight | State register written at round completion → Hamming Distance |
| Selection function | LastSubBytes() | Custom HD on last round state transition |
The key question for hardware AES is: where does a data-dependent value actually get written to storage? The answer is at the round boundary — when the AES engine finishes a round and writes the new state back into the internal state register, overwriting the previous round's state. That register write is the physical leakage event, and it leaks in proportion to how many bits changed — the Hamming Distance between the old state and the new state.
Confirming Leakage: Reverse CPA
Before launching a full attack, it is critical to confirm and precisely locate exploitable leakage. Formal methods such as Test Vector Leakage Assessment (TVLA) are commonly used — they are model-agnostic and can indicate whether any statistical difference exists between two sets of traces. In practice, TVLA is typically run with a fixed, known key to ensure deterministic behavior and valid statistical comparisons.
In this work, Reverse CPA is used instead. By explicitly using the known key, Reverse CPA not only confirms the presence of leakage but also precisely maps it across the time domain of the AES execution. For a previously undocumented target like the MSPM0G3507, this provides a "leakage map" that is more useful for attack timing than the pass/fail results of TVLA.
Reverse CPA computes exact intermediate values from the known key and correlates them against every sample point in the trace. This is not an attack. It "cheats" with the key, but it answers the critical questions:
- Is there real, exploitable leakage in the traces?
- Where in the AES execution does it occur?
- Does the chosen leakage model correctly describe the physical events?
The resulting time-domain correlation peaks reveal the most vulnerable points in the hardware AES execution, guiding the CPA attack and providing the resolution needed to target individual clock cycles.
Step 1: Plaintext and ciphertext correlation
The first check correlates the raw plaintext and ciphertext bytes against the trace using identity selection functions. This immediately shows where in the trace the input data is processed and where the output appears, without any key knowledge required.
@scared.reverse_selection_function def identity_plaintext(plaintext): return plaintext rev_plaintext = scared.CPAReverse( selection_function=identity_plaintext, model=scared.HammingWeight() ) @scared.reverse_selection_function def identity_ciphertext(ciphertext): return ciphertext rev_ciphertext = scared.CPAReverse( selection_function=identity_ciphertext, model=scared.HammingWeight() ) container = scared.Container(ths) rev_plaintext.run(container) rev_ciphertext.run(container)
Step 2: Per-round S-box output correlation
The second check uses the known key to compute the S-box output at each of the 10 rounds and correlates it against the trace segment of interest. This requires key knowledge but precisely locates where each round leaks, giving a map of the AES execution within that segment.
def make_sbox_sf(round_num): def sbox_output(plaintext, key): return aes.encrypt(plaintext, key, at_round=round_num, after_step=aes.Steps.SUB_BYTES) sbox_output.__name__ = f'sbox_output_round_{round_num}' return scared.reverse_selection_function(sbox_output) results_per_round = {} for round_num in range(1, 11): sf = make_sbox_sf(round_num) rev = scared.CPAReverse( selection_function=sf, model=scared.HammingWeight() ) container = scared.Container(ths) rev.run(container) results_per_round[round_num] = np.max(np.abs(rev.results), axis=0)
All 10 rounds are clearly visible and well-separated across the trace with peak correlations of ~0.05–0.06. The leakage is real and strong enough to attack. This is an important step. It rules out trace quality or alignment problems before investing in the attack itself.
The Selection Function: HD on the Last Round State Transition
What the hardware leaks at the last round
The last round of AES-128 encryption performs SubBytes, ShiftRows, and AddRoundKey — no MixColumns. When this round completes, the AES peripheral writes the resulting ciphertext back into the same internal state register that held the last round input. This register write is the leakage point.
The power consumed during that write is proportional to the number of bits that change — the Hamming Distance between the state before the round and the state after it. Working backwards from the ciphertext with a key hypothesis k:
state_before_subbytes = InvSubBytes( state_after_sub_and_sr ) ← undo SubBytes
Leakage = HD( state_after_sub_and_sr, state_before_subbytes )
= HW( state_after_sub_and_sr ⊕ state_before_subbytes )
This is the Hamming Distance — specifically the HD between the state before SubBytes and the state after SubBytes+ShiftRows at the last round. ShiftRows is a byte permutation, so while it reorders bytes globally, it does not affect the per-byte Hamming Distance computation used in the CPA model. The XOR of the two states is the exact bit-difference vector, and counting its set bits with HammingWeight() gives the HD directly.
Why HammingWeight() is the correct model
Hamming Distance between two values A and B is mathematically identical to the Hamming Weight of their XOR:
The selection function computes the XOR between the last round input and output states and returns that XOR vector. The HammingWeight() model then counts the set bits — giving the Hamming Distance. The selection function and model work as a pipeline: the selection function computes what to measure, and the model computes how much it leaked. Together they model the dominant data-dependent component of the register transition inside the AES peripheral.
@scared.attack_selection_function def last_round_subbytes_hd(ciphertext, guesses): # Computes HD across the SubBytes operation in the final round. # state_after_sub_and_sr = ciphertext XOR guess (undo final AddRoundKey) # state_before_subbytes = InvSubBytes(state_after_sub_and_sr) # HD = HW(state_after_sub_and_sr XOR state_before_subbytes) res = np.empty((len(ciphertext), len(guesses), 16), dtype=np.uint8) for i, guess in enumerate(guesses): state_after_sub_and_sr = ciphertext ^ guess state_before_subbytes = scared.aes.INV_SBOX[state_after_sub_and_sr] res[:, i, :] = np.bitwise_xor( state_after_sub_and_sr, state_before_subbytes ) return res # HammingWeight() counts set bits in the XOR vector = HD(input, output) att = scared.CPAAttack( selection_function=last_round_subbytes_hd, model=scared.HammingWeight(), discriminant=scared.maxabs )
This selection function targets the Hamming Distance specifically across the SubBytes operation in the final round — from the state before SubBytes to the state after SubBytes+ShiftRows. While the round-boundary register write conceptually involves the full last-round transition, ShiftRows only permutes bytes without adding bit flips, so per-byte HD values remain unchanged and scared's byte-independent CPA works correctly. Targeting SubBytes, which dominates switching activity in iterative hardware AES due to the nonlinear S-box logic, produced the strongest localized leakage peaks in reverse CPA (~sample 200), and enabled full key recovery from 45,500 traces.
The Attack
frame = range(50, 206) container = scared.Container(ths[:N_TRACES], frame=frame) att.run(container) last_round_key = np.argmax(att.scores, axis=0).astype(np.uint8) original_key = scared.aes.inv_key_schedule(last_round_key, round_in=10)[0][0]
The CPA attack runs over a selected frame covering the AES computation window (samples 50–206), where leakage was confirmed via reverse CPA. No trace preprocessing (resampling, filtering, or re-alignment) was required thanks to synchronous clocking directly from the MSPM0 clock output. Raw traces appear entirely positive-shifted because the MSPM0G3507 exhibits very small, mostly positive current variations during AES execution, and the Nano's AC-coupled input removes the DC component without introducing negative excursions. Figures use DC-removed versions for clarity, and the CPA attack is performed on these DC-removed traces. However, CPA is invariant to constant offset and produces identical results on the raw traces. The recovered last round key is then inverted through the AES-128 key schedule to recover the original master key. The dataset contains 100,000 traces, and the full key is recovered cleanly from only 45,500.
Key Takeaways
Know Your Target
Software and hardware AES leak at fundamentally different points. The selection function must model the actual physical event, not the software equivalent.
Register Transitions Leak
Hardware AES leaks when the round result is written back to the state register, with power proportional to bit transitions, modelled as Hamming Distance.
Reverse CPA as Diagnostic
Confirmed real leakage existed in the traces before the attack, ruling out capture quality issues and locating all 10 rounds precisely.
Synchronous Sampling
Clocking the CW Nano directly from the MSPM0 clock output ensures a fixed phase relationship, eliminating jitter at 1 sample/cycle resolution.
The MSPM0G3507 had no prior documented side-channel attacks. To the best of the author's knowledge, this is the first publicly documented key extraction from the MSPM0G3507 hardware AES engine.
TI Advisory — TI-PSIRT-2021-100116
This work was not reported to Texas Instruments. Texas Instruments has a general standing advisory — TI-PSIRT-2021-100116 (January 2022) — covering physical security attacks against their silicon, including power analysis and electromagnetic side-channel attacks. TI acknowledges that devices without documented mitigations against specific physical attacks are potentially vulnerable, and assigns a CVSS base score ranging from 4.8 to 6.1 depending on attack complexity and confidentiality impact. The MSPM0G3507 carries no documented mitigations against power analysis. The advisory is publicly available at https://www.ti.com/seclit/ca/swra739/swra739.pdf.
Resources
| ChipWhisperer Nano (CW1101) | chipwhisperer.readthedocs.io |
| MSPM0G3507 | ti.com/product/MSPM0G3507 |
| WeActStudio MSPM0G3507CoreBoard | github.com/WeActStudio/WeActStudio.MSPM0G3507CoreBoard |
| eShard scared — documentation | eshard.gitlab.io/scared |
| eShard scared — source | github.com/eshard/scared |