bedri-zija.github.io  /  research
Breaking Hardware AES on the MSPM0G3507: From Zero to Key
Why the software AES attack does not transfer to hardware AES, what the hardware actually leaks, and how to exploit it.
CPA Hardware AES MSPM0G3507 scared Hamming Distance Side-Channel

Background

Side-channel analysis (SCA) exploits physical leakage from a device during cryptographic operations. Power consumption correlates with the data being processed. This is the foundation of Correlation Power Analysis (CPA). While software AES is well-studied and straightforward to attack, hardware AES is a different beast that requires understanding the physical leakage at the hardware level before choosing the right attack. The MSPM0G3507 has not been previously studied as an SCA target. This work targets a platform with no prior published side-channel research.


The Target

Hardware setup — MSPM0G3507 target board with ChipWhisperer Nano and SMA coaxial measurement lines
Fig 1 — Hardware setup. Target is a WeActStudio MSPM0G3507CoreBoard with onboard USB-to-UART used for communication. The board's decoupling capacitor on the VCORE pin is removed. Approximately 1.5 V is applied externally to overdrive the internal LDO (nominal 1.2 V VCORE). The measurement path is: power supply → 100 nF bypass cap (high side of shunt) → shunt → low side of shunt → VCORE pin. The ChipWhisperer Nano is connected to the low side of the shunt, which is the VCORE node. Green wire: trigger output from the DUT to the Nano trigger input. Blue wire: GND. Yellow wire: clock output from the MSPM0G3507 to the Nano clock input for synchronous 1 sample/cycle capture. One SMA coaxial line routes the analog signal from the shunt node to the CW Nano measurement input; the other is the external power supply feed to the VCORE rail.
DeviceTexas Instruments MSPM0G3507
PeripheralHardware AES-128 engine
CaptureChipWhisperer Nano, 8 MSPS clocked from the MSPM0 clock output, synchronous at 1 sample/clock cycle
Dataset100,000 traces

Traces are captured with the ChipWhisperer Nano and stored in the eStraces format. All analysis — reverse CPA, leakage confirmation, and the final key recovery — is performed with eShard's scared library. The firmware is minimal — plaintext is loaded, hardware encrypts it, ciphertext is read back:

C
DL_AES_init(AES, DL_AES_MODE_ENCRYPT_ECB_MODE, DL_AES_KEY_LENGTH_128);
DL_AES_setKey(AES, &g_key[0], DL_AES_KEY_LENGTH_128);

/* Trigger high - SCA capture window start */
DL_GPIO_setPins(GPIO_USER_PINS_PORT, GPIO_USER_PINS_TRIGGER_PIN);

DL_AES_loadDataIn(AES, &g_plaintext[0]);
while (DL_AES_isBusy(AES))
    ;

DL_AES_getDataOut(AES, &g_ciphertext[0]);

/* Trigger low - SCA capture window end */
DL_GPIO_clearPins(GPIO_USER_PINS_PORT, GPIO_USER_PINS_TRIGGER_PIN);

The trigger wraps the full AES operation including the ciphertext readout. The key is fixed and loaded once before the trigger. This matches a real deployment scenario exactly.

First trace of the dataset - Complete AES execution
Fig 2 — First trace of the dataset. Complete AES execution including ciphertext readout, captured between the two trigger edges.

Software vs Hardware AES: Different Leakage, Different Attack

A standard CPA attack on software AES targets the output of the last round SubBytes operation using a Hamming Weight model. This works because on a CPU, every intermediate value produced during AES execution must be written to memory or moved across a data bus, and those bus transitions leak power proportional to the Hamming Weight of the value being moved. The SubBytes output is a clean, key-dependent intermediate value that sits right at the boundary between the ciphertext and the key, making it the ideal attack target.

Hardware AES works fundamentally differently. The MSPM0G3507 AES peripheral executes encryption in an iterative architecture, where each round is processed sequentially by dedicated hardware logic, taking a fixed number of clock cycles to complete. According to the datasheet, AES-128 encryption requires exactly 168 cycles regardless of clock frequency. This corresponds to roughly 16–17 cycles per round on average, including the initial AddRoundKey and final round, and matches the spacing observed in the traces. A fully parallel combinational implementation would complete in a handful of cycles with no observable round structure at all.

Within each round, the operations — SubBytes, ShiftRows, MixColumns, AddRoundKey — are implemented in hardware logic. The output of SubBytes feeds directly into ShiftRows, which feeds into MixColumns, and so on. None of these intermediate values are written to software-visible registers or external data buses. This is the critical difference: a standard SW-based CPA model targeting SubBytes output finds no dominant bus-level leakage to correlate with, because the physical event it models, a value being written across a software-visible data bus, does not occur for these internal states. Combinational logic still produces switching activity, but without a matching leakage model it does not yield a clean, high-SNR correlation peak. The leakage that actually dominates is elsewhere.

Software AES (CPU) Hardware AES (Peripheral)
Execution One operation per clock cycle, sequential Iterative hardware — one round per ~16 cycles, state written to register at each round boundary
Intermediate values Written to registers / memory — bus transitions leak Directly wired between stages — never on a bus
Primary leakage point SubBytes output written to data bus → Hamming Weight State register written at round completion → Hamming Distance
Selection function LastSubBytes() Custom HD on last round state transition

The key question for hardware AES is: where does a data-dependent value actually get written to storage? The answer is at the round boundary — when the AES engine finishes a round and writes the new state back into the internal state register, overwriting the previous round's state. That register write is the physical leakage event, and it leaks in proportion to how many bits changed — the Hamming Distance between the old state and the new state.


Confirming Leakage: Reverse CPA

Before launching a full attack, it is critical to confirm and precisely locate exploitable leakage. Formal methods such as Test Vector Leakage Assessment (TVLA) are commonly used — they are model-agnostic and can indicate whether any statistical difference exists between two sets of traces. In practice, TVLA is typically run with a fixed, known key to ensure deterministic behavior and valid statistical comparisons.

In this work, Reverse CPA is used instead. By explicitly using the known key, Reverse CPA not only confirms the presence of leakage but also precisely maps it across the time domain of the AES execution. For a previously undocumented target like the MSPM0G3507, this provides a "leakage map" that is more useful for attack timing than the pass/fail results of TVLA.

Reverse CPA computes exact intermediate values from the known key and correlates them against every sample point in the trace. This is not an attack. It "cheats" with the key, but it answers the critical questions:

The resulting time-domain correlation peaks reveal the most vulnerable points in the hardware AES execution, guiding the CPA attack and providing the resolution needed to target individual clock cycles.

Step 1: Plaintext and ciphertext correlation

The first check correlates the raw plaintext and ciphertext bytes against the trace using identity selection functions. This immediately shows where in the trace the input data is processed and where the output appears, without any key knowledge required.

Python
@scared.reverse_selection_function
def identity_plaintext(plaintext):
    return plaintext

rev_plaintext = scared.CPAReverse(
    selection_function=identity_plaintext,
    model=scared.HammingWeight()
)

@scared.reverse_selection_function
def identity_ciphertext(ciphertext):
    return ciphertext

rev_ciphertext = scared.CPAReverse(
    selection_function=identity_ciphertext,
    model=scared.HammingWeight()
)

container = scared.Container(ths)
rev_plaintext.run(container)
rev_ciphertext.run(container)
Reverse CPA - Plaintext vs Ciphertext
Fig 3 — Reverse CPA: plaintext correlation (blue) peaks strongly at the start of the AES window; ciphertext correlation (red) shows a clear peak at the end of the window, confirming the ciphertext readout is captured within the trigger frame.

Step 2: Per-round S-box output correlation

The second check uses the known key to compute the S-box output at each of the 10 rounds and correlates it against the trace segment of interest. This requires key knowledge but precisely locates where each round leaks, giving a map of the AES execution within that segment.

Python
def make_sbox_sf(round_num):
    def sbox_output(plaintext, key):
        return aes.encrypt(plaintext, key,
                          at_round=round_num,
                          after_step=aes.Steps.SUB_BYTES)
    sbox_output.__name__ = f'sbox_output_round_{round_num}'
    return scared.reverse_selection_function(sbox_output)

results_per_round = {}
for round_num in range(1, 11):
    sf = make_sbox_sf(round_num)
    rev = scared.CPAReverse(
        selection_function=sf,
        model=scared.HammingWeight()
    )
    container = scared.Container(ths)
    rev.run(container)
    results_per_round[round_num] = np.max(np.abs(rev.results), axis=0)
Reverse CPA - Sbox output correlation per round
Fig 4 — Reverse CPA S-box output correlation per round. All 10 rounds are clearly separated and locatable across the trace. Peak correlations reach ~0.05–0.06, confirming real leakage.

All 10 rounds are clearly visible and well-separated across the trace with peak correlations of ~0.05–0.06. The leakage is real and strong enough to attack. This is an important step. It rules out trace quality or alignment problems before investing in the attack itself.


The Selection Function: HD on the Last Round State Transition

What the hardware leaks at the last round

The last round of AES-128 encryption performs SubBytes, ShiftRows, and AddRoundKey — no MixColumns. When this round completes, the AES peripheral writes the resulting ciphertext back into the same internal state register that held the last round input. This register write is the leakage point.

The power consumed during that write is proportional to the number of bits that change — the Hamming Distance between the state before the round and the state after it. Working backwards from the ciphertext with a key hypothesis k:

Last round state transition
state_after_sub_and_sr = ciphertext ⊕ k                            ← undo AddRoundKey
state_before_subbytes = InvSubBytes( state_after_sub_and_sr )  ← undo SubBytes

Leakage = HD( state_after_sub_and_sr, state_before_subbytes )
         = HW( state_after_sub_and_sr ⊕ state_before_subbytes )

This is the Hamming Distance — specifically the HD between the state before SubBytes and the state after SubBytes+ShiftRows at the last round. ShiftRows is a byte permutation, so while it reorders bytes globally, it does not affect the per-byte Hamming Distance computation used in the CPA model. The XOR of the two states is the exact bit-difference vector, and counting its set bits with HammingWeight() gives the HD directly.

Why HammingWeight() is the correct model

Hamming Distance between two values A and B is mathematically identical to the Hamming Weight of their XOR:

Mathematical identity
HD( A, B ) = HW( A ⊕ B )

The selection function computes the XOR between the last round input and output states and returns that XOR vector. The HammingWeight() model then counts the set bits — giving the Hamming Distance. The selection function and model work as a pipeline: the selection function computes what to measure, and the model computes how much it leaked. Together they model the dominant data-dependent component of the register transition inside the AES peripheral.

Python
@scared.attack_selection_function
def last_round_subbytes_hd(ciphertext, guesses):
    # Computes HD across the SubBytes operation in the final round.
    # state_after_sub_and_sr = ciphertext XOR guess  (undo final AddRoundKey)
    # state_before_subbytes  = InvSubBytes(state_after_sub_and_sr)
    # HD = HW(state_after_sub_and_sr XOR state_before_subbytes)
    res = np.empty((len(ciphertext), len(guesses), 16), dtype=np.uint8)
    for i, guess in enumerate(guesses):
        state_after_sub_and_sr = ciphertext ^ guess
        state_before_subbytes  = scared.aes.INV_SBOX[state_after_sub_and_sr]
        res[:, i, :] = np.bitwise_xor(
            state_after_sub_and_sr,
            state_before_subbytes
        )
    return res

# HammingWeight() counts set bits in the XOR vector = HD(input, output)
att = scared.CPAAttack(
    selection_function=last_round_subbytes_hd,
    model=scared.HammingWeight(),
    discriminant=scared.maxabs
)

This selection function targets the Hamming Distance specifically across the SubBytes operation in the final round — from the state before SubBytes to the state after SubBytes+ShiftRows. While the round-boundary register write conceptually involves the full last-round transition, ShiftRows only permutes bytes without adding bit flips, so per-byte HD values remain unchanged and scared's byte-independent CPA works correctly. Targeting SubBytes, which dominates switching activity in iterative hardware AES due to the nonlinear S-box logic, produced the strongest localized leakage peaks in reverse CPA (~sample 200), and enabled full key recovery from 45,500 traces.


The Attack

Python
frame = range(50, 206)
container = scared.Container(ths[:N_TRACES], frame=frame)
att.run(container)

last_round_key = np.argmax(att.scores, axis=0).astype(np.uint8)
original_key   = scared.aes.inv_key_schedule(last_round_key, round_in=10)[0][0]

The CPA attack runs over a selected frame covering the AES computation window (samples 50–206), where leakage was confirmed via reverse CPA. No trace preprocessing (resampling, filtering, or re-alignment) was required thanks to synchronous clocking directly from the MSPM0 clock output. Raw traces appear entirely positive-shifted because the MSPM0G3507 exhibits very small, mostly positive current variations during AES execution, and the Nano's AC-coupled input removes the DC component without introducing negative excursions. Figures use DC-removed versions for clarity, and the CPA attack is performed on these DC-removed traces. However, CPA is invariant to constant offset and produces identical results on the raw traces. The recovered last round key is then inverted through the AES-128 key schedule to recover the original master key. The dataset contains 100,000 traces, and the full key is recovered cleanly from only 45,500.

Terminal output - key extraction result
Fig 5 — Terminal output. Full AES-128 key recovered from 45,500 traces. Key correct: True.
N_TRACES : 45,500
Last round key : d0 14 f9 a8 c9 ee 25 89 e1 3f 0c c8 b6 63 0c a6
Original key : 2b 7e 15 16 28 ae d2 a6 ab f7 15 88 09 cf 4f 3c
Key correct : True ✓
CPA correlation - best guess per byte
Fig 6 — CPA correlation for best guess per byte. All 16 bytes converge sharply at the same sample point (~sample 200), confirming a clean single-point last-round leakage.

Key Takeaways

Know Your Target

Software and hardware AES leak at fundamentally different points. The selection function must model the actual physical event, not the software equivalent.

Register Transitions Leak

Hardware AES leaks when the round result is written back to the state register, with power proportional to bit transitions, modelled as Hamming Distance.

Reverse CPA as Diagnostic

Confirmed real leakage existed in the traces before the attack, ruling out capture quality issues and locating all 10 rounds precisely.

Synchronous Sampling

Clocking the CW Nano directly from the MSPM0 clock output ensures a fixed phase relationship, eliminating jitter at 1 sample/cycle resolution.

The MSPM0G3507 had no prior documented side-channel attacks. To the best of the author's knowledge, this is the first publicly documented key extraction from the MSPM0G3507 hardware AES engine.


TI Advisory — TI-PSIRT-2021-100116

This work was not reported to Texas Instruments. Texas Instruments has a general standing advisory — TI-PSIRT-2021-100116 (January 2022) — covering physical security attacks against their silicon, including power analysis and electromagnetic side-channel attacks. TI acknowledges that devices without documented mitigations against specific physical attacks are potentially vulnerable, and assigns a CVSS base score ranging from 4.8 to 6.1 depending on attack complexity and confidentiality impact. The MSPM0G3507 carries no documented mitigations against power analysis. The advisory is publicly available at https://www.ti.com/seclit/ca/swra739/swra739.pdf.


Resources

ChipWhisperer Nano (CW1101) chipwhisperer.readthedocs.io
MSPM0G3507 ti.com/product/MSPM0G3507
WeActStudio MSPM0G3507CoreBoard github.com/WeActStudio/WeActStudio.MSPM0G3507CoreBoard
eShard scared — documentation eshard.gitlab.io/scared
eShard scared — source github.com/eshard/scared