Академический Документы
Профессиональный Документы
Культура Документы
White Paper
Hardware Security
for AI Accelerators
© Rambus Inc.
Introduction
The rapid growth of artificial intelligence and machine learning (AI/ML) in applications spanning
virtually every industry is driving the development of dedicated accelerator hardware, and its
broad deployment across data centers and the network edge. The virtuous cycle of greater AI/ML
processing power enabling new applications which then spur demand for more processing power
is in full swing.
The accelerating value creation of AI/ML combined with its wider deployment raises the
motivation for and the risk of attacks. Protecting AI/ML assets, be they hardware, software
or data, is increasingly mission critical. This white paper will discuss the threats, fundamental
security techniques, and use models that illustrate safeguarding these assets.
AI Assets
For our use cases, we’ll be examining both edge and data center (server) devices. An edge device
may contain an ML accelerator for local inference. Alternatively, it may be a simple edge device
which collects inputs and transmits this data to the cloud for inferencing. Our example server
device has an accelerator card with a dedicated ML accelerator chip. This could be performing
training, or inference in the cloud as a counterpart to the simple edge device.
Across these devices, the AI assets needing protection start with the AI accelerator. Attackers
could attempt to tamper with the accelerator hardware to deny its usage or bypass its security.
This is particularly true in the case of a smart edge device that’s outside the hardened data
center environment. In addition, both edge and server ML accelerators can be subject to
attempts to tamper with their firmware, again, to deny usage or bypass security.
Training data is another important asset. An attacker can tamper with training data to distort
the resulting model. This is called an AI poisoning attack. In addition, an attacker could attempt
to steal the training data. For our use cases, we assume that training occurs only on in the data
center.
The inference model is another one of our assets. An attacker could modify or replace the
inference model to induce incorrect behavior, or they could attempt to steal the inference model
itself. An attacker could also tamper with the input data to produce misclassifications. Attackers
could steal the input data, which could have privacy implications. Finally, the inference results
themselves require protection as they drive action in the physical world which if tampered with
could cause property loss, injury or loss of life.
Attackers can run malicious firmware on the CPU and can read the contents of SRAM. Attackers
can monitor, intercept and change network traffic, and they can use side-channel attacks like
power analysis, EM analysis, and fault injection on the device.
The threat model for servers and their ML accelerator cards differs from that of edge devices
in that servers are in a hardened data center environment. For our data center use cases, we
assume attackers won’t have physical access to the hardware. However, potential threats are still
numerous and complex.
Attackers can subvert the host CPU hypervisor and access any process or memory region. They
can read the flash in both the host and the accelerator, as well as the contents of SSDs. Beyond
reading, attackers could attempt to can change the contents of flash or SSDs. Attackers could
run malicious software on the host CPU and malicious firmware on the accelerator CPU. Further,
attackers can read SRAM and DRAM contents on the host and the accelerator. They can monitor,
intercept, and change network and bus traffic.
Security Measures
In our use cases, we’ll be employing a number of security techniques to safeguard AI/ML assets.
These are described in the paragraphs below:
Encryption: Unencrypted data (plain text) is converted to encrypted data (ciphertext) using an
encryption algorithm. The encryption algorithm performs alphabetic substitution of the plain
text using a secret key and a block cipher (an enormous “alphabet” that’s practically infeasible to
work through by brute force). Changing the key alters the ciphertext generated by the encryption
algorithm. Knowledge of the key is needed to decrypt the ciphertext to plain text. AES (Advanced
Encryption Standard) is an example of an encryption algorithm.
Hashing: A hash function creates a fixed-length “fingerprint” for any set of data (message). Quick
to compute, a hash function is deterministic, so the same input produces the same output hash
value. Comparing the hash of a message with an earlier hash can determine if the message is
unaltered. Matching hashes confirm the integrity of the message. Properties of an ideal hash
function are that no two data sets produce the same fingerprint, and it is infeasible to determine
the message from the hash. SHA-2 and SHA-3 (Secure Hash Algorithm) are examples of hash
functions.
Signing: Signer and verifier share a secret key that is used to cryptographically create a signature
in symmetric key authentication. The signature is sent with the message. Upon receipt, the
verifier recomputes the signature using the key, and if it matches the one sent, confirms
authenticity of the message sent. HMAC (Key-Hash Authentication Code) is one of the most
commonly used signing algorithms.
System Monitoring: A set of processes which check the operation of system hardware and
software against a set of normal parameters. Operation outside the norm signals the system may
be under attack.
The HRT is the foundation for security. It contains the keys and other secure data needed for
authentication and encryption. A baseline function of the root of trust is that it verifies we have
authentic, and untampered boot code with which to start up the system. Ideally, the HRT should
be purpose-built for security where complexity is minimized so it can be hardened against attack.
It siloes cryptographic operations away from main processing so the main processor(s) in the ML
accelerator can be optimized for performance while the HRT is kept simple for security
Further, it should offer a full feature set for executing complex algorithms and cryptographic
protocols. Using a layered security model, it should provide the robust security of hardware with
the flexibility of software. And it should include strong anti-tamper features to guard against side-
channel attacks such as differential power analysis or fault injection.
Use Cases
Protecting ML Accelerator Availability: Firmware
In our first use case, we guard against an attacker attempting to tamper with the ML accelerator
firmware to deny or disrupt usage or bypass security. The solution entails secure boot and
firmware protection.
Accelerator Card(s)
ASIC
DRAM
5 CPU ML Accelerator
1
Hardware 3
PCIe 2 Root of Trust
4 Flash
The HRT can also monitor firmware updates in a similar manner and also provide rollback
protection using the OTP.
Now let’s look protecting the ML accelerator hardware from a tamper attack aiming to deny
usage or bypass security. The solution in this case is system monitoring.
Edge device
1
SoC
3 Sensors
SRAM Hardware 2
Root of ML
Trust Accelerator
CPU U/I
5
Actuators
6 4
Flash
Network
Memory
1. The HRT can monitor system status and memory contents and detect tampering activity.
The HRT can also detect attacks like fault injection. It can monitor test and debug
logic, hardware configuration, and other hardware status in the SoC using dedicated
connections into the ML accelerator logic.
2. It can also monitor the ML accelerator operation, ensuring it’s operating only when
expected.
3. It can periodically hash known SRAM state to detect tampering.
4. It can periodically hash invariant flash data and ensure it’s not changing.
5. Internal logic in the HRT can detect physical attacks like fault injection.
6. The HRT also monitors network traffic, looking for anomalous traffic that might indicate
an attack or a compromised software stack.
In this use case, we provide protection against both tampering and theft of training data.
The solution employs signing and authenticating training data before its use. Training data is
encrypted when not in use to protect it from theft.
Accelerator Card(s)
2
ASIC 5
DRAM
CPU ML Accelerator
3
Hardware
PCIe
Root of Trust Flash
4
Host
PCIe
1
SSD CPU DRAM
Network
1. The signed encrypted training data is stored in the SSD on the host. Even if the host is
compromised, decrypted training data is never present on the host and protected from
theft.
2. The signed encrypted training data is sent over the PICe interface to the accelerator
card and stored in DRAM.
3. The HRT decrypts the training data and hashes the encrypted data.
4. It then verifies the signature and compares hashes.
5. If the hashes match, the verified training data is sent to the ML accelerator.
In this use case we use similar techniques to protect the inference model from tampering,
replacement or theft. Once training is complete, the inference model should be signed and
encrypted.
Edge device
SoC
Sensors
SRAM
2
Hardware ML
Root of Accelerator
CPU Trust U/I
3 4
Actuators
1
Flash
Network
Memory
Integrity of input data must also be protected. This can be done by authenticating
communication with the source of input data.
Accelerator Card(s)
ASIC
4 DRAM
CPU ML Accelerator
Edge device
4
SoC
2 Sensors Hardware 2
SRAM PCIe Flash
Root of Trust
Hardware
Root of
CPU Trust U/I
Host
Actuators PCIe
Here we examine a case of a simple edge device that doesn’t have an ML accelerator. Instead it
relies on inference being done on a server in the cloud.
1. In this case, an HRT in the edge device and an HRT in the accelerator can mutually
authenticate and provide a secure communication channel to protect the input data
integrity.
2. The host communicates with the edge device over the network interface and bridges
connection over PCIe, enabling the two HRTs to communicate.
3. A mutual authentication protocol such as MACsec using pre-provisioned keys and IDs
ensures the edge device is legitimate as is the server. All input data going from the edge
device to the accelerator passes through the secure channel with data encryption.
4. Data from sensors on the edge device are subjected to an integrity check to ensure that
the input data is not tampered with when transmitted to the AI accelerator.
Since input data can carry confidential information, it too must be safeguarded to protect the
privacy of users. A single host may be handling multiple workloads, and compromising one
application could give an attacker access to private data on another workload if not protected.
The solution is to encrypt input data in transit to the accelerator.
Accelerator Card(s)
ASIC
5 DRAM
CPU ML Accelerator
2
Hardware
PCIe Flash
Root of Trust
4
3
Host
PCIe
1 Network
1. Data communicated over the network interface can be encrypted using keys mutually
established between the sending device and HRT in the accelerator.
2. Each workload can have a different key, managed by the HRT.
3. The encrypted input data is sent from the host to the accelerator. The host never sees
the decrypted input data, so a compromise of the host does not compromise user
privacy.
4. Based on the workload ID supplied by the host, the HRT derives the correct key and
decrypts the data.
5. The decrypted input data is then used by the accelerator for inference.
In the final use case, we protect inference results from tampering. Here we’ll use authentication
and a secure communication channel between the edge device and the server when inference is
performed.
Accelerator Card(s)
ASIC
4 DRAM
CPU ML Accelerator
Edge device
SoC
2 Sensors Hardware 2
SRAM PCIe Flash
Root of Trust
Hardware
Root of
CPU Trust U/I
4
Host
Actuators PCIe
The CryptoManager Root of Trust RT-630 is a fully programmable hardware security core
offering security-by-design for AI/ML applications. It protects against a wide range of hardware
and software attacks through state-of-the-art anti-tamper and security techniques. It is built on a
custom 32-bit siloed and layered secure co-processor, along with dedicated secure memories.
The 800G MACsec Protocol Engine supports aggregate bandwidth of 100 to 800 Gbps over as
many as 64 channels. It provides line-rate operation, so there’s no sacrifice in performance to
achieve robust Layer 2 MACsec security between networked devices. Supporting all IEEE MACsec
standards, it has options for Cisco extensions and IPsec ESP AES-GCM protocol.