AI Data Security
Best Practices for Securing Data Used to Train & Operate AI Systems
Executive summary
This Cybersecurity Information Sheet (CSI) provides essential guidance on securing
data used in artificial intelligence (AI) and machine learning (ML) systems. It also
highlights the importance of data security in ensuring the accuracy and integrity of AI
outcomes and outlines potential risks arising from data integrity issues in various stages
of AI development and deployment.
This CSI provides a brief overview of the AI system lifecycle and general best practices
to secure data used during the development, testing, and operation of AI-based
systems. These best practices include the incorporation of techniques such as data
encryption, digital signatures, data provenance tracking, secure storage, and trust
infrastructure. This CSI also provides an in-depth examination of three significant areas
of data security risks in AI systems: data supply chain, maliciously modified (“poisoned”)
data, and data drift. Each section provides a detailed description of the risks and the
corresponding best practices to mitigate those risks.
This guidance is intended primarily for organizations using AI systems in their
operations, with a focus on protecting sensitive, proprietary, or mission critical data. The
principles outlined in this information sheet provide a robust foundation for securing AI
data and ensuring the reliability and accuracy of AI-driven outcomes.
This document was authored by the National Security Agency’s Artificial Intelligence
Security Center (AISC), the Cybersecurity and Infrastructure Security Agency (CISA),
the Federal Bureau of Investigation (FBI), the Australian Signals Directorate’s Australian
Cyber Security Centre (ASD’s ACSC), the New Zealand’s Government Communications