Skip to main content

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock () or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

CMS Security Data Lake (SDL)

A centralized repository for security data created to improve CMS’s security posture and support threat detection and threat hunting activities

Contact: CRM Team | CRMPMO@cms.hhs.gov
slack logoCMS Slack Channel
  • #cyber-risk-management
  • #security-datalake

What is the CMS Security Data Lake (SDL)?

The CMS Security Data Lake (SDL) is a centralized repository designed to store, process, maintain, secure, and govern large amounts of security data. Unlike most traditional databases and data warehouses, the CMS SDL can process all data types relevant to CMS's security posture including: 

  • Structured data with standardized formatting  
  • Semi-structured data like emails, markup languages, and websites 
  • Unstructured data including log, telemetry, events, or other data sources 

The CMS SDL allows CMS to store this raw data from diverse sources and formats and enables security stakeholders to access, analyze, transform and research the full body of available data in a cost efficient way. Analyzing this data provides CMS with the ability to: 

  • Strengthen our real-time visibility enterprise IT security posture with actionable intelligence and threat detection data
  • Scale secure and proven products and services that enable teams across CMS to achieve their goals quickly and safely
  • Promote cross-functional collaboration among various security stakeholders
  • Pursue and advance technologies that are low-risk and proven that are likely to be adopted quickly
  • Create, mature, and diffuse services among our partners that are shared, reusable and sustainable
  • Easily add, remove, or replace tools as needed

In addition to the abilities listed above, the CMS SDL directly responds to both CMS priorities and federal system security requirements designed to improve the security posture of all US government systems.

Government priorities and requirements

The White House has prioritized cybersecurity improvements, the adoption of best practices, and the implementation of innovative security tools across federal agencies. 

In response, the CMS Information Security and Privacy Group (ISPG) has identified five organizational priorities that relate to cybersecurity at CMS. The CMS SDL addresses these priorities in the following ways: 

Risk-based program management

The CMS SDL provides a centralized repository for storing and managing data from various sources. This makes it easier to implement security controls and monitor access to the data, as opposed to having data spread across multiple systems or silos. This helps teams make more informed risk-based decisions.

Innovation unleashed through experimentation and adaptation 

Not only is the CMS SDL an innovative product, but it helps teams review and scale other products, tools, and services quickly. 

Resilient enterprise security posture

By aggregating and analyzing data from various sources within the SDL, CMS can perform advanced threat detection and security analytics. This can help identify unusual patterns or anomalies that may indicate security breaches. 

Remove, simplify, and automate wherever possible

The CMS SDL can be integrated with other CMS security tools. This allows for real-time monitoring and security incident alerts, making it easier for CMS to detect and respond to threats.

Advance CMS toward Zero Trust security

The CMS SDL enhances Zero Trust security by serving as a centralized repository for user and device behavior data, network traffic logs, and access control policies. Collecting and analyzing this data allows CMS to continuously monitor and verify access requests, detect anomalies, and enforce access controls, aligning with the core principles of Zero Trust.

Why is CMS transitioning to the CMS SDL? 

As our Next Generation Reporting and CRM programs continue their maturation, DIR wanted to acknowledge the feedback from CMS’ cyber security stakeholders in the community (YOU) and build a data management strategy with a foundation that is flexible enough to meet our current and future requirements. In short, the shift towards the SDL was predicated on allowing security management teams to make better and faster decisions regarding CMS data.

Key factors driving CMS to transition are:

  • Improved reporting with additional data sources
  • Aggregation, normalization, and grouping of data to enhance analysis and reporting
  • Allow CMS stakeholders to use the SDL as a self-service entity
  • Build your own reports/dashboards and add your own data
  • Enhance scalability and flexibility in data processing and data management
  • Bring additional security data from multiple sources into one feed (lessen data silos)
  • Set the groundwork for employing advanced analytics, machine learning, and artificial intelligence to improve threat detection and response times

CMS SDL on Confluence

Learn more about our transition from our "Legacy" Data Warehouse (LDW) to the more efficient Security Data Lake (SDL).

Learn more about the CMS SDL

Who can use the CMS SDL? 

The open format of the CMS SDL provides a flexible and cost-effective solution for teams across the CMS enterprise to address the agency’s strategic security priorities. The CMS SDL is recommended for teams engaged in the following activities:

Continuous Diagnostics and Monitoring (CDM)

The CMS SDL is directly related to Continuous Diagnostics and Monitoring (CDM) and the work that’s being done by the Cyber Risk Management (CRM) Team. The CMS SDL can help teams:

  • Manage configuration settings using data on asset compliance status, security policies, and severity of vulnerabilities
  • Manage hardware assets using data on hardware assets, inventory of EC2 and managed instances, and AWS resource tags
  • Assess and mitigate vulnerabilities using data on vulnerabilities, detection, and mitigation status

Security Operations

The CMS SDL’s centralized data management enables robust access control, encryption, and audit capabilities. Additionally, the CMS SDL will: 

  • Enable and improve collection, detection, triage, investigation, incident response and lessons learned
  • Provide more actionable intelligence, higher fidelity alerting to speed up triage and incident response
  • Use AI tools to analyze low fidelity alerts for advanced attacks, analyze false positives to refine and tune existing detections / analytics, identify other patterns / trends
  • Offer robust detection logic using detection-as-code, Python and community-driven and developed analytics will reduce cost, improve portability and avoid vendor lock-in
  • Improved data will enhance purple and red teaming and tabletop testing
  • Collection policies not limited by cost or storage constraints

Threat Intelligence

The CMS SDL provides the context needed to feed all core functions of Security Operations including triage, investigation, and incident response. Additionally, the CMS SDL will offer better "strategic and operational" intelligence by enabling:

  • Threat modeling exercises
  • Quantitative data analysis including loss exceedance curves and probabilistic estimation in real dollars 
  • Internally-sourced intelligence based on actual incident data that’s stored in the CMS SDL 
  • Fulfilling CISO and CTI threat intelligence requirements 

Threat Hunting

Threat hunting is a proactive, data driven approach that is reliant on up-to-date, high quality, comprehensive data. Current threat hunting is heavily dependent on atomic indicators of compromise (IOCs). The CMS SDL will allow for:

  • More advanced threat hunting, such as anomaly-based and by specific threat actor groups
  • Greater focus on riskiest stages in kill chain: post exploitation 
  • Improved analytics, detections, preventive controls, and incident response
  • Faster Observe, Orient, Decide, Act (OODA) loops that will allow CMS to be more responsive to attacks 

Software and Container Security

The CMS SDL is also used to test and validate tools and services that are currently used by CMS including:

  • Snyk to scan and fix vulnerabilities and license violations in open-source dependencies and containers
  • Semgrep 
  • Grype
  • GitLeaks
  • Other DAST tools

Software-as-a-Service (SaaS) Governance

SaaS governance involves defining data ownership, access policies, and data lifecycle management rules. Implementing data governance practices within the CMS SDL helps re-enforce security policies and ensure compliance with current regulations and standards. 

  • Use AppOmni to monitor SaaS services, track issues, run scans, detail policies, and offer insight into associated risks
  • Use BitSight to provide overview of company portfolio, company rating, product rating, product information, changes in ratings, details about potential security threats of product
  • Include SaaS Security and operational health into CMS’ risk-based security posture

What changes can I expect from the CMS SDL?

Our goal is to keep the transition process seamless for our users while maintaining 100% data fidelity and dashboard functionality. You will not need to do anything differently when the SDW dashboards go into production. You will see a series of dashboards with “SDW” in the name, and that is how you’ll know that you are accessing a dashboard using the Security Data Warehouse as the primary source. You will be encouraged to start using them as soon as they are available.

Currently, all CRM dashboards use the LDW as their primary data source. Since June 2023, the core CRM dashboards have been re-engineered to use the SDW as the primary source. 

The core CRM dashboards will be in production within the first week of November 2023. Stakeholders will see a broad announcement on Slack when the dashboards are in production.  Both versions of the dashboards will remain available during this Transition Period. During this period the development team will gather feedback, answer questions, and make any necessary adjustments. Once the transition period is over (February 1st, 2024), the Legacy dashboards will be removed. 

How can I get help? 

During this period of transition, you may experience minor disruptions, unexpected issues, or notifications highlighting upcoming updates. The development team is proactively working on two primary issues during the transition period on our "Current Issues" list. Please visit our CRM Data Quality Ticket Status Tracker for the latest ticket updates. 

The CMS Cyber Risk Management (CRM) Team can help answer your questions and get your team onboarded to the CMS SDL. You can reach out to the team on CMS Slack in the #cyber-risk-management channel or via email at CRMPMO@cms.hhs.gov