Skip to main content

CMS Security Data Lake (SDL)

A centralized repository for security data created to improve CMS’s security posture and support threat detection and threat hunting activities

Contact: CRM Team | CRMPMO@cms.hhs.gov
slack logoCMS Slack Channel
  • #security-datalake
  • #cyber-risk-management

What is the CMS Security Data Lake (SDL)?

The CMS Security Data Lake (SDL) is a centralized repository designed to store, process, maintain, secure, and govern large amounts of security data. Unlike most traditional databases and data warehouses, the CMS SDL can process all data types relevant to CMS's security posture including: 

  • Structured data with standardized formatting  
  • Semi-structured data, markup languages, logs, telemetry, events, or other data sources.

The CMS SDL allows CMS to store this raw data from diverse sources and formats and enables security stakeholders to access, analyze, transform, and research the full body of available data in a cost effective way. Analyzing this data provides CMS with the ability to: 

  • Strengthen our real-time visibility enterprise IT security posture with actionable intelligence and threat detection data
  • Take a data-driven approach to scale security products and services that enable teams across CMS to achieve their goals quickly and safely.
  • Promote cross-functional collaboration among various security stakeholders.
  • Create, mature, and diffuse services among our partners that are shared, reusable and sustainable
  • Easily add, remove, or replace tools as needed.

In addition to the abilities listed above, the CMS SDL directly responds to both CMS priorities and federal system security requirements designed to improve the security posture of all US government systems.

Government priorities and requirements

The White House has prioritized cybersecurity improvements, the adoption of best practices, and the implementation of innovative security tools across federal agencies. 

In response, the CMS Information Security and Privacy Group (ISPG) has identified five organizational priorities that relate to cybersecurity at CMS. The CMS SDL addresses these priorities in the following ways: 

Risk-based program management

The CMS SDL provides a centralized repository for storing and managing data from various sources. This makes it easier to implement data governance controls and monitor access to the data, as opposed to having data spread across multiple systems or silos. This helps teams make more informed risk-based decisions.

Innovation unleashed through experimentation and adaptation 

Not only is the CMS SDL an innovative product, but it helps teams review and scale other products, tools, and services quickly. 

Resilient enterprise security posture

By aggregating and analyzing data from various sources within the SDL, CMS can perform advanced threat detection and security analytics. This can help identify unusual patterns or anomalies that may indicate security breaches. 

First-class integrations, using open standards, ease of automation.

The CMS SDL can be integrated with other CMS security tools. The SDL is built with simplicity and open standards in mind. This allows for real-time monitoring, security incident alerting, and 3rd party tool integrations making it easier for CMS to promptly detect and respond to threats.

Advance CMS toward Zero Trust security

The CMS SDL powers CMS' Zero Trust maturity program by providing access to user and device behavior data, network traffic logs, and access control policies. Collecting and analyzing this data allows CMS to continuously monitor and verify access requests, detect anomalies, and mature the various Zero Trust pillars.

Why is CMS transitioning to the CMS SDL? 

As our Next Generation Reporting and CRM programs continue their maturation, DIR wanted to acknowledge the feedback from CMS’ cyber security stakeholders in the community (YOU) and build a data management strategy with a foundation that is flexible enough to meet our current and future requirements. In short, the shift towards the SDL was predicated on allowing security management teams to make better and faster decisions regarding CMS' systems.

Key factors driving CMS to transition are:

  • Improved reporting with additional data sources
  • Aggregation, normalization, and grouping of data to enhance analysis and reporting
  • Allow CMS stakeholders to use the SDL as a self-service entity
  • Build your own reports/dashboards and add your own data
  • Enhance scalability and flexibility in data processing and data management
  • Bring additional security data from multiple sources into one feed (lessen data silos)
  • Set the groundwork for employing advanced analytics, machine learning, and artificial intelligence to improve threat detection and response times

CMS CRM DW on Confluence

Learn more about our transition from our "Legacy" Data Warehouse (LDW) to the more efficient Security Data Lake (SDL).

Learn more about the CMS SDL

Who can use the CMS SDL? 

The open format of the CMS SDL provides a flexible and cost-effective solution for teams across the CMS enterprise to address the agency’s strategic security priorities. The CMS SDL is recommended for teams engaged in the following activities:

Continuous Diagnostics and Monitoring (CDM)

The CMS SDL is directly related to Continuous Diagnostics and Monitoring (CDM) and the work that’s being done by the Cyber Risk Management (CRM) Team. The CMS SDL can help teams:

  • Manage configuration settings using data on asset compliance status, security policies, and severity of vulnerabilities
  • Manage hardware assets using data on hardware assets, inventory of EC2 and managed instances, and AWS resource tags
  • Assess and mitigate vulnerabilities using data on vulnerabilities, detection, and mitigation status

Security Operations

The CMS SDL’s centralized data management enables robust access control, encryption, and audit capabilities. Additionally, the CMS SDL will: 

  • Enable and improve collection, detection, triage, investigation, incident response and lessons learned
  • Provide more actionable intelligence, higher fidelity alerting to speed up triage and incident response
  • Use AI tools to analyze low fidelity alerts for advanced attacks, analyze false positives to refine and tune existing detections / analytics, identify other patterns / trends
  • Offer robust detection logic using detection-as-code, Python and community-driven and developed analytics will reduce cost, improve portability and avoid vendor lock-in
  • Improved data will enhance purple and red teaming and tabletop testing
  • Collection policies not limited by cost or storage constraints

Threat Intelligence

The CMS SDL provides the context needed to feed all core functions of Security Operations including triage, investigation, and incident response. Additionally, the CMS SDL will offer better "strategic and operational" intelligence by enabling:

  • Threat modeling exercises
  • Quantitative data analysis including loss exceedance curves and probabilistic estimation in real dollars 
  • Internally-sourced intelligence based on actual incident data that’s stored in the CMS SDL 
  • Fulfilling CISO and CTI threat intelligence requirements 

Threat Hunting

Threat hunting is a proactive, data driven approach that is reliant on up-to-date, high quality, comprehensive data. Current threat hunting is heavily dependent on atomic indicators of compromise (IOCs). The CMS SDL will allow for:

  • More advanced threat hunting, such as anomaly-based and by specific threat actor groups
  • Greater focus on riskiest stages in kill chain: post exploitation 
  • Improved analytics, detections, preventive controls, and incident response
  • Faster Observe, Orient, Decide, Act (OODA) loops that will allow CMS to be more responsive to attacks 

Software and Container Security

The CMS SDL is also used to test and validate tools and services that are currently used by CMS including:

  • Snyk to scan and fix vulnerabilities and license violations in open-source dependencies and containers
  • Semgrep 
  • Grype
  • GitLeaks
  • Other DAST tools

Software-as-a-Service (SaaS) Governance

SaaS governance involves defining data ownership, access policies, and data lifecycle management rules. Implementing data governance practices within the CMS SDL helps re-enforce security policies and ensure compliance with current regulations and standards. 

  • Use AppOmni to monitor SaaS services, track issues, run scans, detail policies, and offer insight into associated risks
  • Use BitSight to provide overview of company portfolio, company rating, product rating, product information, changes in ratings, details about potential security threats of product
  • Include SaaS Security and operational health into CMS’ risk-based security posture

 

ZeroTrust as a Security Model 

ZeroTrust is a security model that is built on continuous validation at every stage of digital interaction. The ZeroTrust (ZT) security model, also known as ZeroTrust Architecture (ZTA), maintains that no user or application should be trusted by default. As a result, organizations that implement a ZeroTrust model move from checking permissions only at initial sign-on to continuously checking permissions as users or devices move through a system. This constant validation provides enhanced security for systems, devices, and users. ZeroTrust is a security strategy that is ideal for SaaS applications because it can help mitigate risks associated with access to sensitive data, tracking user activity, security posture, and cyberattacks.

Use Cases for ZeroTrust

  • Replacing or augmenting VPNs: ZeroTrust can provide an extra layer of protection for organizations that are looking to replace or augment their VPNs.
  • Improving access control for the cloud: ZeroTrust can reduce the risk of unauthorized cloud-based access by verifying all requests.

How can I get help? 

During this period of transition, you may experience minor disruptions, unexpected issues, or notifications highlighting upcoming updates. The development team is proactively working on two primary issues during the transition period on our "Current Issues" list. Please visit our CRM Data Quality Ticket Status Tracker for the latest ticket updates. 

The CMS Cyber Risk Management (CRM) Team can help answer your questions and get your team onboarded to the CMS SDL. You can reach out to the team on CMS Slack in the #cyber-risk-management channel or via email at CRMPMO@cms.hhs.gov