• Furious Warrior
  • Posts
  • Enhancing Security Monitoring and Logging for Operational Technology in Data Centers (Part 2)

Enhancing Security Monitoring and Logging for Operational Technology in Data Centers (Part 2)

Enhancing Security Monitoring and Logging for Operational Technology

In partnership with

Enhancing Security Monitoring and Logging for Operational Technology in Data Centers (Part 2)

Data Center

Want SOC 2 compliance without the Security Theater?

  • Oneleet is the all-in-one platform for SOC 2 Compliance & Attestation.

  • Get the automation software, penetration test, 3rd party audit, and vCISO services in one place!

  • Focus on what matters to build real-world security & pass security reviews!

Introduction

In data centers (DC), security monitoring and logging play a critical role in maintaining the operational integrity of the facility. This part of the series delves into the role of Information Technology (IT) and Operational Technology (OT) in supporting data center activities, highlighting the importance of monitoring for effective DC operations. Additionally, we explore the Purdue Enterprise Reference Architecture, a model that provides a structured approach to securing industrial control systems (ICS) used in data centers.

I. The Role of IT and OT in Data Center Operations

A. Intersection of IT and OT in Data Centers Data centers operate at the convergence of network, compute, and storage systems, all of which are supported by a complex array of industrial control systems (ICS). These systems bridge the gap between the physical and digital realms, enabling the seamless operation of data centers.

B. Increasing Threat Vectors As the number of ICS components in data centers grows, so do the potential threat vectors. Unlike traditional IT environments, where the primary concern is data confidentiality, the main risk in OT environments is the loss of service availability. This makes operational metrics critical for influencing security detection and monitoring systems.

II. Importance of Monitoring in Data Center Operations

A. Criticality of Availability In the context of industrial control systems and building management systems, availability is paramount. Any disruption can lead to significant operational failures, making continuous monitoring essential.

B. Influence on Security Systems Operational metrics from OT systems can directly enhance security detection and monitoring systems. By understanding the operational status of critical components, security systems can be more effectively tuned to detect and respond to threats.

III. The Purdue Model: A Framework for Securing Data Center ICS

 Overview of the Purdue Enterprise Reference Architecture The Purdue Model is a six-layer framework designed to secure ICS environments by defining different levels of systems and their functions. It is comparable to the OSI model used in networking systems, but tailored for industrial environments.

Breakdown of the Purdue Model

  1. Level 4/5 – Enterprise Layer

    • This layer includes traditional IT systems like email, Enterprise Resource Planning (ERP), and other business-specific applications. Here, confidentiality and integrity of information are the primary security concerns.

  2. Level 3.5 – Demilitarized Zone (DMZ)

    • The DMZ serves as a buffer between the IT and OT networks, reducing the attack surface. It hosts remote access services, external connections, and patch management systems, ensuring that only validated resources connect to the ICS network.

  3. Level 3 – Facilities/Process Control Network

    • This layer supports the ICS infrastructure, including Active Directory services, data historians, and network infrastructure. It is crucial for data center operations and the backbone of the control network.

  4. Level 2 – Supervisory Control

    • Level 2 handles real-time monitoring, operations supervision, and control. It includes engineering workstations, Human-Machine Interfaces (HMI), and systems like the Building Management System (BMS) and Electrical Power Management System (EPMS).

  5. Level 1/0 – Intelligent Devices & Physical Processes

    • The lowest levels of the Purdue Model encompass sensors, actuators, Programmable Logic Controllers (PLCs), and other devices that directly control physical processes within the data center.

IV. Common OT Systems in Data Centers

A. Mechanical Systems

  • Mechanical infrastructure includes the cooling stack, water storage, and treatment systems, which are vital for maintaining the temperature and environment within the data center. Key components include cooling towers, chillers, pumps, and air handling units.

B. Electrical Systems

  • Electrical infrastructure involves power distribution and monitoring systems, including substations, transformers, power distribution centers, and backup power sources. These systems are essential for ensuring a stable and reliable power supply.

C. Other Building Support Systems

  • These systems include life safety, access control, and other building support mechanisms, such as fire panels, smoke detectors, leak detection, and fire smoke dampers. They are critical for ensuring the safety and security of the data center environment.

Defining Critical Assets in Data Centers

Importance of Speed in Design and Operations

  • Data center designers and operators must move swiftly to meet capacity demands. This requires a robust risk management strategy to address cybersecurity challenges and secure the necessary capacity.

System Classification and Risk Management

  • Classifying systems within the Purdue Model framework allows for a structured approach to aligning security controls and prioritizing risk management activities. Systems lower in the model (Levels 0-2) have the greatest impact on human safety and operational availability.

Domain Classification for IT Operations

  • Data center operations can be further classified by the physical and logical domain sizes influenced by the control systems. This classification helps prioritize cybersecurity efforts based on the potential impact of damage or outages, from regional levels down to individual machines.

Event and Monitoring Requirements for Data Center Equipment

Devices responsible for monitoring, protecting, and controlling equipment in a data center are often in operation for decades. As cyber security threats continue to evolve, it is essential that these devices provide information that supports the timely investigation and resolution of security incidents.

The following tables outline the types of information these systems should provide to enhance our ability to monitor their security posture and supply data that drives effective analytics. These tables are organized into three key areas and should serve as initial guidance rather than a comprehensive set of recommendations:

Security-Relevant Features: This section highlights capabilities that enhance security measures, crucial for detecting, preventing, and mitigating cyber threats. Examples include:

- Real-Time Alerts: Automated notifications for unauthorized access or suspicious activity.

- Access Logs: Detailed records of system access for security audits.

- Encryption Protocols: Data encryption in transit and at rest to protect sensitive information.

Transform the way you run your business using AI (Extended Labour day Sale)💰

Imagine a future where your business runs like a well-oiled machine, effortlessly growing and thriving while you focus on what truly matters.
This isn't a dream—it's the power of AI, and it's within your reach.

Join this AI Business Growth & Strategy Masterclass and discover how to revolutionize your approach to business.
In just 4 hours, you’ll gain the tools, insights, and strategies to not just survive, but dominate your market.

What You’ll Experience: 
🌟 Discover AI techniques that give you a competitive edge
💡 Learn how to pivot your business model for unstoppable growth
💼 Develop AI-driven strategies that turn challenges into opportunities
⏰ Free up your time and energy by automating the mundane, focusing on what you love

🗓️ Tomorrow | ⏱️ 10 AM EST

This is more than just a workshop—it's a turning point.
The first 100 to register get in for FREE. Don’t miss the chance to change your business trajectory forever.

 

Security Features

Purpose

Notes

Authentication (unique ID)

and authorization (ACL)

 

Unique identifier for

access logs, and

restricted access to

resources

 

Not all devices/protocols have authentication

● Modbus device, for example, will reply to any

Request

Certificates, encryption

keys

 

Trusted users

TLS certificates as an example.

● Have they expired, when do they expire?

● When do they need to be renewed?

● Log a message / warning when these are about  to expire?

Heartbeat

Signal uninterrupted

asset identity

 

Capture loss of connectivity to client.

● Signal if unresponsive to a heartbeat message

Default credentials alerts

 

Provide mechanism to detect if default  credentials have been changed

 

 

Credential management:

● Passwords are not preferred.

● A device that will not do anything until default

passwords or trust lists are changed.

Logs

 

Provide history of

access and actions on

assets

 

Separate dedicated log for security events.

● Access logs

● Maintenance logs (Changes to system or user

configuration)

● Centrally managed security log (protections  to

detect alterations)

Secure communication protocols

Mitigate man-in-the-middle

vulnerabilities

On startup, device may report protocols it is configured

to use, and which ones are enabled/disabled

Letʼs remove this as a requirement but fit it into the

startup category (which still needs to be created) On

startup it would be useful if the device reported the

protocols it is configured to use

 

2. Configuration Data

This area emphasizes the data assets should provide for investigations, especially after a security incident. Key aspects include:

- Firmware and Software Versions: Identifying outdated or vulnerable components.

- Network Configurations: Tracing unauthorized access routes.

- User Permissions: Ensuring only authorized personnel access sensitive system areas.

Any configuration change in general should generate risk signal

 

Configuration Item

Purpose

Notes

 

Acceptable system

parameters (valid range)

 

Create a baseline of

system wide parameters

once configured. Detect

changes to this baseline.

 

The change of system parameters could be a security

event.

● Baseline configurations should likely be stored upstream of assets through an asset or

Configuration management system.

● Include settings at the control system level and device level.

Firmware Control

 

Verification (ie, signed)

that the firmware is of

the correct version. Device specific.

 

 

Facilitate firmware management

● Provide mechanism to know if new firmware is

available

● Verify firmware version

● Ideally update devices in place without taking  triggering an outage

Network links (source &

destination IPs, ports,

and protocols)

 

Network monitoring

Identification of resources to communicate with.

● Mapping of source and destination identifiers,

and inclusion in logs

● A change could be a security event.

 

Device Identifier (Mac &

IP address)

 

Network identity

Name, Type, IP and MAC Address, serial number,

certificate, category

● Example: “I am a HVAC controller or power meter”

Device classification

 

Verify the correct asset

(or asset class) is

physically/logically

connected to the correct

location.

 

Similar to Device Identifier.

● Assist with detection of unauthorized assets

connected to various locations within the

system hierarchy

 

Session state

Track connection details

Provides a mechanism to know whether a device is still

alive (or disconnected, reconnected, or replaced)

Sensor/device units

(Temperature, Voltage)

 

Units associated with a

measurement and open

a scaling factor as well

 

Useful to know if a unit of measurement or scaling

factor ever changes. Difference between process level

configuration and system level configuration a

challenge.

 

Application in controller

Programming for

specific applications. Potentially monitoring

for changes in code.

 

Like firmware control. Track application version to help

triage for known security alerts

Documentation that

describes all possible

security events for a

given device

 

List of possible events

and what each one

means. Like an error

code lookup

 

Potential risk to expose these in documentation

 

3. Operational data, though not directly tied to security, is crucial for detecting anomalies that may signal security issues and for system maintenance. Examples include:

- System Uptime: Monitoring uptime and downtime to spot irregular patterns suggesting security breaches.

- Performance Metrics: Tracking CPU and memory usage to identify unusual behavior.

- Event Logs: Keeping detailed logs of system events to correlate with security incidents.

Item

Purpose

Notes

Sensor/Device Valid range(Temperature, Voltage, Network traffic)

Provides ability to detect operational anomaly, as a potential signal of an attack

Operational anomalies would be used as a proxy for a potential security event.

Sensor/Device functional range(Temperature, Voltage, Network traffic)

Provides ability to detect operational anomaly, as a potential signal of an attack

Like a valid range, except with additional customer context to restrict the operational range within the assetʼs function. For example, a temperature sensor may have a valid range from 0-255C, but a functional range of 15-65C.

Exceeded

thresholds/parameters

Alert/Alarm on user

defined conditions for a

given parameter

Built in alert/alarms tied with an assetʼs functional range contains the complexity of thresholding to within the assest

System/application alerts

Flag different conditions

of the system/application

with different priorities.

System/application alerts may combine knowledge from various sensors across multiple assets to flag system conditions.

Conclusion

Enhancing security monitoring and logging for Operational Technology (OT) in data centers is not just a best practice but a critical necessity for maintaining both operational availability and safety. As data centers become increasingly integral to the functioning of modern society, the importance of robust security measures cannot be overstated. By leveraging established frameworks like the Purdue Model, which provides a structured approach to segmenting and securing different layers of industrial control systems, data center operators can gain a comprehensive understanding of their network architecture and the role of critical systems within it.

Understanding the intricacies of these critical systems allows operators

 

Want SOC 2 compliance without the Security Theater?

  • Oneleet is the all-in-one platform for SOC 2 Compliance & Attestation.

  • Get the automation software, penetration test, 3rd party audit, and vCISO services in one place!

  • Focus on what matters to build real-world security & pass security reviews!

Please select up to three topics that interest you the most:

Login or Subscribe to participate in polls.

Reply

or to participate.