Unified Event Management
2008-07-22 TWO work-in-progress sections from the Unified Event Management paper, put up here for people to throw darts at! It’s a work in progress, collaborating via Google Docs…
4. Reference Architecture
The reference architecture is abstractly illustrated in seven logical layers as shown in the figure Unified Event Management. The use case is a Call Center Service, but this could be swapped for any other IT service such as Enterprise Resource Planning or Customer Relationship Management as two common examples.
This is the beauty of a Unified Computing System: it can run multiple heterogeneous services concurrently on the same infrastructure. .

Unified Event Management System
Each of the seven logical layers produces events that must be collected in a predictable and reliable (read: mature) way because other processes take these events as input. Unifying the events from seven layers into one Centralized Event Aggregation point is an example of an Operational Pattern. The other Operational Patterns for Unified Event Management are Manage, Filter, Correlate, Review and Network Operations Center (NOC). The first five of these Operational Patterns are processes executed and maintained by the sixth, organizational Operational Pattern, Network Operations Center (NOC).
The seven layers of the Unified Computing System are explained below.
- The first infrastructure layer is the data center itself, called Facilities, where the Call Center Service is running. This data center provides the space, power and cooling for the service. Each of these components can be monitored and provide metrics and events for the aggregator.
- On top of the Facilities is another core infrastructure layer, the Network. This is the LAN and SAN connectivity, and devices that provide this connectivity all produce events to be aggregated.
- Storage is the third infrastructure layer providing all storage for the stateless compute platforms. Servers are stateless which means they boot from SAN and all system and application data lives “on a storage array, over the network”.
- The fourth infrastructure layer is the Compute layer, represented as the CPU and Memory capabilities to run service workloads. These workloads use the Compute layer to access the lower layers.
- The fifth infrastructure layer is the Virtualization layer. This provides enterprise data center features such as security and high availability to the service workloads.
- The sixth layer is where Applications run. Many heterogenous Applications can run concurrently on the same Unified Computing System infrastructure. Application examples are many, from desktops to web to ERP to CRM.
- The final, seventh layer is a representation of the Service. This is the interface with the business customer who pays for the service, and the end users who use the service.
To ease understanding, the Unified Event Management illustration gives examples of products/features that exist in each layer. These examples can be swapped out, for the most part, with other competing products/features. For example, the Application might not be VMware View and could be an Oracle application running an Enterprise Resource Planning (ERP) service.
To the right hand side of the Unified Event Management illustration are the Operational Patterns. There are six of these:
- Centralized Event Aggregation collects all events from all seven layers into a centralized, secure, accessible location and enables the other Operational Patterns.
- Manage is the set-up and maintenance of the event management system.
- Filter is the on-going learning how to exclude unimportant events and highlight important events.
- Correlate makes sense of the centralized events from all seven layers to provide improved fault isolation and root cause analysis.
- Review is a management pattern to monitor, measure and improve events and the Unified Event Management system.
- Network Operations Center (NOC) is a people / organizational pattern representing the staff who execute the work in Unified Event Management.
| In summary, there are seven layers in a Unified Computing System that generate events.
These events are processed by a Unified Event Management system that consists of six Operational Patterns. |
As explained in section 2.1 A Note on Operational Patterns and Implementations, these are abstract patterns that ease the explanation of the Unified Event Management system. The next section explains each pattern in detail, then the last section provides an implementation of each pattern.
5. Unified Event Management Patterns (Best Practice)
Remember that patterns are just (!) abstractions; operational patterns are simple, universal, vendor-agnostic ways to explain an operational problem and solution without going into implementation details. This leaves the door open for multiple ways to implement the operational pattern, letting the implementer decide which products and techniques to use.
For Event Management, ITIL lists a number of best practices which I’m using as the basis for the operational patterns in Unified Event Management. The next section takes these patterns and provides vendor-specific implementations of them.
5.1 Centralized Event Aggregation
The first operational pattern for Unified Event Management concerns the centralized collection of events. Regardless of the tools you use in the actual implementation, you will follow a similar pattern. First of all, what is the problem to solve?
| Pattern Attribute |
Problem |
Solution |
| Sources |
There are seven layers in the Unified Computing System, all of which produce events that need to be collected prior to filtering and correlation. | The Centralized Event Aggregator (CEA) should be able to accept all event formats over multiple transport protocols. |
| Formats |
Each layer uses different transport protocols and message formats. | The source format will need to be interpreted by text-only type searches, which means some kind of programmatic means to translate different events format into human-readable and interrogatable forms. |
| Communication |
Events need to be sent over the network from the source to the centralized target. | All sources should have a route to the aggregator so that events can be sent in real-time. |
| Retention |
Events need to be kept for a specified time period (years or indefinitely) depending on audit and compliance constraints. | By setting a retention policy, the system should self-regulate including expanding storage as required. Archiving is critical and when used, should be tested regularly to ensure the data is recoverable. |
| Security |
Availability, Integrity and Confidentiality are all critical to event management | The aggregator should be highly available with redundant components and disaster recovery and BCP procedures. If the aggregator is unavailable, sources should queue their events and alert that there is an operational exception. The aggregator should be backed up as well as the data.
Provisions should be made to ensure events are not dropped, lost or editable within the system. Role Based Access Controls should limit the access to the aggregator, as well as network and end-point security measures and auditable means to ensure compliance. |
5.2 Manage
The Manage process monitors and maintains the Unified Event Management process through the following activities:
Monitor
- Monitor the source agents to ensure they are working as expected.
- Monitor the Centralized Event Aggregation tool to ensure it is work as expected.
- Monitor the Filter and Correlation access points on the CEA to ensure they are available as expected.
Maintain
- Continuously refine policies regarding Unified Event Management and apply to other processes like Filter and Correlation.
- Implement changes required of the other Filtering, Correlation and Review processes.
- Ensure that new source components are integrated into the CEA.
- Ensure that the CEA is kept up to date and available.
- Test disaster recovery procedures for the CEA.
5.3 Filter
Filtering is a continuous and automated method (with manual over-ride) of parsing all events into categories like Informational, Warning and Critical.
Categorisation
- Define (and constantly refine) the filtering rules and the categories. Starting categories are Informational, Warning and Critical.
- Check that the rules are effective by checking the category contents.
- Check that newly added source components are covered by the existing rules: refine the rules where appropriate.
Filtering
- Automatically filter incoming events into the correct category.
- Allow manual over-ride for event filtering.
5.4 Correlate
Correlation is the procedure of understanding events and choosing a response, especially across concurrent events from all seven Unified Computing System layers.
Correlate
- Select all events that have one or more data points in common. An example of a data point is time, or hostname, or WWN, or MAC, or IP address.
- Show a trend of events over a time period.
Respond
- Standard technical procedures should be established (through training and online systems) to allow quick look-up for events.
- Escalation procedures should be established to allow one-step response, this should include a “learning system” to repeat learned response processes, preferably codified into an automated system.
5.5 Review
All events that are processed by Unified Event Management, the filtering, categorization, correlation and response, should be reviewed to ensure a healthy, fit-for-purpose system.
5.6 Network Operations Center (NOC)
The NOC is a 24×7, three shift operation that is responsible for Unified Event Management, and they should:
- have a team dedicated to the ownership, development, maintenance and operation of these procedures.
- work with Operations Engineering to implement and manage the Centralized Event Aggregator.
- interface with the Service Desk, Incident and Problem Management process for enterprise operations.

Unified Event Management by Steve Chambers is licensed under a Creative Commons Attribution-Share Alike 3.0 United States License.

@Steve This is a truly insightful paper. I would urge for a retrospective on if the seven layers could use some abstraction. Lack of a ubiquitous language has been the biggest roadblock in ensuring representations/models are used for effectively communicating event management frameworks.
With Operational patterns, you hit the nail on the head by concepts like filtering, reviewing, correlating and managing. I also think that event lifecycle should make its way in a comprehensive paper like this. I would also urge for adding the dimension of quality of alarming/events which would include accuracy, completeness and actionability of events.
All great points, Robin. Let me get the first cut out then we can work in your ideas? Thanks, Steve
Stevie –
Most excellent! I especially like the layers in section 4 and the chart in 5.1. Keep up the good work! Expect a pingback later.
PS. This needs to be on the main page so it gets put into the RSS feed.
Dave