Last Updated: 2012-10-20 23:32:53 UTC
by Johannes Ullrich (Version: 1)
Back when I started DShield.org, one of the challenges was dealing with variations in log formats. 10+ years laters, this problem hasn't really changed, even though there are some promising solutions (which isn't that different form 10+ years ago).
Firewall logs are a pretty simple example. The basic information captured is pretty similar across different firewalls: Packet header data. Some log formats are more verbose then others, but the idea is the same and it is not too hard to come up with a standard to express these logs. For DShield, we used a smallest common denominator approach. It wasn't our goal to collect all the details offered by different firewalls. For an enterprise log management system however, you may need to preserve this detail, and the simple tab delimited format we came up with for DShield wouldn't be extensible enough.
One of the logging standards that is gaining some steam is "CEE", or "Common Event Expression" . To be successful, a logging standard has to address a number of different problems:
- Log format: This is the basic "syntax" used to express logs. This problem is actually the easier one to solve, and the current approach is to use XML to express the logs. XML isn't exactly efficient, but it is extendable and there is a rich set of libraries and database technologies to create and parse XML. I see it as the "ugly default solution". A more compact binary format may be preferred, but would have a much higher cost to get started.
- Taxonomy: This is the hard problem. The "magic strings" we assign different events. For firewall logs, this is pretty easy usually. But think about antivirus! You could log the MD5 hash of the sample that was detected as malicious. But this wouldn't be as meaningful as knowing what malware family this sample belongs to. But there is no agreement as to what constitutes a "malware family" or what to call different families. If you have to correlate logs from different vendors, you will need to translate the name each vendor assigns to a particular piece of malware.
- Vendor Acceptance: There are a lot of great proposals in this space that solve the first two problems. But unless you want to implement it yourself, you need a vendor to support a particular solution. In order for a standard to catch on, there has to be customer demand first. Secondly, the solution has to be economical to implement. It helps if the standard is open and not associated with licensing fees. But first of all, the standard needs to be easy to implement.
So how does CEE solve these issues?
CEE supports two different formats: XML and JSON. XML is the "primary" standard allowing for the most flexibility, but JSON, due to its simple structure, is easier to parse and sufficient in many applications. It is also not terribly hard to convert JSON to XML.
CEE doesn't really solve all of this problem, but it starts by defining common labels and data types (like "src.ipv4" for the IPv4 address of a source). In part, CEE refers to other standards like CVE to come up with a vocabulary to use to identify events.
I didn't list this problem above, but it is certainly important to consider how logs are transported. In the Unix world, various versions of syslog have become the de-facto standard for log transport. But once you leave Unix based systems, syslog support is no longer a given. CEE addresses various issues like support for compression and protecting log integrity (which plain old syslog doesn't do well at all)
I do think CEE is certainly a standard to watch out for. Right now, the standard is labeled as "beta". The tricky part will be vendor support. The CEE board does include representatives from a number of important vendors, but I don't see a lot (any?) log management vendors on the list. Of course CEE would help the most if devices generating logs would support it.