Home » Docs » Design

Audit Log Analysis

An audit log is a full historic account of all events that are relevant for a certain object. In this case, we keep audit logs of each target that is managed by the provisioning server.

Problem

The first issue is where to maintain the audit log. On the one hand, one can maintain it on the target, but since the management agent talks to the server, it could keep the log too.

Then there is the question of how to maintain the log. What events should be in it, and what is an event?

Finally, the audit log should be readable and query-able, so people can review it.

The following use cases can be defined:

Context

We basically have two contexts:

Possible solutions

As with all repositories, there should be one location where it is edited. In this case, the logical place to do that is on the target itself, since that is where the changes actually occur. In theory, the server also knows, but that theory breaks down if things fail on the target or other parties start manipulating the life cycle of bundles. The target itself can detect such activities.

The next question is what needs to be logged. And how do we get access to these events?

When storing events, each event can get a unique sequence number. Sequence numbers start with 1 and can be used to determine if you have the complete log.

Assuming the target has limited storage, it might not be possible to keep the full log available locally. There are a couple of reasons to replicate this log to a central server:

When replicating, the following scenarios can occur:

  1. The target has lost its whole log and really wants to (re)start from sequence number 1.
  2. The server has lost its whole log and receives a partial log.

Starting with the second scenario, the server always simply collects incoming audit logs, so its memory can be restored from any number of targets or relay servers that report everything they know (again). Hopefully that will lead to a complete log again. If not, there's not much we can do.

The first scenario is potentially more problematic, since the target has no way of knowing (for sure) at which sequence number it had arrived when everything was lost. In theory it might ask (relay) servers, but even those might not have been up to date, so that does not work. The only thing it can do here is: Start a new log at sequence number 1. That means we can have more than one log in these cases, and that again means we need to be able to identify which log (of each target) we're talking about. Therefore, when a new log is created, it should contain some unique identifier for that log (an identifier that should not depend on stored information, so for example we could use the current time in milliseconds, that should be fairly unique, or just some random number).

How to find the central server? Use the discovery service!? This is not that big of a deal.

Events should at least contain:

The server will add:

Storage will be resolve differently on the server and target. On the target, using any kind of database would amount to having to include a considerable library, which makes these solutions impractical there. We might want to consider something like that for the server though. The options we have, are:

How do events get logged?

Implicit algorithms can be build on top of the AuditLog service. What we need to monitor is the life cycle layer, which basically means adding a BundleListener and an FrameworkListener. Those capture all state changes of the framework. Technically we can either directly add those listeners, or use EventAdmin if that is available.

What would be the best way for the target to send audit log updates to the server? I don't think we want the server to poll here, so the target should send updates (periodically). So how does it know what to send?

Discussion

Having two layers for the audit log makes sense:

On the target we should implement a storage solution ourselves, to keep the actual code small. The code should be able to log events quickly (as that will happen far more often than retrieving them).

Communication between the target and server should be initiated by the target. The target can basically send two commands to the server:

  1. My audit log contains sequence number 4-8, tell me your numbers. The server then responds (for example) with 1-6. This indicates we need to send 7-8.
  2. Here you have events 7-8, can you send me 1-3? The server stores its missing events, and sends you the events it has (always check if what you get is what you requested).

This is setup in this way so the same commands can also be used by relay servers to replicate logs between server and target.

Conclusion