Active Directory security event trending as an NSM component
How to use trending tools, like Cricket, Splunk or Cacti, to visualize the Active Directory event log. Time series trending provides a different perspective from the typical SIEM. It is also useful for capacity planning, security analysis and operations.
First, getting Event log data to a trending tool
- AD Domain Controller global auditing policy enabled (applied on each DC)
- AD Domain Controller reset group policy audit inherentance for Windows 2003 DCs so the audit policies apply down to actual objects. There is a third party tool available to do this.
- AD Domain Controller auditing policies for specific GPOs enabled
- Exporting Events from Windows to the log server
- Normalizing timestamps on the syslog server
- Generating event counts every X minutes
- Generating host/user/group specific event counts every X minutes
- Collecting the information in the trending Tool (Cricket, Cacti, Splunk)
A little magic goes a long way. Let’s lift the covers and visualize security events that would normal end up in event log wasteland or in Yet Another $$ management console.
AD Domain Controller logging security events
Turn on auditing for interesting events. The point is not to graph everything, it is to graph useful data. I have compiled a list of event log event types that would benefit from graphing. Examples of specific evenIDs or Error Codes are listed at the bottom of the post.
Second, get the data to your syslog server
- The windows event log of the Domain Controller need to be exported to a syslog server. Snare is a great open-source tool for just this purpose. Windows Remote Event collection could also be used to import the data into a management server with Snare, but events can’t be pre-filtered on the AD. It is also possible to use the WMI/DCOM interface or a Splunk agent if the plan is to use Splunk to do the collection and processing.The transport method should be taken into consideration, if the intermediate networks could be compromised, use an IPSec tunnel to transport the syslog data back to the management server.
Third, normalize the syslog data
- On the syslog server, process the incoming logs to make sure they have consistent timestamps.
- Create a script to process incoming data to identify instances of events based on EventID or Error Code in addition to any group object fields(example Deletion or Creation)for actions related to an event. You need to do a bit of regex magic at this point to count and collect the data for the various event log message.
- If using Cricket or Cacti create a script that counts the event types and outputs the value. The output could be piped directly to the collection process or sent to a flat file(recommended) for later retrieval.
- If using Splunk the timestamps should be automatically normalized. The next step would be to create a search filter that will find and add up all the matched events. Create a report with each family of data to display. Look for some Splunk foo in the future to show how to do it.
- Run your trending tools collection process as an exec type script or read the flat file input.
- Apply thresholds, abherrant behaviour detection or other statistical methods to flag problems.
Visualizing events
To make the most of the data, it needs to be optimized for display. Each family of events can be stacked together so they can be viewed in a single timeline. Data can be presented using the negative and positive axis to differentiate events such as logins and logouts.
Windows Audit Event Families of Interest
- Login
- Logout
- Login Types
- Authentication failures
- Kerberos error codes
- Domain logs cleared
- Account creations/deletion from Account Management Audit Policy
- Delegation of admin authority
- File access/changes/creation
- Group Policy changes
- Root write gpLink or gpOptions
- Number of client sessions
- Numberof DCs
- Numberof GCs
- SyncRequest
- Trust and relationship
- Inbound replication statistics
- Outbound replication statistics
This provides a global view of Active Directory events. The specific codes of each family can be obtained from Microsoft KB articles and also from Randy Franklin Smith’s Ultimate Windows Security.com website. I have listed a few codes below, but there are many more on Randy’s web site. He also has a nifty web interface to navigate the various object. An amazing site.
Drilling down
Security views can be created based on asset criticality. Select certain hosts, users or groups of hosts to be displayed individually. This way you can get a macro view and a micro view of what you(or management) may find is important.
Error and EventIds for the various event families
Kerberos authentication failures
Error code and description
- 6 The username doesn’t exist.
- 12 Workstation restriction; logon time restriction.
- 18 Account disabled, expired, or locked out.
- 23 The user’s password has expired.
- 24 Pre-authentication failed; usually means bad password
- 32 Ticket expired. This is a normal event that get frequently logged
by computer accounts. - 37 The workstation’s clock is too far out of synchronization with
the DC’s clock.
NTLM Error codes
- DEC HEX and description
- 3221225572 C0000064 user name does not exist
- 3221225578 C000006A user name is correct but the password is wrong
- 3221226036 C0000234 user is currently locked out
- 3221225586 C0000072 account is currently disabled
- 3221225583 C000006F user tried to logon outside his day of week or
time of day restrictions - 3221225584 C0000070 workstation restriction
- 3221225875 C0000193 account expiration
- 3221225585 C0000071 expired password
- 3221226020 C0000224 user is required to change password at next logon
Logon should be displayed on the positive axis and Logoffs on the negative axis.
-
- 528 Successful Logon
- 540 Successful Network Logon (Windows 2000, XP, 2003 Only)
- 529 Logon Failure – Unknown user name or bad password
- 530 Logon Failure – Account logon time restriction violation
- 531 Logon Failure – Account currently disabled
- 532 Logon Failure – The specified user account has expired
- 533 Logon Failure – User not allowed to logon at this computer
- 534 Logon Failure – The user has not been granted the requested
logon type at this machine - 535 Logon Failure – The specified account’s password has expired
- 539 Logon Failure – Account locked out
EventID and Description
Other security events from a Domain Controller
- 675 Audit account logon events
Event 675 on a domain controller indicates a
failed initial attempt to logon via Kerberos at a
workstation with a domain account usually due
to a bad password but the failure code indicates
exactly why authentication failed. See Kerberos
failure codes below. - 676 or
Failed 672
Audit
account logon
events
Event 676 gets logged for other types of failed
authentication. See Kerberos failure codes below.
NOTE: Windows 2003 Server logs a failed event
672 instead of 676. - 681 or
Failed 680
Audit account
logon events
Event 675 on a domain controller indicates a
failed logon via NTLM with a domain account.
Error code indicates exactly why authentication
failed. See NTLM error codes below. NOTE:
Windows 2003 Server logs a failed event 680
instead of 681. - 642 Audit account
management
Event 642 indicates a change to the specified user
account such as a reset password or a disabled
account being re-enabled. The event’s description
specifies the type of change. - 632, 636,
660
Audit account
management
All 3 events indicate the specified user was added
to the specified group. Group scopes Global,
Local and Universal correspond to the 3 event IDs. - 624 Audit account
management
New user account was created. - 644 Audit account
management
Specified user account was locked out after
repeated logon failures. - 517 Audit system events
The specified user cleared the security log.On Log analysisIf you are only doing log visualization as described above with no log retention or analysis this would be your next step. What to do with them is a wide open debate. Here is one possible scenario for thos starting out.- Splunk as a log retention, analysis, reporting and alerting tool for both security and operations
- Splunk can be fed operational and/or security events from pretty much anything
- If more security filtering and processing is required, a SIM/SIEM type service could be setup in parallel
- Event visualization is an indicator to the same extent an IDS alert is an indication to look deeper, it is never an ends. I also advocate leveraging existing tools for quick wins. It may be that a company wants or needs a really advanced log analysis tool with pre-built logic and expert systems. But as I like to say, one step at a time, selecting tools that may cost hundreds of thousands of dollars over their lifetime should not be done without understanding what’s what. Cheers.