Archive for the ‘NSM’ Category

SANS 2010 Orlando Wrapup

Thursday, March 18th, 2010

Having just come back from the SANS Institute Orlando security conference, I want to share some random thoughts.

  • SANS Instructors and speakers are top rate
  • The invited vendors were pertinent to the conference and added to the experience
  • SANS @ Night and special conferences were very much worth it and provided up to date actionable material.
  • Lots of senior technical personnel from military, industrial, public and commercial interests rounded out the experience.

The standout speakers were Lenny Zelster, Jason Fossen and Ed Skoudis. Their talks all brought unique insight, usable knowledge and were superb speakers. Eric Conrad was an enjoyable instructor, he provided no nonsense feedback on what worked for him in his various jobs.

Network security monitoring and extrusion detection have really gained in the general mindset. As there are no silver bullets, the intruders will gain a foothold and only well varied system and network indicators can highlight anomalous behaviour that can permit a response. It is also clear that response now needs to be more coordinated and all encompassing to lockdown a site before advanced intruders react to partial measures and deploy more advanced malware or burrow deeper.

Main attack vectors where directed Spear-fishing and combination’s of drive-by exploits or business document exploits. For local attacks, any shared layer 2 network is a sitting duck to MiTM attacks which impress by their breadth, variety and tool automation.

On the SIEM side of things, Qradar, LogRythm and Splunk all had products that were well received by various network and security administrators.

On the NSM side of things, Sourcefire had a compelling commercial solution that is similar to SGUIL but with more setup automation and polished approach.

Using black hole routing for network security monitoring - Part2

Thursday, March 18th, 2010

Black hole routing is now your friend. You are forwarding traffic not meant to be routed to a termination point on your network. Let’s look at some advanced ways to get more focused network security data from this traffic. For part 1, skip to the next post.

Advanced network monitoring

  • Forward the black hole routes to a passive Honeypot such as Nepenthes
    • This would entail having either the black hole router or the core router forward the traffic to the Honeypot as the next-hop for all the black holed routes.
    • The honey pot will accept connections and will let the operator know what the evil doer was up to. Ex. C2 channels, File transfers, protocol tunnelling, scanning, exploiting, etc.
  • Reconfigure or implement a DNS black list to point the blocked traffic to specific black hole networks. Traffic can be categorized using pre-built filters in the analysis station.
    • Known malware+spyware black list resolves to x.x.x.A
    • Domains, sub-domains or FQDNs that are unrelated to business needs (.tv, .sex) black list resolves to x.x.x.B
    • Domains, sub-domains or FQDNs considered security risks* (.ru, .cn, .ws, .cm, .info, etc.) black list resolves to x.x.x.C. Mcafee publishes a list of the riskiest TLDs, Mapping the mal web.
    • Blocked web sites due to company policy resolves to x.x.x.D
    • In maintenance or no longer available internal servers resolves to x.x.x.E
    • Other blocked content that is of no interest to network or security analysts resolves to 127.0.0.1
  • Advertise specific black hole routes that target nasty netblocks.
    • If there is a requirement in splitting traffic reaching specific routes, convert the link between the core router and the black hole router to an 802.1Q trunk with sub-interfaces. This way you can advertise each group of nasty routes on it’s own sub-interface. Reporting would know which 802.1Q tag is associated with which group of nasties.
    • The risk with explicitely blocking network blocks or domains, is the user community may not be aware they HAVE been blocked administratively. Such that a web page or email track back saying, hey the ressource you are trying to reach is blocked! Some type of advisory should be published that some domains or netblocks may be blocked, check list or contact your help desk.
  • Graph the traffic hitting each sub-interface or interface on the black-hole router using SNMP and Cricket or Cacti.
  • Run ntop, sancp or argus on the analysis station to get session data.
  • Find a way to generate tickets or events back into analysts consoles for interesting traffic hitting the black hole router.
    • Cleaning most of the above classes of traffic will be beneficial for fixing operational problems.
    • Storing this traffic in your network analyzer can serve as an easy to maintain forensic archive for infection vectors.
    • A very cheap alternative to a honeypot.
    • Forensic history of problematic sources in the managed network.
    • Provides a list of workstations and their associated users that are risks to the managed LAN.
    • An easy way to archive the presence of traffic that was marked for deletion by other systems. (Policy based routing, DNS, proxies)
    • Can serve as a source of automated tickets for ops and events for security folks. As what needs to be fixed is always on the managed side of the LAN or WAN.
    • I would love to hear about other benefits
  • Recap of the benefits

  • Provides a looking glass into misconfigured, unauthorized, unexpected or inexistant traffic or destinations.
  • Cheap and easy to setup
  • Maintenance and operation can be very automated
  • Integrates with existing processes (ticketing, change management, backups, monitoring, logging)

Using black hole routing for network security monitoring - Part1

Saturday, March 6th, 2010

The average Network Administrator has control over all the tools to put in place effective black hole routing and reporting. The simple steps below should get you started and the prerequisites to getting there.

Executive summary

Static black routing and policy based black hole routing are great ways to securely offload traffic from your core routers or firewalls. The traffic that ends up being dropped could be known source or destination networks that should not be routed, but they could also be unexpected, misconfigured or malicious traffic that is being dropped. Keeping an eye on this traffic could pay off big time. NSM at work! But first how to do the basic setup.

Prerequisites

  • A spare router running Cisco IOS 12.2.x and up
  • A spare network tap or mirror port on your core route
  • A spare distributed network analyzer or workstation with wireshark
  • Console port access or out-of-band access to the router via a secure terminal server

An alternate way of processing the incoming traffic directly in a honeypot described here.

Physical setup

  • Connect the network tap to your core-router
  • Connect your black-hole-router to your network tap.
  • Connect your analyzer to the replication port of your network tap.
  • Connect your router to your out-of-band access

Logical setup on a cisco IOS router with a test route

  • Apply secure configuration guidelines to protect your black-hole-router. *
  • Configure a point to point routed interface on black-hole-router and your core-router.
  • Create the appropriate routing process
  • Make the black-hole-router a sub that will only advertise static routes
  • Add a redistribute static statement to your routing process
  • Create a null0 interface
  • On null0 configure no ip unreachable
  • Now create a test black hole route: ip route 10.255.255.1 255.255.255.255 Null0
  • ! Example using eigrp IGP   
    
    !   
    
    ip classless   
    
    !   
    
    interface null0   
    
     no ip unreachable   
    
     no shutdown   
    
    !   
    
    router eigrp 1   
    
     redistribute static   
    
     eigrp stub static   
    
     network x.x.x.x # replace x.x.x.x with the network of the point-to-point interface   
    
     no auto summary   
    
    !   
    
    ip route 10.255.255.1 255.255.255.255 null0
  • Verify the route is showing up in the core router.
  • Send traffic to it from another network location. This traffic should be blackholed on the router. Check with your traffic analyzer, with a debug statement or a logged ACL on your router.

Applying the black hole route on a cisco IOS router Here you need to choose what type of black hole route to use. Can you inject a default black hole route, specific black routes routes or both. Your network may be architected so that you already have a black hole route somewhere that could be migrated to your new setup. You may wish only to black hole routes to certain networks (ie. bogon networks, known “bad net blocks” or AS). Use a bit of DNS foo to blackhole “bad” domains, sub-domains or FQDNs.

! Example using eigrp IGP   

!   

ip route 0.0.0.0 0.0.0.0 null0   

!

By creating a black hole route, all traffic that is now destined to networks with no explicit routing entries will end up on your new black hole router. After applying the route, the first order of business is making sure you have not broken anything during your maintenance window. You are doing this work in your lab or at least in a maintenace windows right, ;-). Make sure you can still reach your application servers, your proxies, external NATs, etc. Once you are satisfied save your configuration on the router.

Now, fire up your network analyzer. Traffic can now be categorized

  • There should be your typical misconfigured hosts, moritoring servers, deprecated DNS entries sending traffic to networks that do not exist.
  • Workstations trying to update software using non-proxied ports from internet vendor servers.
  • Management/monitoring software scanning ports of IPs for auditing purposes
  • And the famous other category

Yeah, fun with “other”. Non browser based applications trying to send web or ftp traffic without going through the designated proxies. Applications trying to reach internet addresses for which there are no proxy services. Having worked hard at designing your network to make use of NAT and proxies, you get to have a much easier time tracking down the above classes of traffic.

Benefits of this easy win approach

  • Cleaning most of the above classes of traffic will be beneficial for fixing operational problems.
  • Storing this traffic in your network analyzer can serve as an easy to maintain forensic archive for infection vectors.
  • A very cheap alternative to a honeypot.
  • Forensic history of problematic sources in the managed network.
  • Provides a list of workstations and their associated users that are risks to the managed LAN.
  • An easy way to archive the presence of traffic that was marked for deletion by other systems. (Policy based routing, DNS, proxies)
  • Can serve as a source of automated tickets for ops and events for security folks. As what needs to be fixed is always on the managed side of the LAN or WAN.
  • I would love to hear about other benefits

* - SANS IOS security guidelines or other security focused templates. Any mistake on this router can have nasty consequences. All your routers should be equally secure, audited, logged, change controlled, monitored and backed up.

Active Directory security event trending as an NSM component

Thursday, March 4th, 2010

NSM Logo

How to use trending tools, like Cricket, Splunk or Cacti, to visualize the Active Directory event log. Time series trending provides a different perspective from the typical SIEM. It is also useful for capacity planning, security analysis and operations.

First, getting Event log data to a trending tool

  • AD Domain Controller global auditing policy enabled (applied on each DC)
  • AD Domain Controller reset group policy audit inherentance for Windows 2003 DCs so the audit policies apply down to actual objects. There is a third party tool available to do this.
  • AD Domain Controller auditing policies for specific GPOs enabled
  • Exporting Events from Windows to the log server
  • Normalizing timestamps on the syslog server
  • Generating event counts every X minutes
  • Generating host/user/group specific event counts every X minutes
  • Collecting the information in the trending Tool (Cricket, Cacti, Splunk)

A little magic goes a long way. Let’s lift the covers and visualize security events that would normal end up in event log wasteland or in Yet Another $$ management console.

AD Domain Controller logging security events

Turn on auditing for interesting events. The point is not to graph everything, it is to graph useful data. I have compiled a list of event log event types that would benefit from graphing. Examples of specific evenIDs or Error Codes are listed at the bottom of the post.

Second, get the data to your syslog server

    The windows event log of the Domain Controller need to be exported to a syslog server. Snare is a great open-source tool for just this purpose. Windows Remote Event collection could also be used to import the data into a management server with Snare, but events can’t be pre-filtered on the AD. It is also possible to use the WMI/DCOM interface or a Splunk agent if the plan is to use Splunk to do the collection and processing.The transport method should be taken into consideration, if the intermediate networks could be compromised, use an IPSec tunnel to transport the syslog data back to the management server.

Third, normalize the syslog data

  • On the syslog server, process the incoming logs to make sure they have consistent timestamps.
  • Create a script to process incoming data to identify instances of events based on EventID or Error Code in addition to any group object fields(example Deletion or Creation)for actions related to an event. You need to do a bit of regex magic at this point to count and collect the data for the various event log message.
  • If using Cricket or Cacti create a script that counts the event types and outputs the value. The output could be piped directly to the collection process or sent to a flat file(recommended) for later retrieval.
  • If using Splunk the timestamps should be automatically normalized. The next step would be to create a search filter that will find and add up all the matched events. Create a report with each family of data to display. Look for some Splunk foo in the future to show how to do it.
  • Run your trending tools collection process as an exec type script or read the flat file input.
  • Apply thresholds, abherrant behaviour detection or other statistical methods to flag problems.

Visualizing events

To make the most of the data, it needs to be optimized for display. Each family of events can be stacked together so they can be viewed in a single timeline. Data can be presented using the negative and positive axis to differentiate events such as logins and logouts.

Windows Audit Event Families of Interest

  • Login
  • Logout
  • Login Types
  • Authentication failures
  • Kerberos error codes
  • Domain logs cleared
  • Account creations/deletion from Account Management Audit Policy
  • Delegation of admin authority
  • File access/changes/creation
  • Group Policy changes
  • Root write gpLink or gpOptions
  • Number of client sessions
  • Numberof DCs
  • Numberof GCs
  • SyncRequest
  • Trust and relationship
  • Inbound replication statistics
  • Outbound replication statistics

This provides a global view of Active Directory events. The specific codes of each family can be obtained from Microsoft KB articles and also from Randy Franklin Smith’s Ultimate Windows Security.com website. I have listed a few codes below, but there are many more on Randy’s web site. He also has a nifty web interface to navigate the various object. An amazing site.

Drilling down

Security views can be created based on asset criticality. Select certain hosts, users or groups of hosts to be displayed individually. This way you can get a macro view and a micro view of what you(or management) may find is important.

Error and EventIds for the various event families

Kerberos authentication failures

Error code and description

  • 6 The username doesn’t exist.
  • 12 Workstation restriction; logon time restriction.
  • 18 Account disabled, expired, or locked out.
  • 23 The user’s password has expired.
  • 24 Pre-authentication failed; usually means bad password
  • 32 Ticket expired. This is a normal event that get frequently logged
    by computer accounts.
  • 37 The workstation’s clock is too far out of synchronization with
    the DC’s clock.

NTLM Error codes

    DEC HEX and description

  • 3221225572 C0000064 user name does not exist
  • 3221225578 C000006A user name is correct but the password is wrong
  • 3221226036 C0000234 user is currently locked out
  • 3221225586 C0000072 account is currently disabled
  • 3221225583 C000006F user tried to logon outside his day of week or
    time of day restrictions
  • 3221225584 C0000070 workstation restriction
  • 3221225875 C0000193 account expiration
  • 3221225585 C0000071 expired password
  • 3221226020 C0000224 user is required to change password at next logon

Logon should be displayed on the positive axis and Logoffs on the negative axis.

  •  
    • 528 Successful Logon
    • 540 Successful Network Logon (Windows 2000, XP, 2003 Only)
    • 529 Logon Failure - Unknown user name or bad password
    • 530 Logon Failure - Account logon time restriction violation
    • 531 Logon Failure - Account currently disabled
    • 532 Logon Failure - The specified user account has expired
    • 533 Logon Failure - User not allowed to logon at this computer
    • 534 Logon Failure - The user has not been granted the requested
      logon type at this machine
    • 535 Logon Failure - The specified account’s password has expired
    • 539 Logon Failure - Account locked out
  • EventID and Description

    Other security events from a Domain Controller

  • 675 Audit account logon events
    Event 675 on a domain controller indicates a
    failed initial attempt to logon via Kerberos at a
    workstation with a domain account usually due
    to a bad password but the failure code indicates
    exactly why authentication failed. See Kerberos
    failure codes below.
  • 676 or
    Failed 672
    Audit
    account logon
    events
    Event 676 gets logged for other types of failed
    authentication. See Kerberos failure codes below.
    NOTE: Windows 2003 Server logs a failed event
    672 instead of 676.
  • 681 or
    Failed 680
    Audit account
    logon events
    Event 675 on a domain controller indicates a
    failed logon via NTLM with a domain account.
    Error code indicates exactly why authentication
    failed. See NTLM error codes below. NOTE:
    Windows 2003 Server logs a failed event 680
    instead of 681.
  • 642 Audit account
    management
    Event 642 indicates a change to the specified user
    account such as a reset password or a disabled
    account being re-enabled. The event’s description
    specifies the type of change.
  • 632, 636,
    660
    Audit account
    management
    All 3 events indicate the specified user was added
    to the specified group. Group scopes Global,
    Local and Universal correspond to the 3 event IDs.
  • 624 Audit account
    management
    New user account was created.
  • 644 Audit account
    management
    Specified user account was locked out after
    repeated logon failures.
  • 517 Audit system events
    The specified user cleared the security log.On Log analysis

    If you are only doing log visualization as described above with no log retention or analysis this would be your next step. What to do with them is a wide open debate. Here is one possible scenario for thos starting out.

    • Splunk as a log retention, analysis, reporting and alerting tool for both security and operations
    • Splunk can be fed operational and/or security events from pretty much anything
    • If more security filtering and processing is required, a SIM/SIEM type service could be setup in parallel
      Event visualization is an indicator to the same extent an IDS alert is an indication to look deeper, it is never an ends. I also advocate leveraging existing tools for quick wins. It may be that a company wants or needs a really advanced log analysis tool with pre-built logic and expert systems. But as I like to say, one step at a time, selecting tools that may cost hundreds of thousands of dollars over their lifetime should not be done without understanding what’s what. Cheers.

DNS event trending as an NSM component

Wednesday, March 3rd, 2010

Cricket Logo

Making use of time series trending for network security monitoring. A short history with examples.

Time series trending tools like Cricket, Cacti and Torrus focus on performance and availability.

Operational teams use these day in day out, the tools are there to organize and display reams of data in a quick and painless way.Time series trending is a source of indicators.

Strengths of time series trending

  • detailed baseline
  • displays seasonality
  • highlights anomalies
  • illustrates subtle changes
  • provisioning planning

Strengths of time series trending using RRD databases

  • fast visualization - anyone having had to suffer SQL based trending tools can attest
  • low maintenance
  • graphing flexibility
  • capabilities can be extended
  • no vendor lockin or annual maintenance fees

Making use of these strengths to identify threats is a secret recipe. Yeah, reaaally.

Here are some DNS based trends that can help quantify, understand and defend or cleanup against extrusion attemps or malware/botnet command and control communications.

  • Trending the number of hits against security related blacklisted entries
  • Trending the number of hits against .cn, .ru, etc. specific domains
  • Trending the number of hits against honeypot entries
  • Trending the number of hits by source IP

The data value can be extended by doing a bit of munging on the interesting bits.

  • Google foo, to automate searchs of the IP addresses or DNS names to see if they are related to specific malware or botnets. This will help prioritize the cleaning efforts.
  • Anomaly detection such as Holt-Winters smoothing with confidence intervals to identify anomalies in seasonality. Which, for the non-initate means: sudden traffic pattern changes such as traffic going 0 during normal hours or high usage during off-hours. Things that may not trigger a threshold, but that are out of seasonality.
  • The information could also be reported in dashboards on a Splunk server for operational teams or management. Trending data over the long term also unshackles the administrator from defining hard limits in time of day use for what is normal and what is not. You can still define hard limits which can generate security events, but analysis is made much easier by seeing the whole time series.

    Once identified the vector of the outbreak needs to be cleaned before nasty malware can move in (think Zeus variant).