≡ Menu

Network

Splunk vs ELK

If you are in IT Operations in any role, you have probably come across either Splunk or ELK, or both. These are two heavyweights in the field of Operational Data Analytics. In this blog post, I’m going to share with you what I feel about these two excellent products based on my years of experience with them.

The problem Splunk and ELK are trying to solve: Log Management

While there are fancier terms such as ‘Operational Data Intelligence’, ‘Operational Big Data Analytics’, ‘Log data analytics platform’, the problem both Splunk and ELK are trying to solve is Log Management. So, what’s the challenge with Log management?

Logs, logs, logs and more logs

The single most important piece of troubleshooting data in any software program is the log generated by the program. If you have ever worked with vendor support for any software product, you have been inevitably asked to provide – you guessed it, Log files. Without the log files, they really can’t see what’s going on.

Logs not only contains information about how the software program runs, they may contain data that are valuable to business as well. Yeap, that’s right. For instance, you can retrieve wealth of data from your Web Server access logs to find out things like geographical dispersion of your customer base, most visited page in your website, etc.

If you are running only a couple of servers with few applications running on them, accessing and managing your logs are not a problem. But in an enterprise with hundreds and even thousands of servers and applications, this becomes an issue. Specifically,

  1. There are thousands of log files.
  2. The size of these log files run in Giga or even Terra bytes.
  3. The data in these log files may not be readily readable or searchable (unstructured data)

Sources_of_logfiles (4)

Both Splunk and ELK attempt to solve the problem of managing ever growing Log data. In essence, they supply a scalable way to collect and index log files and provide a search interface to interact with the data. In addition, they provide a way to secure the data being collected and enable users to create visualizations such as reports, dashboards and even Alerts.

Now that you know the problem Splunk and ELK are attempting to solve, let’s compare them and find how they are achieving this. I’m going to compare them in 4 areas as follows:

Technology

Cost

Features

Learning Curve for the operations team

Got it ? I can’t wait to share. Let’s dive in.

Splunk_image

VS

ElasticSearch Logo

Technology

Witnessing C++ vs Java has never been more exciting

While Splunk is a single coherent closed-source product, ELK is made up of three open-source products: ElasticSearch, LogStash, and Kibana.

Both Splunk and ELK store data in Indexes. Indexes are the flat files that contain searchable log events.

Both Splunk and ELK employ an Agent to collect log file data from the target Servers. In Splunk this Agent is Splunk Universal Forwarder. In ELK, this is LogStash (and in the recent years, Beats). There are other means to get the data in to the indexes, but the majority of the use-cases will be using the Agents.

common_components

While Splunk uses a proprietary technology  (primarily developed in C++) for their indexing, Elastic Search is based on Apache Lucene, an open source technology written fully in Java.

On the Search interface side, Splunk employs a Search Head, a Splunk instance with specific functions for searching. ELK uses Kibana, an open source data visualization platform. When it comes to creating visualizations, in my opinion, Splunk makes Kibana look plain. (Note: It is possible to use Grafana to connect to ELK to visualize data. Some believe Grafana visualizations are richer than Kibana). With recent versions of Kibana, you also have Timelion, a time series data visualizer.

For querying, while Splunk uses their proprietary SPL (Splunk Porcessing Lanugage, with syntax that resembles SQL-like statements with Unix Pipe), ELK uses Query DSL with underlying JSON formatted syntax.

Let me summarize the technical info in the table below.

Screen Shot 2017-12-08 at 6.06.27 AM.png

In the end

Both Splunk and ELK are fundamentally very sound in Technology. Though one can argue one way or the other, the longevity  of these two products in the marketplace prove that they both are indeed superior in their own way. However, Splunk differs in the crucial Schema on read technology.

With Schema on read, there is minimal processing required before indexing. In fact you can throw anything at Splunk as long as Splunk can determine the Host, Source (File where data is coming from) and Source Type (a meta field that helps Splunk to determine the type of the log file, manually/automatically determined).  The fields are generally determined ONLY at the search time.

However, with ELK, you must provide the field mapping ahead of time (before indexing). One can certainly argue that this is not necessarily bad. But I’m going to leave it up to the community to decide.

Cost

Is Open-Source really free?

Cost of the software: ELK is free. Splunk is not.

Splunk’s license fee is based on Daily Log Volume that is being indexed.  For example, you may buy a 1TB license which will let you ingest up to 1TB per day. There is no cost for keeping the historic data. It is only the daily volume that counts (the License Meter resets at midnight every day). Further the cost is NOT based on the number of users or number of CPU cores either. You can get either a Term license, which you pay per year. Or you can get a perpetual license, which is just one time fee plus annual support fee (if any).

I’m unable to give you a dollar figure as it greatly various based on geographic location of your business, and obviously on the data volume (and the sales team you are working with :-)). But in general, compared to other commercial products in the market (SumoLogic, Loggly, Sematext etc), Splunk *may* be on the expensive side. (Again, too many variables to give you a black-and-white answer).

ELK is open source. You pay nothing for using the software.

But, and this is a big but, the cost of ownership is not just the cost of software. Here are other costs to consider.

  1. Cost of Infrastructure. Both Splunk and ELK would require similar hardware infrastructure.
  2. Cost of implementing the solution. This is a big one. For example, when you purchase Splunk, you might get some consulting hours that you can use to implement your solution. With ELK, you are on your own.
  3. Cost of ongoing maintenance: This can also be a big one. Once again, you might get some support hours from Splunk, but with ELK, you are on your own.
  4. Cost of add-ons and plugins: Both Splunk and ELK have plugin/add-on based solutions that extend the fuctionality. Some are free and some are not. For example, you will have to pay for Shield (ELK Security) and ITSI (Splunk IT Service Intelligence)

In the end

Yes, Open source is free. But is it free, as in free ? The biggest problem you will face, as an evangelist of ELK in your organization, is coming up with a dollar amount of the cost. As for Splunk, you have to be able to convince your organization of the cost. At least in this case, the cost is predictable.

Features

Looking for somethin? There is an app for it.

Both Splunk and ELK have myriad of features. When I say feature, it can be any of the following:

  1. Support for a certain type of data input. For example, does it allow data input via HTTP, or a script ? So, earlier when I said both Splunk and ELK employ an agent to collect data, I lied. Both the products support several other means of getting data in.
  2. A data visualization functionality. For example, does it allow creating custom dashboards, reports, etc? How feature rich are they ?
  3. Integration with other products/frameworks. For example, can it send/receive data from APM products such as NewRelic, Dynatrace or AppDyanmics ? Can it send/receive data from Hadoop ? Both the products integrate well with many major platforms.
  4. Security Features: Does it support Role Based Auth, two-way SSL or Active Directory Integration? With Splunk, security is available out of the box. But with ELK, you pay for Sheild (or Xpack in the recent versions)
  5. Data manipulation: How easy is it to modify the data being ingested? Can I mask sensitive information readily? Splunk provides powerful regular expression based filters to mask or remove data. Same can be achieved with Logstash in ELK world.
  6. Extensibility: Can we easily extend the product by writing our own solutions?
  7. Metrics Store: Indexing text (log files) is one thing. But indexing Metrics (numerical data) is another thing. The performance of indexing and search is astronomically higher on a Time series Metrics index. Splunk has introduced this in their Version 7.
  8. Agents Management: How are you going to manage hundreds or thousands of Beats or Splunk Universal Forwarders ? While ansible or chef can be used with both the products, Splunk has an advantage of letting you manage the universal forwarders using their Deployment Manager (A Splunk instance with a specific function)

In the end

Since both ELK and Splunk have a strong user community and good extensibility, there is no shortage of plugins and add-ons. (In Splunk world, there is the notion of apps).

splunkbase.splunk.com (https://splunkbase.splunk.com/)

Screen Shot 2017-12-10 at 11.58.31 AM

Elastic Search Plugins (https://www.elastic.co/guide/en/elasticsearch/plugins/current/index.html)

Screen Shot 2017-12-10 at 4.42.51 PM

Learning Curve for the operations team

From 0 to 60mph in 3 seconds. Really?

Both Splunk and ELK have massive product documentation. Perhaps too much documentation if you want to get started rapidly. The learning curve for both the products is steep.

For both the products, a solid understanding of Regex (Regular Expressions), Scripting (Shell/Python and the like) and TCP/IP is required.

For performing searches, you must learn SPL (Splunk Processing Language) for Splunk, and Query DSL for Elastic Search. SPL is like unix pipe + SQL. It has got tons of commands that you can use. With Query DSL, the queries are formatted in Json. In both the products, the search language can easily overwhelm a new user. Just because of the sheer amount of features that SPL provides, Splunk can be much more intimidating than ELK. (In fact, there are 143 search commands in SPL in their Splunk Enterprise 7.0.1).

Creating visualizations also require some learning. Here again, Splunk proivdes more features and might look more intimidating than Kibana for a new user. Note that you can also  use Grafana to connect to ELK to visualize data.

Perhaps the biggest hurdle you will face with Splunk is the server side Configuration and Administration. The entire product is configured using bunch of .conf files. One will need intimate knowledge of the specification of these configuration files in order to implement and support Splunk. While ELK does require some reading on server side setup, it’s not nearly as much as Splunk.

In the end

Spunk does have a steeper learning curve compared to ELK. But whether it is a showstopper for you or not is something you have to decide. You will have to invest in few resources with solid Splunk experience if you want to implement and support the solution correctly.

So, there you have it. Splunk Vs ELK at a super high level. I haven’t gone deep in technical aspects for brevity. But there is plenty of documentation for both Splunk and ELK online. Just get lots of coffee before you begin 🙂

Let me know what you think.

Happy Monitoring !

 

How to use AppDynamics to monitor Server health?

Yes, AppDynamics is awesome for Application monitoring – Java Heap, deep transaction tracing, tons of out-of-the-box framework monitoring (JDBC,WebService etc) and the list goes on. But do you know Appdynamics can be used to effectively monitor Servers too, whether it is virtual or physical? When I say server, I mean the host operating system such as RedHat Enterprise Linux, Windows 2012, Solaris etc. Let me show you how you can do this.

Enter AppDynamics Machine Agent

While Java can be monitored using a Java Agent, a Server can be monitored using a special type of agent called Machine Agent. You will have to have license to run these agents (When you purchase Application agents, typically AppDyanmics throws the same number of Machine Agents, and so you should be good in terms of additional cost). If you are not sure about your present licensing situation, click on ‘licensing’ in your Controller UI as shown below.

Unlike Application agents which run inside the JVM/CLR, Machine agent is a standalone Java program that runs on the host operating system. It collects hardware metrics and sends them to Controller (once a minute). A user can view these metrics via Controller UI. Pretty simple, hah?

Read More

What is SYN_SENT socket status?

When dealing with Network issues, the command ‘netstat’ can be very handy.

netstat -an

or

netstat -an | grep “remote ip”

shows all the sockets in the system. Each socket has various status. For example, a socket can be in ‘ESTABLISHED’ status or in ‘LISTENING’ status.

One important status you might come across is ‘SYN_SENT’. When you see a socket in this status, it most probably indicates a Firewall issue, i.e the remote host you are trying to reach is NOT reachable due to a firewall block.

Note that the SYN_SENT status will not remain for long time. It only lasts for couple of seconds. So, you have to be quick in running the netstat command (perhaps in another terminal window)

When the client initiates a connection to Server, it first sends a SYN package. At this point the socket status changes to ‘SYN_SENT’. If the remote server is reachable and working, the client will receive a ‘SYN + ACK’, for which the client will send a ‘ACK’ and thus forms a TCP connection.

Network Security Attacks

There are several types of Network Security Attacks as described below:

  1. SYN Flood

    Here the attacker sends a SYN request from a spoofed source address. When the server responds with SYN-ACK, the source never replies back leaving the server handing with a half-open connection (Typically the client sends a SYN-ACK-ACK to complete the three way handshake). Half open connections consume resources eventually degrading the performance of the server.

    CISCO routers employ ‘TCP Intercept’ and ‘CAR – Committed Access Rate’ features to combat SYN-FLOOD. You can also change the default setting for the ‘maximum number of half-open TCP connections’

  2. UDP Flood

    Here the server is flooded with UDP requests, degrading the performance of the Server

  3. ICMP Flood

    Here the Server is flooded with ‘echo’ requests (which is an ICMP request), degrading the performance of the Server.

    It is best to drop ICMP packets at the router or Firewall.

  4. Smurf

    Here the attacker sends ICMP request packets to the broadcast address of the target network using a spoofed source address. The target responds with a echo request to all the hosts in the network, eventually overwhelming the network.

  5. Fraggle

    A flavor of Smurf attack which uses UDP Echo packets (UDP Port 7) instead of ICMP packets. Cisco routers can be configured to disable the TCP and UDP services (TCP and UDP small servers) to defend against Fraggle

  6. Bluejacking and bluesnarfing

    Here the Bluetooth enabled devices are attached. In Bluejacking, unsolicited messages are sent. In Bluesnarfing, personal information such as pictures and contacts, and cell phone information such as serial numbers are stolen.

Read More

Protecting Wireless Networks using WEP,WPA and WPA2

Wired Equivalent Privacy:

The intention of WEP (Wired Equivalent Privacy) was to provide the same level of security as in Wired Networks. But it fell short greatly.

WEP uses 128bit key (with 24 bit initialization vector) which is very easy to crack. It uses RC4 (Rivest Cipher 4) stream cipher.

Two modes:

Open Systems Authentication:

No need of credentials from the client. After the initial association with AP (Access Point), WEP encrypts the whole conversation.

Shared Key Authentication:

Requires Client to present credentials to connect to AP before the encryption beings.

WEP can be enhanced by using ssh or tunneling.

WiFi Protected Access (WPA and WPA2):

WPA uses TKIP(Temporal Key Integrity Protocol), a sequence counter to prevent replay attacks and a 64 bit message integrity check. It combines a secret root key with initialization vector.

WPA2 uses AES with Cipher Block chaining message Authentication code Protocol (CCMP).

Both WPA and WPA supports several EAP extensions such as EAP-TLS, EAP-TTLS (Tunneled Transport Layer Security) and Protected EPA (PEAPv0,v1)

VPN (Virtual Private Network) Security Protocols

Commonly used VPN security technologies are:

  1. Point to Point Tunneling Protocol (PPTP)
  2. Layer2 Forwarding Protocol (L2F)
  3. Layer2 Tunneling Protocol (L2TP)
  4. IPSec
  5. SSL

Point to Point Tunneling Protocol (PPTP):

  1. Uses PAP, CHAP, EAP
  2. Typically used in dial-up connections in Windows platform
  3. Operates at Data Link Layer

Layer 2 forwarding Protocol (L2F):

  1. Developed by CISCO
  2. Similar to PPTP
  3. Operates at Data Link Layer

Layer 2 tunneling Protocol (L2TP)

Read More

Remote access security technologies

There are 5 major remote access security technologies

  1. RAS (Remote Access Service)
  2. Radius
  3. Diameter
  4. TACACS

RAS (Remote Access Service):
Uses PPP (Point to Point Protocol) to secure dial-in, ISDN and serial links. Uses the following authentication mechanisms.

PAP (Password Authentication Protocol):

  1. Two way hand shake
  2. Sends passwords in clear text
  3. No protection against replay or brute force attacks

CHAP (Challenge Handshake Protocol):

  1. Uses three way hand shake
  2. Both server and client need to have a shared secret preconfigured
  3. Shared secret is stored in clear text. MS-CHAP allows the shared secret to be stored in encrypted form

EAP (Extensible Authentication Protocol):

  1. Used primarily in Wireless networks
  2. Supports various authentication mechanisms like MD5-Challenge, S/Key, generic token card and digital Certificates

 

RADIUS (Remote Authentication Dial-In Service)
  1. Open-Source UDP based.
  2. Provides authentication and accountability
  3. Use provides username/password to a RADIUS client using PAP or CHAP. Radius client encrypts password and sends to RADIUS Server for authentication

 

DIAMETER
  1. Improved version of RADIUS
  2. Uses TCP. Supports IPSsec, TLS

 

TACACS (Terminal Access Controller Access Control System):
  1. Uses UDP. Provides Authentication, Authorization and Accountability
  2. XTACACS is an improved version but no longer used
  3. TACACS+ is the current version. Supports several authentication mechanisms – PAP,CHAP,MS-CHAP,EAP,KERBEROS,Token Cards

Firewall Classifications and Architectures

Classifications of Firewalls:

  1. Packet Filtering
  2. Circuit Level Gateway
  3. Application Level Gateway

Architectures of Firewall:

  1. Screening Router
  2. Dual Homed Gateway
  3. Screened-Host Gateway
  4. Screened Subnet

 

Classification

Description

Advantages

Disadvantages

Packet Filtering

Basic. Operates at Network or Transport Layers. Examines TCP,IP,ICMP,UDP headers from the packet and routes based on a firewall ACL

  1. In expensive and Fast
  2. Easy to setup
  3. Transparent to users
  1. No Context level routing
  2. Can be hit by Spoofing
  3. Limited Logging
  4. No strong user authentication

Circuit Level Gateway

Operates at Session Layer. Uses state information about the established connections. Once the virtual circuit is formed, no packet analysis is done.

  1. Fast
  2. Low maintenance
  1. Limited Logging.
  2. Once connection is established, no further analysis is done

Application Level Gateway

Operates at Application Layer. Implemented as a Proxy Server.

  1. Supports Strong user authentication
  2. Data is not directly sent to the destination.
  1. Low performance because packet needs to be brought all the way up to Application layer for analysis
  2. High maintenance.

 

Architecture

Description

Advantages

Disadvantages

Screening Router

Basic Packet Filtering Firewall

  1. Cheap
  2. Transparent to users
  1. Makes internal Network structure complex
  2. No user authentication
  3. Single point of failure

Dual homed Gateway

It is bastion host with two network interface cards. It may be connected to an external screening router

  1. Fail safe mode. If it fails, nothing is allowed access
  2. Internal network structure is masked
  1. Additional auth required for users
  2. May slow down performance
  3. May not be available for all services.

Screened Host Gateway

External Screening router and internal Bastion Host.

  1. Transparent outbound access and restricted inbound access
  1. Screening router can by-pass the Bastion host
  2. Masking internal network is difficult

Screened subnet

Most secure. Forms a DMZ network between external and internal firewall

  1. Transparent, flexible
  2. Internal Network is masked
  1. Difficult to maintain
  2. Expensive

 

Layer 5,6,7 protocols (higher level protocols)

Here are the protocols commonly used in higher levels (5,6,7 of the OSI model)

Layer 5 (Session):

  1. NetBIOS
  2. NFS
  3. RPC
  4. SSH
  5. SIP

Layer 6: (Presentation):

  1. ASCII
  2. ENCDIC
  3. MPEG
  4. JPG
  5. GIF

Layer 7 (Application):

  1. FTP,TFTP
  2. SNMP
  3. SMTP
  4. MIME, S/MIME
  5. HTTP,HTTPS,S-HTTP
  6. POP3,IMAP
  7. PEM
  8. TELNET
  9. S-RPC

IP address classes

IP (Internet protocol) is a Network Layer protocol (Layer 3) that considered ‘routed’ protocol. It addresses the Network Packets so that routing protocols like OSPF,BGP and RIP can correctly route the packet.

IP defines the IP addresses. IP address is a 32 bit number (4 octets). It comprises of Network and Host numbers. The higher order bits define the Network number as shown below.

There are 5 classes of IP addresses:

 

 

Class

Leading bits

Size of network
Number bit field

Size of rest bit field

Number of Networks

Addresses Per Network

Start address

End address

A

0

8

24

128 (27)

16,777,216 (224)

0.0.0.0

127.255.255.255

B

10

16

16

16,384 (214)

65,536 (216)

128.0.0.0

191.255.255.255

C

110

24

8

2,097,152 (221)

256 (28)

192.0.0.0

223.255.255.255


 

             

Class D is defined as Multicast. Address Range: 224 – 239

Class E is experimental. Address Range: 240 – 254

 

127.0.0.1 to 127.255.255.255 is defined as loop back address range.

 

Also, a range of IP addresses are reserved for Private use (i.e not routable in internet). They are

 

Class A

10.0.0.0 – 10.255.255.255

Class B

172.16.0.0 – 172.31.0.0

Class C

192.168.0.0 – 192.168.255.255

 

IPV6 uses 128 bit addresses and primarily introduced to address the depleting IPV4 addresses.