PGbase: forwarder

Thursday, October 20, 2016

How to setup Splunk Search Head Cluster?

If you already know what is Splunk, and are interested in setting up your own Search Head Cluster, continue reading.

For this, the environment will be:

1 Deployer – sends apps/configurations to the search heads
3 Search Heads – for the SHC
1 Indexer – the “search peer” that the SHC will dispatch jobs to
1 Forwarder – for testing data input from the TA/App into the indexer

Sizing wise you could make them all VMs. Something reasonably small could be as follows for each system – with the Deployer and Forwarder being much smaller.

4 cores
8GB RAM
60GB disk

Once you have all your machines ready, follow steps given below. My steps consider linux-based setups, but you can do it on any other Splunk-supported OS. Make sure to change paths accordingly.

0) If you haven't done already, change the default admin password 'changeme' to something else. Any of the SHC setup commands will not work properly if your admin password is the default one.

1) on Deployer:
in /opt/splunk/etc/system/local/server.conf add following line under [general] stanza, write following line.
pass4SymmKey = yourKey

Replace yourKey with your plaintext key. Do not worry, Splunk will definitely encrypt it later.

2) Initialize all search head clusters:
On each SH, run these commands -
/opt/splunk/bin/splunk init shcluster-config -auth admin:splunk -mgmt_uri <mgmt uri of this setup> -replication_port <any unusual port like 20000> -conf_deploy_fetch_url <mgmt uri of deployer> -secret yourKey
/opt/splunk/bin/splunk restart

Now, at this point, each SH where you ran above commands knows who is deployer for them and the key to authenticate with.

3) Bring up cluster captain:
This step is required only for SH cluster. You can omit this step if you are not setting up SHC.
/opt/splunk/bin/splunk bootstrap shcluster-captain -servers_list "<comma-separated list of mgmt uri of all search heads, including designated captain>" -auth <this setup's username:password>

4) Check search head cluster status:
To check the overall status of your search head cluster, run this command from any of the members:
/opt/splunk/bin/splunk show shcluster-status -auth <this setup's username:password>

5) Deploy the bundle (app):
/opt/splunk/bin/splunk apply shcluster-bundle -target <mgmt uri of SH where you want to deploy app> -auth <SH's username:password>

Your Search Head Cluster setup should be ready and operational now.

What is Splunk?

Its like Google for logs!

When you need to debug some application or system, what do you do? You go through log files. They tell you (almost) everything about what it was trying to do, and what happened. But what do you do when you need to debug a distributed or cloud based or microservices based system? Do you go to each and every machine/app and try to correlate that information with logs of another machine/app? Do you always have a design where all log lines from all those machines or apps are written to single log? Usually not.

That's where Splunk is really useful. Its a log processing and analysis product, which stores all your logs in indexed manner, and provides very fast searching ability.

Events and Indexes

Each entry that gets stored in Splunk is called an 'event'. And, the logical place where a particular event is stored in is called as 'index'. So, when searching, you basically query some index(es) to find some events.
Each indexed event has 4 fields associated with it: time, sourcetype, source and host. The time field indicates when that event happened. Sourcetype identifies data structure of that event, where as source identifies where that event happened. The host is the machine where this event generated. You can search your data using these fields.

Apart from that, Splunk extracts most of the fields from your data, which can also be used while searching the data.

The Splunk setup which indexes the data is called as indexer.

They call it SPL

SPL stands for Search Processing Language. Its like a query you enter to grab some data out of Splunk index(es).

Your search can be a simple term (e.g. a username) to see how frequently it appears in log, or it could be a complex one (e.g. a particular source, particular event, containing this or that, happened between 1am to 4:40am).
There are endless possibilities which you can search using Splunk. We use Splunk in my company to analyze application logs and find out which exceptions occur more often and on which days it reaches the peak point.
There's a lot to talk about this, and the best way to know about SPL is to go through Splunk's own documentation on SPL.

Apps, Add-ons and Data Sources

OK. We have got at least some idea about what it does. But tell me how it gets my log data?!
Well, there's no big magic in that. There apps, add-ons and other data import sources supported in Splunk using which you bring in the data. You can import those logs files - be it syslog, csv or json. You can also develop splunk apps or add-ons to make API calls to outer world, and produce data understandable to Splunk! Splunk is capable of consuming data outputed on stdout/stderr!

There's no database

That brings us to the next point: Where does that data go eventually? Let me tell you that there is no database to manage. Splunk stores data directly in the file system. Because of that, in fact the Splunk setup is quite fast.

Scalability is easy

If a single Splunk server is not enough you can simply add another one. The data can be distributed among multiple Splunk setups. You can have the same forwarder forwarding your data to multiple indexers, or you can have multiple forwarders feeding data to single indexer or a combination of both these cases. Not only that, you can also distribute your search operations using multiple 'search heads'. There are these two interesting deployment scenarios known as Indexer Cluster and Search Head Cluster, and worth exploring when scalability is important.

PGbase