Thursday, October 20, 2016

What is Splunk?

Its like Google for logs!

When you need to debug some application or system, what do you do? You go through log files. They tell you (almost) everything about what it was trying to do, and what happened. But what do you do when you need to debug a distributed or cloud based or microservices based system? Do you go to each and every machine/app and try to correlate that information with logs of another machine/app? Do you always have a design where all log lines from all those machines or apps are written to single log? Usually not.

That's where Splunk is really useful. Its a log processing and analysis product, which stores all your logs in indexed manner, and provides very fast searching ability.

Events and Indexes

Each entry that gets stored in Splunk is called an 'event'. And, the logical place where a particular event is stored in is called as 'index'. So, when searching, you basically query some index(es) to find some events.
Each indexed event has 4 fields associated with it: time, sourcetype, source and host. The time field indicates when that event happened. Sourcetype identifies data structure of that event, where as source identifies where that event happened. The host is the machine where this event generated. You can search your data using these fields.

Apart from that, Splunk extracts most of the fields from your data, which can also be used while searching the data.

The Splunk setup which indexes the data is called as indexer.

They call it SPL

SPL stands for Search Processing Language. Its like a query you enter to grab some data out of Splunk index(es).

Your search can be a simple term (e.g. a username) to see how frequently it appears in log, or it could be a complex one (e.g. a particular source, particular event, containing this or that, happened between 1am to 4:40am).
There are endless possibilities which you can search using Splunk. We use Splunk in my company to analyze application logs and find out which exceptions occur more often and on which days it reaches the peak point.
There's a lot to talk about this, and the best way to know about SPL is to go through Splunk's own documentation on SPL.

Apps, Add-ons and Data Sources

OK. We have got at least some idea about what it does. But tell me how it gets my log data?!
Well, there's no big magic in that. There apps, add-ons and other data import sources supported in Splunk using which you bring in the data. You can import those logs files - be it syslog, csv or json. You can also develop splunk apps or add-ons to make API calls to outer world, and produce data understandable to Splunk! Splunk is capable of consuming data outputed on stdout/stderr!

There's no database

That brings us to the next point: Where does that data go eventually? Let me tell you that there is no database to manage. Splunk stores data directly in the file system. Because of that, in fact the Splunk setup is quite fast.

Scalability is easy

If a single Splunk server is not enough you can simply add another one. The data can be distributed among multiple Splunk setups. You can have the same forwarder forwarding your data to multiple indexers, or you can have multiple forwarders feeding data to single indexer or a combination of both these cases. Not only that, you can also distribute your search operations using multiple 'search heads'. There are these two interesting deployment scenarios known as Indexer Cluster and Search Head Cluster, and worth exploring when scalability is important.

Wednesday, October 5, 2016

Expand and shrink IPv4 range

Few months ago, I had written this blog post about chunking IPv4 range into multiple sub-ranges. Soon thereafter arouse requirement to have something to expand or shrink given IPv4 range. And, I came up with this code.

There are two functions written - one expands given IPv4 range and other shrinks it. Let me explain each of them one by one.

The expand_range() function

The input IPv4 range could be a comma-separated list of IPv4 addresses or a proper range (e.g. 10.10.10.10-10.10.10.155) or a mix of both. Objective of this function is to provide a list of ALL IPv4 addresses that are part of given range.

This function starts with exploding the input based on comma. For each element in resulting array, it checks whether its a single IPv4 address or a range having a dash (-) character.

If it contains a dash, it further explodes it to get start and end IPv4 addresses, and converts them to long using ip2long(). Then, it simply runs a for loop to generate all the long values in between them and adds them into output array.

If its a single IPv4 address, that is added as it is to output array.

Finally, it sorts the output array, removes duplicates using array_unique(), and converts each element mach to IPv4 address using long2ip().

Based on optional second argument, it either returns array or a string representation of expanded IPv4 range.


The shrink_range() function

Here, the input IPv4 range could be either a string representation or an array, similar to one output by expand_range() function. Objective of this function is to shorten the given IPv4 range. So, if input is 10.10.10.10,10.10.10.11,10.10.10.12,10.10.10.13 then it should shorten it to 10.10.10.10-10.10.10.13.

The function starts with creating an array out of given IPv4 addresses. If first argument itself is array, it just copies it. And it then takes count() of array elements so that it can loop over them.

In the loop, it keeps on checking IPv4 address and current index as well as next index. It converts both of them into log and calculates difference between them by subtracting current index's long value from next index's long value.

If that difference is 1, it means the next IPv4 address is in sequence with current IPv4 address. And, that also means that we are getting into a range which can be shortened. The function then sets a flag to remember that, and starts building string for this short range.

Else, its either a standalone IPv4 address OR we have possibly reached end of short range. So it checks if the flag is still ON. If yes, it ends the short range, and copies the short range string into output array. If we aren't preparing any short range, it simply adds current IPv4 into output array.

Finally, based on optional second argument, it either returns array or string representation of shrunken IPv4 range.

Monday, May 23, 2016

From where I cloned my local repo?

If you want to determine the URL from where you cloned your local repository, use this command.

git config --get remote.origin.url