Categories
Hints and Tips Logstash Regular Expressions Ruby

What’s in a word? (\w regexp shorthand class)

Well not just letters of the alphabet it seems.

Take the case of the logstash pattern WORD:

WORD \b\w+\b

but the shorthand character class \w matches [a-zA-Z0-9_] – notice the digits and underscore! So WORD is not really a WORD!

REALWORD \b[a-zA-Z]+\b

would be better … although I suppose things might be different in Unicode. But generally log files may be Unicode but frequently the data itself is still effectively ASCII.

Categories
Big Data Elasticsearch ELK Java Javascript Kibana Languages Logstash nx-log Ruby

ELK and PeopleSoft

I have spent some time looking into Elasticsearch, Logstash and Kibana (ELK) for analysis of PeopleSoft web, application and process scheduler log files.

Whilst commercial solutions exist that can be configured to do this, they all seem somewhat over priced solutions to a relatively common and essentially simple problem – log file shipping, consolidation/aggregation and analysis. This is where ELK steps in …. bringing a mix of Java, Ruby and Javascript to the party.

IMHO, ELK runs best on flavours of Unix – Linux, FreeBSD or even Solaris. I have also found the most effective solution for servers running Windows is to ship the logs with some simple pre-processing to a number of logstash processes on Linux using NXLog running as a service under Windows. This reduces the CPU load on the Windows servers so they can get on with their primary functions. Check out NXLog Community Edition for more details.

Determining the parsing rules for the various log file formats is probably the most difficult part. Provided you are reasonably familiar with both the data and regular expression matching, you should have no problem understanding and transforming your data into a format that is easy to visualise in Kibana.

However, when you hit any significant data volumes you really need to look carefully at the system settings for each component. Elasticsearch scales very well, but performs best when given plenty of memory.

Here’s a simple example from an nxlog.conf file on Windows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
Module im_file
#SavePos FALSE
#ReadFromLast FALSE
File 'I:\LOGS\PROD\APPSRV_*.LOG'
InputType multiline
Exec convert_fields("AUTO","utf-8");
Exec $filename = file_basename(file_name());
Exec $filedir = file_dirname(file_name());
Exec if $raw_event =~ /(GetCertificate|_dflt|Token authentication succeeded|PSJNI:|GetNextNumberWithGaps|RunAe|Switching to new log file|PublishSubscribe|Token=)/ { drop();};
Exec if $filedir =~ /\\(appserv|appserv\\prcs)\\([A-Z0-9\-]+)\\LOGS/ { $stack = $1; $server_name = $2; $server_ip = $3; $domain = $5;};
Exec $server_ip =~ s/_/./g;
Exec $host = $server_ip;
Exec if $raw_event =~ /([A-Za-z0-9\-_]+)@(\d+\.\d+\.\d+\.\d+)/ { $oprid = $1; $client_ip = $2;};
Exec if $raw_event =~ /^([A-Za-z_0-9]+)\.(\d+) \((\d+)\) \[(\d{2}.\d{2}.\d{2} \d{2}:\d{2}:\d{2})/ { $server_process = $1; $pid = $2; $task_no = $3; $datestamp = $4; };
Exec delete($EventReceivedTime); 
Exec delete($filedir); 
Exec delete($filename); 
Exec delete($SourceModuleType);
Exec $message = $raw_event;
Exec $message =~ s/^.*?\]//;
Exec $message =~ s/^\(\d+\)\s+//;
Exec to_json();

This is just an example that shows some reasonable nx-log directives to pre-process the PeopleSoft Application Server logs into a consistent and usable format. Some of the regular expressions are specific to my use case but they are useful to illustrate some simple techniques you may find useful.