≡ Menu

dougmcclure.net

thoughts on business, service and technology operations and management in the digital transformation era

Bookmarks for March 25th through June 18th

in General

These are my links for March 25th through June 18th:

  • OpenStack LumberJack – Part 1 rsyslog | Professional OpenStack – Logging for OpenStack has come quite a ways. What I’m going to attempt to do over a few posts, is recreate and expand a bit on what was discussed at this last OpenStack Summit with regard to Log Management and Mining in OpenStack. For now, that means installing rsyslogd and setting it up to accept remote connections.
  • rsyslog.conf file
  • FailoverSyslogServer – rsyslog wiki
  • How to configure failover for rsyslog in Red Hat Enterprise Linux 6? – Red Hat Customer Portal
  • Introducing the Solr Scale Toolkit | SearchHub | Lucene/Solr Open Source Search
  • Highly Available ELK (Elasticsearch, Logstash and Kibana) Setup | Everything Should Be Virtual
  • Logstash configuration dissection
  • Splunk Introduces Splunk Enterprise 6.1 – Enabling the Mission-critical Enterprise Multi-site Clustering: Delivers continuous availability for Splunk Enterprise deployments that span multiple sites, countries or continents by replicating raw and indexed data in a clustered configuration. Search Affinity: Provides a performance increase when using multi-site clustering by routing search and analytics requests to the nearest cluster, increasing performance and decreasing network usage. zLinux Forwarder: Allows for application and platform data from IBM mainframes to be easily collected and indexed by Splunk Enterprise. Data Preview with Structured Inputs: Enables previewing of massive data files to verify alignment of fields and headers before indexing to improve data quality and the time it takes to discover critical insights.
  • Streamlining application logs collection on AWS Elastic Beanstalk with logstash – part 1 | Mob in Tech – However, we like to experiment things, so I decided to try the home made solution for the backend of our new upcoming mobile game. Our backend is a homebrewed Java REST webservices application hosted in an Elastic Beanstalk container, in the us-east-1 region. The final goal is to gather logs from all instances of the Java application into a local (Paris) Elastic Search database, in a scalable manner. In this case, scalable means for us: every single step of the data pipeline has to be horizontally scalable, meaning we can speed up the process by adding additional capacity at each step independently.
  • How to Pre-Process Logs with Logstash: Part III of “Scalable and Robust Logging for Web Applications” ← #workHard / partyHard – This article is an introduction on how to pre-process logs from multiple sources in logstash before storing them in a data store or analyze them in real time. Some common use cases are unifying time formats across different log sources, anonymizing data, extracting only interesting information from the logs as well as tagging and selective distribution.
  • Building an Activity Feed System with Storm – Programming – O’Reilly Media – Problem You want to build an activity stream processing system to filter and aggregate the raw event data generated by the users of your application. Solution Streams are a dominant metaphor for presenting information to users of the modern Internet. Used on sites like Facebook and Twitter and mobile apps like Instagram and Tinder, streams are an elegant tool for giving users a window into the deluge of information generated by the applications they use every day.
  • Wirbelsturm: 1-Click Deployments of Storm and Kafka clusters with Vagrant and Puppet – Michael G. Noll – I am happy to announce the first public release of Wirbelsturm, a Vagrant and Puppet based tool to perform 1-click local and remote deployments, with a focus on big data related infrastructure. Wirbelsturm’s goal is to make tasks such as “I want to deploy a multi-node Storm cluster” simple, easy, and fun. In this post I will introduce you to Wirbelsturm, talk a bit about its history, and show you how to launch a multi-node Storm (or Kafka or …) cluster faster than you can brew an espresso.
  • RapidEngines Application Analytics – We provide the worlds fastest, most flexible and most scalable time series data platform. Delivered as software or a cloud service to help you visualize and detect application performance events before they impact your business.
  • SevOne Acquires Log Analytics Provider RapidEngines | Business Wire – SevOne, the leader of scalable performance monitoring solutions to the world’s most connected companies, today announced it has acquired RapidEngines, a leading provider of highly scalable log analytics software for IT enterprises, service providers and application developers. The acquisition is the first from SevOne since closing the $150M investment from Bain Capital which remains one of the largest venture financings of 2013. SevOne’s large customer base will now have access to RapidEngines’ log analytics software granting users the benefit of automatically collecting and organizing log data to better provide a detailed picture of user and machine behavior.
  • Google Cloud Platform Blog: A New Logs Viewer for Google Cloud Platform – Today we are excited to announce a significantly updated Logs Viewer for App Engine users. Logs from all your instances can be viewed together in near real time, with greatly improved filtering, searching and browsing capabilities. This release includes UI and functional improvements. We’ve added features that simplify navigation and make it easier to find the logs data you’re looking for.
  • About | LOGSEARCH – What started out as an internal development project from within City Index was soon after released as an open source project for all to benefit. City Index realised the potential value of the information available to them in the log files and required a flexible solution to not only view the log files but rather to view and cross analyse them.
  • Approaches to Indexing Multiple Logs File Types in Solr and Setting up a Multi Node, Multi Core Solr Cloud – Apache Solr is a widely used open source search platform that internally uses Apache Lucene based indexing. Solr is very popular and provides a database to store indexed data and is a very high scalable, capable search solution for the enterprise platform. This article provides a basic vision for a single and multi-core approach to indexing and querying multiple log file types in Solr. Solr indexes the log files generated by the servers and allows searching the logs for troubleshooting. It has the capability to scale to work in a multi-node cluster set up in a distributed and fault tolerant manner. These capabilities are collectively called SolrCloud. Solr uses Zookeeper for working in a distributed manner
  • Introducing Morphlines: The Easy Way to Build and Integrate ETL Apps for Hadoop | Cloudera Developer Blog – Morphlines can be seen as an evolution of Unix pipelines where the data model is generalized to work with streams of generic records, including arbitrary binary payloads. A morphline is an efficient way to consume records (e.g. Flume events, HDFS files, RDBMS tables, or Apache Avro objects), turn them into a stream of records, and pipe the stream of records through a set of easily configurable transformations on the way to a target application such as Solr, for example as outlined in the following figure: In this figure, a Flume Source receives syslog events and sends them to a Flume Morphline Sink, which converts each Flume event to a record and pipes it into a readLine command. The readLine command extracts the log line and pipes it into a grok command. The grok command uses regular expression pattern matching to extract some substrings of the line. It pipes the resulting structured record into the loadSolr command. Finally, the loadSolr command loads the record into Solr, typically a SolrCloud. In the process, raw data or semi-structured data is transformed into structured data according to application modelling requirements.
  • Pivotal CF 1.1 Advances Enterprise PaaS with New Capabilities | Pivotal P.O.V. – What’s new in Pivotal CF 1.1:

    Improved app event log aggregation – developers can now go to a unified log stream for full application event visibility (Watch) and drain logs to a 3rd party tool like Splunk for analysis (Watch)

  • elasticsearch-curator 1.0.0 : Python Package Index – Tending your time-series indices in Elasticsearch
0 comments

The initial offering of Netcool Operations Insight (NOI) v1.1 provides a good starting point with some neat event search use cases for front line IT Ops support teams and some basic event analysis support for the Netcool Administrator. The NOI-SCALA integration provides a primary entry point for searching across events via the event list tool menu. In my opinion, the ability to search and interact with events is far more valuable using the search and apps concepts within SCALA. After I talk about some plumbing foundations, I’ll share some of my Event Analysis apps that can get you jump started with event analysis and reduction use cases in no time.

Another area for improvement is with the firehose approach of sending all events by default from the active Netcool integration point towards a single SCALA datasource. What this leads to is a single, all in one datasource in SCALA that can easily grow to be very large in size and lead to slow search performance as well as challenges with keeping the indexed event data pruned when using the SCALA delete utility available today. It’s my best practice to intelligently analyze and route events to unique datasources by key fields such as service or application name, functional technology type or role, etc. so more efficient search and apps can be created.

Another common scenario is when you have established a Netcool historical event archive and you’d like to incorporate some specific historical events within your SCALA environment for analysis. The current NOI solution doesn’t incorporate an easy or flexible way to fold in these historical events within SCALA so more realistic event search and analysis use cases can be developed. I’ll share some thoughts on intelligent event processing and routing for getting the events you need into SCALA for search and analysis in the most efficient way possible.

Fortunately, SCALA comes with the awesome logstash toolkit which can be used to simplify and extend your NOI offering through its wide array of inputs, filters and codecs and our SCALA output plugin! I’ll start this series by updating some past blog posts based on using the latest logstash v1.4 and the SCALA v1.2 release. I’m not sure when the SCALA content team will get around to updating the toolkit to install out of the box support for v1.4 so here’s what you have to do to use this much improved logstash version with SCALA v1.2.

Prepare to use logstash v1.4 and SCALA logstash toolkit

  • Download logstash 1.4.x here (zip, tar.gz, deb, rpm)
  • Unzip into a new directory where you will run it ($LOGSTASH140-DIR)
  • Download SCALA logstash toolkit from here
  • Explode it on your laptop or system
  • From the SCALA logstash toolkit, copy $MYDIR/LogstashIntegrationToolkit_v1.1.0.1/lstoolkit/LogstashLogAnalysis_v1.1.0.1/logstash/outputs/scala_custom_eif.rb and the unity/ directory to your logstash directory $LOGSTASH140-DIR/lib/logstash/outputs/
  • From the SCALA logstash toolkit, copy LogstashIntegrationToolkit_v1.1.0.1\lstoolkit\LogstashLogAnalysis_v1.1.0.1\logstash\outputs\eif.conf to your logstash directory $LOGSTASH140-DIR/lib/logstash/outputs/ and name it something like eif-scala-IPADDR.conf so you know which SCALA server it is configured for (one config file for each unique scala_custom_eif output)

Configure the SCALA logstsah toolit ouput plugin

You can have multiple SCALA logstash outputs enabling you to stream events to multiple SCALA systems. In order to do this, you’ll need multiple EIF configuration files with each one mapped to a specific SCALA system. Name each one uniquely with the SCALA target IP address so you know which one to use.

  • Configure the eif-scala-IPADDR.conf file
  • Set the BufEvtPath to something unique if you have multiple SCALA outputs in your config
  • Set the LogFileName to something unique if you have multiple SCALA outputs in your config
  • Set the ServerLocation to your SCALA destination
  • Set the ServerPort to your SCALA EIF Receiver port (5529 by default)

Build a configuration for logstash and fire it up

I’ve provided a simple logstash config here (MyLogstash.conf) to start with for this blog series. Feel free to use/extend one you may already have by adding the new scala_custom_eif output and the desired ‘scalaFields’ which are at the heart of how this scala_custom_eif output works in conjunction with the SCALA DSV toolkit.

I’ll continue to blog about using the socket gateway with logstash to stream in events to SCALA. From my experience, it’s tremendously easier than setting up the XML GW used with NOI. I’m sure you could also just configure another end point for that XML GW and route events through logstash that way, I’ve just not experimented with that.

I’m starting this series with a simple logstash configuration file which gets us up and running with a simple ‘send all events’ type flow. As I alluded to above, we really don’t want this generic set up. We’ll build upon this by using more of the powerful logstash options to address our use cases.

To start logstash : $LOGSTASH140-DIR/bin/logstash agent -f /opt/logstash-1.4.0/MyLogstash.conf –verbose &

Building a SCALA DSV pack for your events

To use this simple logstash starting config, we’ll need a SCALA DSV pack to match. The SCALA logstash output module will pre-format every field named ‘scalaFields’ into a CSV formatted message bundle and stream it outbound to the SCALA EIF receiver. The SCALA DSV pack simply breaks apart the CSV message and posts fields properly into the index.

  • Create a simple DSV header file (MyEventHeader.csv) made up of the 17 field names from alerts.status that you’re sending across the Socket GW. Reference this post to get help setting up the Netcool Socket GW.
  • From the DSVToolkit directory ($SCALAHOME/unity_content/DSVToolkit_v1.1.0.2), run the following command: python primeProps.py MyEventDSVPack.props 17 -f MyEventHeader.csv Verify that you see “The properties file was successfully edited.” returned.
  • Edit the MyEventDSVPack.props file
  • Set scalaHome to reflect your installation location (default: scalaHome: $HOME/IBM/LogAnalysis)
  • Name the DSV Pack by setting this vield to something more intuitive: aqlModuleName: MyEventDSVPack (default: aqlModuleName: dsv17Column)
  • Change the name of the LastOccurrence field to timestamp < -- this will become the main indexed timestamp within SCALA. If you want to use FirstOccurrence, then name that one timestamp instead
  • Change the dataType for the timestamp field as well as the FirstOccurrence field to: dataType: DATE
  • Add this field to the timestamp and FirstOccurrence field: dateFormat: yyyy-MM-dd’T’HH:mm:ssz < -- this must match what's come from OMNIBus Socket GW and through any logstash processing you may have done
  • Review all of your field sections and determine which fields you’d like to be able to search, sort and filter on. I generally make them all ‘true’ realizing that there could be performance trade offs on this for your search results.
  • Deploy the DSV Pack using this command: python dsvGen.py MyEventDSVPack.props -d -f -u unityadmin -p unityadmin
  • Verify that you see one or more ‘BUILD SUCCESSFUL’ responses in your terminal screen.

Final tasks and seeing results

The final SCALA task is to create a new datasource that uses the new DSV pack we just deployed. The key part of setting this up is to use the same host and path names you set in your logstash config. Create a new SCALA datasource that uses this new MyEventDSVPack Type and Collection. Set the host and path to ‘mynetcoolevents‘ based on our sample config. Name the datasource ‘My Netcool Events’ or similar.

Start everything up (Netcool GW and logstash) and verify events are successfully indexed within SCALA by watching the GenericReceiver.log and/or by running searches within SCALA.

We’ll build on this in my next post by laying down my event analysis apps and then getting into how to intelligently route events to specific datasources within SCALA.

0 comments

Bookmarks for March 3rd through March 25th

in General

These are my links for March 3rd through March 25th:

  • Numenta Releases Grok for IT Analytics on AWS | Numenta – Grok anomaly detection leverages sophisticated machine intelligence algorithms to enable new insights into critical IT systems. Grok automatically learns complex patterns and then highlights unusual behavior. As software topologies and usage patterns change, Grok continuously learns and adapts, eliminating the need for frequent resetting of thresholds. Visualization of Grok output is displayed on a constantly updated mobile device, enabling IT professionals to assess the health of their systems anytime, anywhere. Using Grok, IT operators can better prevent business downtime while reducing false positives.

    Grok is the first commercial application of Numenta’s groundbreaking Cortical Learning Algorithm (CLA), biologically inspired algorithms for machine intelligence. The core CLA technology is ideal for large-scale analysis of continuously streaming datasets and excels at modeling and predicting patterns in data.

    “Grok provides an early warning system to IT professionals to give them real-time insights into their system performance,” said Numenta CEO Donna Dubinsky. “Grok anticipates problems before they happen, reduces false positives, and lowers engineering costs through automated modeling and continuous learning.”

    Grok features include:

    Monitoring of performance and health of AWS environments or other systems
    Automatic modeling to determine normal patterns
    Automatic identification and ranking of unusual patterns
    Continuous learning of new patterns as environments evolve – no need for manual threshold setting
    Notification to user when an anomaly occurs
    Output displayed graphically on an Android mobile device
    Simple setup via a web-based or command-line interface
    Support for AWS auto-scaling groups and logical clusters

  • LMAO if you don’t logstash | by Paul Czarkowski | @pczarkowski
  • Elasticsearch.org Kibana 3.0.0 GA Is Now Available! | Blog | Elasticsearch – Today is a big day for Elasticsearch and the Kibana team. After 5 milestone releases and over 1000 commits, we’re happy to announce the release of Kibana 3.0.0 GA. Over the last year, Kibana has moved from a simple interface to search logs to a fully featured, interactive analysis and dashboard system for any type of data. Everyday, we’re incredibly inspired by the people who tell us they’ve solved major problems, optimized their existing deployments and found insights in places they never imagined.
  • Apache Solr vs ElasticSearch – the Feature Smackdown! – The Feature Smackdown
  • SiLK: enterprise-grade log analysis solution | LucidWorks | LucidWorks – LucidWorks Solr integration with LogStash and Kibana (SiLK) is an enterprise-grade log analysis solution that enables the ad-hoc search and analysis of billions of events and transactions across multiple applications, servers and devices.
  • Advanced Web Analytics for Big Data & Hadoop – Alpine Data Labs
  • wise.io | Machine Learning as a Service & Big Data Analytics – Our state-of-the-art machine learning technology reveals hidden value in your data. Our applications integrate seamlessly into your business.
  • Home | Skytree – Machine Learning on Big Data for Predictive Analytics – Machine Learning is the modern science of discovering patterns and making predictions from complex data.
  • UCI Machine Learning Repository – We currently maintain 273 data sets as a service to the machine learning community. You may view all data sets through our searchable interface. Our old web site is still available, for those who prefer the old format. For a general overview of the Repository, please visit our About page. For information about citing data sets in publications, please read our citation policy. If you wish to donate a data set, please consult our donation policy. For any other questions, feel free to contact the Repository librarians. We have also set up a mirror site for the Repository.
  • Log.io – Real-time log monitoring in your browser
  • doubaokun/node-ab – A command tool to test the performance of HTTP services.
  • zanchin/node-http-perf – Node HTTP Server Performance Tool
  • Uptime by fzaninotto – A remote monitoring application using Node.js, MongoDB, and Twitter Bootstrap.
  • Cloud Foundry and Logstash – Scott Frederick’s humble blog – The Cloud Foundry Loggregator component formats logs according to the syslog standard as defined in RFC5424. The logstash cookbook includes an example configuration for syslog consumption, but that configuration follows an older RFC3164 syslog standard.

    Here is a logstash configuration that works with RFC5424 output, with some additional changes for Cloud Foundry:

  • Using your historical data for analytic usage
  • Using R for Educational Research: An Introductory Workshop to Break the Learning Curve – R_intro_SERA_2012.pdf
  • Welcome to a Little Book of R for Time Series! — Time Series 0.2 documentation – This is a simple introduction to time series analysis using the R statistics software.
  • BestFirstRTutorial.pdf
  • Introducing R
  • Introduction to R Seminar – UCLA Institute for Digital Research and Education
0 comments