≡ Menu

dougmcclure.net

thoughts on business, service and technology operations and management in a big data and analytics world

These are my links for March 25th through June 18th:

  • OpenStack LumberJack – Part 1 rsyslog | Professional OpenStack – Logging for OpenStack has come quite a ways. What I’m going to attempt to do over a few posts, is recreate and expand a bit on what was discussed at this last OpenStack Summit with regard to Log Management and Mining in OpenStack. For now, that means installing rsyslogd and setting it up to accept remote connections.
  • rsyslog.conf file
  • FailoverSyslogServer – rsyslog wiki
  • How to configure failover for rsyslog in Red Hat Enterprise Linux 6? – Red Hat Customer Portal
  • Introducing the Solr Scale Toolkit | SearchHub | Lucene/Solr Open Source Search
  • Highly Available ELK (Elasticsearch, Logstash and Kibana) Setup | Everything Should Be Virtual
  • Logstash configuration dissection
  • Splunk Introduces Splunk Enterprise 6.1 – Enabling the Mission-critical Enterprise Multi-site Clustering: Delivers continuous availability for Splunk Enterprise deployments that span multiple sites, countries or continents by replicating raw and indexed data in a clustered configuration. Search Affinity: Provides a performance increase when using multi-site clustering by routing search and analytics requests to the nearest cluster, increasing performance and decreasing network usage. zLinux Forwarder: Allows for application and platform data from IBM mainframes to be easily collected and indexed by Splunk Enterprise. Data Preview with Structured Inputs: Enables previewing of massive data files to verify alignment of fields and headers before indexing to improve data quality and the time it takes to discover critical insights.
  • Streamlining application logs collection on AWS Elastic Beanstalk with logstash – part 1 | Mob in Tech – However, we like to experiment things, so I decided to try the home made solution for the backend of our new upcoming mobile game. Our backend is a homebrewed Java REST webservices application hosted in an Elastic Beanstalk container, in the us-east-1 region. The final goal is to gather logs from all instances of the Java application into a local (Paris) Elastic Search database, in a scalable manner. In this case, scalable means for us: every single step of the data pipeline has to be horizontally scalable, meaning we can speed up the process by adding additional capacity at each step independently.
  • How to Pre-Process Logs with Logstash: Part III of “Scalable and Robust Logging for Web Applications” ← #workHard / partyHard – This article is an introduction on how to pre-process logs from multiple sources in logstash before storing them in a data store or analyze them in real time. Some common use cases are unifying time formats across different log sources, anonymizing data, extracting only interesting information from the logs as well as tagging and selective distribution.
  • Building an Activity Feed System with Storm – Programming – O’Reilly Media – Problem You want to build an activity stream processing system to filter and aggregate the raw event data generated by the users of your application. Solution Streams are a dominant metaphor for presenting information to users of the modern Internet. Used on sites like Facebook and Twitter and mobile apps like Instagram and Tinder, streams are an elegant tool for giving users a window into the deluge of information generated by the applications they use every day.
  • Wirbelsturm: 1-Click Deployments of Storm and Kafka clusters with Vagrant and Puppet – Michael G. Noll – I am happy to announce the first public release of Wirbelsturm, a Vagrant and Puppet based tool to perform 1-click local and remote deployments, with a focus on big data related infrastructure. Wirbelsturm’s goal is to make tasks such as “I want to deploy a multi-node Storm cluster” simple, easy, and fun. In this post I will introduce you to Wirbelsturm, talk a bit about its history, and show you how to launch a multi-node Storm (or Kafka or …) cluster faster than you can brew an espresso.
  • RapidEngines Application Analytics – We provide the worlds fastest, most flexible and most scalable time series data platform. Delivered as software or a cloud service to help you visualize and detect application performance events before they impact your business.
  • SevOne Acquires Log Analytics Provider RapidEngines | Business Wire – SevOne, the leader of scalable performance monitoring solutions to the world’s most connected companies, today announced it has acquired RapidEngines, a leading provider of highly scalable log analytics software for IT enterprises, service providers and application developers. The acquisition is the first from SevOne since closing the $150M investment from Bain Capital which remains one of the largest venture financings of 2013. SevOne’s large customer base will now have access to RapidEngines’ log analytics software granting users the benefit of automatically collecting and organizing log data to better provide a detailed picture of user and machine behavior.
  • Google Cloud Platform Blog: A New Logs Viewer for Google Cloud Platform – Today we are excited to announce a significantly updated Logs Viewer for App Engine users. Logs from all your instances can be viewed together in near real time, with greatly improved filtering, searching and browsing capabilities. This release includes UI and functional improvements. We’ve added features that simplify navigation and make it easier to find the logs data you’re looking for.
  • About | LOGSEARCH – What started out as an internal development project from within City Index was soon after released as an open source project for all to benefit. City Index realised the potential value of the information available to them in the log files and required a flexible solution to not only view the log files but rather to view and cross analyse them.
  • Approaches to Indexing Multiple Logs File Types in Solr and Setting up a Multi Node, Multi Core Solr Cloud – Apache Solr is a widely used open source search platform that internally uses Apache Lucene based indexing. Solr is very popular and provides a database to store indexed data and is a very high scalable, capable search solution for the enterprise platform. This article provides a basic vision for a single and multi-core approach to indexing and querying multiple log file types in Solr. Solr indexes the log files generated by the servers and allows searching the logs for troubleshooting. It has the capability to scale to work in a multi-node cluster set up in a distributed and fault tolerant manner. These capabilities are collectively called SolrCloud. Solr uses Zookeeper for working in a distributed manner
  • Introducing Morphlines: The Easy Way to Build and Integrate ETL Apps for Hadoop | Cloudera Developer Blog – Morphlines can be seen as an evolution of Unix pipelines where the data model is generalized to work with streams of generic records, including arbitrary binary payloads. A morphline is an efficient way to consume records (e.g. Flume events, HDFS files, RDBMS tables, or Apache Avro objects), turn them into a stream of records, and pipe the stream of records through a set of easily configurable transformations on the way to a target application such as Solr, for example as outlined in the following figure: In this figure, a Flume Source receives syslog events and sends them to a Flume Morphline Sink, which converts each Flume event to a record and pipes it into a readLine command. The readLine command extracts the log line and pipes it into a grok command. The grok command uses regular expression pattern matching to extract some substrings of the line. It pipes the resulting structured record into the loadSolr command. Finally, the loadSolr command loads the record into Solr, typically a SolrCloud. In the process, raw data or semi-structured data is transformed into structured data according to application modelling requirements.
  • Pivotal CF 1.1 Advances Enterprise PaaS with New Capabilities | Pivotal P.O.V. – What’s new in Pivotal CF 1.1:

    Improved app event log aggregation – developers can now go to a unified log stream for full application event visibility (Watch) and drain logs to a 3rd party tool like Splunk for analysis (Watch)

  • elasticsearch-curator 1.0.0 : Python Package Index – Tending your time-series indices in Elasticsearch
0 comments

The initial offering of Netcool Operations Insight (NOI) v1.1 provides a good starting point with some neat event search use cases for front line IT Ops support teams and some basic event analysis support for the Netcool Administrator. The NOI-SCALA integration provides a primary entry point for searching across events via the event list tool menu. In my opinion, the ability to search and interact with events is far more valuable using the search and apps concepts within SCALA. After I talk about some plumbing foundations, I’ll share some of my Event Analysis apps that can get you jump started with event analysis and reduction use cases in no time.

Another area for improvement is with the firehose approach of sending all events by default from the active Netcool integration point towards a single SCALA datasource. What this leads to is a single, all in one datasource in SCALA that can easily grow to be very large in size and lead to slow search performance as well as challenges with keeping the indexed event data pruned when using the SCALA delete utility available today. It’s my best practice to intelligently analyze and route events to unique datasources by key fields such as service or application name, functional technology type or role, etc. so more efficient search and apps can be created.

Another common scenario is when you have established a Netcool historical event archive and you’d like to incorporate some specific historical events within your SCALA environment for analysis. The current NOI solution doesn’t incorporate an easy or flexible way to fold in these historical events within SCALA so more realistic event search and analysis use cases can be developed. I’ll share some thoughts on intelligent event processing and routing for getting the events you need into SCALA for search and analysis in the most efficient way possible.

Fortunately, SCALA comes with the awesome logstash toolkit which can be used to simplify and extend your NOI offering through its wide array of inputs, filters and codecs and our SCALA output plugin! I’ll start this series by updating some past blog posts based on using the latest logstash v1.4 and the SCALA v1.2 release. I’m not sure when the SCALA content team will get around to updating the toolkit to install out of the box support for v1.4 so here’s what you have to do to use this much improved logstash version with SCALA v1.2.

Prepare to use logstash v1.4 and SCALA logstash toolkit

  • Download logstash 1.4.x here (zip, tar.gz, deb, rpm)
  • Unzip into a new directory where you will run it ($LOGSTASH140-DIR)
  • Download SCALA logstash toolkit from here
  • Explode it on your laptop or system
  • From the SCALA logstash toolkit, copy $MYDIR/LogstashIntegrationToolkit_v1.1.0.1/lstoolkit/LogstashLogAnalysis_v1.1.0.1/logstash/outputs/scala_custom_eif.rb and the unity/ directory to your logstash directory $LOGSTASH140-DIR/lib/logstash/outputs/
  • From the SCALA logstash toolkit, copy LogstashIntegrationToolkit_v1.1.0.1\lstoolkit\LogstashLogAnalysis_v1.1.0.1\logstash\outputs\eif.conf to your logstash directory $LOGSTASH140-DIR/lib/logstash/outputs/ and name it something like eif-scala-IPADDR.conf so you know which SCALA server it is configured for (one config file for each unique scala_custom_eif output)

Configure the SCALA logstsah toolit ouput plugin

You can have multiple SCALA logstash outputs enabling you to stream events to multiple SCALA systems. In order to do this, you’ll need multiple EIF configuration files with each one mapped to a specific SCALA system. Name each one uniquely with the SCALA target IP address so you know which one to use.

  • Configure the eif-scala-IPADDR.conf file
  • Set the BufEvtPath to something unique if you have multiple SCALA outputs in your config
  • Set the LogFileName to something unique if you have multiple SCALA outputs in your config
  • Set the ServerLocation to your SCALA destination
  • Set the ServerPort to your SCALA EIF Receiver port (5529 by default)

Build a configuration for logstash and fire it up

I’ve provided a simple logstash config here (MyLogstash.conf) to start with for this blog series. Feel free to use/extend one you may already have by adding the new scala_custom_eif output and the desired ‘scalaFields’ which are at the heart of how this scala_custom_eif output works in conjunction with the SCALA DSV toolkit.

I’ll continue to blog about using the socket gateway with logstash to stream in events to SCALA. From my experience, it’s tremendously easier than setting up the XML GW used with NOI. I’m sure you could also just configure another end point for that XML GW and route events through logstash that way, I’ve just not experimented with that.

I’m starting this series with a simple logstash configuration file which gets us up and running with a simple ‘send all events’ type flow. As I alluded to above, we really don’t want this generic set up. We’ll build upon this by using more of the powerful logstash options to address our use cases.

To start logstash : $LOGSTASH140-DIR/bin/logstash agent -f /opt/logstash-1.4.0/MyLogstash.conf –verbose &

Building a SCALA DSV pack for your events

To use this simple logstash starting config, we’ll need a SCALA DSV pack to match. The SCALA logstash output module will pre-format every field named ‘scalaFields’ into a CSV formatted message bundle and stream it outbound to the SCALA EIF receiver. The SCALA DSV pack simply breaks apart the CSV message and posts fields properly into the index.

  • Create a simple DSV header file (MyEventHeader.csv) made up of the 17 field names from alerts.status that you’re sending across the Socket GW. Reference this post to get help setting up the Netcool Socket GW.
  • From the DSVToolkit directory ($SCALAHOME/unity_content/DSVToolkit_v1.1.0.2), run the following command: python primeProps.py MyEventDSVPack.props 17 -f MyEventHeader.csv Verify that you see “The properties file was successfully edited.” returned.
  • Edit the MyEventDSVPack.props file
  • Set scalaHome to reflect your installation location (default: scalaHome: $HOME/IBM/LogAnalysis)
  • Name the DSV Pack by setting this vield to something more intuitive: aqlModuleName: MyEventDSVPack (default: aqlModuleName: dsv17Column)
  • Change the name of the LastOccurrence field to timestamp < -- this will become the main indexed timestamp within SCALA. If you want to use FirstOccurrence, then name that one timestamp instead
  • Change the dataType for the timestamp field as well as the FirstOccurrence field to: dataType: DATE
  • Add this field to the timestamp and FirstOccurrence field: dateFormat: yyyy-MM-dd’T’HH:mm:ssz < -- this must match what's come from OMNIBus Socket GW and through any logstash processing you may have done
  • Review all of your field sections and determine which fields you’d like to be able to search, sort and filter on. I generally make them all ‘true’ realizing that there could be performance trade offs on this for your search results.
  • Deploy the DSV Pack using this command: python dsvGen.py MyEventDSVPack.props -d -f -u unityadmin -p unityadmin
  • Verify that you see one or more ‘BUILD SUCCESSFUL’ responses in your terminal screen.

Final tasks and seeing results

The final SCALA task is to create a new datasource that uses the new DSV pack we just deployed. The key part of setting this up is to use the same host and path names you set in your logstash config. Create a new SCALA datasource that uses this new MyEventDSVPack Type and Collection. Set the host and path to ‘mynetcoolevents‘ based on our sample config. Name the datasource ‘My Netcool Events’ or similar.

Start everything up (Netcool GW and logstash) and verify events are successfully indexed within SCALA by watching the GenericReceiver.log and/or by running searches within SCALA.

We’ll build on this in my next post by laying down my event analysis apps and then getting into how to intelligently route events to specific datasources within SCALA.

0 comments

These are my links for March 3rd through March 25th:

  • Numenta Releases Grok for IT Analytics on AWS | Numenta – Grok anomaly detection leverages sophisticated machine intelligence algorithms to enable new insights into critical IT systems. Grok automatically learns complex patterns and then highlights unusual behavior. As software topologies and usage patterns change, Grok continuously learns and adapts, eliminating the need for frequent resetting of thresholds. Visualization of Grok output is displayed on a constantly updated mobile device, enabling IT professionals to assess the health of their systems anytime, anywhere. Using Grok, IT operators can better prevent business downtime while reducing false positives.

    Grok is the first commercial application of Numenta’s groundbreaking Cortical Learning Algorithm (CLA), biologically inspired algorithms for machine intelligence. The core CLA technology is ideal for large-scale analysis of continuously streaming datasets and excels at modeling and predicting patterns in data.

    “Grok provides an early warning system to IT professionals to give them real-time insights into their system performance,” said Numenta CEO Donna Dubinsky. “Grok anticipates problems before they happen, reduces false positives, and lowers engineering costs through automated modeling and continuous learning.”

    Grok features include:

    Monitoring of performance and health of AWS environments or other systems
    Automatic modeling to determine normal patterns
    Automatic identification and ranking of unusual patterns
    Continuous learning of new patterns as environments evolve – no need for manual threshold setting
    Notification to user when an anomaly occurs
    Output displayed graphically on an Android mobile device
    Simple setup via a web-based or command-line interface
    Support for AWS auto-scaling groups and logical clusters

  • LMAO if you don’t logstash | by Paul Czarkowski | @pczarkowski
  • Elasticsearch.org Kibana 3.0.0 GA Is Now Available! | Blog | Elasticsearch – Today is a big day for Elasticsearch and the Kibana team. After 5 milestone releases and over 1000 commits, we’re happy to announce the release of Kibana 3.0.0 GA. Over the last year, Kibana has moved from a simple interface to search logs to a fully featured, interactive analysis and dashboard system for any type of data. Everyday, we’re incredibly inspired by the people who tell us they’ve solved major problems, optimized their existing deployments and found insights in places they never imagined.
  • Apache Solr vs ElasticSearch – the Feature Smackdown! – The Feature Smackdown
  • SiLK: enterprise-grade log analysis solution | LucidWorks | LucidWorks – LucidWorks Solr integration with LogStash and Kibana (SiLK) is an enterprise-grade log analysis solution that enables the ad-hoc search and analysis of billions of events and transactions across multiple applications, servers and devices.
  • Advanced Web Analytics for Big Data & Hadoop – Alpine Data Labs
  • wise.io | Machine Learning as a Service & Big Data Analytics – Our state-of-the-art machine learning technology reveals hidden value in your data. Our applications integrate seamlessly into your business.
  • Home | Skytree – Machine Learning on Big Data for Predictive Analytics – Machine Learning is the modern science of discovering patterns and making predictions from complex data.
  • UCI Machine Learning Repository – We currently maintain 273 data sets as a service to the machine learning community. You may view all data sets through our searchable interface. Our old web site is still available, for those who prefer the old format. For a general overview of the Repository, please visit our About page. For information about citing data sets in publications, please read our citation policy. If you wish to donate a data set, please consult our donation policy. For any other questions, feel free to contact the Repository librarians. We have also set up a mirror site for the Repository.
  • Log.io – Real-time log monitoring in your browser
  • doubaokun/node-ab – A command tool to test the performance of HTTP services.
  • zanchin/node-http-perf – Node HTTP Server Performance Tool
  • Uptime by fzaninotto – A remote monitoring application using Node.js, MongoDB, and Twitter Bootstrap.
  • Cloud Foundry and Logstash – Scott Frederick’s humble blog – The Cloud Foundry Loggregator component formats logs according to the syslog standard as defined in RFC5424. The logstash cookbook includes an example configuration for syslog consumption, but that configuration follows an older RFC3164 syslog standard.

    Here is a logstash configuration that works with RFC5424 output, with some additional changes for Cloud Foundry:

  • Using your historical data for analytic usage
  • Using R for Educational Research: An Introductory Workshop to Break the Learning Curve – R_intro_SERA_2012.pdf
  • Welcome to a Little Book of R for Time Series! — Time Series 0.2 documentation – This is a simple introduction to time series analysis using the R statistics software.
  • BestFirstRTutorial.pdf
  • Introducing R
  • Introduction to R Seminar – UCLA Institute for Digital Research and Education
0 comments

These are my links for December 11th through March 3rd:

  • Output to Elasticsearch in Logstash format (Kibana-friendly) – In this post you’ll see how you can take your logs with rsyslog and ship them directly to Elasticsearch (running on your own servers, or the one behind Logsene’s Elasticsearch API) in a format that plays nicely with Logstash. So you can use Kibana to search, analyze and make pretty graphs out of them.

    This is especially useful when you have a lot of servers logging [a lot of data] to their syslog daemons and you want a way to search them quickly or do statistics on the logs. You can use rsyslog’s Elasticsearch output to get your logs into Elasticsearch, and Kibana to visualize them. The only challenge is to get your rsyslog configuration right, so your logs end up where Kibana is expecting them. And this is exactly what we’re doing here.

  • GraphLab Notebook | GraphLab – The power of GraphLab with the ease of Python, running in the Cloud.
  • Prelert Introduces Push Button Machine Learning in Anomaly Detective 3.1 – Prelert, the first vendor to package data science into downloadable applications for everyday users, today announced the release of Anomaly Detective 3.1, which introduces the ability to deploy powerful machine learning tools at the push of a button.

    Anomaly Detective is a deeply integrated app for Splunk Enterprise that helps identify and resolve performance and security issues, and their causes, as they develop. It provides a solution to one of the major problems inherent in working with Big Data – gaining valuable insights from otherwise overwhelming volumes of data in real-time.

  • DataLoop.io – Cloud Server Monitoring for DevOps & Operations Teams – Dataloop.IO is a new start-up in the IT Infrastructure Monitoring space, focused on building a new monitoring tool for DevOps/Operations teams that run Cloud services at scale.

    Our Cloud service significantly reduces the time required to setup and deploy your monitoring. It reduces the friction of writing and deploying new monitoring scripts so your team can ensure full coverage regardless of how quickly your environment is changing.

  • Predictions For 2014: Technology Monitoring | Forrester Blogs – Further development of pattern analytics to complement log-file analytics. For the last five years, log-file analytics has been a major focus area in the area of IT operational analytics. During 2014 we expect further development with pattern analytics or features that can make insights based on data in-stream or in-flight on the network.

    Re-emergence of business service management (BSM) features. Increasing technology innovation is leading to greater complexity in business service architecture. This means that any features that simplify the management of complex business services become a must. Hence why we predict the re-emergence of BSM features that will be more successful than previous attempts, as these new BSM approaches will have automated discovery and mapping of technology to business services.

  • Data Mining Map – An Introduction to Data Mining
  • Actian Analytics Platform™ | Accelerating Big Data 2.0™ | Actian – Actian transforms big data into business value for any organization – not just the privileged few. Our next generation Actian Analytics Platform™ software delivers extreme performance, scalability, and agility on off-the-shelf hardware, overcoming key technical and economic barriers to broad adoption of big data, delivering Big Data for the Rest of Us™.
  • Visual Intelligence for your web application – COSCALE – The COSCALE Application Performance Analyzer provides swift and accessible visual intelligence for your web-application through smart correlations of any application and infrastructure metric
  • Qubole | Big Data as a Service – Switch your data infrastructure to auto-pilot using our award-winning, auto-scaling Hadoop cluster, built-in data connectors and an intuitive graphical editor for all things Big Da
  • Altiscale Hadoop as a Service – Altiscale’s offering is ideally suited for today’s data science needs. Features for data science include permanent HDFS volumes, access to the latest tools, resource sharing without conflict, job-level monitoring and support, and pricing plans that eliminate unpleasant surprises.
  • Boundary Surpasses 400% YoY Growth in Processing of Massive IT Operations Performance Analytics in the Cloud – The Boundary service is processing an average of 1.5 trillion application and infrastructure performance metrics per day on behalf of its clients and has computed occasional daily bursts of over 2 trillion metrics.
  • To Log or Not to Log: Proven Best Practices for Instrumentation – Innovation Insights – To log or not to log? This is an age-old question for developers. Logging everything can be great because you have plenty of data to work from when you have a problem. But it’s not so great if you have to grep and inspect it all yourself. In my mind, developers should instead be thinking about logging the right events in the right format for the right consumer.
  • IT Operations Analytics (ITOA) Landscape – Say goodbye to years of chronic IT Operations pains. IT Operations Analytics (ITOA) is here, and gaining strong momentum. You Are a Leader – Seize The Opportunity.
  • Zoomdata – Next Generation Big Data Analytics

    - Built for the Big Data Revolution
    - Connected to the World in Real-Time
    - Designed for the Touch Generation
    - Fuses Data into a Single Experience
    - Easy & Powerful Interface

  • Enterprise Management Services Enterprise Event Management – Enterprise Event Management Trying to manage a modern IT environment without a consolidated view of operations is like trying to drive a car at 100 mph while looking at six different dashboards. The proliferation of development, build, and operations tools has made it increasingly difficult to stay in control of IT and reduce downtime. Too often, developers and administrators have been left with two unappealing alternatives: Either they have to try and write their own event consolidator or they struggle with legacy products from a different era. Boundary is the industry’s leading SaaS-based enterprise event management offering, enabling you to track and optimize your modern, rapidly changing application infrastructures. With Boundary, you can consolidate, standardize, prioritize, enrich and correlate events and notifications from hundreds of systems into a single console.
  • Cubism.js – Cubism.js is a D3 plugin for visualizing time series. Use Cubism to construct better realtime dashboards, pulling data from Graphite, Cube and other sources. Cubism is available under the Apache License on GitHub.
  • Crosslet – Crosslet is a free small (22k without dependencies) JavaScript widget for interactive visualisation and analysis of geostatistical datasets. You can also use it for visualizing and comparing multivariate datasets. It is a combination of three very powerful JavaScript libraries: Leaflet, an elegant and beautiful mapping solution, and Crossfilter, a library for exploring large multivariate datasets in the browser. D3, a data driven way of manipulating objects. Crosslet also supports TopoJSON, a GeoJSON extension that allows to present geometry in a highly compact way. Crosslet is written in CoffeeScript and uses less for styling.
  • Charts, Graphs and Images – CodeProject
  • dc.js – Dimensional Charting Javascript Library – dc.js is a javascript charting library with native crossfilter support and allowing highly efficient exploration on large multi-dimensional dataset (inspired by crossfilter's demo). It leverages d3 engine to render charts in css friendly svg format. Charts rendered using dc.js are naturally data driven and reactive therefore providing instant feedback on user's interaction. The main objective of this project is to provide an easy yet powerful javascript library which can be utilized to perform data visualization and analysis in browser as well as on mobile device.
  • SharePoint Development Lab by @avishnyakov » Go Cloud – A better logging for SharePoint Online/Office365/Azure apps – It seems that cloud based products and services have a significant impact on how we design, write, debug, trace and deliver our applications. The way we think about this is not the same anymore; there might be no need to have SharePoint on-premises and SharePoint Online/O365 might be a better choice, there might be no reason to host a web application on dedicated hardware/hosting provider, but Azure could bring more benefits. All these trends cannot be simple ignored, and it is a good thing to see how new services and offerings might be used in you applications.
0 comments

To catch up, check out part 1, part 2 and part 3.

I wanted to get an up to date configuration out based on some recent work for our upcoming Pulse 2014 demo making use of the latest versions of logstash v1.3.3 and our SCALA v1.2.0 release. Nothing significantly different per se, but the changes in logstash syntax and internal event flow/routing has significantly changed from v1.1.x.

I’ve included an example logstash v1.3.3 configuration file in my git repo here. It should be simple to follow the flow from inputs, filters and outputs. The use of tags and conditionals is key to control filter activation and output routing. It’s very powerful stuff!

I’ll get another post out this week with our next key component being the SCALA DSV pack to consume the events routed via logstash to SCALA.

0 comments

These are my links for November 19th through December 11th:

  • The Netflix Tech Blog: Announcing Suro: Backbone of Netflix’s Data Pipeline – Suro, which we are proud to announce as our latest offering as part of the NetflixOSS family, serves as the backbone of our data pipeline. It consists of a producer client, a collector server, and plugin framework that allows events to be dynamically filtered and dispatched to multiple consumers.
  • Sensu | An open source monitoring framework – Designed for the Cloud The Cloud introduces new challenges to monitoring tools, Sensu was created with them in mind. Sensu will scale along with the infrastructure that it monitors.
  • datastack.io – data integration as a service – collect data. share insights.data integration as a service * * Kinda Logstash or Heka. But without the pain.
  • Glassbeam Begins Where Splunk Ends – Going Beyond Operational Intelligence with IoT Logs | Glassbeam – Glassbeam SCALAR is a flexible, hyper scale cloud-based platform capable of organizing and analyzing complex log bundles including syslogs, support logs, time series data and unstructured data generated by machines and applications. By creating structure on the fly based on the data and its semantics, Glassbeam’s platform allows traditional BI tools to plug into this parsed multi-structured data so companies can leverage existing BI and analytics investments without having to recreate their reports and dashboards. By mining machine data for product and customer intelligence, Glassbeam goes beyond traditional log management tools to leverage this valuable data across the enterprise. With a focus on providing value to the business user, Glassbeam’s platform and applications enable users to reduce costs, increase revenues and accelerate product time to market. In fact, Enterprise Apps Today’s Drew Robb recognized this critical value proposition naming Glassbeam a hot Big Data startup for analytics, which is attracting interest from investors, partners and customers. Today’s acquisition serves to showcase a market that is heating up, and new requirements around data analytics. But this is only the start and Glassbeam deliberately picks up where Splunk ends. We remain committed to cutting through the clutter and providing a clear view of operational AND business analytics to users across the enterprise.
  • Splunk Buys Cloudmeter to Boost Operational Intelligence Portfolio – The acquisition of Cloudmeter rounds out Splunk's portfolio with a capability to analyze machine data from a wider range of sources. Financial terms of the deal were not disclosed. The transaction was funded with cash from Splunk's balance sheet, the company said. Indeed, the addition of Cloudmeter will enhance the ability of Splunk customers to analyze machine data directly from their networks and correlate it with other machine-generated data to gain insights across Splunk's core use cases in application and infrastructure management, IT operations, security and business analytics.
  • Netuitive Files for Ground-Breaking New Patent in IT Operations Analytics – Press Release – Digital Journal – The patent filing is led by Dr. Elizabeth A. Nichols, Chief Data Scientist for Netuitive, a quantitative analytics expert focused on extending Netuitive's portfolio of IT Operations Analytics (ITOA) solutions to new applications and services. "Netuitive is committed to delivering industry leading IT Operations Analytics that proactively address business performance," said Dr. Nichols. "In addition, Netuitive's research and development is actively focused on new algorithm initiatives that will further advance our abilities to monitor new managed elements associated with next-generation IT architecture and online business applications."
  • Legume for Logstash – Legume Web Interface for Logstash & Elasticsearch Legume is a zeroconfig web interface run entirely on the client side that allows to browse and search log messages in Elasticsearch indexed by Logstash.
  • Deploying an application to Liberty profile on Cloud Foundry | WASdev – As part of the partnership between Pivotal and IBM we have created the WebSphere Application Server Liberty Buildpack, which enables Cloud Foundry users to easily deploy apps on Liberty profile.
  • IBM’s project Neo takes aim at the data discovery and visualisation market – MWD’s Insights blog – Project Neo is IBM’s answer to data visualisation and discovery for business users. It promises to help those who don’t possess specialist skills or training in analytics, to visually interact with their data and surface interesting trends and patterns by using a more simplistic dashboard interface that helps and guides users in the analysis process. Whereas previous tool incarnations are often predisposed to using data models, scripting or require knowledge of a query language, Project Neo takes a different tack. It aims to bypass this approach by enabling users to ask questions in plain English against a raw dataset (including CSV or Excel files) and return results in the form of interactive visualisations.
  • Machine learning is way easier than it looks | Inside Intercom – Like all of the best frameworks we have for understanding our world, e.g. Newton’s Laws of Motion, Jobs to be Done, Supply & Demand — the best ideas and concepts in machine learning are simple. The majority of literature on machine learning, however, is riddled with complex notation, formulae and superfluous language. It puts walls up around fundamentally simple ideas.

    Let’s take a practical example. Say we wanted to include a “you might also like” section at the bottom of this post. How would we go about that?

  • Where Are My AWS Logs? – Logentries Blog – Over my time at Logentries, we’ve had users contact us about where to find their logs while they were setting up Logentries. As a result, we recently released a feature for Amazon Web Services called the AWS Connector, which automatically discovers your log files across your Linux EC2 instances, no matter how many instances you have. Finding your linux logs however may only be a first step in the process as AWS logs can be all over the map… so to speak…. So where are they located? Here’s where you can start to find some of these.
  • Responsive Log Management… Like Beauty, it’s in the Eye of the Bug-holder | – As a software engineer, I’m responsible for the code I write and responsible for what we ship. But designing, building, and deploying SaaS is a real challenge – it means software developers are now responsible for making sure the live system runs well too. This is a real challenge, but with Loggly I get real-time telemetry on how my code is running, how my systems are behaving – and how well our software meets the need of our customers.
  • Mahout Explained in 5 Minutes or Less – blog.credera.com – In the spectrum of big data tools, Apache Mahout is a machine-learning engine that fits into the data mining category of the big data landscape. It is one of the more interesting tools in the big data toolbox because it allows you to extract actionable tasks from a big data set. What do we mean by actionable tasks? Things such as purchase recommendations based on a similar customer’s buying habits, or determining whether a user comment is spam based on the word clusters it contains.
  • Change management using Evolven’s IT Operations Analytics – TechRepublic – Evolven is designed to track and report change across an array of operating systems, databases, servers, and more to help pinpoint inconsistencies. It can also assist you in preventing issues and determining root causes of problems. Evolven can be helpful with automation—to find out why things didn’t work as expected and what to do next—and can also alert you to suspicious or unauthorized changes in your environment.

    Human and technological policies go hand-in-hand to balance each other and ensure the best possible results. Whereas my last article on the subject referenced the human processes IT departments should follow during change management, I’ll now take a look at technology that can back those processes up by examining what Evolven does and what benefits it can bring

  • Fluentd vs Logstash – Jason Wilder’s Blog – Fluentd and Logstash are two open-source projects that focus on the problem of centralized logs. Both projects address the collection and transport aspect of centralized logging using different approaches.

    This post will walk through a sample deployment to see how each differs from the other. We’ll look at the dependencies, features, deployment architecture and potential issues. The point is not to figure out which one is the best, but rather to see which one would be a better fit for your environment.

  • astanway/crucible · GitHub – Crucible is a refinement and feedback suite for algorithm testing. It was designed to be used to create anomaly detection algorithms, but it is very simple and can probably be extended to work with your particular domain. It evolved out of a need to test and rapidly generate standardized feedback for iterating on anomaly detection algorithms.
  • Now in Public Beta – Log Search & Log Watch | The AppFirst Blog – The decision to open our new log applications to the public was not one taken lightly. Giving our customers the ability to search all of their log files for any keywords is quite taxing on our system, so we had to take several precautions. To ensure the reliability of our entire architecture, we decided to create a separate web server solely responsible for retrieving log data from our persistence storage HBase. By making this an isolated subsystem, we don’t run the risk of a potentially large query bogging everything else down as well.
  • Log Insight: Remote Syslog Architectures | VMware Cloud Management – VMware Blogs – When architecting a syslog solution, it is important to understand the requirements both from a business and a product perspective. I would like to discuss the different remote syslog architectures that are possible when using vCenter Log Insight.
  • Why We Need a New Model for Anomaly Detection: #1 | Metafor Software – Share on reddit Share on hackernews Share on email

    I’m not talking about anomaly detection in stable enterprise IT environments. Those are doing just fine. Those infrastructures have mature, tested procedures for rolling out software updates and implementing new applications on an infrequent basis (still running FORTRAN written in the 70s, on servers from the 70s, yeah, that’s a thing).

    I’m talking about anomaly detection in the cloud, where the number of virtual machines fluctuates as often as application roll outs. Current solutions for anomaly detection track dozens or even hundreds of metrics per server in an attempt to divine normal performance and spot anomalous behavior. An ideal solution would adapt itself to the quirks of each metric, to different application scenarios, and to machine re-configurations.

    This is a problem that lends itself to machine learning techniques, but it’s still an incredibly difficult problem to solve. Why?

  • Beyond The Pretty Charts – A Report From #devopsdays in Austin | Metafor Software – Don’t just look at timeline charts. We’ve fallen into the trap of looking at all the pretty charts as time series charts. When we do that, we end up missing some important characteristics. For example, a simple histogram of the data, instead of just a time chart, can tell you a lot about anomalies and distribution. Using different kinds of visualization is crucial to giving us a different aspect on our data.
  • Server Anomaly Detection | Predictive IT Analytics | Config Drift Monitoring | Metafor Software – Know about problems before your threshold based monitoring tool does. Get alerted to issues your thresholds will never catch.

    Metafor’s machine learning algorithms alert you to anomalous behavior in your servers, clusters, applications, and KPIs.

0 comments

If you’d like to catch up, check out the first three posts in this tutorial, starting here.

The easiest way to get started is by having a good understanding of the structure of your Netcool events. With a fairly default deployment we know there are a number of standard alerts.status fields of interest such as first and last occurrence, node, agent, alert group, alert key, manager to name a few. Nearly every customer I have ever worked with has extended their alerts.status schema to accommodate the various probe and gateway level integrations they have as well as to support event enrichment, auto-ticketing, etc.

There’s definitely a level of maturity here that needs to be understood through brief analysis of your events via the AEL. Which slots are you populating with a high degree of completeness? Which ones help you understand the context of an event beyond the node name? Which ones are used to determine if the event is ever acted upon? Which ones will help you assess the event streams, ask questions and take actions on investigating event validity within your environment? Your goal is to ensure you have the best possible set of fields that will enable your event analysis, event analytics and most importantly the decisions, actions and next steps you will be able to take based upon your analysis.

One place you can get a complete snapshot of the alerts.status configuration is the ../omnibus/var/Tivoli_eif.NCOMS.alerts.status.def file. I used this to get the list of all the field names for easy copy and paste when building my socket gateway mapping file.

With the fields of interest identified, download and install the Netcool/OMNIbus socket gateway in accordance with the install instructions in the docs. If you don’t already own the socket gateway, check with your sales rep. In most cases since you’re using it to route events from one C&SI product to another, there isn’t a charge. But, IANAL and T&C’s change with the wind so check. If you have a problem with this, ping me and I can suggest a number of other alternative approaches.

Once installed, the first configuration activity is to update the gateway’s socket.map file with the fields you’re interested in.

  • Make a backup copy of the original.
  • Remove the default fields you’re not interested in.
  • Add fields you are interested in.
  • Place the fields in a logical order.
  • NOTE: I’m placing the @Identifier first as the socket gateway inserts an event type (INSERT, UPDATE, DELETE) in front of each event it sends across so we don’t want that to mess up any other slot.

This is the socket map I used within in our pretty default environment when sending events from ITM, APM/ITCAM, BSM, etc. For a bare bones set up, the ones I’ve highlighted in bold are probably good enough to get started.

CREATE MAPPING StatusMap
(
'' = '@Identifier',
'' = '@LastOccurrence' CONVERT TO DATE,
'' = '@FirstOccurrence' CONVERT TO DATE,
'' = '@Node',

'' = '@NodeAlias',
'' = '@Summary',
'' = '@Severity',
'' = '@Manager',
'' = '@Agent',
'' = '@AlertGroup',
'' = '@AlertKey',
'' = '@Type',
'' = '@Tally',
'' = '@Class',
'' = '@Grade',

'' = '@Location',
'' = '@ITMDisplayItem',
'' = '@ITMEventData',
'' = '@ITMTime',
'' = '@ITMHostname',
'' = '@ITMSitType',
'' = '@ITMThruNode',
'' = '@ITMSitGroup',
'' = '@ITMSitFullName',
'' = '@ITMApplLabel',
'' = '@ITMSitOrigin',
'' = '@CAM_Application_Name',
'' = '@CAM_Transaction_Name',
'' = '@CAM_SubTransaction_Name',
'' = '@CAM_Client_Name',
'' = '@CAM_Server_Name',
'' = '@CAM_Profile_Name',
'' = '@CAM_Response_Time',
'' = '@CAM_Percent_Available',
'' = '@CAM_Expected_Value',
'' = '@CAM_Actual_Value',
'' = '@CAM_Details',
'' = '@CAM_Total_Requests',
'' = '@BSMAccelerator_Service',
'' = '@BSMAccelerator_Function'
);

Next, we need to set up some simple filtering to control the event types we send across the gateway. The socket.reader.tblrep.def is used to define what comes across the socket gateway and what filters we might want to apply. Here are a couple examples I’ve used.

Only sends INSERTS and UPDATES (not DELETES as they don’t send across the entire event structure) and filter out all of the internal TBSM events which are Class 12000.

REPLICATE INSERTS, UPDATES FROM TABLE 'alerts.status'
USING MAP 'StatusMap'
FILTER WITH 'Class !=12000';

Only sends INSERTS and UPDATES (not DELETES as they don’t send across the entire event structure) and filter out events with Severity 0, 1 and 2.

REPLICATE INSERTS FROM TABLE 'alerts.status'
USING MAP 'StatusMap'
FILTER WITH 'Severity >=3';

I was unable to figure out a more complex filter example which I would have liked to use for more filtering so these had to do.

Next, the core socket gateway properties need to be configured. Edit the NCO_GATE.props file as follows.

#Update these based on your install preferences
MessageLevel : 'warn'
MessageLog : '$OMNIHOME/log/NCO_GATE.log'
Name : 'NCO_GATE'
PropsFile : '$OMNIHOME/etc/NCO_GATE.props'

#This will be the IP and Port for your logstash installation and the TCP Input you use
Gate.Socket.Host : '10.10.10.1'
Gate.Socket.Port : 1234

#These will create a comma separated (CSV) event format with fields wrapped in " ".
Gate.Socket.EndString : '"'
Gate.Socket.StartString : '"'
Gate.Socket.Separator : ','

#This sets First/Last Occurrence format to mimic ISO8601 format supported by SCALA
Gate.Socket.DateFormat : '%Y-%m-%dT%H:%M:%S%Z'

Here’s how to start the socket gateway for reference later. We’ll need the remote end of the TCP connection to be started up first.

../omnibus/bin/nco_g_socket &

You can check that your gateway is running by running the ps aux | grep nco_g command. To stop the gateway, kill the process.

Check the output file you created on the Logstash server to verify that you’ve captured some events from the gateway. If you see some there, we’re all set for our next activity to set up annotation and indexing of the events in SCALA v1103.

0 comments

Now that I’m done with what felt like months of work for our big demo at IBM’s IOD show last week, let me get this series done! Next up we’ll walk through the use of Logstash to serve as the collection and mediation tool for streaming in events from Netcool/OMNIbus and getting them indexed within SCALA v1103. We’re still using Logstash v113 here. We’ll have support for Logstash v1.2.x in our next release very soon. NOTE: With SCALA v1103 now available, that will be what I mention moving forward.

To catch up, check out part 1 and part 2.

On a separate system if at all possible, prepare for installation of Logstash v113 and the SCALA Logstash toolkit.

  • Download logstsah v1.1.13 from here
  • Create a new directory for the logtash environment. I generally create /opt/logstash.
  • Copy the SCALA Logstash Toolkit to this directory
  • Review the SCALA Logstash Toolkit installation steps
  • Explode the SCALA Logstash Toolkit
  • Copy the logstash-1.1.13-flatjar.jar package to this /opt/logstash/lstoolkit directory
  • Update the install configuration file install-scala-logstash.conf
  • Update the eif.conf file
  • Run the ./install-scala-logstash.sh script.

The lstoolkit directory contains the following files:

/opt/logstash/lstoolkit/
- LogstashLogAnalysis_v1.1.0.0.zip
- install-scala-logstash.conf
- startlogstash-scala.sh
- install-scala-logstash.sh
- logstash-1.1.13-flatjar.jar
- start-logstash.conf
- logstash/

/opt/logstash/lstoolkit/logstash/
- conf/
-- logstash-scala.conf
- outputs/
-- eif-10.10.10.1.conf
-- scala_custom_eif.rb
- unity/

Next, we need to make a few simple configurations in the Logstash configuration file to get us up and running. In this simple scenario, the following configuration file for Logstash should be updated with a configuration similar to this:

input
{
#Create your TCP input which your Netcool/OMNIbus socket gateway will connect to

tcp
{
type=> "netcool"
format=> "plain"
port=> 1234
data_timeout=> -1
}

} #End of Inputs

filter
{
#Use the Mutate filter to set the hostname and log path to anything you want. This is used in the SCALA LogSource definition.

mutate
{
type=> "netcool"
replace=>["@source_host","MYOMNIBUSNAME","@source_path","Netcool"]
}

#Have some events you want to drop out? I used the Grep filter type to filter out some poorly formatted events whose summary message included commas which broke SCALA DSV processing

grep
{
type=> "netcool"
match=>[ "@message",".*WAS_YN_WebAppNoActivity_W.* | .*WAS_YN_WebAppActivity_H.*" ]
negate=> true
}

} #End of Filters

output
{
#Create a simple output file of all your raw CSV delimited events for future use, replay, etc.

file
{
type=> "netcool"
message_format=> "%{@message}"
path=> "/opt/logstash/raw-events-csv.log"
}

#Create one or more ouputs to spray events to as many SCALA boxes as you'd like

scala_custom_eif
{
eif_config=> "logstash/outputs/eif-10.10.10.1.conf"
debug_log=> "/tmp/scala/scala-logstash-10.10.10.1.log"
debug_level=> "debug"
}

} #End of Outputs

Note: If you have multiple SCALA systems, you can spray events to each of them by having more than one output stanza for the scala_custom_eif plugin. Each one must have its own unique eif_config and debug_log configurations. I just put in the IP address of my end points to easily identify each one.

To start up Logstash, use the ./startlogstash-scala.sh script. You may wish to update this to send Logstash to the background when starting up. To stop Logstash, use ps aux | grep logstash and kill the Logstash process.

When we complete the next series of tasks in Netcool/OMNIbus we can peek at the output file we created via Logstash, we can see the raw CSV events that resemble the example below. This is what’s sent across the socket gateway.

INSERT: "WAS_YN_EJBConNoActivity_W:syswasslesNode01:syswassles:KYNS::ITM_EJB_Containers",
2013-09-27T13: 46: 44EDT,
2013-09-27T13: 46: 44EDT,
"syswasslesNode01:syswassles:KYNS",
"syswasslesNode01:syswassles:KYNS",
"WAS_YN_EJBConNoActivity_W[(Method_Invocation_Rate=0.000 ) ON syswasslesNode01:syswassles:KYNS (Method_Invocation_Rate=0 )]",
1,
"tivoli_eif probe on systbsmsles",
"ITM",
"ITM_EJB_Containers",
"WAS_YN_EJBConNoActivity_W",
20,
2,
6601,
1,
"",
"",
"~",
"09/27/2013 08:29:45.000",
"sysitm.poc.ibm.com",
"S",
"TEMS",
"",
"WAS_YN_EJBConNoActivity_W",
"",
"syswasslesNode01:syswassles:KYNS",
"",
"",
"",
"",
"",
"",
"",
"",
"",
"",
"",
0,
"",
""

This is the event passed in from the TCP Input and through the filters to the scala_custom_eif output:

D,
[
2013-09-27T13: 46: 42.601000#21554
]DEBUG--: scala_custom_eif: Receivedevent: # @data={
"@source"=>"tcp://10.10.10.1:52074/",
"@tags"=>[

],
"@fields"=>{

},
"@timestamp"=>"2013-09-27T17:46:42.588Z",
"@source_host"=>"s3systbsmsles",
"@source_path"=>"Netcool",
"@message"=>"INSERT: \"WAS_YN_EJBConNoActivity_W:syswasslesNode01:syswassles:KYNS::ITM_EJB_Containers\",2013-09-27T13:46:44EDT,2013-09-27T13:46:44EDT,\"syswasslesNode01:syswassles:KYNS\",\"syswasslesNode01:syswassles:KYNS\",\"WAS_YN_EJBConNoActivity_W[(Method_Invocation_Rate=0.000 ) ON syswasslesNode01:syswassles:KYNS (Method_Invocation_Rate=0 )]\",1,\"tivoli_eif probe on systbsmsles\",\"ITM\",\"ITM_EJB_Containers\",\"WAS_YN_EJBConNoActivity_W\",20,2,6601,1,\"\",\"\",\"~\",\"09/27/2013 08:29:45.000\",\"sysitm.poc.ibm.com\",\"S\",\"TEMS\",\"\",\"WAS_YN_EJBConNoActivity_W\",\"\",\"syswasslesNode01:syswassles:KYNS\",\"\",\"\",\"\",\"\",\"\",\"\",\"\",\"\",\"\",\"\",\"\",0,\"\",\"\"\n",
"@type"=>"netcool"
}>

This is the event sent out of the scala_custom_eif output in the IBM Event Integration Framework (EIF) format fit for consumption by the SCALA EIF Receiver.

D,
[
2013-09-27T13: 46: 42.602000#21554
]DEBUG--: scala_custom_eif: Sendingtecevent: AllRecords;hostname='s3systbsmsles';RemoteHost='';text='INSERT: "WAS_YN_EJBConNoActivity_W:syswasslesNode01:syswassles:KYNS::ITM_EJB_Containers",
2013-09-27T13: 46: 44EDT,
2013-09-27T13: 46: 44EDT,
"syswasslesNode01:syswassles:KYNS",
"syswasslesNode01:syswassles:KYNS",
"WAS_YN_EJBConNoActivity_W[(Method_Invocation_Rate=0.000 ) ON syswasslesNode01:syswassles:KYNS (Method_Invocation_Rate=0 )]",
1,
"tivoli_eif probe on systbsmsles",
"ITM",
"ITM_EJB_Containers",
"WAS_YN_EJBConNoActivity_W",
20,
2,
6601,
1,
"",
"",
"~",
"09/27/2013 08:29:45.000",
"sysitm.poc.ibm.com",
"S",
"TEMS",
"",
"WAS_YN_EJBConNoActivity_W",
"",
"syswasslesNode01:syswassles:KYNS",
"",
"",
"",
"",
"",
"",
"",
"",
"",
"",
"",
0,
"",
""';logpath='Netcool';END

Logstash is far more powerful than what I’ve showed in this very simple example. I’d encourage you to investigate its capabilities further by reading the website, user group or IRC.

Up next, we’ll walk through the configuration of Netcool/OMNIbus and get our events flowing towards Logstash and SCALA.

0 comments