Bookmarks for October 1st through November 18th

by delicious on November 18, 2013

in General

These are my links for October 1st through November 18th:

Subscribing to the WebSphere MQ FTE Transfer log topic (Computing minutiea – notes from a small island.) – A customer was just asking me about the 'transfer log' for WebSphere MQ File Transfer Edition. They had mistakenly took the term to refer to a log file which contained a record of all transfers. In fact, WMQ FTE provides for auditing by publishing transfer-related information on a well-known topic name, namely "SYSTEM.FTE/Log/".
I explained to them that there are two built in facilities which take advantage of these publications to provide a full auditing solution.

The WMQ Explorer plug-in for FTE, subscribes and presents the information in a tabular report.
The Database logger, subscribes and then stores the information in database tables.
Splunk Drives Operational Intelligence with Amazon Web Services – vailability of new Amazon Machine Images (AMIs) for Splunk® Enterprise 6 and Hunk™: Splunk Analytics for Hadoop. The new AMIs further accelerate the speed at which organizations can deploy Splunk software and gain critical visibility into their cloud-based applications and data. Splunk also released the new version of the Splunk App for Amazon Web Services (AWS), which leverages the newly announced AWS CloudTrail, a new service that logs all AWS API calls, to enable organizations to improve monitoring, security and compliance across all applications and infrastructure running in AWS. The Splunk Enterprise AMI and Hunk AMI are available in the AWS Marketplace. The Splunk App for AWS is available on Splunk Apps.
Splunk and Prelert Predict: What’s the Difference? – In conclusion, the Prelert Anomaly Detective is different from Splunk’s ‘predict’ command in the following ways:
Less false alerts on data with non-Gaussian profiles;
Easily scales to analyze multiple items simultaneously, even across sourcetypes;
Automatically scores each anomaly based on the severity of the deviations; and
Can be easily operationalized to a Real-Time search (with alerts) without manually having to read/write summary indexes.

Prelert seeks to complement Splunk, as Anomaly Detective extends the capabilities of Splunk’s “predict” by filtering out noise, analyzing multiple items simultaneously, and isolating true anomalies without setting thresholds.
The-Field-Guide-to-Data-Science/ at master · booz-allen-hamilton/The-Field-Guide-to-Data-Science · GitHub – We cannot capture all that is Data Science. Nor can we keep up – the pace at which this field progresses outdates work as fast as it is produced. As a result, we have opened this field guide to the world as a living document to bend and grow with the community, technology, expertise, and evolving techniques. Therefore, if you find the guide to be useful, neat, or even lacking, then we encourage you to add your expertise, including:
Case studies from which you have learned
Citations for journal articles or papers that inspire you
Algorithms and techniques that you love
Your thoughts and comments on other people’s additions
csvfix – CSVfix is a tool for manipulating CSV data – Google Project Hosting – CSVfix is a command-line tool specifically designed to deal with CSV data. With it you can, among other things:
Reorder, remove, split and merge fields
Convert case, trim leading & trailing spaces
Search for specific content using regular expressions
Filter out duplicate data or data on exclusion lists
Enrich with data from other sources
Add sequence numbers and file source information
Split large CSV files into smaller files based on field contents
Perform arithmetic calculations on individual fields
Validate CSV data against a collection of validation rules
Convert between CSV and fixed format, XML, SQL and DSV
RDataMining.com: R and Data Mining – This website presents documents, examples, tutorials and resources on R and data mining
? IBM IT Operations Analytics: Solving the IT Operations Big Data Challenge – YouTube – IBM IT Operations Analytics: Solving the IT Operations Big Data Challenge
IBM IT Operations Analytics: Achieving Actionable Insights from IT Operations Big Data – YouTube – IBM IT Operations Analytics: Achieving Actionable Insights from IT Operations Big Data
? IT Operations Analytics: The Magic Inside – Video Blog – YouTube – IT Operations Analytics: The Magic Inside
Agile Insights: Big Data Capture & Software Analytics | New Relic – Software Analytics is about gathering billions and billions of metrics from your live production software, including user clickstreams, mobile activity, end user experiences and transactions, and then making sense of those — providing you with business insights. Software analytics includes Application Performance Management, but extends to User Behavior, Business Transactions, Customer Insights and much, much more.
iis and logstash – IIS grok pattern
Lumberjack – a Light Weight Log Shipper for Logstash | beingasysadmin – Lumberjack is one such input plugin designed for logstash. Though the plugin is still in beta state, i decided to give it a try. By default we can also use logstash itself for shipping logs to centralized Logstash server, the JVM made it difficult to work with many of my constrained machines. Lumberjack claims to be a light weight log shipper which uses SSL and we can add custom fields for each line of log which we ships.
Machine Learning Platform – Text Analysis Service | Datumbox – Power-up your own Intelligent Applications by using our cutting edge Machine Learning platform. Sign-up today and start building intelligent services with our powerful & easy-to-use API.
BigML – Machine Learning Made Easy – Easily add data-driven decisions and predictive power to your company
Forecasting: principles and practice | OTexts – This textbook is intended to provide a comprehensive introduction to forecasting methods and to present enough information about each method for readers to be able to use them sensibly. We don’t attempt to give a thorough discussion of the theoretical details behind each method, although the references at the end of each chapter will fill in many of those details. The book is written for three audiences: (1) people finding themselves doing forecasting in business when they may not have had any formal training in the area; (2) undergraduate students studying business; (3) MBA students doing a forecasting elective. We use it ourselves for a second-year subject for students undertaking a Bachelor of Commerce degree at Monash University, Australia.
Is Splunk Cloud a Cop-Out? – “It's a tacit admission by Splunk that Storm isn't really competitive with the new breed of log management SaaS guys like Loggly, SumoLogic, Logentries, etc. They’ve got some work to do with Splunk Cloud, including on the pricing model, but I do expect it to be a formidable offering in the still very nascent log management SaaS space. I'd be careful to sell Splunk short, they're still the team to beat in this space. And there's still a lot of opportunity out there.”
Orange – Data Mining Fruitful & Fun – Open source data visualization and analysis for novice and experts. Data mining through visual programming or Python scripting. Components for machine learning. Add-ons for bioinformatics and text mining. Packed with features for data analytics.
Apache Mahout: Scalable machine learning and data mining – Currently Mahout supports mainly four use cases: Recommendation mining takes users' behavior and from that tries to find items users might like. Clustering takes e.g. text documents and groups them into groups of topically related documents. Classification learns from exisiting categorized documents what documents of a specific category look like and is able to assign unlabelled documents to the (hopefully) correct category. Frequent itemset mining takes a set of item groups (terms in a query session, shopping cart content) and identifies, which individual items usually appear together.
Did Splunk Just Surrender on SaaS? | – Today, it appears that Splunk has thrown in the towel on Software as a Service (SaaS) and replaced Splunk Storm with a hosted software model. We were always skeptical that a company with such a phenomenally successful enterprise software business would disrupt its own business with a serious SaaS offering. And with today’s announcement of Splunk Cloud now it seems that the doubts were justified.
Using ElasticSearch And Logstash To Serve Billions Of Searchable Events For Customers | Blog | Elasticsearch – There are quite a bit of projects and services out there focussed on logging events. We ultimately picked Logstash, a tool for collecting, parsing, mangling and passing on logs.
Internally, the events pushed out via our webhooks are also used in other parts of our system. We currently use Redis for this. Logstash has a Redis input plugin that retrieves log events from a Redis list. After some minor filtering, the events can then be sent out via an output plugin. A very commonly used output plugin is the Elasticsearch plugin.

A great way to use Elasticsearch’s very rich API is by setting up Kibana, a tool to “make sense of a mountain of logs”. The new incarnation, Kibana 3, is fully client side JavaScript, and will be the default interface for Logstash. Unlike previous versions, it no longer depends on a Logstash-like schema, but is now usable for any Elasticsearch index.

Next post: Metafor Software – IT Operations Analytics

Previous post: Using Netcat to Stream Logs to SCALA v1102