dougmcclure.net — thoughts on business, service and technology operations and management in the digital transformation era

Bookmarks for December 11th through March 3rd

by delicious on March 3, 2014

in General

These are my links for December 11th through March 3rd:

Output to Elasticsearch in Logstash format (Kibana-friendly) – In this post you’ll see how you can take your logs with rsyslog and ship them directly to Elasticsearch (running on your own servers, or the one behind Logsene’s Elasticsearch API) in a format that plays nicely with Logstash. So you can use Kibana to search, analyze and make pretty graphs out of them.
This is especially useful when you have a lot of servers logging [a lot of data] to their syslog daemons and you want a way to search them quickly or do statistics on the logs. You can use rsyslog’s Elasticsearch output to get your logs into Elasticsearch, and Kibana to visualize them. The only challenge is to get your rsyslog configuration right, so your logs end up where Kibana is expecting them. And this is exactly what we’re doing here.
GraphLab Notebook | GraphLab – The power of GraphLab with the ease of Python, running in the Cloud.
Prelert Introduces Push Button Machine Learning in Anomaly Detective 3.1 – Prelert, the first vendor to package data science into downloadable applications for everyday users, today announced the release of Anomaly Detective 3.1, which introduces the ability to deploy powerful machine learning tools at the push of a button.
Anomaly Detective is a deeply integrated app for Splunk Enterprise that helps identify and resolve performance and security issues, and their causes, as they develop. It provides a solution to one of the major problems inherent in working with Big Data – gaining valuable insights from otherwise overwhelming volumes of data in real-time.
DataLoop.io – Cloud Server Monitoring for DevOps & Operations Teams – Dataloop.IO is a new start-up in the IT Infrastructure Monitoring space, focused on building a new monitoring tool for DevOps/Operations teams that run Cloud services at scale.
Our Cloud service significantly reduces the time required to setup and deploy your monitoring. It reduces the friction of writing and deploying new monitoring scripts so your team can ensure full coverage regardless of how quickly your environment is changing.
Predictions For 2014: Technology Monitoring | Forrester Blogs – Further development of pattern analytics to complement log-file analytics. For the last five years, log-file analytics has been a major focus area in the area of IT operational analytics. During 2014 we expect further development with pattern analytics or features that can make insights based on data in-stream or in-flight on the network.
Re-emergence of business service management (BSM) features. Increasing technology innovation is leading to greater complexity in business service architecture. This means that any features that simplify the management of complex business services become a must. Hence why we predict the re-emergence of BSM features that will be more successful than previous attempts, as these new BSM approaches will have automated discovery and mapping of technology to business services.
Data Mining Map – An Introduction to Data Mining
Actian Analytics Platform™ | Accelerating Big Data 2.0™ | Actian – Actian transforms big data into business value for any organization – not just the privileged few. Our next generation Actian Analytics Platform™ software delivers extreme performance, scalability, and agility on off-the-shelf hardware, overcoming key technical and economic barriers to broad adoption of big data, delivering Big Data for the Rest of Us™.
Visual Intelligence for your web application – COSCALE – The COSCALE Application Performance Analyzer provides swift and accessible visual intelligence for your web-application through smart correlations of any application and infrastructure metric
Qubole | Big Data as a Service – Switch your data infrastructure to auto-pilot using our award-winning, auto-scaling Hadoop cluster, built-in data connectors and an intuitive graphical editor for all things Big Da
Altiscale Hadoop as a Service – Altiscale’s offering is ideally suited for today’s data science needs. Features for data science include permanent HDFS volumes, access to the latest tools, resource sharing without conflict, job-level monitoring and support, and pricing plans that eliminate unpleasant surprises.
Boundary Surpasses 400% YoY Growth in Processing of Massive IT Operations Performance Analytics in the Cloud – The Boundary service is processing an average of 1.5 trillion application and infrastructure performance metrics per day on behalf of its clients and has computed occasional daily bursts of over 2 trillion metrics.
To Log or Not to Log: Proven Best Practices for Instrumentation – Innovation Insights – To log or not to log? This is an age-old question for developers. Logging everything can be great because you have plenty of data to work from when you have a problem. But it’s not so great if you have to grep and inspect it all yourself. In my mind, developers should instead be thinking about logging the right events in the right format for the right consumer.
IT Operations Analytics (ITOA) Landscape – Say goodbye to years of chronic IT Operations pains. IT Operations Analytics (ITOA) is here, and gaining strong momentum. You Are a Leader – Seize The Opportunity.
Zoomdata – Next Generation Big Data Analytics
– Built for the Big Data Revolution
– Connected to the World in Real-Time
– Designed for the Touch Generation
– Fuses Data into a Single Experience
– Easy & Powerful Interface
Enterprise Management Services Enterprise Event Management – Enterprise Event Management Trying to manage a modern IT environment without a consolidated view of operations is like trying to drive a car at 100 mph while looking at six different dashboards. The proliferation of development, build, and operations tools has made it increasingly difficult to stay in control of IT and reduce downtime. Too often, developers and administrators have been left with two unappealing alternatives: Either they have to try and write their own event consolidator or they struggle with legacy products from a different era. Boundary is the industry’s leading SaaS-based enterprise event management offering, enabling you to track and optimize your modern, rapidly changing application infrastructures. With Boundary, you can consolidate, standardize, prioritize, enrich and correlate events and notifications from hundreds of systems into a single console.
Cubism.js – Cubism.js is a D3 plugin for visualizing time series. Use Cubism to construct better realtime dashboards, pulling data from Graphite, Cube and other sources. Cubism is available under the Apache License on GitHub.
Crosslet – Crosslet is a free small (22k without dependencies) JavaScript widget for interactive visualisation and analysis of geostatistical datasets. You can also use it for visualizing and comparing multivariate datasets. It is a combination of three very powerful JavaScript libraries: Leaflet, an elegant and beautiful mapping solution, and Crossfilter, a library for exploring large multivariate datasets in the browser. D3, a data driven way of manipulating objects. Crosslet also supports TopoJSON, a GeoJSON extension that allows to present geometry in a highly compact way. Crosslet is written in CoffeeScript and uses less for styling.
Charts, Graphs and Images – CodeProject –
dc.js – Dimensional Charting Javascript Library – dc.js is a javascript charting library with native crossfilter support and allowing highly efficient exploration on large multi-dimensional dataset (inspired by crossfilter's demo). It leverages d3 engine to render charts in css friendly svg format. Charts rendered using dc.js are naturally data driven and reactive therefore providing instant feedback on user's interaction. The main objective of this project is to provide an easy yet powerful javascript library which can be utilized to perform data visualization and analysis in browser as well as on mobile device.
SharePoint Development Lab by @avishnyakov » Go Cloud – A better logging for SharePoint Online/Office365/Azure apps – It seems that cloud based products and services have a significant impact on how we design, write, debug, trace and deliver our applications. The way we think about this is not the same anymore; there might be no need to have SharePoint on-premises and SharePoint Online/O365 might be a better choice, there might be no reason to host a web application on dedicated hardware/hosting provider, but Azure could bring more benefits. All these trends cannot be simple ignored, and it is a good thing to see how new services and offerings might be used in you applications.

Event Analysis using SmartCloud Analytics Log Analysis (SCALA) v1.2.0 – Example logstash v1.3.x Configuration

by doug on February 11, 2014

To catch up, check out part 1, part 2 and part 3.

I wanted to get an up to date configuration out based on some recent work for our upcoming Pulse 2014 demo making use of the latest versions of logstash v1.3.3 and our SCALA v1.2.0 release. Nothing significantly different per se, but the changes in logstash syntax and internal event flow/routing has significantly changed from v1.1.x.

I’ve included an example logstash v1.3.3 configuration file in my git repo here. It should be simple to follow the flow from inputs, filters and outputs. The use of tags and conditionals is key to control filter activation and output routing. It’s very powerful stuff!

I’ll get another post out this week with our next key component being the SCALA DSV pack to consume the events routed via logstash to SCALA.

0 comments

Bookmarks for November 19th through December 11th

by delicious on December 11, 2013

in General

These are my links for November 19th through December 11th:

The Netflix Tech Blog: Announcing Suro: Backbone of Netflix’s Data Pipeline – Suro, which we are proud to announce as our latest offering as part of the NetflixOSS family, serves as the backbone of our data pipeline. It consists of a producer client, a collector server, and plugin framework that allows events to be dynamically filtered and dispatched to multiple consumers.
Sensu | An open source monitoring framework – Designed for the Cloud The Cloud introduces new challenges to monitoring tools, Sensu was created with them in mind. Sensu will scale along with the infrastructure that it monitors.
datastack.io – data integration as a service – collect data. share insights.data integration as a service * * Kinda Logstash or Heka. But without the pain.
Glassbeam Begins Where Splunk Ends – Going Beyond Operational Intelligence with IoT Logs | Glassbeam – Glassbeam SCALAR is a flexible, hyper scale cloud-based platform capable of organizing and analyzing complex log bundles including syslogs, support logs, time series data and unstructured data generated by machines and applications. By creating structure on the fly based on the data and its semantics, Glassbeam’s platform allows traditional BI tools to plug into this parsed multi-structured data so companies can leverage existing BI and analytics investments without having to recreate their reports and dashboards. By mining machine data for product and customer intelligence, Glassbeam goes beyond traditional log management tools to leverage this valuable data across the enterprise. With a focus on providing value to the business user, Glassbeam’s platform and applications enable users to reduce costs, increase revenues and accelerate product time to market. In fact, Enterprise Apps Today’s Drew Robb recognized this critical value proposition naming Glassbeam a hot Big Data startup for analytics, which is attracting interest from investors, partners and customers. Today’s acquisition serves to showcase a market that is heating up, and new requirements around data analytics. But this is only the start and Glassbeam deliberately picks up where Splunk ends. We remain committed to cutting through the clutter and providing a clear view of operational AND business analytics to users across the enterprise.
Splunk Buys Cloudmeter to Boost Operational Intelligence Portfolio – The acquisition of Cloudmeter rounds out Splunk's portfolio with a capability to analyze machine data from a wider range of sources. Financial terms of the deal were not disclosed. The transaction was funded with cash from Splunk's balance sheet, the company said. Indeed, the addition of Cloudmeter will enhance the ability of Splunk customers to analyze machine data directly from their networks and correlate it with other machine-generated data to gain insights across Splunk's core use cases in application and infrastructure management, IT operations, security and business analytics.
Netuitive Files for Ground-Breaking New Patent in IT Operations Analytics – Press Release – Digital Journal – The patent filing is led by Dr. Elizabeth A. Nichols, Chief Data Scientist for Netuitive, a quantitative analytics expert focused on extending Netuitive's portfolio of IT Operations Analytics (ITOA) solutions to new applications and services. "Netuitive is committed to delivering industry leading IT Operations Analytics that proactively address business performance," said Dr. Nichols. "In addition, Netuitive's research and development is actively focused on new algorithm initiatives that will further advance our abilities to monitor new managed elements associated with next-generation IT architecture and online business applications."
Legume for Logstash – Legume Web Interface for Logstash & Elasticsearch Legume is a zeroconfig web interface run entirely on the client side that allows to browse and search log messages in Elasticsearch indexed by Logstash.
Deploying an application to Liberty profile on Cloud Foundry | WASdev – As part of the partnership between Pivotal and IBM we have created the WebSphere Application Server Liberty Buildpack, which enables Cloud Foundry users to easily deploy apps on Liberty profile.
IBM’s project Neo takes aim at the data discovery and visualisation market – MWD’s Insights blog – Project Neo is IBM’s answer to data visualisation and discovery for business users. It promises to help those who don’t possess specialist skills or training in analytics, to visually interact with their data and surface interesting trends and patterns by using a more simplistic dashboard interface that helps and guides users in the analysis process. Whereas previous tool incarnations are often predisposed to using data models, scripting or require knowledge of a query language, Project Neo takes a different tack. It aims to bypass this approach by enabling users to ask questions in plain English against a raw dataset (including CSV or Excel files) and return results in the form of interactive visualisations.
Machine learning is way easier than it looks | Inside Intercom – Like all of the best frameworks we have for understanding our world, e.g. Newton’s Laws of Motion, Jobs to be Done, Supply & Demand — the best ideas and concepts in machine learning are simple. The majority of literature on machine learning, however, is riddled with complex notation, formulae and superfluous language. It puts walls up around fundamentally simple ideas.
Let’s take a practical example. Say we wanted to include a “you might also like” section at the bottom of this post. How would we go about that?
Where Are My AWS Logs? – Logentries Blog – Over my time at Logentries, we’ve had users contact us about where to find their logs while they were setting up Logentries. As a result, we recently released a feature for Amazon Web Services called the AWS Connector, which automatically discovers your log files across your Linux EC2 instances, no matter how many instances you have. Finding your linux logs however may only be a first step in the process as AWS logs can be all over the map… so to speak…. So where are they located? Here’s where you can start to find some of these.
Responsive Log Management… Like Beauty, it’s in the Eye of the Bug-holder | – As a software engineer, I’m responsible for the code I write and responsible for what we ship. But designing, building, and deploying SaaS is a real challenge – it means software developers are now responsible for making sure the live system runs well too. This is a real challenge, but with Loggly I get real-time telemetry on how my code is running, how my systems are behaving – and how well our software meets the need of our customers.
Mahout Explained in 5 Minutes or Less – blog.credera.com – In the spectrum of big data tools, Apache Mahout is a machine-learning engine that fits into the data mining category of the big data landscape. It is one of the more interesting tools in the big data toolbox because it allows you to extract actionable tasks from a big data set. What do we mean by actionable tasks? Things such as purchase recommendations based on a similar customer’s buying habits, or determining whether a user comment is spam based on the word clusters it contains.
Change management using Evolven’s IT Operations Analytics – TechRepublic – Evolven is designed to track and report change across an array of operating systems, databases, servers, and more to help pinpoint inconsistencies. It can also assist you in preventing issues and determining root causes of problems. Evolven can be helpful with automation—to find out why things didn’t work as expected and what to do next—and can also alert you to suspicious or unauthorized changes in your environment.
Human and technological policies go hand-in-hand to balance each other and ensure the best possible results. Whereas my last article on the subject referenced the human processes IT departments should follow during change management, I’ll now take a look at technology that can back those processes up by examining what Evolven does and what benefits it can bring
Fluentd vs Logstash – Jason Wilder’s Blog – Fluentd and Logstash are two open-source projects that focus on the problem of centralized logs. Both projects address the collection and transport aspect of centralized logging using different approaches.
This post will walk through a sample deployment to see how each differs from the other. We’ll look at the dependencies, features, deployment architecture and potential issues. The point is not to figure out which one is the best, but rather to see which one would be a better fit for your environment.
astanway/crucible · GitHub – Crucible is a refinement and feedback suite for algorithm testing. It was designed to be used to create anomaly detection algorithms, but it is very simple and can probably be extended to work with your particular domain. It evolved out of a need to test and rapidly generate standardized feedback for iterating on anomaly detection algorithms.
Now in Public Beta – Log Search & Log Watch | The AppFirst Blog – The decision to open our new log applications to the public was not one taken lightly. Giving our customers the ability to search all of their log files for any keywords is quite taxing on our system, so we had to take several precautions. To ensure the reliability of our entire architecture, we decided to create a separate web server solely responsible for retrieving log data from our persistence storage HBase. By making this an isolated subsystem, we don’t run the risk of a potentially large query bogging everything else down as well.
Log Insight: Remote Syslog Architectures | VMware Cloud Management – VMware Blogs – When architecting a syslog solution, it is important to understand the requirements both from a business and a product perspective. I would like to discuss the different remote syslog architectures that are possible when using vCenter Log Insight.
Why We Need a New Model for Anomaly Detection: #1 | Metafor Software – Share on reddit Share on hackernews Share on email
I’m not talking about anomaly detection in stable enterprise IT environments. Those are doing just fine. Those infrastructures have mature, tested procedures for rolling out software updates and implementing new applications on an infrequent basis (still running FORTRAN written in the 70s, on servers from the 70s, yeah, that’s a thing).

I’m talking about anomaly detection in the cloud, where the number of virtual machines fluctuates as often as application roll outs. Current solutions for anomaly detection track dozens or even hundreds of metrics per server in an attempt to divine normal performance and spot anomalous behavior. An ideal solution would adapt itself to the quirks of each metric, to different application scenarios, and to machine re-configurations.

This is a problem that lends itself to machine learning techniques, but it’s still an incredibly difficult problem to solve. Why?
Beyond The Pretty Charts – A Report From #devopsdays in Austin | Metafor Software – Don’t just look at timeline charts. We’ve fallen into the trap of looking at all the pretty charts as time series charts. When we do that, we end up missing some important characteristics. For example, a simple histogram of the data, instead of just a time chart, can tell you a lot about anomalies and distribution. Using different kinds of visualization is crucial to giving us a different aspect on our data.
Server Anomaly Detection | Predictive IT Analytics | Config Drift Monitoring | Metafor Software – Know about problems before your threshold based monitoring tool does. Get alerted to issues your thresholds will never catch.
Metafor’s machine learning algorithms alert you to anomalous behavior in your servers, clusters, applications, and KPIs.

0 comments

Next Posts Previous Posts