These are my links for March 25th through June 18th:
- OpenStack LumberJack – Part 1 rsyslog | Professional OpenStack – Logging for OpenStack has come quite a ways. What I’m going to attempt to do over a few posts, is recreate and expand a bit on what was discussed at this last OpenStack Summit with regard to Log Management and Mining in OpenStack. For now, that means installing rsyslogd and setting it up to accept remote connections.
- rsyslog.conf file –
- FailoverSyslogServer – rsyslog wiki –
- How to configure failover for rsyslog in Red Hat Enterprise Linux 6? – Red Hat Customer Portal –
- Introducing the Solr Scale Toolkit | SearchHub | Lucene/Solr Open Source Search –
- Highly Available ELK (Elasticsearch, Logstash and Kibana) Setup | Everything Should Be Virtual –
- Logstash configuration dissection –
- Splunk Introduces Splunk Enterprise 6.1 – Enabling the Mission-critical Enterprise Multi-site Clustering: Delivers continuous availability for Splunk Enterprise deployments that span multiple sites, countries or continents by replicating raw and indexed data in a clustered configuration. Search Affinity: Provides a performance increase when using multi-site clustering by routing search and analytics requests to the nearest cluster, increasing performance and decreasing network usage. zLinux Forwarder: Allows for application and platform data from IBM mainframes to be easily collected and indexed by Splunk Enterprise. Data Preview with Structured Inputs: Enables previewing of massive data files to verify alignment of fields and headers before indexing to improve data quality and the time it takes to discover critical insights.
- Streamlining application logs collection on AWS Elastic Beanstalk with logstash – part 1 | Mob in Tech – However, we like to experiment things, so I decided to try the home made solution for the backend of our new upcoming mobile game. Our backend is a homebrewed Java REST webservices application hosted in an Elastic Beanstalk container, in the us-east-1 region. The final goal is to gather logs from all instances of the Java application into a local (Paris) Elastic Search database, in a scalable manner. In this case, scalable means for us: every single step of the data pipeline has to be horizontally scalable, meaning we can speed up the process by adding additional capacity at each step independently.
- How to Pre-Process Logs with Logstash: Part III of “Scalable and Robust Logging for Web Applications” ← #workHard / partyHard – This article is an introduction on how to pre-process logs from multiple sources in logstash before storing them in a data store or analyze them in real time. Some common use cases are unifying time formats across different log sources, anonymizing data, extracting only interesting information from the logs as well as tagging and selective distribution.
- Building an Activity Feed System with Storm – Programming – O’Reilly Media – Problem You want to build an activity stream processing system to filter and aggregate the raw event data generated by the users of your application. Solution Streams are a dominant metaphor for presenting information to users of the modern Internet. Used on sites like Facebook and Twitter and mobile apps like Instagram and Tinder, streams are an elegant tool for giving users a window into the deluge of information generated by the applications they use every day.
- Wirbelsturm: 1-Click Deployments of Storm and Kafka clusters with Vagrant and Puppet – Michael G. Noll – I am happy to announce the first public release of Wirbelsturm, a Vagrant and Puppet based tool to perform 1-click local and remote deployments, with a focus on big data related infrastructure. Wirbelsturm’s goal is to make tasks such as “I want to deploy a multi-node Storm cluster” simple, easy, and fun. In this post I will introduce you to Wirbelsturm, talk a bit about its history, and show you how to launch a multi-node Storm (or Kafka or …) cluster faster than you can brew an espresso.
- RapidEngines Application Analytics – We provide the worlds fastest, most flexible and most scalable time series data platform. Delivered as software or a cloud service to help you visualize and detect application performance events before they impact your business.
- SevOne Acquires Log Analytics Provider RapidEngines | Business Wire – SevOne, the leader of scalable performance monitoring solutions to the world’s most connected companies, today announced it has acquired RapidEngines, a leading provider of highly scalable log analytics software for IT enterprises, service providers and application developers. The acquisition is the first from SevOne since closing the $150M investment from Bain Capital which remains one of the largest venture financings of 2013. SevOne’s large customer base will now have access to RapidEngines’ log analytics software granting users the benefit of automatically collecting and organizing log data to better provide a detailed picture of user and machine behavior.
- Google Cloud Platform Blog: A New Logs Viewer for Google Cloud Platform – Today we are excited to announce a significantly updated Logs Viewer for App Engine users. Logs from all your instances can be viewed together in near real time, with greatly improved filtering, searching and browsing capabilities. This release includes UI and functional improvements. We’ve added features that simplify navigation and make it easier to find the logs data you’re looking for.
- About | LOGSEARCH – What started out as an internal development project from within City Index was soon after released as an open source project for all to benefit. City Index realised the potential value of the information available to them in the log files and required a flexible solution to not only view the log files but rather to view and cross analyse them.
- Approaches to Indexing Multiple Logs File Types in Solr and Setting up a Multi Node, Multi Core Solr Cloud – Apache Solr is a widely used open source search platform that internally uses Apache Lucene based indexing. Solr is very popular and provides a database to store indexed data and is a very high scalable, capable search solution for the enterprise platform. This article provides a basic vision for a single and multi-core approach to indexing and querying multiple log file types in Solr. Solr indexes the log files generated by the servers and allows searching the logs for troubleshooting. It has the capability to scale to work in a multi-node cluster set up in a distributed and fault tolerant manner. These capabilities are collectively called SolrCloud. Solr uses Zookeeper for working in a distributed manner
- Introducing Morphlines: The Easy Way to Build and Integrate ETL Apps for Hadoop | Cloudera Developer Blog – Morphlines can be seen as an evolution of Unix pipelines where the data model is generalized to work with streams of generic records, including arbitrary binary payloads. A morphline is an efficient way to consume records (e.g. Flume events, HDFS files, RDBMS tables, or Apache Avro objects), turn them into a stream of records, and pipe the stream of records through a set of easily configurable transformations on the way to a target application such as Solr, for example as outlined in the following figure: In this figure, a Flume Source receives syslog events and sends them to a Flume Morphline Sink, which converts each Flume event to a record and pipes it into a readLine command. The readLine command extracts the log line and pipes it into a grok command. The grok command uses regular expression pattern matching to extract some substrings of the line. It pipes the resulting structured record into the loadSolr command. Finally, the loadSolr command loads the record into Solr, typically a SolrCloud. In the process, raw data or semi-structured data is transformed into structured data according to application modelling requirements.
- Pivotal CF 1.1 Advances Enterprise PaaS with New Capabilities | Pivotal P.O.V. – What’s new in Pivotal CF 1.1:
Improved app event log aggregation – developers can now go to a unified log stream for full application event visibility (Watch) and drain logs to a 3rd party tool like Splunk for analysis (Watch)
- elasticsearch-curator 1.0.0 : Python Package Index – Tending your time-series indices in Elasticsearch