One of the problems with attending sessions after the first day is that they often are additional details from the first day. Much of what I would state about the following days were covered in my previous post. The sessions I attended:
What's the Latest in Logstash?
Andrew Cholakian, Software Engineer, Elastic
Suyog Rao, Logstash Team Lead, Elastic
Jordan Sissel, Logstash Creator & Tech Lead, Elastic
This talked surprised me. I really wanted to go with "Get the Lay of the Lucene Land." I went to this talk because we are currently using Logstash on much of our data. While I am interested in Lucene, I would be a newbie with it. The practical side of me won out and I went to Logstash. Key points:
- Persistent Queue to avoid data loss. Will be disabled by default. It will use storage space (disk) which you set. Set checkpoint every so many writes.
- Management and monitoring: APIs developed . Tell logging through URL. Integration with x-pack. Monitoring.
- The Dissect Filter: Dissect is less than Grok. Benchmark shows over twice as fast.
- With new stats API in logstash can pull microseconds. Shows 5 times faster (other graph shows end to end).
- Kafka plugin: support 4 different versions of Kafka. That pretty amazing because Kafka has been changing alot.
- Pipeline visualization: Pretty impressive looking.
- LIR. Enabling LIR in core. In progress. Pipeline visualizer in progress. Java pileine execution is next. New config languages: planning.
- Centralized management. Elasticsearch as a remote config store. Manage configuration via UI. Group multiple logstash under roles.
- Changes will be able to be rolled back to previous version also.
- Adding puppet like ability to Logstash, though Logstash will pull (not be pushed to). After change and logstash automatically pulls it down. Refresh is set. Version control and configuration management.
- You can change out the Kibana icons with any emoticon you want.
- JDBC Lookup. Can pull data out from DBs. Can store in memory and then do local lookups.
- Creating Custon Plugins: JRuby. Want to expand to other languages like scala, clojure, and java to help expand logstash plugin development.
What's X-citing in X-pack
This kind of had the whole a-team of Elastic:
Chris Earle, Monitoring Lead, Elastic
Mark Harwood, Software Engineer, Elastic
Shaunak Kashyap, Software Engineer, Elastic
Jay Modi, Security Engineer, Elastic
Alexander Reelsen, Software Engineer, Elastic
- X-Pack: 5.1: search profiler introduced for breaking down the timing with search and aggregation timing. Deeper integration.
- Monitoring: Kibana monitoring added advanced node and index views (5.1). Logstash monitoring (5.2), Cgroup (container) metric display for Elasticsearch (5.2).
- Monitoring: proactive, automatic notification of problems via Watcher. Find bottlenecks in your Logstash nodes and plugins. Machine learning integration. Beats integration. Machine learning will be interesting especially in relation to proactive Watcher.
- Security: 5.0: CLI to generate certificate. It is important, even if not exactly exciting. Talking on keys and consistent responses. That always good.
- 6.0: will require TLS for node to node transport. I would like to hear about performance of the encryption. Passwords will go away from config files. 5.3: a secure storage mechanism for pass phrases. Store encrypted.
- Single Sign on. SAML, Kerberous.
- Reporting: More layout options.
- Additional output formats: CSV export.
- Alerting: presernt: execution is on master node. Move to distrbuted atch execution to data nodes (not just master node).
- Future: structure of a single watcher is too static. in/out -> condition -> action. Future keep state between watch execution. Add conditions. Make core execution async.
- JSON body editor for watchers. Add different kind of editors. Example for threshold. Testing it with simulated results.
- Graph: amazed at how few people held up their hand as to "how many people know what Graph is." So we have to listen to what graph is.
- I did not follow how he got to his conclusion. He started with Graph, ran a python script, then went to graph, showed clustering, and then he switched to time visualization. Concluded it was one guy posting all these certain things to raise his ranking. Pertinent in respect to someone hacking. I did not follow how he did it and will need to watch the video. But he demonstrated integrating activity into other graphics/visualization as part of the URL holding all the information. Tight integration. I like it.
- Future: more details behind connections, more perspectives. That so needed.
- adjacency_matrix aggregation. Coming in 5.3. Which is nice. Graph over time (swim line visualization).
I attended both sessions:
Machine Learning in the Elastic Stack
Sophie Chang, Machine Learning Team Lead, Elastic
Steve Dodson, Machine Learning Tech Lead, Elastic
Machine Learning and Statistical Methods for Time Series Analysis
Steve Dodson, Machine Learning Tech Lead, Elastic
With the Prelert talks, it is tough. Machine learning is very mathematical. Not that easy to make understandable. Even worse for me to write up. Basic ideas:
- How to determine: past behavior, predict future, use predictions to make decisions.
- Create a Job: option: single metric job, multiple metric job, advanced job.
- aggregation, field, and bucket span. Graphs like a sign wave.
- "Statistical models are created" So probabilistic statistical models.
- max severity, detector, actual, typical, Prob score between 0 to 100. Configure Watcher.
- There will be profiling based on user and machines based on data input. I later asked about memory and was assured that have handled large amounts with no issues. I did not write it down, but believe it was in hundreds of thousands range.
- Integration on backend. Advantages of being in cluster. Close to data. Efficiency. Can provide context to abnormally.
- Will be a service in the cloud also.
- Load balancing analysis using persistent tasks. Results written to index.
- Will be in 5.4. May time frame.
I have some images:
The above shows the screen for a single field analysis.
Anomaly timeline screen.
Anomaly detection with timeline displayed.
View of taking unstructured log messages and clustering by similar messages for classification.
Adding a new detector.
Looking at the timeline verses anomaly detection.
Timeline, events, along with severity to detect anomalous activities.
Switching over to profiling an entity. In this case, it is a client. Once more, memory should not be an issue. Hundred of thousands should be possible. they did discuss a whole mechanism by which work would be distributed across the nodes. In the end, it should not burden your cluster. I state that without seeing it in action. The nice thing with Elastic, mostly you talk with the engineers and they tend to be straight. Salespeople are wonderful. I am just saying some companies sales people will tell all sorts of things. That not the MO for Elastic.
Entity profile with creation of a profile based on status code responses for a typical client.
One more talk:
A Standard Query Language for Elasticsearch Elasticsearch SQL
Costin Leau, Software Engineer, Elastic
Costin Leau did a fine job. While it might have gone in depth and demoed more, the ideas were pretty much covered yesterday. My notes basically stop as I just watched what Costin was doing.
On the final day, I started with a discussion with one of the Elastic engineers on the road map. He did a fine job, but at this point in the conference we had heard what he had to say. Mostly it was an opportunity to ask questions. I asked about how many entities Prelert could handle and was told hundred of thousand (if I recall correctly). I asked if once learned, Prelert would use Watcher to send out in near-real time. I was told it would.
I attended the Security@Slack talk by Nate Brown, Developer at Slack and Ryan Huber, Slack Security. They did a fine job. At this point, we had heard most of what there was to hear about Elastic. It was interesting to hear about Slack and Security.
We ended up talking a bunch with folks. We talked with Thomas Davis, Architect & Project Lead,National Energy Research Scientific Computing Center and Cary Whitney, Computer Scientist, National Energy Research Scientific Computing Center. They were great. Showed us some of the work they had done and viewing some near-real time results from their computer room using metrics in Grafana. they did the talk, "The Hotel NERSC Data Collect: Where Data Checks In, But Never Checks Out." The NERSC data collect system is designed to provide access to 30TB of logs and time-series data generated by the supercomputers at Berkeley Lab. We were particularly interested in hearing about their archiving on high disk capacity nodes using generic hardware. We discussed using Elasticsearch as a large, long term data storage engine, including index allocation tagging, use of index aliases, Curator and scripts to generate snapshots, long term archiving of these snapshots, and restoration. We also talked about times series data and use of metrics data in addition to event data.
There was also discussions with two gentleman from department of the interior concerning setting up an Elastic user group. Particular focus on issues we face. While we discussed transitioning from Splunk to Elastic, we found alot of common experiences. Splunk has to makes sales. they can be very aggressive. There is no issue there. That is business. They will follow playbook for those thinking to leave the flock. It has been interesting. During a lunch Monday, I happened to sit next to a gentleman whose company was considering transitioning. The companies are different. Into the discussion, I attempted to bring in the Elastic instructor. While the instructor was knowledgeable in Elastic, his experience ended there. Theoretically he could address Splunk's differences, but he had not worked them. Nor had he worked security operations. Part of the discussions we had with the two gentlemen from DOI was about how to assure management of what to expect. Also, how to work the support and funding angle from higher management. Let us face facts, Elastic is relatively new. New makes management, especially non-technical, nervous. He helps to have the facts. Discuss companies that are using Elastic. How are they using it. Partnerships that can be tapped into. All these points. Also prepared them for how Splunk will aggressively try to persuade higher management to stay. In Monday's talk, I told the gentleman he likely should stay with Splunk. I can actually make better arguments than Splunk sale people made to us, because it really does depend on how you are using the product. I think many organizations can benefit from having both products. In the end, both products are full text search engines. It really does comes down to what will you use the product for and what resources will you be able to put into the product. That answer will vary greatly between organizations. I can argue both sides. That is why having a user group and discussing these issues is a great idea. Hopefully share some of the experiences as they develop their environments.
That about covers the sessions I was able to attend. When they post the videos, I will make sure to let you all know.