Our MongoDB Pune User meet-up was a great success! Many of you expressed interest in learning MORE about the “Centralized Logging System Using MongoDB” as introduced by Webonise AVP for Engineering, Vivek Parihar, so we’re detailing everything here! Below is a quick summary:
- The Fundamentals
- What is logging
- Why we need Logging?
- Logging DOs and DON’Ts
- Logs are Streams, Not Files
- Disadvantages of Logging (Blog Exclusive!)
- The Challenges Associated with Logging
- Problems managing Logs for huge INFRA
- The Solution to These Challenges
- What Central Logging System can do for us?
- Central Logging System Architecture
- What and Why Fluentd is the Best Choice to Compliment MongoDB
- Why MongoDB is a good fit
- Disadvantages of MongoDB (Blog Exclusive!)
Logging refers to the act of keeping track of something. It is considered the most important part of any application. Why do we need logging?
- Logging helps to find and fix bugs. (It is actually extensively used for debugging.)
- Logging helps to diagnose and understand the behavior of an application.
- Logging can tell you all the specifics about what happened when, where and why
This is pretty much the same way that we deal with a “Living” bug. It may not be as easy to spot a small bug because, well… it is small, but if you do regular cleaning (or logging) you will spot and eliminate them before it turns into an issue. Whereas, if you let the bug “grow”, yes it’ll be easier to find them but it will take more effort and even complicated machinery (or software) to fix it.
Logging: Dos and Don’ts
- Don’t: Affect the UI. Logging should be FAST, but it should never affect the user.
- Do: Log only useful INFO, never unnecessary babble as presented below:
- Do: Differentiate Log Levels. This will guide you on the next best course of action depending on what your current Log’s level is. Refer to the photo below for the different log levels there are:
“Logs are streams, not files.” - Adam Wiggins, Heroku Co-Founder
It can be easy to assume that Logs are files, but they really are streams or better yet, time-ordered streams (as introduced by Adam Wiggins in this post). The beginning and end are non-existent. Logs are an ongoing, collated collection of events that can be viewed in real-time or in a later date.
Disadvantages of Logging
So, why don’t we do this all the time? Logging is only suitable when there is a huge INFRA running it. It might be too overkill for small apps that don’t have a steady stream of traffic and user related data.
The Challenges Associated with Logging
Logging is not always smooth sailing. Vivek shared that while logging can be highly beneficial, he has also encountered issues during implementation, such as the following:
1. If you have a huge INFRA then identifying and solving errors becomes a nightmare.
2. If your server is down, you can't debug or know what the reason is for the downtime.
3. If somebody breaks or hacks into the server, there’s a possibility that all security logs be erased. If we have a central log location then we can check who broke into the server, when it occurred and what was stolen from the server.
4. The biggest problem is increasing in the size of logs.
The Solution to These Challenges
What can Central Logging System do for us?
1. Log Collections
All of the logs are in one place. This makes things, like searching through logs and analysis across multiple servers, easier than bouncing around between boxes (greatly simplifying log analysis and correlation tasks).
Scaled-out servers behind load balancers each produce their own log files. This makes it impossible to debug a single action flow that is distributed between servers, unless the logs converge into a single article.
3. High Availability
Suppose your system is down or overloaded and unable to tell you what happened, you can still access data because it is within the INFRA.
Local logs from the server may be lost in the event of an intrusion or system failure. By having the logs in a centralized location, you have a much better chance of finding something useful on what happened.
5. Prevent Disk BLOAT
It reduces disk space usage and disk I/O on core servers that should be busy doing something else.
6. Visual Indicators
Abnormal behaviors can be detected faster when we see them in a visual instrument such as a graph, where peak points are easily noticed.
Central Logging System Architecture
What is Fluentd and Why is it the Best Compliment to MongoDB
Fluentd converts everything in JSON format. It’s like syslogd, but uses JSON for log messages. JSON is a kind of format for transferring the messages from one end to another. MongoDB is built to store data in JSON format, so Fluent is considered a complimentary software to aid MongoDB in implementing a centralized logging system. During the process of logging, Fluentd acts as a:
Besides the fact that its written in Ruby, it also has a HA (High Availability) configuration, its very easy to setup (setup can be done in three easy steps), the in stream processing capability is superb, it is immediately ready to analyze and it provides various options for reliable and robust transport.
Why MongoDB is a good fit
“Implementing a Centralized Logging System using MongoDB is not new, but not a lot of people know about this application.” Vivek Parihar stated. “A lot of people launch a SaaS service like Paper Trail and Loggly which prompts the system to send the data to another INFRA, a security risk which can leave crucial data exposed.”
Logs contain very crucial data (aka the very same data that reflects every action user perform on the system). Although MongoDB is not more secured per se, the main advantage is that it stores data within the INFRA so that making it secured is very easy.
Just like Fluentd, MongoDB is very easy to setup. Logs are converted into JSON format. It is scalable and easy to replicate. It is built with a capability for Capped Collections and Tailable Cursor.
If you are thinking about using MongoDB for Logging for the first time, you should first answer these questions regarding your system:
1. How many inserts per second can it support, which limits the event throughput?
2. How will the system manage the growth of event data, particularly concerning a growth in insert activity?
In most cases the best way to increase the capacity of the system is to use architecture with some sort of partitioning or sharding that distributes writes among a cluster of systems. Sharding is a well-known concept for scaling the system horizontally. It’s like dividing the system into chunks.
Disadvantages of MongoDB
MongoDB lacks some features of relational database like transaction or join, but it gains the ability to scale easily and has a flexible schema that is easy to manipulate with JSON like data format.
Can you provide a comparison between Fluentd, Flume and Logstash?
“Basically we utilize Fluentd to collect data from different sources. The main advantage of using Fluentd is its HA(High Availability) configuration. Let’s also not forget about the fact that its very easy to setup. Flume and Logstash aren’t easy to setup.” - Vivek Parihar, Webonise AVP-Engineering
We were very delighted with the response we received from the audience during the meetup. There were questions of further implementation about this application and even inquiries for a workshop, which we are highly considering.
Would you also be interested for a workshop on building a centralized logging system using MongoDB? Or, perhaps further information about this topic is what you seek? Let us know in the comments section!