Why not take the world’s fastest logger and compare it to G3log. It’s a fun comparison and might show some PRO and CONs with each.
This comparison will show that being the world’s fastest logger is not good enough when it comes to latency. G3log wins hands down.
Does it sound like a contradiction? It’s not. Read on …
I don’t actually know that spdlog is the fastest logger in the world. It’s just way faster than many other loggers I have seen. Let’s look at the author, Gabi Melman’s own comparison and then put a reality check on it.
I’m obviously just as biased as Gabi Melman is regarding performance testing the loggers. So before you make up your mind. Test these loggers for yourself. Try out both the average case and the worst case latency case. Play with c0mpiler options, try to write on a SSD vs a normal HDD.
Why spdlog is the fastest logger according to Gabi Melman
Spdlog is a header only library which in it’s core uses a famous lock-free queue by Dmitry Vyukov. As in many asynchronous communication schemas the queue or message sending protocol is the key behind the speed, latency or lack thereof.
Let’s see some numbers; A snapshot from Gabi Melman’s own comparison.
Pretty impressive. Isn’t it? Let’s take a look at how it compares as an asynchronous logger. After all, asynchronous loggers are usually to prefer since they reduce the high latency you can see in to synchronous loggers.
Wow! It completely blew the competition away on the total time to log 1 million log entries. Impressive. It also hurt a little (since I wrote g2log some years ago). Ouch.
Now let’s take Gabi’s testing apart and see if spdlog is really as good as this testing showed. You will see that the test results above are also hiding some ugly facts that might be very, very bad for your system.
Now to the good stuff
In the test for logging 1 million log entries the queue size that the spdlog is using is 1,048,576 (one million 48 thousand five hundred and seventy six, or 2^20). Spdlog uses a fixed size queue; the fixed sized queue and the limit to log 1 million entries the test minimizes the queue contention. If you change the queue size to be 2^19 : 524,288 the contention will increase and the performance will also be worse and the Worst Case Latency will be terrible.
Even so, even with decreased queue size spdlog is fast in the average case. However, it comes at a price and that price is, as you might already have guessed, “Worst Case Latency“.
Worst Case Latency
Worst case latency is the highest latency (read: wait time) from that the code does a LOG statement until it is ready for the next code statement. I.e. it’s the maximum time it takes for the logger to get all the log details and put it onto a queue for another thread to deal with.
The “Worst Case Latency” is almost the sole reason why I decided that Google’s glog wasn’t good enough . It’s almost the sole reason why I deemed all other loggers, at that time, to be lacking and why I had to create g2log.
Google’s glog is what I (see previous link) called a pseudo-asynchronous logger. It used a buffer so that most log entries are buffered. Only once in awhile does it get written to file. This makes glog an efficient logger compared to, for example, boost::log. Of course, once in awhile one log entry pays the penalty for writing all of the previous buffered log entries to file. The average speed will be good at the cost of having bad worst case latency.
When I wrote g2log I made a comparison with Google’s glog and how the worst case latency would look if plotted. Here you can see that same comparison
As you see the two loggers are fairly equal. Actually at the time of writing glog (red) seems to be faster for most log entries. Then there’s this red fuzz at the bottom, far right. This red fuzz is the log entries using Google’s glog that paid the price for all the other log entries. There are only a few of these but they ranged from around 300 millisecond to closer to a second.
This makes Google’s glog unfit for many systems that are safety critical or mission critical. Imagine a missile warning system that has a “hickup” of 500 milliseconds? Or a how such a hick-up can affect software systems when involved in brain surgery such as a radiation knife, or heart surgery or an air traffic control system, or an air traffic collision detection and avoidance system? What system have you used that are high performing and only cares for the “average case” and not for the “worst case”? They sure exist but I have never worked with one in all of my professional C++ career.
For all the systems that I have worked with it is no good if the average case is awesome if the worst case latency is terrible.
Worst Case Latency: spdlog vs G3log
Let’s look at some speed comparisons then, shall we? Let’s measure each LOG call and save them into microseconds buckets.
A word of comparison caution. Depending on compiler options performance results might differ. I ran with the following compiler options. “-O3 -fPIC -Ofast -m64 -march=native” and”-O3 -march=native“. Spdlog seemed to run slightly better if running consecutive runs, if the sun was shining and we had meatballs for dinner (if we didn’t then for some reason it ran slower).
Scenario 1: loggers writing a total of 1 million logs over 10 threads
This test is in spdlog’s favor as the number of logs is less than maximum queue size (1,048,576). An atomic counter is used to communicate across threads if it’s OK to write a log entry. Once the 1 million mark is reached the threads will exit.
The test code with atomic counter can be found here: spdlog test code, g3log test code.
Results can be found here and in the chart below. Average times were pretty much the same between the loggers (g3log was slightly faster). When playing around with the testing I sometimes got quite faster average times for spdlog but usually with very bad outliers in the graph.
Ouch. Around 10-11ms Worst Time Latency for spdlog. For some systems this type of hiccup is OK. For other systems a factor 10 decrease of performance, once in awhile, is unacceptable.
Try two: compile options : “-O3 -march=native”
When compiling with less optimization spdlog turned from staccato speedster to a Elon Musk style rocket. For the same scenario as above, 10 threads that logs in total 1 million log entries will then look something like this (yet again focusing on the worst wase latency): graph as seen below and also with results here. Spdlog didn’t keep this consistent over tests though. At times it reverted back to the 12,000 us (or more) for the worst case latency; however most runs it was fairly consistent. G3log was consistent for all runs.
Nicely done spdlog. It is worth noting that g3log moves a little slower when optimizing less. Spdlog on the other hand has a factor 10 improvement when the optimization changed. Worst case latency decreases from around 11,000 us to 1100 us. Unfortunately this wasn’t consistent, as you will see it sometimes reverted back to the sluggish worst case performance outliers.
Scenario 2: loggers writing a total of 1 million logs over 100 threads
With this scenario the contention to access the message queue is increasing but the test is still in spdlog’s favor since the test does not exceed the queue capacity.
The spdlog results varied greatly. Most runs spdlog had its worst time latency around 1000 us, (1ms) but sometimes it increased with a factor 10+. I.e. +10,000 us (10 ms). G3log was steadily around 500 – 600 us as worst case latency.
This discredits my initial thought that the optimization change had a real affect on spdlog. Spdlog is definitely a bit temperamental. I continued to test this and Oh, and Behold, on one of my test runs the spdlog got less than 100us across all threads as worst case latency. That happened once. Unfortunately I didn’t save the result and I have not been able to replicate it.
Scenario 3: 10 threads, each thread writing 1 million log entries
With this scenario the contention of accessing the queue is kept up for a longer time. This scenario is not in spdlog’s favor. Make no mistake however, this is a very challenging test also for g3log.
The source code (g3log, spdlog) for this test is almost identical to the one given previously. We skip the atomic counter; each thread will keep track of where it’s at. After the thread has logged 1 million log entries it will exit.
At this point I felt like mixing it up a bit. I tested both loggers with both compiler options. I chose the compiler option that got best result for each logger. For g3log it was “-O3 -fPIC -Ofast -m64 -march=native” for spdlog it was once again “-O3 -march=native“.
The result showed that worst case latency was consistent for spdlog at around 10,000 us, for g3log 400us to 1,000us. Results can be found here.
Scenario 4: spdlog only, 10 threads, each thread writing 1 million log entries. Queue size of 2^19 instead of 2^20;
For fun I also ran one test with the setup for spdlog to be 2^19, i.e. 524288 compared to its default of 2^20, i.e. 1,048,576).
int queue_size = 524288; // 2 ^ 19 spdlog::set_async_mode(queue_size);
In the same test with extreme contention on the smaller queue the worst case latency became a huge bottleneck. In a scenario such as this spdlog might benefit from instead moving to a mpmc dynamically sized lock-free queue, like MoodyCamel’s queue.
I won’t plot the result since it would be silly to compare this in a chart with g3log. I’ll just put down the worst case latency numbers for you. The average case is still not terrible but spdlog’s worst case latency on the threads are troubling. Close to 8 seconds as worst case latency when the queue contention got real bad. Ouch.
This result couldn’t be saved in microsecond format to Google docs due to the size so I’ll just paste in the summary of the result below.
Using 10 to log in total 1000000 log entries to noatomic-smallqueue-1-10
0 the worst thread latency was:7900 ms (7900444 us)]
1 the worst thread latency was:8 ms (8026 us)]
2 the worst thread latency was:6489 ms (6489388 us)]
3 the worst thread latency was:4 ms (4271 us)]
4 the worst thread latency was:7898 ms (7898381 us)]
5 the worst thread latency was:9 ms (9893 us)]
6 the worst thread latency was:5047 ms (5047866 us)]
7 the worst thread latency was:6 ms (6219 us)]
8 the worst thread latency was:436 ms (436103 us)]
9 the worst thread latency was:2934 ms (2934326 us)]
Total time :9375 ms (9375954 us)
Average time: 9.37595 us
If you don’t use spdlog right and try to optimize the queue (and do a bad job at it) and some thread runs havoc with log entries, even for a short burst, you can expect wait times close to 8 seconds. Can you afford that? Can your high efficiency life saving or mission critical system afford that?
Opinion: Fixed Sized Queue vs Dynamic Sized Queue
When I started testing spdlog, running exactly the tests that Gabi Melman used then sometimes the logger produced fantastic results. I was very pleased to see that as I think we need more asynchronous loggers. Spdlog is extremely well made. Gabi’s C++ skills are top notch and he has created a speed marvel. I was very curios to see how the worst case latency result would be so that’s why I made this blog entry.
In the end, it mostly boils down to the choice of queue. My strong opinion is that it is rarely a good choice to go with a fixed sized queue if you can avoid it. If you choose a fixed sized queue you better have a good way of dealing with a full queue.
The systems that need fixed sized queues are often embedded systems with scarce memory resources. Such systems are also very sensitive to worst case latency. It is my opinion that it is a bad, bad, bad idea to stall and wait in a contention scenario for embedded systems. It is likely better to then pitch all logs until contention goes down (and then you log a warning so that the contention issue was seen).
Of course, fatal logs, should never be pitched. These are the last that will be seen before the system goes down. That is if, the logger, or other part of the system will catch a fatal event (g3log does)
If you are not on an embedded system then either skip logger’s with fixed sized queues or make sure you can handle the potential queue contention if something goes awry.
Of course, system logs should never log as frequent as I have done in the tests above. If you do then it is way too much information to be of any use anyhow. Keep your logging to a minimum.
The tests were made on Macbook, 2.8Ghz Intel Core i7, 16 GB 1600 MHz DDR3 with a 1TB flash disk storage.
c++ –version gave: Apple LLVM version 6.0 (clang-600.0.57)
Spdlog uses a lock-free queue which is the reason why it often is blazingly fast.
For fun I tried a lock-free queue for g3log that doesn’t follow the strict FIFO requirements of the current lock-based one. FIFO ordering is kept only on a per-thread basis. Beware: If this would be employed on g3log then it could limit the logs you see before a fatal event is caught. (The fatal entry is the last one that is seen)
I have tried MoodyCamel’s lock-free queue queue and it worked great with g3log. I used the simplest API for the queue and not the most efficient; my testing skipped thread tokens and bulk enqueue/dequeue. Using these enhanced API would complicate things but would also speed it up even more.
Using the lock-free queue made it overall much, much faster, except in the Worst Case Latency scenario. To me, that’s what matters so I’ll stick with Anthony Williams mutex protected queue [g3log version, Williams blog] for now.
If you want to test the experimental faster G3log version, you can see the gist here and just go ahead and use it. Beware of the changes to the fatal handling.
Please notice that Moody Camel’s queue is not public domain, see the gist for license details.
Extra: G3log Fatal Handling
G3log detects most fatal crashes (the infamous segmentation fault to mention one) and logs the fatal event with a stack trace as its last message before it exits (Windows/OSX/Linux). If you decide to go with another asynchronous logger then feel free to rip out the crash handling in g3log and use it.
My opinion is that any asynchronous logger needs a good crash handling, or else the developers will quickly be frustrated with the lack of feedback when the system is suffering.
G3log is unlicensed, i.e. it is published as an public domain dedication so you can borrow, copy, use any parts of it with no obligations towards me.
Of course, I really appreciate if you drop me a line and tell me if you like it, or not like it, and suggestions for improvements. You can reach me at Hedstrom at KjellKod dot cc.
My results showed that g3log was overall consistent with predictable, fairly low worst case latencies. My results also showed that spdlog was inconsistent, differed hugely between runs, with comparably unfavorable worst case latencies.
The tests results showed that in completely terrifying nightmare scenarios the g3logger was still consistent with low worst case latencies while the spdlogger exposed itself with many seconds worth of wait time.
I am obviously just as biased as Gabi Melman is regarding performance testing the loggers. So before you make up your mind. Test it for yourself.
See the compiler optimizations made, think also hard about the scenarios your logger might run into.
- Is the average speed what is the most important?
- Can you live with the worst case latency that the logger exhibits.
- What is the worst case latencies in performance scenarios you are likely to encounter? (test them!)
- What compiler options are you using?
- Can you ever get into a scenario where there is heavy log contention — past a fixed queue size?
- What is worst in an outlandish extreme nightmare scenario with logging statements running amok on your system? :
1) the queue eating up RAM until the process crashes but the log statements are fast
2) the queue is fixed size but the log statements can take many seconds to complete
3) Also, which one of 1) and 2) are you most likely to detect and correct?
The scary scenarios above are most likely just that, something extreme that might only matter in a philosophical discussion.
My view on the matter is that I rather pick a very fast asynchronous logger with stabile worst case latency, even when the contention is high (g3log) compared to a potentially even faster asynchronous logger with unstable worst case latencies that might peak at seconds wait time.
The result for all the testing can be found here.