The world’s fastest logger vs G3log

[Shameless Plug Warning] :
You have until August 31st, 2017 to try out NetMon and participate in LogRhythm’s Network security contest. Win up to $18,000 when applying your scripting skills to detect network vulnerabilities.  See https://logrhythm.devpost.com/ for more information.

Why not take the world’s fastest logger and compare it to G3log.  It’s a fun comparison and might show some PRO and CONs with each.

This comparison will show that being the world’s fastest logger is not good enough when it comes to latency. G3log wins hands down.

Does it sound like a contradiction? It’s not. Read on …


I don’t actually know that spdlog is the fastest logger in the world. It’s just way faster than many other loggers I have seen.  Let’s look at the author, Gabi Melman’s own comparison and then put a reality check on it.

You can find his logger with comparison here and my copy of it here (in case it disappears). Gabi’s actual benchmark code can be found here.

I’m obviously just as biased as Gabi Melman is regarding performance testing the loggers.   So before you make up your mind. Test these loggers for yourself. Try out both the average case and the worst case latency case. Play with c0mpiler options, try to write on a SSD vs a normal HDD.

Why spdlog is the fastest logger according to Gabi Melman

Spdlog is a header only library which in it’s core uses a famous lock-free queue by Dmitry Vyukov. As in many asynchronous communication schemas the queue or message sending protocol is the key behind the speed, latency or lack thereof.

Let’s see some numbers; A snapshot from Gabi Melman’s own comparison.

Gabi-spdlog-performance-synchronous

Pretty impressive. Isn’t it?  Let’s take a look at how it compares as an asynchronous logger. After all, asynchronous loggers are usually to prefer since they reduce the high latency you can see in to synchronous loggers.

gabi-spdlog-asynchronous-comparison

Wow! It completely blew the competition away on the total time to log 1 million log entries. Impressive.  It also hurt a little (since I wrote g2log some years ago). Ouch.

Now let’s take Gabi’s testing apart and see if spdlog is really as good as this testing showed. You will see that the test results above are  also hiding some ugly  facts that might be very, very bad for your system.

Now to the good stuff

In the test for logging 1 million log entries the queue size that the spdlog is using is 1,048,576  (one million 48 thousand five hundred and seventy six, or 2^20). Spdlog uses a fixed size queue;  the fixed sized queue and the limit to log 1 million entries the test  minimizes the queue contention.  If you change the queue size to be 2^19 : 524,288 the contention will increase and the performance will also be worse and the Worst Case Latency will be terrible.

Even so, even with decreased queue size spdlog is fast in the average case. However, it comes at a price and that price is, as you might already have guessed,  “Worst Case Latency“.

Worst Case Latency

Worst case latency is the highest latency (read: wait time) from  that the code does a LOG statement until it is ready for the next code statement. I.e. it’s the maximum time it takes for the logger to get all the log details and put it onto a queue for another thread to deal with.

The “Worst Case Latency” is almost the sole reason why I decided that Google’s glog wasn’t good enough . It’s almost the sole reason why I deemed all other loggers, at that time, to be lacking and why I had to create g2log.

Google’s glog is what I (see previous link) called a pseudo-asynchronous logger. It used a buffer so that most log entries are buffered. Only once in awhile does it get written to file. This makes glog an efficient logger compared to, for example, boost::log. Of course, once in awhile one log entry pays the penalty for writing all of the previous buffered log entries to file. The average speed will be good at the cost of having  bad worst case latency.

When I wrote g2log I made a comparison with Google’s glog and how the worst case latency would look if plotted.  Here you can see that same comparison

g2log-glog-comparison-2threads_each1million-zoomed_in

As you see the two loggers are fairly equal. Actually at the time of writing glog (red) seems to be faster for most log entries. Then there’s this red fuzz at the bottom, far right. This red fuzz is the log entries using Google’s glog that paid the price for all the other log entries.  There are only a few of these but they ranged from around 300 millisecond to closer to a second.

This makes Google’s glog  unfit for many systems that are safety critical or mission critical.  Imagine a missile warning system that has a “hickup” of 500 milliseconds?  Or a how such a hick-up can affect software systems when involved in brain surgery such as a radiation knife, or heart surgery or an air traffic control system, or an air traffic collision detection and avoidance system? What system have you used that are high performing and only cares for the “average case” and not for the “worst case”? They sure exist but I have never worked with one in all of my professional C++ career.

For all the systems that I have worked with it is no good if the average case is awesome if the worst case latency is terrible.

Worst Case Latency: spdlog vs G3log

Let’s look at some speed comparisons then, shall we? Let’s measure each LOG call and save them into microseconds buckets. 

A word of comparison caution. Depending on compiler options performance results might differ. I ran with the following compiler options. “-O3 -fPIC -Ofast -m64 -march=native” and”-O3 -march=native“. Spdlog seemed to run slightly better if running consecutive runs, if the sun was shining and we had meatballs for dinner (if we didn’t then for some reason it ran slower).

Scenario 1: loggers writing a total of 1 million logs over 10 threads

This test is in spdlog’s favor as the number of logs is less than maximum queue size (1,048,576). An atomic counter is used to communicate across threads if it’s OK to write a log entry. Once the 1 million mark is reached the threads will exit.

The test code with atomic counter can be found here: spdlog test code, g3log test code.
Results can be found here and in the chart below.  Average times were pretty much the same between the loggers (g3log was slightly faster). When playing around with the testing I sometimes got quite faster average times for spdlog but usually with very bad outliers in the graph.

Try one: compile options for both loggers “-O3 -fPIC -Ofast -m64 -march=native”. 1 million logs over 10 threads.
spdlog_vs_g3log_atomiccounter_03++optimization

Ouch. Around 10-11ms Worst Time Latency for spdlog. For some systems this type of hiccup is OK.  For other systems a factor 10 decrease of performance, once in awhile, is unacceptable.

Try two:  compile options : “-O3  -march=native”

When compiling with less optimization spdlog turned from staccato speedster to a Elon Musk style rocket.  For the same scenario as above, 10 threads that logs in total 1 million log entries will then look something like this (yet again focusing on the worst wase latency): graph as seen below and also with results here. Spdlog didn’t keep this consistent over tests though. At times it reverted back to the 12,000 us (or more) for the worst case latency; however most runs it was fairly consistent. G3log was consistent for all runs.
spdlog_vs_g3log_atomiccounter_03onlyOptimizationNicely done spdlog. It is worth noting that g3log moves a little slower when optimizing less. Spdlog on the other hand has a factor 10 improvement when the optimization changed. Worst case latency decreases from around 11,000 us to 1100 us. Unfortunately this wasn’t consistent, as you will see it sometimes reverted back to the sluggish worst case performance outliers.

Scenario 2: loggers writing a total of 1 million logs over 100 threads

With this scenario the contention to access the message queue is increasing but the test is still in spdlog’s favor since the test does not exceed the queue capacity.

The spdlog results varied greatly. Most runs spdlog had its worst time latency around 1000 us,  (1ms) but sometimes it increased with a factor 10+. I.e. +10,000 us (10 ms). G3log was steadily around 500 – 600 us as worst case latency.

This discredits my initial thought that the optimization change had a real affect on spdlog. Spdlog is definitely a bit temperamental. I continued to test this and Oh, and Behold, on one of my test runs the spdlog got less than 100us across all threads as worst case latency. That happened once. Unfortunately I didn’t save the result and I have not been able to replicate it.

Results can be found here, and chart here.

spdlog_vs_g3log__100threads_1millionlogs_atomiccounter

Scenario 3: 10 threads, each thread writing 1 million log entries

With this scenario the contention of accessing the queue is kept up for a longer time. This scenario is not in spdlog’s favor.  Make no mistake however, this is a very challenging test also for g3log.

The source code (g3log, spdlog) for this test is almost identical to the one given previously. We skip the atomic counter; each thread will keep track of where it’s at. After the thread has logged 1 million log entries it will exit.

At this point I felt like mixing it up a bit. I tested both loggers with both compiler options. I chose the compiler option that got best result for each logger. For g3log it was “-O3 -fPIC -Ofast -m64 -march=native” for spdlog it was once again “-O3  -march=native“.

The result showed that worst case latency was consistent for spdlog at around 10,000 us, for g3log 400us to 1,000us. Results can be found here.

noatomics-spdlog-vs-g3log-1million_logs_per_thread

Scenario 4: spdlog only, 10 threads, each thread writing 1 million log entries. Queue size of 2^19 instead of 2^20;

For fun I also ran one test with the setup for spdlog to be 2^19, i.e. 524288 compared to its default of   2^20, i.e. 1,048,576).

    int queue_size = 524288;  // 2 ^ 19
    spdlog::set_async_mode(queue_size);

In the same test with extreme contention on the smaller queue the worst case latency became a huge bottleneck.   In a scenario such as this spdlog might benefit from instead moving to a mpmc dynamically sized lock-free queue, like MoodyCamel’s queue.

I won’t plot the result since it would be silly to compare this in a chart with g3log.  I’ll just put down the worst case latency numbers for you. The average case is still not terrible but spdlog’s worst case latency on the threads are troubling. Close to 8 seconds as worst case latency when the queue contention got real bad. Ouch. 

This result couldn’t be saved in microsecond format to Google docs due to the size so I’ll just paste in the summary of the result below.

Using 10 to log in total 1000000 log entries to noatomic-smallqueue-1-10
0 the worst thread latency was:7900 ms (7900444 us)]
1 the worst thread latency was:8 ms (8026 us)]
2 the worst thread latency was:6489 ms (6489388 us)]
3 the worst thread latency was:4 ms (4271 us)]
4 the worst thread latency was:7898 ms (7898381 us)]
5 the worst thread latency was:9 ms (9893 us)]
6 the worst thread latency was:5047 ms (5047866 us)]
7 the worst thread latency was:6 ms (6219 us)]
8 the worst thread latency was:436 ms (436103 us)]
9 the worst thread latency was:2934 ms (2934326 us)]
Total time :9375 ms (9375954 us)
Average time: 9.37595 us

If you don’t use spdlog right and try to optimize the queue (and do a bad job at it) and some thread runs havoc with log entries, even for a short burst, you can expect wait times close to 8 seconds.   Can you afford that? Can your high efficiency life saving or mission critical system afford that?

Opinion: Fixed Sized Queue vs Dynamic Sized Queue

When I started testing spdlog, running exactly the tests that Gabi Melman used then sometimes the logger produced fantastic results. I was very pleased to see that as I think we need more asynchronous loggers. Spdlog is extremely well made. Gabi’s C++ skills are top notch and he has created a speed marvel. I was very curios to see how the worst case latency result would be so that’s why I made this blog entry.

In the end, it mostly boils down to the choice of queue. My strong opinion is that it is rarely a good choice to go with a  fixed sized queue if you can avoid it. If you choose a fixed sized queue you better have a good way of dealing with a full queue.

The systems that need fixed sized queues are often embedded systems with scarce memory resources.  Such systems are also very sensitive to worst case latency. It is my opinion that it is a bad, bad, bad idea to stall and wait in a contention scenario for embedded systems. It is likely better to then pitch all logs until contention goes down (and then you log a warning so that the contention issue was seen).

Of course, fatal logs, should never be pitched. These are the last that will be seen before the system goes down. That is if, the logger, or other part of the system will catch a fatal event (g3log does)

If you are not on an embedded system then either skip logger’s with fixed sized queues or make sure you can handle the potential queue contention if something goes awry.

Of course, system logs should never log as frequent as I have done in the tests above. If you do then it is way too much information to be of any use anyhow. Keep your logging to a minimum.

Platform

The tests were made on Macbook, 2.8Ghz Intel Core i7, 16 GB 1600 MHz DDR3 with a 1TB flash disk storage.

c++ –version gave: Apple LLVM version 6.0 (clang-600.0.57)

Extra: G3log-lock-free

Spdlog uses a lock-free queue which is the reason why it often is blazingly fast.

For fun I tried a lock-free  queue for g3log that doesn’t follow the strict FIFO requirements of the current lock-based one.  FIFO ordering is kept only on a per-thread basis. BewareIf this would be employed on g3log then it could limit the logs you see before a fatal event is caught. (The fatal entry is the last one that is seen)

I have tried MoodyCamel’s lock-free queue queue and it worked great with g3log. I used the simplest API for the queue and not the most efficient; my testing skipped thread tokens and bulk enqueue/dequeue. Using these enhanced API would complicate things but would also speed it up even more.

Using the lock-free queue made it overall  much, much faster, except in the Worst Case Latency scenario. To me, that’s what matters so I’ll stick with Anthony Williams mutex protected queue [g3log version, Williams blog] for now.

If you want to test the experimental faster G3log version, you can see the gist here and just go ahead and use it. Beware of the changes to the fatal handling.

Please notice that Moody Camel’s queue is not public domain, see the gist  for license details.

Extra: G3log Fatal Handling

G3log detects most fatal crashes (the infamous segmentation fault to mention one) and logs the fatal event with a stack trace as its last message before it exits (Windows/OSX/Linux).  If you decide to go with another asynchronous logger then feel free to rip out the crash handling in g3log and use it.

My opinion is that any asynchronous logger needs a good crash handling, or else the developers will quickly be frustrated with the lack of feedback when the system is suffering.

G3log is unlicensed, i.e. it is published as an public domain dedication so you can borrow, copy, use any parts of it with no obligations towards me.

Of course, I really appreciate if you drop me a line and tell me if you like it, or not like it, and suggestions for improvements.  You can reach me at Hedstrom at KjellKod dot cc.

Conclusion

My results showed that g3log was overall consistent with predictable, fairly low worst case latencies. My results also showed that spdlog was inconsistent, differed hugely between runs, with comparably unfavorable worst case latencies.

The tests results showed that in completely terrifying nightmare scenarios the g3logger was still consistent with low worst case latencies while the spdlogger exposed itself with many seconds worth of wait time.

I am obviously just as biased as Gabi Melman is regarding performance testing the loggers. So before you make up your mind. Test it for yourself.

See the compiler optimizations made,  think also hard about the scenarios your logger might run into.

  • Is the average speed what is the most important?
  • Can you live with the worst case latency that the logger exhibits.
  • What is the worst case latencies in performance scenarios you are likely to encounter? (test them!)
  • What compiler options are you using?
  • Can you ever get into a scenario where there is heavy log contention — past a fixed queue size?
  •  What is worst in an outlandish extreme nightmare scenario with logging statements running amok on your system?   :
    1) the queue eating up RAM until the process crashes but the log statements are fast
    2) the queue is fixed size but the log statements can take many seconds to complete
    3) Also, which one of 1) and 2) are you most likely to detect and correct?

The scary scenarios above are most likely just that, something extreme that might only matter in a philosophical discussion.

My view on the matter is that I rather pick a very fast asynchronous logger with stabile worst case latency, even when the contention is high (g3log) compared to a potentially even faster asynchronous logger with unstable worst case latencies that might peak at seconds wait time.

The result for all the testing can be found here.

Advertisements

About kjellkod

Software Engineer by trade, (former?) Martial Artist and Champion by strong will and small skill... and a Swede by nationality :)
This entry was posted in Uncategorized. Bookmark the permalink.

31 Responses to The world’s fastest logger vs G3log

  1. Ben aldrich says:

    Nice article. I guess we shouldn’t move away from g3logger just yet!

  2. Pingback: Quora

  3. Pingback: Quora

  4. Uli says:

    Detailed article, thanks! Do you have any plans to incorporate the “C++ Format” library? It could be the last reason to prefer spdlog over g3log.

    • kjellkod says:

      I hope improved worst-case performance and not loosing logs in case of critical errors (segfaults etc) are also good arguments to prefer g3log over spdlog 🙂
      are you looking forward to improved performance or do you just prefer the API with C++ format library?

      I have pondered writing a C++ format-like library which would be as conformant as possible to the printf-type API formatting which already exist within g3log.
      I.e. today you can already do
      LOG(INFO) << "some text"; or you can use the printf-like API:LOG_F(INFO, "some %s", "text");

      However most people are comfortable using the "<<" operators so I have not yet bothered to do so.
      The C++ format library (https://github.com/cppformat/cppformat) is not public domain so bringing it in would disrupt the licensing.

      • Uli says:

        I have just recently been made aware of the talk “How NOT to Measure Latency” by Gil Tene that gives good reasoning why to prefer worst-case performance measurements over average (or percentile) performance measurements. Until then I was undecided whether your approach or the approach of spdlog is more reasonable. So yes, your arguments are compelling.
        I had the nice API in mind as the speed of g3log already is first-class. I find it sometimes cumbersome to format numbers and I have to log a lot of measurement data. I can only speak for me, but if most people are satisfied with operator<<, as you say, why are C++ Format and spdlog so popular (e.g. forks on github)?

      • kjellkod says:

        The author of spdlog did great PR for his library at reddit. It is also a piece of real nice code.

        If you don’t like the streaming API then just use the print_f like API of g3log.

        That API was written exactly for the use case you are mentioning.

  5. patr says:

    Did not look at the design. I assume you are aware of the idea of LMAX disruptor?

    • kjellkod says:

      Thanks @patr. I am although I have not yet tested it. I could be wrong but I believe LMAX is best when the number of threads are known in advance, right?

      For a logger that is rarely the case. That’s why g3log *still* uses a lock embedded queue : the potential disadvantages far outweighs the benefits in the areas where g3log is mostly used

  6. Rafael Gago says:

    “In a scenario such as this spdlog might benefit from instead moving to a mpmc dynamically sized lock-free queue, like MoodyCamel’s queue.”

    IMO that queue is a prime example of what a lockfree queue shouldn’t be: complex, hard to verify and trying to pack features, move constructors, exception safety and all that C++ uglyness. Dmitrys queues are light years above in quality.

    In lockfree code simplicity and easiness to review is one of the most desireable features. Lockfree code should be kept specialized and feature restricted as it is. The combinatorial explosion when adding features (trying to generalize) is not worth the risk. Think about finding a potential bug on MoodyCamel’s queue, it is impossible for an human brain to cover all the interleavings that such code can have, and even by using validating tools it is very time consuming to process all the output they generate.

    The fact that everyone raves about that queue makes me very scared. If a beginner bases its application on that type of complex lockfree code which themselves aren’t able to verify what happens if it turns out that the code is _very_ wrong and has _very_nasty_ bugs?

    And this is my personal opinion, the author tends to brag a lot on his github with phrases like:

    “This queue not only has less limitations than others (for the most part), but it’s also faster. It’s been fairly well-tested, and offers advanced features like bulk enqueueing/dequeueing (which, with my new design, is much faster than one element at a time, approaching and even surpassing the speed of a non-concurrent queue even under heavy contention).”

    “In short, there was a lock-free queue shaped hole in the C++ open-source universe, and I set out to fill it with the fastest, most complete, and well-tested design and implementation I could. The result is moodycamel::ConcurrentQueue :-)”

    And then you look at the code it is based on parallel good’old SPSC ringbuffers, leaving the queue non-linearizable and hence unusable for contexts where you aren’t in control of all the threads, which are the cases where one is in control of the whole architecture and can tailor the most efficient message passing mechanism by himself (e.g SPSC queues on Thread Local Storage?)

    After the wall of text, the most important thing related to the paragraph above, as that queue isn’t linearizable it’s unusable for a data logger, as it isn’t linearizable the timestamps can travel in time back and forth in batches (not as individual elements contending for access to a MPSC or mutex based queue, which if not in order they show very similar timestamps).

    BTW the fact that it isn’t linearizable adds jitter to the dequeuing too.

    So for usages where one isn’t in control of the threads it could be usable for things like a memory allocator, object recycling, etc. On such scenarios is better to have thread local queues as jemalloc, tcmalloc, etc successfully do. They are exactly the scenarios where you have total control and you can build an efficient message passing mechanism without using a generic “queue” (note the quotes, as a queue without ordering can’t be called a queue).

    What spdlog should have done is to have two different MPSC queues, the fast bounded array based queue for the fast path and then to switch to a node based one using dynamic memory when the main queue is full.

    Requiring 1.000.0000 entries for a test to show good numbers is questionable too? how many MB are we talking about for just a data logger? This is a hefty memory requirement.

    I know that synthetic tests are synthetic, but isn’t 100 threads totally unrealistic to test? Any sane program will try to use a number around the machine core number.

    • kjellkod says:

      @Rafael very good analysis.

      “After the wall of text, the most important thing related to the paragraph above, as that queue isn’t linearizable it’s unusable for a data logger, as it isn’t linearizable the timestamps can travel in time back and forth in batches”

      I agree that the batch output is annoying but I don’t agree that it makes it unusable. Per thread the time will never go backwards.

      For the record. Obviously I was not impressed by either lock-free queue enough to start using them with g3log. In my opinion the advantages are dwarfed by the disadvantages. I hope I made that point clear in my blog.

      —-
      Let me digress for the moment and explain the motivation for making an asynchronous logger as such a logger will have some interleaving of threads (and possibly time stamps).

      For g3log the threads will interleave till the point it is inserted into the lock-based queue. If you can not come to terms with that then a synchronous logger is obviously better.

      For the systems I have worked on, all my professional life, synchronous loggers were bad. The sometimes extreme latency when accessing the disk makes them unusable for many safety critical or system critical devices.

      Often, for such devices, logging is turned off in production and only used when in development.

      G3log was created to enable these devices production enabled logging. Fast *enough* to be usable but without the extreme latency found on synchronous loggers.

      Asynchronous logging however should I.M.H.O be paired with crash handling as otherwise fatal crashes will leave the developers clueless, potentially temporarily maybe switching to synchronous logging by thereby also introducing new timing artifacts that may hide the bug.


      enough self promotion of g3log and back to your comments

      Please note that the test setup was almost identical to the test setup that Gabi Melman used himself. 100 threads and atomic counters where his idea, not mine.

      I made the testing and my blog as response to his own performance testing and comparison as I was curious about spdlog and as I realized that he was hiding the issue of spdlog’s potential for extreme latency.

      I tried to make it clear in the article where I changed the tests outside of the test setup that Gabi made.

      • Rafael Gago says:

        I agree with the extreme latency of a synchronous logger and I formulated vaguely my thoughts, my wrong.

        What I wanted to mean is that most regular user applications aren’t time critical. And if they are properly designed they shouldn’t log every single detail of what they do if they aren’t in the debug/tracing log level. So on the highest error levels (e.g. critical) it may be more important to be sure that the entry gets logged than to be fast. Of course taking care of signals is the right thing to do but this doesn’t cover every situation.

        Anyways, we are speaking about nuances here and I’m sure that we agree on each other.

        As you are aware, no logger mentioned here (even mine that I sent you by PM) is suitable for a critical (hard-realtime) system (robots, industrial control, etc) as most e.g. use the heap at runtime, non wait-free algorithms (e.g. Dmitry queues), exceptions, non-realtime operative systems without strong priority and timing guarantees, etc.

        I forgot to say that you did a beautiful logger. Congrats.

      • kjellkod says:

        Thanks @Rafael 🙂

        I would maybe add that if it is non-time critical application then most (not all) reasons for using C++ goes away.

      • kjellkod says:

        FYI:

        I totally agree btw regarding the critical hard-realtime systems.

        However, most “hard-realtime” systems are not “that hard”. I.e they will use heap and can use 3rd party libraries (such as a logger) if it can be proven by extreme testing or similar that it does not break the hard time requirements.

        Usually those systems are running in “dual mode” so that in case there is a hiccup the twin process can take over.

  7. Rafael Gago says:

    Just as a curiosity, I wrote a small logger one year ago that was way faster than spdlog in its time (and I guess that it still is). Spdlog author took some core ideas to his project (as using Dmitrys MPMC whe its project was mutex based) at the time. At that time he was playing with the MoodyCamel queue at it was found unsuitable as I written on the previous comment.

    In my case I needed a highly-specialized logger to debug communication protocols without affecting the caller threads timings (microsecond range) too much, and to get the low enqueueing latency I just started removing features that I didn’t need on my use case, rendering it useless as a generic logger.

    I removed string formating from the caller thread (Pantheios “inspired” me), limited it to just accept literals as format strings, no ostreams, etc. Some of these features are exactly what one expects from a C++ logger. There is no jack-of-all-trades in engineering.

    Due to the feature removal one couldn’t get much faster. I stopped improving it when I noticed that the call to get the timestamp was the bottleneck. It doesn’t have latency spikes and can be configured to do dynamic allocations when the bounded queue (fastpath) is full.

    It’s funny to make a fast logger, it really is, but reliability, easiness to understand and modify by third parties and user friendliness are more important. A good logger doesn’t need to be crazy fast, just has to avoid to be too slow/intrusive.

    People shouldn’t be using asynchronous logging when a synchronous logger can do the job. One uses synchronous loggers always and asynchronous exceptionally.

    If you want to take a look to the logger I’m talking about I can send you the project on a PM, I just don’t find elegant to post my code here, as it can be interpreted as self-promotion and this is not my intention.

    • kjellkod says:

      Please send me that PM or just drop me an email. Hedstrom at kjellkod dot cc.
      I replied in your first post regarding asynchronous loggers and their use cases.
      To summarize it short
      1) life, safety or system critical that cannot have extreme latency lags –> asynchronous
      2) asynchronous loggers should have fatal crash handling

  8. Rafael Gago says:

    BTW, I noticed that you are sharing a counter between threads on your tests, this would screw-up the performance of a really fast logger due to false sharing/cache line bouncing.

    • kjellkod says:

      That’s Gabi’s idea not mine. I just tried to compare “apples to apples”.
      In the performance test that comes with g3log there is no such ugly counter.
      Funny enough the atomic counter in that test might actually boost spdlog as it would remove some contention from the used lock free queue

      • Rafael Gago says:

        Sorry, my bad.

        Anyways, I find it very bad, but I honestly think that he just did it without bad intentions. Just lack of knowledge at the time that those tests were written.

  9. Meri-Cloud says:

    Great blog.
    BTW, It’s worth having a look at https://github.com/Iyengar111/NanoLog
    The benchmark there shows NanoLog is 5 to 10 times faster than spdlog!

    • kjellkod says:

      Cool! I love it that we are seeing more async loggers. It seems to be a well made logger for sure

      Sadly the benchmarks shows only “average” latency which for all time sensitive savy developers out there is an interesting but also pointless benchmark without showing worst case latencies.

      • Meri-Cloud says:

        NanoLog takes around 100 nano seconds to log a line.

        Not sure if its possible to accurately measure that kind of latency with standard library functions.

        For example, gettimeofday / clock gettime on linux takes ~40nanos. Just a simple timestamp::now() is 40% of the entire time taken to generate and push the NanoLog into the async ring buffer. The error in measuring individual log line latency is simply too high.

        The benchmark measures the time taken to insert 1000 log lines (around 100microseconds) which is measurable with chrono timestamps.

        If you have an accurate way to measure an operation which takes ~100nano second operations ideas welcome 🙂

        Thanks.

  10. kjellkod says:

    @Meri-Cloud

    The nano second measurement is to push onto a memory buffer without much or any contention.

    It’s super cool and awesome but also not relevant for time critical applications unless the worst case can be measured. Even if the measuring itself adds overhead this is still vital information.

    Obviously the average number can be discarded during such a measurement. It’s only the worst cases (like for 10 million writes, the worst 100 writes).

    Please remember from my blog article above that the average although interesting doesn’t tell you how it performce under bad conditions

    For the Nanolog this could be
    1. Queue filling up
    2. Wait during buffer -> write -> file
    2. Multiple writers during 1-2.

    I think I explained this pretty good in the blog article above. Please take another look at it. For a logging library to be taken seriously this is key information to share

    • kjellkod says:

      I reached out to the author of the log.

      Its is super fast but it also doesn’t provides the strict contract that spdlog or g3log does

      1. There is no queue contention handling to ensure messages are not lost
      2. Log messages can overwrite other log messages.

      So… Fast but no guarantees. I’m sure this is appealing but it also should disqualify the logger for any systems where regulatory or system expectations are that no log entries are lost.

      Say 😉 if the “loose” FIFO requirement would also be skipped from this logger than he could get it even faster.

  11. Rui Pacheco says:

    I posted a link to this test on spdlog’s github account and the author responded with tests of his own: https://github.com/gabime/spdlog/issues/293

    • kjellkod says:

      I have no bandwidth to redo the performance tests for some time but the result is interesting. I have not followed spdlog for some time and maybe the contention issues for the queue is now resolved. I urge people that are choosing between them to look at the performance tests I made as well as spdlog’s author made and redo them themselves if speed and low worst case latency is important.

      Then of course looking at the other qualities of spdlog and g3log and see what best fit your needs.

      G3log is the 2nd generation (it started with g2log) and it’s awesome that now, finally we start to see more async loggers out there.

      spdlog and g3log are well made loggers that have different priorities in mind. I would love to see fatal crash handling dealt with in spdlog too.

      … Now, that said I wonder if I should start drafting g4log for 2017 😉

  12. Karthik says:

    Hi,

    Have a look at https://github.com/Iyengar111/NanoLog#latency-benchmark-of-guaranteed-logger

    Comparison shows it is faster than spdlog and g3log and provides same strong guarantee that log lines will never be dropped.

    Karthik

    • kjellkod says:

      That’s not correct.

      From the author himself

      In terms of the design, when the ring buffer is full, the producer just overwrites the log line in that slot [i.e. does not wait for consumer thread to pop the item].
      In summary – the old log line will be lost when buffer is full.”

      I.e logs will be dropped when
      1. Buffer is full
      2. A fatal even happens

      This log speed test is also as flawed as spdlog’s test. It pushes LESS than even the buffer holds so even if the consumer didn’t consume A SINGLE log line it’ll be fine.

      This way the test is rigged to has absolutely minimum amount of queue contention. He’s just writing to a memory buffer without even triggering disk I/O or queue issues at all.

      To me this is not a serious benchmark at all

      • Karthik says:

        Please read the link again. NanoLog has 2 policies, guaranteed and non guaranteed. It is very clear the comparison refers to guaranteed logging. No log messages are ever dropped in guaranteed logging.

  13. Kjell H says:

    @Karthik (or @Iyengar111). Great library. What is obvious from spdlog and NanoLog is that with only speed in regard (and not crash safety etc) the log handling is way more optimal.

    For example: https://github.com/KjellKod/lock-free-wait-free-circularfifo (SPSC) is 11-14 ns while on the computer I tested it on the mutex protected queue was 2-3us. So just replacing the queue would make a huge deal.

    The other slowdown on g3log is the fact that it is in essence a double-serving queue. It is a command queue and a log queue. The commands are wrapped in a std::function and so is the log lines. Obviously that is a big possible improvement.

    So thanks for the inspiration. When I get time I want to change out the queueing for a dynamic lock-free MPSC and of course to split the command and the log queue. That way I think g3log can retain the crash safe qualities while getting a significant speedup (at least that’s what I see from some quick hack tests).

    — Going forward. I think I also would like to decrease the overall complexity. g2log was made at the infancy of c++11 and made cross-platform. If I stick to *nix, C++ code for (g4log?) then I think the complexity can be greatly reduced.

    Sweet. It’s nice with some competition 🙂
    Keep up the good work.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s