The simplest way to increase the performance of a program is to use more efficient libraries. Well, at least in Haskell, where refactoring is easy and strong common abstractions are shared by most of the ecosystem. In this post, I will make trivial changes to the first implementation and measure the associated gains.
I used the RTS option
-s to measure performance for this section. Here is the output for the original (
215,523,656,344 bytes allocated in the heap 33,301,747,648 bytes copied during GC 7,227,434,968 bytes maximum residency (15 sample(s)) 209,338,048 bytes maximum slop 14542 MB total memory in use (0 MB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 417618 colls, 0 par 24.950s 24.940s 0.0001s 0.0016s Gen 1 15 colls, 0 par 23.081s 23.115s 1.5410s 10.3409s INIT time 0.000s ( 0.000s elapsed) MUT time 43.614s ( 43.598s elapsed) GC time 48.031s ( 48.055s elapsed) EXIT time 4.061s ( 4.087s elapsed) Total time 95.708s ( 95.741s elapsed) %GC time 50.2% (50.2% elapsed) Alloc rate 4,941,573,670 bytes per MUT second Productivity 49.8% of total user, 49.8% of total elapsed
The most important metrics here are the total memory use and total time elapsed. I ran the programs several times to warm the file cache and picked one output.
Speed and memory usage were pretty stable, but these are not very precise benchmarks. They should be more than sufficient for this study though!
A faster parser combinator libraries : attoparsec
parsec isn’t fast, and
attoparsec is much faster.
On the top of my head, the most notable difference between the two are:
Parsechas actual error messages. With
attoparsecit is almost always something like “satisfy failed” without the error location. This is the reason I almost always write a new parser with
parsec, and only switch to
attoparsecwhen bugs have been ironed out. If error messages are paramount for your application, I suggest taking a look at
Attoparsecis much faster than
parsec, and uses a lot less memory. By the way, if you are in the never-ending intermediate phase of Haskell proficiency, I really suggest that you learn about Church encoding like transformations. They are at work behind the scenes. We will make use of them in a subsequent episode.
Attoparsecis always backtracking. This means you don’t need to use the
attoparsecyou can grab consecutive characters efficiently. With
parsec, you will get a stream of
Charthat will need to be repacked, whereas
attoparseccan extract slices of the parsed text.
Attoparsechas a modern interface. Code is generally more idiomatic, mostly thanks to the use of the standard
Other alternatives are
trifecta (quite slow but great features and error messages) and
megaparsec (a fork of
parsec with a focus on modernizing its interface and with better error messages). We recently ported
megaparsec without any hurdle.
There is a questionable choice where a pair of
parseInt have been turned into
scientific. We will see in subsequent episodes that this had a negative performance impact.
On the performance side, this change divided the execution time by 4.88 and the memory usage by 4.61. That is, with my sample output, 19.61s of clock time and 3154MB of memory.
Reducing the memory footprint : thyme
The change is very localized, but resulted in a small speed boost, and 1GB of memory saved! The reason
thyme is more memory efficient than
time is that it stores the
UTCTime as a
Int64 instead of the more involved structure that
With this change, the program currently uses 1975MB of memory with the sample input. This sample input only weights about 120MB for 570k entries, so I expected a much higher efficiency at that stage. Of course, unevaluated thunks are crippling the program, but this will be handled in the next episode!
From Text to ByteString
The input is readable text, so the correct data type to represent it is
Text. However, as I do not need to perform any text manipulation and am not too concerned about properly handling non-ASCII characters, it might make sense to move to
The change is trivial, and gives a nice speed boost.
In the actual program I wrote the
UnixFileGen was parametrized for the user and group names, file path and link target fields. That way, the parsing and initial analysis would use
ByteString for efficiency, and the files flagged as problematic would be converted to
Text for further processing.
A note about the original C module
The original C module did the same parsing, but stored all data in a Sqlite database for further processing with Ruby. In order to capture the performance difference between my Haskell implementation and the C program, I modified it with two variants:
The in-memory database version (dubbed
mem): this would do the parsing, and insert all the files in an in-memory database. This is a bit unfair compared to my program, as I don’t need to serialize to external storage.
discardversion: this version would just parse the file and not do anything about it. This is also a bit unfair, as my program stores the actual results in a
Gain over first version
(CPU / RAM)
|attoparsec||19.609||3154||4.88x / 4.61x|
|thyme||18.302||1975||5.23x / 7.36x|
|bytestring||13.890||1959||6.89x / 7.42x|
|original C (mem)||12|
|original C (discard)||2.7|
As we can see, with a sensible choice of libraries we approach the execution speed of the in-memory original C program, at the expense of trivial changes.
In the next episode, I will introduce actual optimisations to the program that will only slightly affect its readability and maintainability.
Can we beat the
discard version of the C program? Can we optimize much more without sacrificing readability and maintainability? Without rummaging in the core? All of these questions will be answered in the final episode!