PCAP import - a crash and a flash

5 minute read

A few months ago an Ostinato customer reported the Ostinato GUI crashing when importing a large PCAP file (~600MB).

After a few failed attempts at reproducing the error on Linux, I realized the customer was using Windows and switched to that.

I was able to reproduce the crash with Ostinato on Windows - both with intelligent import enabled or disabled and set out to debug the same.

I started with intelligent import disabled since it would be easier.

No error of any kind. The GUI just closes and the process doesn’t exist. There was no obvious error.

To make matters worse, the crash happened at 99% and it took almost 50min to reach this point. It’s going to be a pain to debug if every repro takes an hour!

So the first order of business was to code up a hack to reduce the import time - while still being able to reproduce the crash. And that’s just what I did.

Phew! At least now I can run iterations without having to wait for an hour each time! Back to debugging. gdb just reported a SEGV and program terminated. I couldn’t even determine where in the code it was crashing.

printf-debugging wouldn’t work because we were importing millions of packets in a loop - not that I didn’t try!

Out of Memory?

I suspected we were running out of memory but there was enough free memory in the system (32GB RAM) and task manager showed a usage of ~1GB. I was really out of ideas after a day or two of debugging and decided to park it for some time and promptly got distracted with other Ostinato bugs and features.

Until 2 weeks ago when another customer reported the same issue. And so I got back to it once again - but still no luck!

Till I had the brainwave of using catch throw in gdb!

It did turn out to be a memory issue as suspected - an exception was being thrown from the C++ new memory allocator. But I still didn’t get it - there’s a lot of memory free in the system and the process is using only about 1GB as per task manager!

A real head-scratcher.

Virtual memory address space?

More googling and asking ChatGPT I got another hint - check the virtual address space of the process! A win32 binary has a 4GB virtual space of which only 2GB is available to the user space. The memory use reported by task manager was titled “Working set (memory)” which as per task manager itself means “Amount of physical memory currently in use by the process”.

So I ran procexp (process explorer) and to my good luck it reported the virtual address space usage by the process! So I ran the PCAP import again and watched the virtual address space of the process slowly increasing from the base 1GB as we imported more packets till it crashed and procexp showed a virtual space usage of ~2GB!

Ostinato crashes when Virtual Size reaches 2GB

This also explains why I wasn’t able to repro on Linux - because my Linux dev env is 64-bit while my Windows Ostinato dev env is 32-bit!.

My Windows 10 itself is 64-bit but my Qt/Mingw chain is 32-bit due to historical reasons. There was never an incentive earlier to upgrade it to 64-bit. Setting up a new development environment with all the dependencies would take a couple of days (or more!) which is why I’ve never convinced myself to do it before now!

When I explained the problem to the customer, he was happy to switch to Linux to make progress (and he did!). So I bought myself some breathing time. And I decided to do some more experiments to better understand the problem and handle it gracefully.

Try and Recover

Unfortunately all my attempts at using a try…catch block failed to catch the exception and went straight to SEGV (segmentation fault).

I wasn’t sure what was happening. But one thing that I did see was that the exception can happen in any thread, not just the one that is doing the pcap import and consumes bulk of the memory. This is because the app and QT in other threads could be doing their own heap allocations which could trigger the same exception.

So unless I can try…catch all the threads, it’s not of any use!

Which brings us to the question - even if we wrapped try...catch around all threads - what do we gain?

To be able to identify the problem and handle it gracefully, try…catch will have to be granular, not at the very top of QThread::run() - the latter has no hope of recovery.

The only way to handle this seemed to be to monitor memory use ourselves in the thread during pcap import and avoid the crash altogether by aborting the import if we reach say 90% of available memory to the process.

Given that there is no portable cross-platform way to find out available free memory for a process, this is a fools errand for none to little gain. Which leads me to the following conclusion.

We’ll just upgrade Ostinato for Windows from 32-bit to 64-bit going forward!

Hack to Fix for speed

But it wasn’t all in vain.

The hack I wrote up to reduce the PCAP import time - I cleaned it up and converted it into a proper optimization to reduce the PCAP import time. A 500MB PCAP file with 450,000 packets of average size 1218 bytes is now imported in a flash!

Note this is only for raw PCAP import (i.e. when you uncheck the intelligent import option - which is by default checked when you open a PCAP file in Ostinato). With raw pcap import, packets are just read in as hex dumps into Ostinato.

https://userguide.ostinato.org/images/pcapImportOptions.png

I also reduced the time taken to delete existing streams (when importing a PCAP file in overwrite mode) and in the process direct delete of Ostinato streams in the GUI should also be MUCH faster now!

For intelligent PCAP import (aka via PDML) I intend to adopt a different strategy of writing a multi-core standalone converter program for the command-line that can be used by the Ostinato Python API as well.

Stay tuned for that!

Do you have any pet peeves while using Ostinato? Let me know in the comments below!

Leave a Comment