WAX node performance, Part II: memory constrained environment

cc32d9
3 min readJun 12, 2021

In my previous post I described the approach that we’re using on our production servers: physical RAM is larger than WAX state size, and the nodeos state is stored in a tmpfs partition.

But WAX RAM consumption is growing quite fast: thousands of new accounts are created daily, and thriving NFT market is creating new assets daily. What will happen when server RAM cannot extend any further?

I tried to run an experiment, and it went pretty well. I took a physical server with 16GB RAM and Intel(R) Xeon(R) CPU E3–1230 v6 @ 3.50GHz, added a 128GB swap partition, and created a tmpfs partition in /etc/fstab:

tmpfs   /srv/wax01/data/state  tmpfs rw,nodev,nosuid,size=80G,x-systemd.after=zfs-mount.service 0  0

then, started a WAX node with --data-dir /srv/wax01/data from a recent snapshot.

Current WAX state is about 32GB in size.

At the node start, when state started to be larger than available RAM, the SSD was loaded at 100% with I/O traffic. It’s clear that all memory pages are fresh, and it’s difficult to make a decision which page should be swapped out. So, a lot of reading and writing as a result. The startup took a few minutes longer than that on a normal server.

Then it started synching, still keeping the SSD I/O at a high level, but caught up with the head block quite quickly. I didn’t measure it carefully, but the resynching speed was around 35 blocks per second.

After catching up, it went pretty smoothly, and SSD kept loaded at less than 4%, with about 20–30 reads or writes per second on average.

Of course if you query an account or a table which was not transacting for a while, it would have to load the data from swap. So, “cleos get account” was taking 0.2s at the first run, and 0.01s on subsequent queries. Also if you push transactions to such a node, some of them may fail at the first time because the data loading time would take longer than usual.

Then, I used the stress test utility to see how it tolerates lesser RAM availability, like this:

stress-ng --vm-bytes 6G --vm-keep -m 1

With stress-ng occupying 6GB, the node was still in sync, but swapping has increased dramatically. With stress-ng taking 8GB, nodeos could not keep in sync any more, and was falling behind constantly.

So, the most active RAM area in a WAX node is about 10GB in size.

After stress-ng tests, the node state was slowly taking over the memory, and the free RAM reduced to few hundred megabytes few hours later.

The node was running for about a day, and it has never gone behind the real time. Our monitoring system did not produce any alarms for it.

So, the outcome of this experiment is that we will be able to tolerate the state size approximately twice the size of physical RAM without too much performance impact. My guessing is that when state size reaches 64GB, this 16GB server will still be able to work, as there’s only a limited amount of RAM accessed and updated in every block.

Also this means that state history nodes and low-traffic API nodes would survive with lesser hardware resources than what you would expect on producer nodes and public API.

The same machine, configured with state mapped on the SSD, got its SSD busy at 20–25% in a relatively quiet time on WAX. It resulted in about 600–900 writes per second (quite clear, as in this case every state update needs to be synchronized with the non-volatile storage).

If you plan a new server with state in tmpfs, it makes sense to plan a big enough swap partition from the very beginning.

--

--

cc32d9

Telegram: cc32d9, Discord: cc32d9#8327, EOS account: "cc32dninexxx"