Building a resilient EOSIO API service

EOSIO applications are dependent on EOSIO API endpoints where they send transactions and retrieve blockchain information. It is important that such API endpoints are reliable and fault-tolerant.

nginx and haproxy are two software products that are used often to build resilient load balancers. So far my best experience is from using them both at the same time:

  • nginx is great at SSL offloading and HTTP request routing. It can also serve as a buffer between slow client connection and fast responding server. Letsencrypt certificates work out of the box with nginx. Also it supports request mirroring (more details below).
  • haproxy is great at active monitoring of backend services. It also allows executing external scripts for checking the backend health. Also it allows dynamic disabling/enabling of backends, so they can be easily taken off the service for maintenance.

So, the best scenario that worked for me is having nginx as front end, proxying the HTTP requests to haproxy that is listening on a localhost address. Haproxy is then distributing the requests to multiple nodeos processes and using my healthcheck script to verify that the nodes are in sync with real time.

It is important that all hosts are synchronizing their time with NTP.

nginx has also a mirror module that can be configured to replicate all push_transaction requests to some other host. There’s still a bug in nodeos that is difficult to identify: once every few months, it can stop forwarding transactions to its p2p neighbors. So, such mirroring may improve the transaction reliability. Still there are rare chances that your local node processes the transaction slower than it’s distributed by p2p network, and the original request would get an error because of duplicate transaction.

I made also a few Nagios plugins that can also be used with Icinga, and the “watchdoggiee” plugin is checking this bug condition. It sends a transaction through a specified node, and checks its result via another API. If the transaction does not propagate, the monitoring system can issue a restart command for the node.

Telegram: cc32d9, EOS account: "cc32dninexxx"

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store