EOSIO applications are dependent on EOSIO API endpoints where they send transactions and retrieve blockchain information. It is important that such API endpoints are reliable and fault-tolerant.
nginx
and haproxy
are two software products that are used often to build resilient load balancers. So far my best experience is from using them both at the same time:
nginx
is great at SSL offloading and HTTP request routing. It can also serve as a buffer between slow client connection and fast responding server. Letsencrypt certificates work out of the box withnginx
. Also it supports request mirroring (more details below).haproxy
is great at active monitoring of backend services. It also allows executing external scripts for checking the backend health. Also it allows dynamic disabling/enabling of backends, so they can be easily taken off the service for maintenance.
So, the best scenario that worked for me is having nginx
as front end, proxying the HTTP requests to haproxy
that is listening on a localhost address. Haproxy
is then distributing the requests to multiple nodeos
processes and using my healthcheck script to verify that the nodes are in sync with real time.
It is important that all hosts are synchronizing their time with NTP.
nginx
has also a mirror module that can be configured to replicate all push_transaction
requests to some other host. There’s still a bug in nodeos
that is difficult to identify: once every few months, it can stop forwarding transactions to its p2p neighbors. So, such mirroring may improve the transaction reliability. Still there are rare chances that your local node processes the transaction slower than it’s distributed by p2p network, and the original request would get an error because of duplicate transaction.
I made also a few Nagios plugins that can also be used with Icinga, and the “watchdoggiee” plugin is checking this bug condition. It sends a transaction through a specified node, and checks its result via another API. If the transaction does not propagate, the monitoring system can issue a restart command for the node.