EOSIO applications are dependent on EOSIO API endpoints where they send transactions and retrieve blockchain information. It is important that such API endpoints are reliable and fault-tolerant.
haproxy are two software products that are used often to build resilient load balancers. So far my best experience is from using them both at the same time:
nginxis great at SSL offloading and HTTP request routing. It can also serve as a buffer between slow client connection and fast responding server. Letsencrypt certificates work out of the box with
nginx. Also it supports request mirroring (more details below).
haproxyis great at active monitoring of backend services. It also allows executing external scripts for checking the backend health. Also it allows dynamic disabling/enabling of backends, so they can be easily taken off the service for maintenance.
So, the best scenario that worked for me is having
nginx as front end, proxying the HTTP requests to
haproxy that is listening on a localhost address.
Haproxy is then distributing the requests to multiple
nodeos processes and using my healthcheck script to verify that the nodes are in sync with real time.
It is important that all hosts are synchronizing their time with NTP.
nginx has also a mirror module that can be configured to replicate all
push_transaction requests to some other host. There’s still a bug in
nodeos that is difficult to identify: once every few months, it can stop forwarding transactions to its p2p neighbors. So, such mirroring may improve the transaction reliability. Still there are rare chances that your local node processes the transaction slower than it’s distributed by p2p network, and the original request would get an error because of duplicate transaction.
I made also a few Nagios plugins that can also be used with Icinga, and the “watchdoggiee” plugin is checking this bug condition. It sends a transaction through a specified node, and checks its result via another API. If the transaction does not propagate, the monitoring system can issue a restart command for the node.