2018年6月18日星期一

Number of forks causing Ansible playbook to fail

TL;DR - The defaults.forks settings from /etc/ansible/ansible.cfg affects the parallelism of a playbook running. If a play that involves complicated wait_for logic hangs when there is a relatively high number of hosts that the play runs against while running fine when the number is low, it could be that the number of forks is exhausted and the wait_for condition could not be reached.

Rationale -

We have this Ansible role to install a private IPFS cluster, in which we wait_for the bootstrapping peer to have fully started before installing other peers. The detailed sequence is -


  1. All other peers wait while the bootstrapping peer (the first one in the cluster in our case) is being deployed - configuration files generated, docker container started and service started.
  2. After service started, the bootstrapping peer prints its ID to a file.
  3. All other peers' wait is unlocked by the bootstrapping peer's ID file, their entry point script (in which the bootstrapping peer's ID is referred to by ipfs bootstrap add) generated along with other configuration files, their containers and services started.


The default value of defaults.forks is 5. If we have, say, 7 hosts that IPFS needs to run on, chances are that all 5 forks would block on the wait_for part. There would then be no more fork for the bootstrapping peer to finish its tasks, and then wait_for would not unlock until the default timeout of 300 seconds is reached and the playbook fails.

Increasing the defaults.forks value to 7 solved our problem.

P.S. It was by reading the official document for the second time that I realized exhaustion of defaults.forks was causing our IPFS installation failure. So many thanks to the Ansible team and hey, do read!

没有评论:

发表评论