· community social

Manually remove unit from etcd

Long story short I was playing with Docker and got myself into quite the bind. After deploying a nice and easy test cluster I decided to have a go with Consul using the “official” Consul on CoreOS. Big problem because that Dockerfile expects you to be running “etcd” while my cluster is running “etcd2”! That means it breaks etcd, fleet, and the entire CoreOS system. As a bonus, since CoreOS handles HA it then proceeds to BREAK THE WHOLE CLUSTER.

So you reboot. Still broken. So friends let us now remove that god awful unit manually from Fleet by way of etcd.

First let’s kill all the services:

localhost ~ $ systemctl stop docker
localhost ~ $ systemctl stop fleet
localhost ~ $ systemctl stop etcd

Start etcd2 back up:

localhost ~ $ systemctl start etcd2

Is it working? On my instance I wrote a test node called “/msg” with a value of “Looking good”. If I get /msg and see “Looking good” then I know I’m connected back to the central etcd cluster.

localhost ~ $ etcdctl get /msg
Looking good.

Fleet keeps all of its configuration in a hidden node under _/coreos.com/fleet so we can start there. We need to find and kill the bad unit ([email protected]). The unit is kept under unit and we also need to remove the job.

localhost ~ $ etcdctl ls /_coreos.com/fleet # Fleet's nodes
/_coreos.com/fleet/job
/_coreos.com/fleet/engine
/_coreos.com/fleet/lease
/_coreos.com/fleet/machines
/_coreos.com/fleet/unit
localhost ~ $ etcdctl ls /_coreos.com/fleet/unit
/_coreos.com/fleet/unit/0fc6e0b73dc6b02da675b13023c6587850eb603a
localhost ~ $ etcdctl get /_coreos.com/fleet/unit/0fc6e0b73dc6b02da675b13023c6587850eb603a
{"Raw":"[Unit]\nDescription=Consul...snipped..."}
localhost ~ $ etcdctl rm /_coreos.com/fleet/unit/0fc6e0b73dc6b02da675b13023c6587850eb603a

That takes care of the unit. Now the job is actually a directory so we need to use a recursive delete here.

localhost ~ $ etcdctl ls /_coreos.com/fleet/job
/_coreos.com/fleet/job/[email protected]
/_coreos.com/fleet/job/[email protected]
localhost ~ $ etcdctl rm --recursive /_coreos.com/fleet/job/[email protected]
localhost ~ $ etcdctl rm --recursive /_coreos.com/fleet/job/[email protected]

Looking good. Now we reboot and return to your normally scheduled containers!

  • LinkedIn
  • Tumblr
  • Reddit
  • Google+
  • Pinterest
  • Pocket