After deleting all worker nodes, why don't my pods start on new worker nodes?
Virtual Private Cloud Classic infrastructure
You deleted all worker nodes in your cluster so that zero worker nodes exist. Then, you added one or more worker nodes. When you run the following command, several pods for Kubernetes components are stuck in the ContainerCreating status, and the calico-node pods are stuck in the CrashLoopBackOff status.
For 1.29 and later:
kubectl -n calico-system get pods
For 1.28 and earlier:
kubectl -n kube-system get pods
When you delete all worker nodes in your cluster, no worker node exists for the calico-kube-controllers pod to run on. The Calico controller pod's data can't be updated to remove the data of the deleted worker nodes. When the Calico
controller pod begins to run again on the new worker nodes, its data is not updated for the new worker nodes, and it does not start the calico-node pods.
Delete the existing calico-node worker node entries so that new pods can be created.
Before you begin: Install the Calico CLI.
-
Run the
ibmcloud ks cluster configcommand and copy and paste the output to set theKUBECONFIGenvironment variable. Include the--adminand--networkoptions with theibmcloud ks cluster configcommand. The--adminoption downloads the keys to access your infrastructure portfolio and run Calico commands on your worker nodes. The--networkoption downloads the Calico configuration file to run all Calico commands.ibmcloud ks cluster config --cluster <cluster_name_or_ID> --admin --network -
For the
calico-nodepods that are stuck in theCrashLoopBackOffstatus, note theNODEIP addresses.For 1.29 and later:
kubectl -n calico-system get pods -o wideFor 1.28 and earlier:
kubectl -n kube-system get pods -o wideIn this example output, the
calico-nodepod can't start on worker node10.176.48.106.NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES ... calico-kube-controllers-656c5785dd-kc9x2 1/1 Running 0 25h 10.176.48.107 10.176.48.107 <none> <none> calico-node-mkqbx 0/1 CrashLoopBackOff 1851 25h 10.176.48.106 10.176.48.106 <none> <none> coredns-7b56dd58f7-7gtzr 0/1 ContainerCreating 0 25h 172.30.99.82 10.176.48.106 <none> <none> -
Get the IDs of the
calico-nodeworker node entries. Copy the IDs for only the worker node IP addresses that you retrieved in the previous step.calicoctl get nodes -o wide -
Use the IDs to delete the worker node entries. After you delete the worker node entries, the Calico controller reschedules the
calico-nodepods on the new worker nodes.calicoctl delete node <node_ID> -
Verify that the Kubernetes component pods, including the
calico-nodepods, are now running. It might take a few minutes for thecalico-nodepods to be scheduled and for new component pods to be created.For 1.29 and later:
kubectl -n calico-system get podsFor 1.28 and earlier:
kubectl -n kube-system get pods
To prevent this error in the future, never delete all worker nodes in your cluster. Always run at least one worker node in your cluster, and if you use Ingress to expose apps, run at least two worker nodes per zone.