Debugging the metrics-server
Virtual Private Cloud Classic infrastructure
The following symptoms might indicate a need to adjust the metrics-server resources:
-
The
metrics-serveris restarting frequently. -
Deleting a namespace results in the namespace being stuck in a
Terminatingstate andkubectl describe namespaceincludes a condition reporting a metrics API discovery error. -
kubectl top pods,kubectl top nodes, otherkubectlcommands, or applications that use the Kubernetes API to log Kubernetes errors such as:The server is currently unable to handle the request (get pods.metrics.k8s.io)Discovery failed for some groups, 1 failing: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request -
HorizontalPodAutoscalers (HPAs) do not scale deployments.
-
Running
kubectl get apiservices v1beta1.metrics.k8s.ioresults in a status such as:NAME SERVICE AVAILABLE AGE v1beta1.metrics.k8s.io kube-system/metrics-server False (FailedDiscoveryCheck) 139d
Your cluster has a metrics service provided by the metrics-server deployment in the kube-system namespace. The metrics-server resource requests and limits are based on the number of nodes in the cluster and
are optimized for clusters with 30 or less pods per worker node. If the memory requests are too low, it can fail with out-of-memory errors and can respond very slowly. If the CPU requests are too low, it can possibly fail liveness and readiness
probes due to CPU throttling.
Problems with the metrics-server can also be cause problems in other areas. The metrics APIs is not available if the control plane is not able to communicate with the metrics-server by using Konnectivity. Admission control webhooks
can prevent the control plane from creating pods, including the metrics-server pod.
Follow these steps to troubleshoot.
-
Verify that metrics-server pods exist.
kubectl get pod -n kube-system -l k8s-app=metrics-serverIf no pods are listed, there is likely a problem with an
admission-controlwebhook. See Why do cluster operations fail due to a broken webhook?. -
Verify that the apiserver can connect to the
metrics-server.kubectl logs POD -n kube-system -c metrics-server --tail 5Replace
PODwith the pod name shown earlier. The content of any logs returned does not matter.If you get an error message that contains text such as
<workerIP>:10250: getsockopt: connection timed out, seekubectlcommands time out. -
If the previous steps do not show a problem, adjust the resources for the
metrics-server. See Adjusting cluster metrics provider resources.