With Kubernetes giant and various workloads might be dealt with.
To maintain monitor of all these processes, monitoring is crucial.
Monitoring
To observe the applying you have to gather metrics, like CPU, reminiscence, disk utilization and bandwidth in your nodes.
As a result of Kubernetes is a distributed system, it must be monitored and hint cluster-wide.
You need to use exterior instruments like Prometheus and visualize it with Grafana. However to get began I like to recommend you to make use of the Kubernetes dashboard, as it is extremely simple to arrange and you’ve got a default consumer interface with an important metrics.
Logging
When you have aggregated logs, you may visualize points and search the logs for points.
In Kubernetes the kubelet writes container logs to native information. With the command kubectl logs
you may see this logs.
If you wish to carry out cluster huge logging, you should use Fluentd to combination logs.
Fluentd brokers run on every node through a DeamonSet and feed them to an ElasticSearch occasion previous to visualization.
Troubleshooting
Errors within the container
If you’re unsure the place to begin, run
kubectl describe your-pod
It will report
- the general standing of the pod: operating, pending or an error state
- the container configuration
- the container occasions
If the pod is already operating you may first take a look at the usual outs of the container. One frequent situation is that there should not sufficient assets allotted.
kubectl logs your-pod your-container
You may search for error messages within the logs.
If there are errors inside a container you execute into the shell of the container to see what’s going on.
kubectl exec -it yourdeployment -- /bin/sh
Networking points
This might be the following place, the place the problems come up.
So you may go forward and examine the DNS, firewalls and normal connectivity.
TODO pattern for connectivity
Safety points
You may need to examine your RBAC.
SELinux and AppArmor are additionally frequent points, particularly with network-centric purposes.
If you do not know the place to begin, you may disable safety for testing, to delimit the difficulty supply. However be sure you reenable safety afterwards.
Another excuse – not just for safety points – might be an replace. You may roll again to search out out when the difficulty was launched.
Additional studying:
Kubernetes dashboard
Prometheus
Fluentd
Troubleshoot a cluster
Troubleshoot applications
Debug Pods