controller-manager:负责管理集群各种资源,保证资源处于预期的状态。
kube-scheduler:资源调度,负责决定将Pod放到哪个Node上运行。
环境(使用kubeadm安装的k8s集群)
Kubernetes v1.23.8
prometheus operator 0.11.0
报警监控如图
原因
ServiceMonitor 资源对象的声明方式,kube-system 这个命名空间下需要匹配具有 k8s-app=kube-scheduler 这样的 Service,但是在kubeadm安装的中k8s集群没有对应的 Service:
kubectl get svc -n kube-system
处理办法
1.controller-manager和kube-scheduler监听地址(位置一般在/etc/kubernetes/manifests下)
kube-controller-manager.yaml kube-scheduler.yaml
更改--bind-address=127.0.0.1为--bind-address=0.0.0.0
2.创建一个服务并确保它有一个与ServiceMonitor k8s-app: kube-scheduler匹配的标签
3.确保服务与正确的标签正确匹配的 pod(component: kube-scheduler)
可以通过下面的命令查看,具体的pod名称以你安装的为准
kubectl get pods kube-controller-manager-xx-k8s-master-001 -n kube-system -o yaml
4.Service的端口名称必须与ServiceMonitor https-metrics匹配具体也可以通过上面的命令查看健康检查端口
创建对应 Service
[[email protected] ]$cat kube-controller-manager-svc.yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
k8s-app: kube-controller-manager
name: kube-controller-manager
namespace: monitoring
spec:
endpoints:
- bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
interval: 30s
port: https-metrics
scheme: https
tlsConfig:
insecureSkipVerify: true
jobLabel: k8s-app
namespaceSelector:
matchNames:
- kube-system
selector:
matchLabels:
k8s-app: kube-controller-manager
---
apiVersion: v1
kind: Service
metadata:
namespace: kube-system
name: kube-controller-manager
labels:
k8s-app: kube-controller-manager
spec:
ports:
- name: https-metrics
port: 10257
selector:
component: kube-controller-manager
[[email protected] ]$cat kube-scheduler-svc.yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
k8s-app: kube-scheduler
name: kube-scheduler
namespace: monitoring
spec:
endpoints:
- bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
interval: 30s
port: https-metrics
scheme: https
tlsConfig:
insecureSkipVerify: true
jobLabel: k8s-app
namespaceSelector:
matchNames:
- kube-system
selector:
matchLabels:
k8s-app: kube-scheduler
---
apiVersion: v1
kind: Service
metadata:
namespace: kube-system
name: kube-scheduler
labels:
k8s-app: kube-scheduler
spec:
ports:
- name: https-metrics
port: 10259
selector:
component: kube-scheduler
kubectl apply -f kube-controller-manager-svc.yaml
kubectl apply -f kube-scheduler-svc.yaml
执行成功后报警消失
微信机器人也报警恢复
Grafana也已经出图
具体参数官网相关用户有相关解决方案
https://github.com/prometheus-operator/kube-prometheus/issues/718