controller-manager:负责管理集群各种资源,保证资源处于预期的状态。
kube-scheduler:资源调度,负责决定将Pod放到哪个Node上运行。
环境(使用kubeadm安装的k8s集群)
Kubernetes v1.23.8
prometheus operator 0.11.0

报警监控如图
2022-08-17T05:42:07.png

原因

ServiceMonitor 资源对象的声明方式,kube-system 这个命名空间下需要匹配具有 k8s-app=kube-scheduler 这样的 Service,但是在kubeadm安装的中k8s集群没有对应的 Service:

kubectl get svc -n kube-system

处理办法

1.controller-manager和kube-scheduler监听地址(位置一般在/etc/kubernetes/manifests下)
kube-controller-manager.yaml kube-scheduler.yaml
更改--bind-address=127.0.0.1为--bind-address=0.0.0.0
2.创建一个服务并确保它有一个与ServiceMonitor k8s-app: kube-scheduler匹配的标签
3.确保服务与正确的标签正确匹配的 pod(component: kube-scheduler)
可以通过下面的命令查看,具体的pod名称以你安装的为准
kubectl get pods kube-controller-manager-xx-k8s-master-001 -n kube-system -o yaml
4.Service的端口名称必须与ServiceMonitor https-metrics匹配具体也可以通过上面的命令查看健康检查端口

创建对应 Service

[[email protected] ]$cat kube-controller-manager-svc.yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    k8s-app: kube-controller-manager
  name: kube-controller-manager
  namespace: monitoring
spec:
  endpoints:
  - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
    interval: 30s
    port: https-metrics
    scheme: https
    tlsConfig:
      insecureSkipVerify: true
  jobLabel: k8s-app
  namespaceSelector:
    matchNames:
    - kube-system
  selector:
    matchLabels:
      k8s-app: kube-controller-manager
---
apiVersion: v1
kind: Service
metadata:
  namespace: kube-system
  name: kube-controller-manager
  labels:
    k8s-app: kube-controller-manager
spec:
  ports:
  - name: https-metrics
    port: 10257
  selector:
    component: kube-controller-manager

[[email protected] ]$cat kube-scheduler-svc.yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    k8s-app: kube-scheduler
  name: kube-scheduler
  namespace: monitoring
spec:
  endpoints:
  - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
    interval: 30s
    port: https-metrics
    scheme: https
    tlsConfig:
      insecureSkipVerify: true
  jobLabel: k8s-app
  namespaceSelector:
    matchNames:
    - kube-system
  selector:
    matchLabels:
      k8s-app: kube-scheduler
---
apiVersion: v1
kind: Service
metadata:
  namespace: kube-system
  name: kube-scheduler
  labels:
    k8s-app: kube-scheduler
spec:
  ports:
  - name: https-metrics
    port: 10259
  selector:
    component: kube-scheduler

kubectl apply -f kube-controller-manager-svc.yaml
kubectl apply -f kube-scheduler-svc.yaml
执行成功后报警消失
2022-08-17T05:54:16.png
微信机器人也报警恢复
2022-08-17T06:16:45.png
Grafana也已经出图
2022-08-17T07:19:35.png

具体参数官网相关用户有相关解决方案
https://github.com/prometheus-operator/kube-prometheus/issues/718

Last modification:September 2, 2022
如果觉得我的文章对你有用,请随意赞赏