prometheus operator中解决KubeSchedulerDown KubeControllerManagerDown

503次阅读
没有评论

共计 2239 个字符,预计需要花费 6 分钟才能阅读完成。

controller-manager:负责管理集群各种资源,保证资源处于预期的状态。
kube-scheduler:资源调度,负责决定将 Pod 放到哪个 Node 上运行。
环境(使用 kubeadm 安装的 k8s 集群)
Kubernetes v1.23.8
prometheus operator 0.11.0

报警监控如图
2022-08-17T05:42:07.png

原因

ServiceMonitor 资源对象的声明方式,kube-system 这个命名空间下需要匹配具有 k8s-app=kube-scheduler 这样的 Service,但是在 kubeadm 安装的中 k8s 集群没有对应的 Service:

kubectl get svc -n kube-system

处理办法

1.controller-manager 和 kube-scheduler 监听地址(位置一般在 /etc/kubernetes/manifests 下)
kube-controller-manager.yaml kube-scheduler.yaml
更改 –bind-address=127.0.0.1 为 –bind-address=0.0.0.0
2. 创建一个服务并确保它有一个与 ServiceMonitor k8s-app: kube-scheduler 匹配的标签
3. 确保服务与正确的标签正确匹配的 pod(component: kube-scheduler)
可以通过下面的命令查看,具体的 pod 名称以你安装的为准
kubectl get pods kube-controller-manager-xx-k8s-master-001 -n kube-system -o yaml
4.Service 的端口名称必须与 ServiceMonitor https-metrics 匹配具体也可以通过上面的命令查看健康检查端口

创建对应 Service

[root@az-k8s-nginx-001]$cat kube-controller-manager-svc.yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    k8s-app: kube-controller-manager
  name: kube-controller-manager
  namespace: monitoring
spec:
  endpoints:
  - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
    interval: 30s
    port: https-metrics
    scheme: https
    tlsConfig:
      insecureSkipVerify: true
  jobLabel: k8s-app
  namespaceSelector:
    matchNames:
    - kube-system
  selector:
    matchLabels:
      k8s-app: kube-controller-manager
---
apiVersion: v1
kind: Service
metadata:
  namespace: kube-system
  name: kube-controller-manager
  labels:
    k8s-app: kube-controller-manager
spec:
  ports:
  - name: https-metrics
    port: 10257
  selector:
    component: kube-controller-manager

[root@az-k8s-nginx-001]$cat kube-scheduler-svc.yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    k8s-app: kube-scheduler
  name: kube-scheduler
  namespace: monitoring
spec:
  endpoints:
  - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
    interval: 30s
    port: https-metrics
    scheme: https
    tlsConfig:
      insecureSkipVerify: true
  jobLabel: k8s-app
  namespaceSelector:
    matchNames:
    - kube-system
  selector:
    matchLabels:
      k8s-app: kube-scheduler
---
apiVersion: v1
kind: Service
metadata:
  namespace: kube-system
  name: kube-scheduler
  labels:
    k8s-app: kube-scheduler
spec:
  ports:
  - name: https-metrics
    port: 10259
  selector:
    component: kube-scheduler

kubectl apply -f kube-controller-manager-svc.yaml
kubectl apply -f kube-scheduler-svc.yaml
执行成功后报警消失
prometheus operator 中解决 KubeSchedulerDown KubeControllerManagerDown
微信机器人也报警恢复
prometheus operator 中解决 KubeSchedulerDown KubeControllerManagerDown
Grafana 也已经出图
2022-08-17T07:19:35.png

具体参数官网相关用户有相关解决方案
https://github.com/prometheus-operator/kube-prometheus/issues/718

正文完
 0
yx
版权声明:本站原创文章,由 yx 于2022-08-17发表,共计2239字。
转载说明:除特殊说明外本站文章皆由CC-4.0协议发布,转载请注明出处。
评论(没有评论)
验证码