节点亲和性调度

节点亲和性分类

节点亲和性调度主要分为硬亲和性调度 requiredDuringSchedulingIgnoredDuringExecution 和 软亲和性调度 preferredDuringSchedulingIgnoredDuringExecution

硬亲和性调度: 必须满足指定条件才调度,否则不调度

软亲和性调度: 优先考虑指定节点,实在不满足也行

节点硬亲和性调度

nginx-deploy.yaml

 1apiVersion: apps/v1
 2kind: Deployment
 3metadata:
 4  name: nginx
 5spec:
 6  replicas: 3
 7  selector:
 8    matchLabels:
 9      app: nginx
10  template:
11    metadata:
12      labels:
13        app: nginx
14    spec:
15      affinity:
16        nodeAffinity:
17          requiredDuringSchedulingIgnoredDuringExecution:
18            nodeSelectorTerms:
19            - matchExpressions:
20              - key: kubernetes.io/hostname
21                operator: "In"
22                values:
23                  - "k8s-m2"
24      containers:
25      - name: nginx
26        image: nginx:1.20.1-alpine
27        resources:
28          limits:
29            memory: "256Mi"
30            cpu: "250m"
31        ports:
32        - containerPort: 80
33          name: http
34          protocol: TCP

创建

1kubectl apply -f nginx-deploy.yaml

观察 Pod 调度情况

1kubectl get pods -l app=nginx
2NAME                     READY   STATUS    RESTARTS   AGE   IP            NODE     NOMINATED NODE   READINESS GATES
3nginx-59c46cc77d-grwq5   1/1     Running   0          5s    10.244.1.84   k8s-m2   <none>           <none>
4nginx-59c46cc77d-hvn6n   1/1     Running   0          5s    10.244.1.85   k8s-m2   <none>           <none>
5nginx-59c46cc77d-z8rrn   1/1     Running   0          5s    10.244.1.83   k8s-m2   <none>           <none>

看到只调度到 k8s-m2 上

节点软亲和性调度

nginx-deploy.yaml

 1apiVersion: apps/v1
 2kind: Deployment
 3metadata:
 4  name: nginx
 5spec:
 6  replicas: 3
 7  selector:
 8    matchLabels:
 9      app: nginx
10  template:
11    metadata:
12      labels:
13        app: nginx
14    spec:
15      affinity:
16        nodeAffinity:
17          preferredDuringSchedulingIgnoredDuringExecution:
18          - preference:
19              matchExpressions:
20              - key: kubernetes.io/hostname
21                operator: "In"
22                values:
23                  - "k8s-m2"
24            weight: 10 # 取值范围:1-100
25      containers:
26      - name: nginx
27        image: nginx:1.20.1-alpine
28        resources:
29          limits:
30            memory: "256Mi"
31            cpu: "500m"
32        ports:
33        - containerPort: 80
34          name: http
35          protocol: TCP

创建

1kubectl apply -f nginx-deployment.yaml

查看调度情况

1kubectl get po -l app=nginx
2NAME                     READY   STATUS    RESTARTS   AGE   IP            NODE     NOMINATED NODE   READINESS GATES
3nginx-544888dd59-b58ch   1/1     Running   0          5s    10.244.1.86   k8s-m2   <none>           <none>
4nginx-544888dd59-m8t28   1/1     Running   0          5s    10.244.1.87   k8s-m2   <none>           <none>
5nginx-544888dd59-x8pmf   1/1     Running   0          5s    10.244.2.69   k8s-m3   <none>           <none>

根据设置的权重值以及节点的实际情况,还是有一定的几率调度到其他节点。

Pod 亲和/反亲和性调度

Pod 亲和性分类

Pod 亲和性调度又分为 Pod 亲和性调度 podAffinityPod 反亲和性调度 podAntiAffinity

每一个下面同时又分为 硬亲和 requiredDuringSchedulingIgnoredDuringExecution 和软亲和 preferredDuringSchedulingIgnoredDuringExecution

Pod 硬亲和性调度

案例 同一 pod 调度到同一节点

 1apiVersion: apps/v1
 2kind: Deployment
 3metadata:
 4  name: nginx
 5spec:
 6  replicas: 3
 7  selector:
 8    matchLabels:
 9      app: nginx
10  template:
11    metadata:
12      labels:
13        app: nginx
14    spec:
15      affinity:
16        podAffinity:
17          requiredDuringSchedulingIgnoredDuringExecution:
18            - topologyKey: "kubernetes.io/hostname"
19              labelSelector:
20                matchLabels:
21                  app: nginx
22      containers:
23      - name: nginx
24        image: nginx:1.20.1-alpine
25        resources:
26          limits:
27            memory: "256Mi"
28            cpu: "250m"
29        ports:
30        - containerPort: 80
31          name: http
32          protocol: TCP

观察 Pod 调度情况

1kubectl get po -l app=nginx -owide
2NAME                     READY   STATUS    RESTARTS   AGE   IP            NODE     NOMINATED NODE   READINESS GATES
3nginx-5dcdcdbd48-dhk4x   1/1     Running   0          6s    10.244.2.76   k8s-m3   <none>           <none>
4nginx-5dcdcdbd48-jncps   1/1     Running   0          6s    10.244.2.78   k8s-m3   <none>           <none>
5nginx-5dcdcdbd48-x2wlr   1/1     Running   0          6s    10.244.2.77   k8s-m3   <none>           <none>

同一 Pod 不允许调度到同一节点

 1apiVersion: apps/v1
 2kind: Deployment
 3metadata:
 4  name: nginx
 5spec:
 6  replicas: 3
 7  selector:
 8    matchLabels:
 9      app: nginx
10  template:
11    metadata:
12      labels:
13        app: nginx
14    spec:
15      affinity:
16        podAntiAffinity:
17          requiredDuringSchedulingIgnoredDuringExecution:
18            - topologyKey: "kubernetes.io/hostname"
19              labelSelector:
20                matchLabels:
21                  app: nginx
22      containers:
23      - name: nginx
24        image: nginx:1.20.1-alpine
25        resources:
26          limits:
27            memory: "256Mi"
28            cpu: "250m"
29        ports:
30        - containerPort: 80
31          name: http
32          protocol: TCP

观察

1NAME                    READY   STATUS    RESTARTS   AGE   IP            NODE     NOMINATED NODE   READINESS GATES
2nginx-d859b58db-2mxq7   1/1     Running   0          4s    10.244.1.89   k8s-m2   <none>           <none>
3nginx-d859b58db-t87qg   1/1     Running   0          4s    10.244.2.79   k8s-m3   <none>           <none>
4nginx-d859b58db-tqsdb   0/1     Pending   0          4s    <none>        <none>   <none>           <none>

看到 2 个节点调度到不同节点上了,另一节点默认打了 NoSchedule 标签,不会调度。

Pod 软亲和性调度

同一 Pod 尽可能调度到到同一节点

 1apiVersion: apps/v1
 2kind: Deployment
 3metadata:
 4  name: nginx
 5spec:
 6  replicas: 3
 7  selector:
 8    matchLabels:
 9      app: nginx
10  template:
11    metadata:
12      labels:
13        app: nginx
14    spec:
15      affinity:
16        podAffinity:
17          preferredDuringSchedulingIgnoredDuringExecution:
18            - podAffinityTerm:
19                topologyKey: "kubernetes.io/hostname"
20                labelSelector:
21                  matchExpressions:
22                    - key: app
23                      operator: "In"
24                      values:
25                        - "nginx"
26              weight: 50
27      containers:
28      - name: nginx
29        image: nginx:1.20.1-alpine
30        resources:
31          limits:
32            memory: "256Mi"
33            cpu: "250m"
34        ports:
35        - containerPort: 80
36          name: http
37          protocol: TCP
1kubectl get po -l app=nginx
2NAME                    READY   STATUS    RESTARTS   AGE   IP            NODE     NOMINATED NODE   READINESS GATES
3nginx-f744fbb8f-6lx2c   1/1     Running   0          16s   10.244.1.93   k8s-m2   <none>           <none>
4nginx-f744fbb8f-cwhn9   1/1     Running   0          20s   10.244.1.91   k8s-m2   <none>           <none>
5nginx-f744fbb8f-d9n6w   1/1     Running   0          18s   10.244.1.92   k8s-m2   <none>           <none>

当资源满足的条件下,看到调度到同一节点。修改 CPU 或者内存导致资源不足的情况下会调度到其他节点。

同一 Pod 尽量不要调度到同一节点

 1apiVersion: apps/v1
 2kind: Deployment
 3metadata:
 4  name: nginx
 5spec:
 6  replicas: 3
 7  selector:
 8    matchLabels:
 9      app: nginx
10  template:
11    metadata:
12      labels:
13        app: nginx
14    spec:
15      affinity:
16        podAntiAffinity:
17          preferredDuringSchedulingIgnoredDuringExecution:
18            - podAffinityTerm:
19                topologyKey: "kubernetes.io/hostname"
20                labelSelector:
21                  matchExpressions:
22                    - key: app
23                      operator: "In"
24                      values:
25                        - "nginx"
26              weight: 50
27      containers:
28      - name: nginx
29        image: nginx:1.20.1-alpine
30        resources:
31          limits:
32            memory: "256Mi"
33            cpu: "250m"
34        ports:
35        - containerPort: 80
36          name: http
37          protocol: TCP
1kubectl get po -l app=nginx
2NAME                     READY   STATUS    RESTARTS   AGE   IP            NODE     NOMINATED NODE   READINESS GATES
3nginx-59bb4cf66c-2546f   1/1     Running   0          4s    10.244.2.83   k8s-m3   <none>           <none>
4nginx-59bb4cf66c-pdv2c   1/1     Running   0          4s    10.244.1.94   k8s-m2   <none>           <none>
5nginx-59bb4cf66c-w79xj   1/1     Running   0          4s    10.244.2.82   k8s-m3   <none>           <none>

这里 k8s-m1 打了 NoSchedule 标签,不满足条件的就调度到同一节点上了。

污点与容忍

NoSchedule 没有容忍该污点的不会调度到具有该污点的节点 NoExecute 没有容忍该污点的会被驱逐出去 PreferNoSchedule 尽量不要调度到改节点

NoSchedule 示例: 容忍 NoSchedule 污点

 1apiVersion: apps/v1
 2kind: Deployment
 3metadata:
 4  name: nginx
 5spec:
 6  replicas: 3
 7  selector:
 8    matchLabels:
 9      app: nginx
10  template:
11    metadata:
12      labels:
13        app: nginx
14    spec:
15      affinity:
16        podAntiAffinity:
17          requiredDuringSchedulingIgnoredDuringExecution:
18            - topologyKey: "kubernetes.io/hostname"
19              labelSelector:
20                matchLabels:
21                  app: nginx
22      containers:
23      - name: nginx
24        image: nginx:1.20.1-alpine
25        resources:
26          limits:
27            memory: "256Mi"
28            cpu: "250m"
29        ports:
30        - containerPort: 80
31          name: http
32          protocol: TCP
33      tolerations:
34      - key: node-role.kubernetes.io/master
35        operator: Equal
36        effect: NoSchedule

观察

1kubectl get po -l app=nginx
2NAME                     READY   STATUS    RESTARTS   AGE   IP            NODE     NOMINATED NODE   READINESS GATES
3nginx-6587d9989c-9wjqm   1/1     Running   0          7s    10.244.1.95   k8s-m2   <none>           <none>
4nginx-6587d9989c-gmrt2   1/1     Running   0          7s    10.244.0.20   k8s-m1   <none>           <none>
5nginx-6587d9989c-tdzvh   1/1     Running   0          7s    10.244.2.84   k8s-m3   <none>           <none>

之前做了硬限制,只调度2个节点,由于 k8s-m1NodeSchdule 污点,无法调度,这里容忍了该节点,已经成功调度到 k8s-m1 节点上。 kube-system 下很多 Pod 都是容忍该污点的,比如 etcdkube-apiserverkube-controller-managerkube-schedulerkube-proxy、网络插件等。

Tolerations: :NoSchedule op=Exists 有兴趣的读者可以自行研究。

NoExecute 示例

给节点打上污点时,不能容忍该污点的 Pod 会被驱逐出去

1kubeclt taint nodes k8s-m1 master:NoExecute

观察效果

1kgp -l app=nginx -owide
2NAME                     READY   STATUS        RESTARTS   AGE     IP            NODE     NOMINATED NODE   READINESS GATES
3nginx-6587d9989c-9wjqm   1/1     Running       0          8m34s   10.244.1.95   k8s-m2   <none>           <none>
4nginx-6587d9989c-cv25q   0/1     Pending       0          3s      <none>        <none>   <none>           <none>
5nginx-6587d9989c-gmrt2   0/1     Terminating   0          8m34s   10.244.0.20   k8s-m1   <none>           <none>
6nginx-6587d9989c-tdzvh   1/1     Running       0          8m34s   10.244.2.84   k8s-m3   <none>           <none>

看到 k8s-m1 节点上的 nginx pod 被立即驱逐出去。

取消污点

1kubectl taint node k8s-m1 master:NoExecute

再次观察 Pod 又可以正常调度了。

固定节点调度

 1apiVersion: apps/v1
 2kind: Deployment
 3metadata:
 4  name: nginx
 5spec:
 6  selector:
 7    matchLabels:
 8      app: nginx
 9  template:
10    metadata:
11      labels:
12        app: nginx
13    spec:
14      nodeSelector:
15        diskType: ssd
16      containers:
17      - name: nginx
18        image: nginx:1.20.1-alpine
19        resources:
20          limits:
21            memory: "256Mi"
22            cpu: "250m"
23        ports:
24        - containerPort: 80
25          name: http
26          protocol: TCP

创建

1kubectl apply -f nginx-deploy.yaml

当前节点没有标签为 diskType=ssd 的,因此没有节点可以调度

1kubectl get po -l app=nginx -owide
2NAME                    READY   STATUS    RESTARTS   AGE   IP       NODE     NOMINATED NODE   READINESS GATES
3nginx-554c6797c-z6plx   0/1     Pending   0          11s   <none>   <none>   <none>           <none>

给节点打标签

1kubectl label node k8s-m2 diskType=ssd

再次查看 Pod,已经调度到 k8s-m2 节点了。

1kubectl get po -l app=nginx -owide
2NAME                    READY   STATUS    RESTARTS   AGE   IP            NODE     NOMINATED NODE   READINESS GATES
3nginx-554c6797c-z6plx   1/1     Running   0          34s   10.244.1.96   k8s-m2   <none>           <none>

常见应用场景

GPU 调度,比如运行需要 GPU 的 Pod。 SSD 磁盘,对于I/O密集型的业务,比如数据库,缓存,可以将节点调度至具有 SSD 磁盘的节点上。