Kubernetes

官方网站:http://www.kubernetes.io

官方文档:https://kubernetes.io/zh/docs/home/

kubernetes是一种容器边编排机制,用于容器化应用程序的部署,扩展以及管理,目标是让部署容器化应用变得简单高效。

image-20220113081152509

发展经历

  • Infrastucture as a Service,简称IAAS,基础设施即服务,代表阿里云、亚马逊云

  • Platform as a Serivce,简称PAAS,平台即服务,代表有:

    • 新浪sae,派单给运维构建环境

    • Apache Mesos 开源分布式资源管理框架,用于构建资源池

    • DockerSwarm,轻量级容器资源管理器,功能太少

    • Kubernetes,领航者,功能全面、稳定,由Google支持,是由brog系统使用go重新演化出来的

  • Software as a Service,简称SAAS,软件即服务,代表:Office 365、腾讯文档无需安装,直接在浏览器使用

集群架构和组件

Brog(博格)架构

image-20220114142738421
  • BrogMaster(集群部署)主要负责请求分发。客户可以通过brogcfg配置文件、command-line tools命令行、web browser浏览器三种方式对集群进行调度管理

  • Scheduler,调度器,由客户端发起的调度将会交给该组件解析,并将任务存储在Google Paxos键值对数据库中

  • Broglet则会从Paxos中循环读取数据,找到自己需要做的任务进行处理,负责提供计算能力与服务

Kubernetes架构

查看源图像
  • Master

    • 负责请求的分发

    • Scheduler调度器:介绍任务给节点,也就是负责将任务分散到不同的node中,并将结果交给ApiServer,ApiServer再将数据写入到etcd(kv数据库)

    • Controller Manager:处理集群中常规后台任务,一个资源对应一个控制器,而ControllerManager就是负责管理这些控制器的。例如Deployment、Service

    • api server:Kubernetes API,集群的统一入口(kubectl, web UI, scheduler, etcd, replication controller),各组件协调者,以RESTfulAPI提供接口服务,所有对象资源的增删改查和监听操作都交给APIServer处理后再提交给Etcd存储。

  • etcd:分布式键值存储系统,用于保存集群状态数据,比如Pod、Service等对象信息

    • go语言编写的可信赖的分布式的键值存储服务,用于存储关键数据,协助分布式集群正常运转

    • v2版本只能存储在内存中,v3则会持久化。注意,在k8s v1.11版本中,v2版本已被弃用

    • 采用HTTP协议、C/S架构

    • 架构图:架构图

  • Node

    • 真正提供计算能力与服务的组件,负责运行Pod和容器

    • kubelet组件,Master在Node节点上的Agent,管理本机运行容器的生命周期,比如创建容器、Pod挂载数据卷、下载secret、获取容器和节点状态等工作。kubelet将每个Pod转换成一组容器。容器之间的差异通过CRI(容器运行时接口)屏蔽。

    • kubeproxy,负责写入规则到iptables或者IPVS实现映射访问,以实现在Node节点上实现Pod网络代理,维护网络规则和四层负载均衡工作。

    • 容器引擎,比如Docker、Containerd

  • 集群调用者

    • Dashboard:给K8s集群提供一个B/S结构的访问体系

    • kubectl:命令管理k8s集群

  • CoreDNS:可以为集群中的SVC创建一个A记录(域名IP对应关系解析)

  • FEDERATION:提供一个可以跨集群多k8s统一管理的功能

  • PROMETHEUS:普罗米修斯,提供K8s监控能力

  • ELK:提供k8s集群日志统一分析介入平台

部署Kubernetes

生产环境部署k8s主要的两种方式:

  1. kubeadm,Kubeadm是一个工具,提供kubeadm init和kubeadm join,用于快速部署Kubernetes集群。

    部署地址:https://kubernetes.io/docs/reference/setup-tools/kubeadm/kubeadm/

    优点:快速、方便

  2. 二进制,从官方下载发行版的二进制包,手动部署每个组件,组成Kubernetes集群。

    下载地址:https://github.com/kubernetes/kubernetes/releases

    优点:可以对部署过程更好的理解

服务器硬件配置推荐:

image-20220113085230251

网络组件的作用

部署网络组件的目的是打通Pod到Pod之间网络、Node与Pod之间网络,从集群中数据包可以任意传输,形成了一个扁平化网络。

目前,主流的网络组件有:

  1. Flannel,几十台

  2. Calico,更大规模

CNI(Container Network Interface,容器网络接口),就是对k8s对接这些三方网络组建的接口。

配置命令补全

连接Kubernetes集群

kubeconfig配置文件

文件位于:~/.kube/config,kubectl 使用kubeconfig认证文件连接到k8s集群,我们可以使用kubectl config指令生成kubeconfig文件;kubeconfig文件主要记录了下面的几个部分的信息:

  • 集群信息:

    clusters:
    - cluster:
        certificate-authority-data: xxxxx   集群证书
        server: https://192.168.41.10:6443  集群地址,master节点
      name: kubernetes 集群名称
  • 上下文信息(已经连接的所有集群信息,比如用户,生产环境有可能有多个集群):

    contexts:
    - context:
        cluster: kubernetes
        user: kubernetes-admin
      name: kubernetes-admin@kubernetes
  • 当前上下文信息(当前选择的哪个集群)

    current-context: kubernetes-admin@kubernetes
  • 客户端证书信息

    users:
    - name: kubernetes-admin
      user:
        client-certificate-data: xxxx   证书
        client-key-data: xxx  key

指定kubeconfig执行命令:

kubectl --kubeconfig=config get nodes

不加会默认从家目录读取,可以移动家目录缩短命令:

mv config ~/.kube
kubectl get nodes

将配置文件分发给其他机器,就可以连接k8s集群。

连接多个k8s集群

合并配置,context方式,切换context

我的mac上docker-desktop的kubeconfig:

apiVersion: v1
clusters:
- cluster:
    certificate-authority-data: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUMvakNDQWVhZ0F3SUJBZ0lCQURBTkJna3Foa2lHOXcwQkFRc0ZBREFWTVJNd0VRWURWUVFERXdwcmRXSmwKY201bGRHVnpNQjRYRFRJeU1ERXhOREF4TkRjek1Wb1hEVE15TURFeE1qQXhORGN6TVZvd0ZURVRNQkVHQTFVRQpBeE1LYTNWaVpYSnVaWFJsY3pDQ0FTSXdEUVlKS29aSWh2Y05BUUVCQlFBRGdnRVBBRENDQVFvQ2dnRUJBT2VRCmYzeVRYWGlybTIrMDR0NDhZcm5NYUd1ZVNZaWpPaXBkNkIxdE5HdldyQXdqS0xMK2loWG5UcEsvSUF0d0VDVTUKRHE2aFJZMGV0WlNUYXFWbXMxUnREb1BUMU5uR3djR2VXRDFyWUh5akpPeDZQTSt5K3RzcVE2NDFhTVd2c3FZTQpqdDZucmMwUjB2ZU1IY2ZuUWtBYnFLZWNlNFNEbEg2UHlndmx0d0RkOUtyb0tVbDRmRmNGY0Z1dmxzK3hWQ2dFCityMU9wMkp3WE5udXB6U2cwYTZlNm9abmNTL3UvVjhFWkg0RHJxMUN3ZnNUaFlsSlB3Z0NGTnR4SHJpV0xrQWcKb3BCanRmaVZYRDI1c2h6UzQ1N0x3NThUTEFzeTc4bEJaVFZ3TmxwQ1VkSjdxYVJWc3ZwWTQ3VThScEFwQllZYQpmNHQvdEhsSXkxQ0JqRWRad3AwQ0F3RUFBYU5aTUZjd0RnWURWUjBQQVFIL0JBUURBZ0trTUE4R0ExVWRFd0VCCi93UUZNQU1CQWY4d0hRWURWUjBPQkJZRUZIMFYwZ2JtNmt6YkJrRWFNM2s4U1JEd09jQWxNQlVHQTFVZEVRUU8KTUF5Q0NtdDFZbVZ5Ym1WMFpYTXdEUVlKS29aSWh2Y05BUUVMQlFBRGdnRUJBRVFTUzJGU0NJZ0V1akZ1amlQWQpWWVU1QTNMRHcvbVBmSUZOWlZtNUhOYlUzZHdxRmVVRUxTaHAzdXpiM0JHbFprVDlRWDBSOEh5VVByWGtrSlJrCk9DYXlBdEg4WWxkTnZVMFFrYlFKNWVRWW1WYU9GK2dxLzRBVWdkRU1FWWdVd2c1TldxQkExSXdHV2VaZk9DM3MKTWRraU1FSG8zYjRBYjd1TDJibHo4aUc3U2NkTi9KRTJTM2FXQ2JjQWVEYitWRzdpS1pGc1JJM0dzMEhNelFiZQpyK3VVdnNHdG5SMnlSZnZrbWpCRTR4VzA0THpVb0tHY1RmcDcrSy9NZ2cyYXQyckVFUnUwK2tCaVFMd2pqWDRjCjRBbi80VUh2aHh6TXhqTkxNQml2bHlDcmg3OW9QUitZMEg5SHpxNmsvVEtuaGs3Ymh4bkpYYVYzbjVCRVNvWDUKMFNrPQotLS0tLUVORCBDRVJUSUZJQ0FURS0tLS0tCg==
    server: https://kubernetes.docker.internal:6443
  name: docker-desktop
contexts:
- context:
    cluster: docker-desktop
    user: docker-desktop
  name: docker-desktop
current-context: docker-desktop
kind: Config
preferences: {}
users:
- name: docker-desktop
  user:
    client-certificate-data: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1xxx
    client-key-data: LS0tLS1CRUdJTiBSU0EgUFJJVkFURSBLRVktLS0tLQpNxxx

我的windows虚拟机上的kubeconfig:

apiVersion: v1
clusters:
- cluster:
    certificate-authority-data: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUM1ekNDQWMrZ0F3SUJBZxxxx
    server: https://192.168.41.10:6443
  name: kubernetes
contexts:
- context:
    cluster: kubernetes
    user: kubernetes-admin
  name: kubernetes-admin@kubernetes
current-context: kubernetes-admin@kubernetes
kind: Config
preferences: {}
users:
- name: kubernetes-admin
  user:
    client-certificate-data: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSURJVENDQWdtZxxxx
    client-key-data: LS0tLS1CRUdJTiBSU0EgUFJJVkFURSBLRVktLS0tLQpNSUlFcEFJQkFBS0NBUUxxx

我想让我的mac上的kubectl可以同时访问这两个集群,将windows虚拟机上的kubeconfig拷贝到mac,并修改名称:

$ ll
-rw-r--r--  1 yangsx  staff   5.6K  1 14 10:06 config-mac
-rw-r--r--  1 yangsx  staff   5.5K  1 14 09:59 config-win

配置 KUBECONFIG 环境变量,是 kubectl 工具支持的变量,变量内容是冒号分隔的 kubernetes config 认证文件路径,以此来合并多个kubeconfig文件:

KUBECONFIG=config-mac:config-win kubectl config view --flatten > $HOME/.kube/config 

查看合并后的文件cat ~/.

apiVersion: v1
clusters:
- cluster:
    certificate-authority-data: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUMvakNDQWVhZ0F3SUJBZxxxx
    server: https://kubernetes.docker.internal:6443
  name: docker-desktop
- cluster:
    certificate-authority-data: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUM1ekNDQWMrZ0F3SUJBZ0lCQURBTkJxxx
    server: https://192.168.41.10:6443
  name: kubernetes
contexts:
- context:
    cluster: docker-desktop
    user: docker-desktop
  name: docker-desktop
- context:
    cluster: kubernetes
    user: kubernetes-admin
  name: kubernetes-admin@kubernetes
current-context: docker-desktop
kind: Config
preferences: {}
users:
- name: docker-desktop
  user:
    client-certificate-data: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSURRakNDQWlxZ0Fxxxx
    client-key-data: LS0tLS1CRUdJTiBSU0EgUFJJVkFURSBLRVktLS0tLQpNSUlFb2dJQkFBS0NBUUVBxxxx
- name: kubernetes-admin
  user:
    client-certificate-data: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSURJVENDQWdtZ0F3SUJBxxxx
    client-key-data: LS0tLS1CRUdJTiBSU0EgUFJJVkFURSBLRVktLS0tLQpNSUlFcEFJQkFBS0NBUUVBMXJBUxxx

注意,不要重名


切换context

  1. docker desktop

    docker-desktop
  2. 使用命令更改context:

    # 查看当前的context
    kubectl config current-context
    docker-desktop
    
    # 查看所有context
    kubectl config get-contexts
    CURRENT   NAME                          CLUSTER          AUTHINFO           NAMESPACE
    *         docker-desktop                docker-desktop   docker-desktop
              kubernetes-admin@kubernetes   kubernetes       kubernetes-admin
    # 切换context
    kubectl config use-context kubernetes-admin@kubernetes
    Switched to context "kubernetes-admin@kubernetes".

切换config文件(不推荐)

export  KUBECONFIG=$HOME/.kube/rancher-config

直接指定配置文件(不推荐)

#切换到生产集群
kubectl get pod  --kubeconfig=/root/.kube/aliyun_prod-config
#切换到生产idc集群
kubectl get pod  --kubeconfig=/root/.kube/vnet_prod-config
#切换到测试环境
kubectl get pod  --kubeconfig=/root/.kube/bjcs_test-config

集群监控

查看资源集群状态

查看master组件状态:

kubectl get cs
Warning: v1 ComponentStatus is deprecated in v1.19+
NAME                 STATUS    MESSAGE             ERROR
controller-manager   Healthy   ok
scheduler            Healthy   ok
etcd-0               Healthy   {"health":"true"}

这里可以看到组件的状态都是健康的

查看node节点状态:

kubectl get nodes
NAME         STATUS   ROLES                  AGE     VERSION
k8s-master   Ready    control-plane,master   5h37m   v1.21.0
k8s-node1    Ready    <none>                 5h26m   v1.21.0
k8s-node2    Ready    <none>                 5h26m   v1.21.0

AGE:所有的资源都有该属性,代表存活时间
STATUS:节点的状态没,如果显示NotReady说明kubelet组件有异常

查看集群信息:

kubectl cluster-info
Kubernetes control plane is running at https://192.168.41.10:6443   # ApiServer的地址,也就是集群代理,连接k8s要连接apiserver这个地址
CoreDNS is running at https://192.168.41.10:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy

# 如果要查看更详细的集群详细信息,可以使用命令:
kubectl cluster-info dump
# 将会以json格式的方式返回集群的详细信息

查看api资源信息:

kubectl api-resources
名称                               短名称        对应的API版本                           是否属于命名空间 资源的类型
NAME                              SHORTNAMES   APIVERSION                             NAMESPACED   KIND
bindings                                       v1                                     true         Binding
componentstatuses                 cs           v1                                     false        ComponentStatus
....
pods                              po           v1                                     true         Pod
...
services                          svc          v1                                     true         Service

查看资源列表:

kubectl get 资源类型(上面的API resource)

kubectl get pod -n kube-system
NAME                                       READY   STATUS    RESTARTS   AGE
calico-kube-controllers-6b9fbfff44-pstmt   1/1     Running   0          5h35m
calico-node-clq4g                          1/1     Running   0          5h35m

更详细的展示
kubectl get pod -n kube-system -o wide

以yaml格式展示
kubectl get pod -n kube-system -o yaml

查看资源的详细信息:

kubectl describe 资源类型(上面的API resource) 资源名称

查看集群资源利用率

查看Node资源消耗:

kubectl top node <node_name>

查看Pod资源消耗:

kubectl top pod <pod_name>

这个过程是:

kubectl命令 => apiserver => metrics-server(pod) => 所有节点kubelet(cadvisor指标接口) => 所有资源利用率

如果提示: error: Metrics API not available,则需要安装metrics,官方地址:kubernetes-sigs/metrics-server: Scalable and efficient source of container resource metrics for Kubernetes built-in autoscaling pipelines. (github.com)

安装完毕后再执行命令如下:

kubectl top node
NAME         CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%
k8s-master   589m         29%    1209Mi          63%
k8s-node1    106m         10%    504Mi           57%
k8s-node2    160m         16%    503Mi           57%

管理组件日志

  • k8s系统组件日志

    • systemd守护进程管理的组件,比如kubelet

      journalctl -u kubelet
    • 或者查看系统日志

      /var/log/mesages
    • Pod部署的组件:

      kubectl logs kube-proxy-btz4p -n kube-system
      kubectl logs -f kube-proxy-btz4p -n kube-system  # 实时查看
  • k8s集群内部部署的应用程序日志

    • 标准输出,容器的标准输出将会输出在宿主机的/var/lib/docker/containers/<container-id>-json.log

    • 日志文件,比如java log框架打印到某个指定的文件,可以对应用程序内部路径挂载,在宿主机查看;或者使用kubectl exec -it <Pod名称> -- bash进入容器内部直接查看文件。

收集k8s日志的思路

  • 针对标准输出:以DaemonSet方式在每个Node上部署一个日志收集程序,采集/var/lib/docker/containers/目录下的所有日志

    image-20220114192414927
  • 针对容器中的日志文件:在Pod中增加一个容器运行日志采集器,使用emptyDir共享日志目录让日志采集器读取到日志文件

    image-20220114192431691

部署应用

image-20220114233339118

命令方式

  1. 使用Deployment控制器部署镜像

    kubectl create deployment web --image=nginx
    kubectl get deployment,pods
    NAME                  READY   UP-TO-DATE   AVAILABLE   AGE
    deployment.apps/web   1/1     1            1           18s
    
    NAME                      READY   STATUS    RESTARTS   AGE
    pod/web-96d5df5c8-4dlx9   1/1     Running   0          18s
  2. 使用service发布pod

    # --target-port 容器中的应用程序端口,比如nginx就是默认为80
    # --port kubernetes中的服务之间访问的端口,可以通过集群ip加上这个端口进行服务访问
    # --type=NodePort 使用NodePort方式暴露给集群外部,将会对集群服务8080端口映射在每一台的一个随机固定端口上,可以通过任意一台Node节点的IP+31052访问
    kubectl expose deployment web --type=NodePort --target-port=80 --port=8080 --name=web 
    kubectl get service
    NAME         TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)          AGE
    web          NodePort    10.102.83.203   <none>        8080:31052/TCP   7m2s
  3. 访问任意NodeIP:31052即可访问Nginx服务

  4. 卸载: kubectl delete delployment webkubectl delete svc web

yaml方式

  1. 编写yaml资源文件

    # api版本
    apiVersion: apps/v1
    # 资源类型
    kind: Deployment
    # 控制器的元数据,一般使用 项目名(project) + 应用程序名(app)
    metadata:
      name: web
      labels:
        app: web
    # 指定控制器对象的详细信息
    spec:
      # 副本数3
      replicas: 3
      # Deployment控制器对象控制的Pod模板
      template:
        metadata:
          name: web
          labels:
            app: web
        # Pod的详细信息
        spec:
          # Pod的容器信息
          containers:
            - name: web
              image: nginx
              imagePullPolicy: IfNotPresent
          # Pod的重启策略
          restartPolicy: Always
      # 通过选择器指定控制器对象控制的Pod,通过labels用来指定控制器要控制的资源,这里是控制名称为web的pod
      selector:
        matchLabels:
          app: web
    
    ---
    
    # 在创建一个svc
    apiVersion: v1
    kind: Service
    metadata:
      name: web
    spec:
      selector: # 标签选择器,选择的一定是标签,这里是Pod的标签
        app: web
      ports:
        - name: http
          port: 8080
          targetPort: 80
          protocol: TCP
      type: NodePort
  2. kubectl apply -f 上面的yaml文件

    kubectl apply -f web.yaml
    deployment.apps/web created
    
    kubectl get pods,deployments
    NAME                       READY   STATUS    RESTARTS   AGE
    pod/web-849df489b4-97tpv   1/1     Running   0          3m3s
    pod/web-849df489b4-crg4q   1/1     Running   0          3m3s
    pod/web-849df489b4-ms5ds   1/1     Running   0          3m3s
    
    NAME                  READY   UP-TO-DATE   AVAILABLE   AGE
    deployment.apps/web   3/3     3            3           3m3s
     yangsx@mac  ~/temp  kubectl get pods,deployments,services
    NAME                       READY   STATUS    RESTARTS   AGE
    pod/web-849df489b4-97tpv   1/1     Running   0          3m14s
    pod/web-849df489b4-crg4q   1/1     Running   0          3m14s
    pod/web-849df489b4-ms5ds   1/1     Running   0          3m14s
    
    NAME                  READY   UP-TO-DATE   AVAILABLE   AGE
    deployment.apps/web   3/3     3            3           3m14s
    
    NAME                 TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)          AGE
    service/kubernetes   ClusterIP   10.96.0.1      <none>        443/TCP          11h
    service/web          NodePort    10.103.77.17   <none>        8080:32654/TCP   24s
  3. 卸载 kubectl delete -f 文件

快速生成yaml

kubectl create deployment web2 --image=nginx --replicas=3 --dry-run=client -o yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  creationTimestamp: null # 可删除
  labels:
    app: web2
  name: web2
spec:
  replicas: 3
  selector:
    matchLabels:
      app: web2
  strategy: {}
  template:
    metadata:
      creationTimestamp: null # 可删除
      labels:
        app: web2
    spec:
      containers:
      - image: nginx
        name: nginx
        resources: {}
status: {} # 可删除

根据已有资源生成yaml

kubectl get deploy web -o yaml > deployment3.yaml

查看资源的API

查看所有的API资源

kubectl api-resources

资源名称                           缩写          版本                                    命名空间级别   资源类型
NAME                              SHORTNAMES   APIVERSION                             NAMESPACED   KIND
bindings                                       v1                                     true         Binding
componentstatuses                 cs           v1                                     false        ComponentStatus
configmaps                        cm           v1                                     true         ConfigMap
endpoints                         ep           v1                                     true         Endpoints
events                            ev           v1                                     true         Event
limitranges                       limits       v1                                     true         LimitRange
namespaces                        ns           v1                                     false        Namespace
nodes                             no           v1                                     false        Node
persistentvolumeclaims            pvc          v1                                     true         PersistentVolumeClaim
persistentvolumes                 pv           v1                                     false        PersistentVolume
pods                              po           v1                                     true         Pod
podtemplates                                   v1                                     true         PodTemplate
replicationcontrollers            rc           v1                                     true         ReplicationController
resourcequotas                    quota        v1                                     true         ResourceQuota
secrets                                        v1                                     true         Secret
serviceaccounts                   sa           v1                                     true         ServiceAccount
services                          svc          v1                                     true         Service
mutatingwebhookconfigurations                  admissionregistration.k8s.io/v1        false        MutatingWebhookConfiguration
validatingwebhookconfigurations                admissionregistration.k8s.io/v1        false        ValidatingWebhookConfiguration
customresourcedefinitions         crd,crds     apiextensions.k8s.io/v1                false        CustomResourceDefinition
apiservices                                    apiregistration.k8s.io/v1              false        APIService
controllerrevisions                            apps/v1                                true         ControllerRevision
daemonsets                        ds           apps/v1                                true         DaemonSet
deployments                       deploy       apps/v1                                true         Deployment
replicasets                       rs           apps/v1                                true         ReplicaSet
statefulsets                      sts          apps/v1                                true         StatefulSet
tokenreviews                                   authentication.k8s.io/v1               false        TokenReview
localsubjectaccessreviews                      authorization.k8s.io/v1                true         LocalSubjectAccessReview
selfsubjectaccessreviews                       authorization.k8s.io/v1                false        SelfSubjectAccessReview
selfsubjectrulesreviews                        authorization.k8s.io/v1                false        SelfSubjectRulesReview
subjectaccessreviews                           authorization.k8s.io/v1                false        SubjectAccessReview
horizontalpodautoscalers          hpa          autoscaling/v1                         true         HorizontalPodAutoscaler
cronjobs                          cj           batch/v1                               true         CronJob
jobs                                           batch/v1                               true         Job
certificatesigningrequests        csr          certificates.k8s.io/v1                 false        CertificateSigningRequest
leases                                         coordination.k8s.io/v1                 true         Lease
bgpconfigurations                              crd.projectcalico.org/v1               false        BGPConfiguration
bgppeers                                       crd.projectcalico.org/v1               false        BGPPeer
blockaffinities                                crd.projectcalico.org/v1               false        BlockAffinity
caliconodestatuses                             crd.projectcalico.org/v1               false        CalicoNodeStatus
clusterinformations                            crd.projectcalico.org/v1               false        ClusterInformation
felixconfigurations                            crd.projectcalico.org/v1               false        FelixConfiguration
globalnetworkpolicies                          crd.projectcalico.org/v1               false        GlobalNetworkPolicy
globalnetworksets                              crd.projectcalico.org/v1               false        GlobalNetworkSet
hostendpoints                                  crd.projectcalico.org/v1               false        HostEndpoint
ipamblocks                                     crd.projectcalico.org/v1               false        IPAMBlock
ipamconfigs                                    crd.projectcalico.org/v1               false        IPAMConfig
ipamhandles                                    crd.projectcalico.org/v1               false        IPAMHandle
ippools                                        crd.projectcalico.org/v1               false        IPPool
ipreservations                                 crd.projectcalico.org/v1               false        IPReservation
kubecontrollersconfigurations                  crd.projectcalico.org/v1               false        KubeControllersConfiguration
networkpolicies                                crd.projectcalico.org/v1               true         NetworkPolicy
networksets                                    crd.projectcalico.org/v1               true         NetworkSet
endpointslices                                 discovery.k8s.io/v1                    true         EndpointSlice
events                            ev           events.k8s.io/v1                       true         Event
ingresses                         ing          extensions/v1beta1                     true         Ingress
flowschemas                                    flowcontrol.apiserver.k8s.io/v1beta1   false        FlowSchema
prioritylevelconfigurations                    flowcontrol.apiserver.k8s.io/v1beta1   false        PriorityLevelConfiguration
nodes                                          metrics.k8s.io/v1beta1                 false        NodeMetrics
pods                                           metrics.k8s.io/v1beta1                 true         PodMetrics
ingressclasses                                 networking.k8s.io/v1                   false        IngressClass
ingresses                         ing          networking.k8s.io/v1                   true         Ingress
networkpolicies                   netpol       networking.k8s.io/v1                   true         NetworkPolicy
runtimeclasses                                 node.k8s.io/v1                         false        RuntimeClass
poddisruptionbudgets              pdb          policy/v1                              true         PodDisruptionBudget
podsecuritypolicies               psp          policy/v1beta1                         false        PodSecurityPolicy
clusterrolebindings                            rbac.authorization.k8s.io/v1           false        ClusterRoleBinding
clusterroles                                   rbac.authorization.k8s.io/v1           false        ClusterRole
rolebindings                                   rbac.authorization.k8s.io/v1           true         RoleBinding
roles                                          rbac.authorization.k8s.io/v1           true         Role
priorityclasses                   pc           scheduling.k8s.io/v1                   false        PriorityClass
csidrivers                                     storage.k8s.io/v1                      false        CSIDriver
csinodes                                       storage.k8s.io/v1                      false        CSINode
csistoragecapacities                           storage.k8s.io/v1beta1                 true         CSIStorageCapacity
storageclasses                    sc           storage.k8s.io/v1                      false        StorageClass
volumeattachments                              storage.k8s.io/v1                      false        VolumeAttachment

查看所有的API的版本

kubectl api-versions

admissionregistration.k8s.io/v1
admissionregistration.k8s.io/v1beta1
apiextensions.k8s.io/v1
apiextensions.k8s.io/v1beta1
apiregistration.k8s.io/v1
apiregistration.k8s.io/v1beta1
apps/v1
authentication.k8s.io/v1
authentication.k8s.io/v1beta1
authorization.k8s.io/v1
authorization.k8s.io/v1beta1
autoscaling/v1
autoscaling/v2beta1
autoscaling/v2beta2
batch/v1
batch/v1beta1
certificates.k8s.io/v1
certificates.k8s.io/v1beta1
coordination.k8s.io/v1
coordination.k8s.io/v1beta1
crd.projectcalico.org/v1
discovery.k8s.io/v1
discovery.k8s.io/v1beta1
events.k8s.io/v1
events.k8s.io/v1beta1
extensions/v1beta1
flowcontrol.apiserver.k8s.io/v1beta1
metrics.k8s.io/v1beta1
networking.k8s.io/v1
networking.k8s.io/v1beta1
node.k8s.io/v1
node.k8s.io/v1beta1
policy/v1
policy/v1beta1
rbac.authorization.k8s.io/v1
rbac.authorization.k8s.io/v1beta1
scheduling.k8s.io/v1
scheduling.k8s.io/v1beta1
storage.k8s.io/v1
storage.k8s.io/v1beta1
v1

查看某个API的二级字段

kubectl explain <资源>


kubectl explain pod
KIND:     Pod
VERSION:  v1

DESCRIPTION:
     Pod is a collection of containers that can run on a host. This resource is
     created by clients and scheduled onto hosts.

FIELDS:
   apiVersion <string>
     APIVersion defines the versioned schema of this representation of an
     object. Servers should convert recognized schemas to the latest internal
     value, and may reject unrecognized values. More info:
     https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources

   kind <string>
     Kind is a string value representing the REST resource this object
     represents. Servers may infer this from the endpoint the client submits
     requests to. Cannot be updated. In CamelCase. More info:
     https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds

   metadata <Object>
     Standard object's metadata. More info:
     https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#metadata

   spec <Object>
     Specification of the desired behavior of the pod. More info:
     https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#spec-and-status

   status <Object>
     Most recently observed status of the pod. This data may not be up to date.
     Populated by the system. Read-only. More info:
     https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#spec-and-statu

查看某个API的所有级别的字段

 kubectl explain svc --recursive
 
KIND:     Service
VERSION:  v1

DESCRIPTION:
     Service is a named abstraction of software service (for example, mysql)
     consisting of local port (for example 3306) that the proxy listens on, and
     the selector that determines which pods will answer requests sent through
     the proxy.

FIELDS:
   apiVersion <string>
   kind <string>
   metadata <Object>
      annotations <map[string]string>
      clusterName <string>
      creationTimestamp <string>
      deletionGracePeriodSeconds <integer>
      deletionTimestamp <string>
      finalizers <[]string>
      generateName <string>
      generation <integer>
      labels <map[string]string>
      managedFields <[]Object>
         apiVersion <string>
         fieldsType <string>
         fieldsV1 <map[string]>
         manager <string>
         operation <string>
         time <string>
      name <string>
      namespace <string>
      ownerReferences <[]Object>
         apiVersion <string>
         blockOwnerDeletion <boolean>
         controller <boolean>
         kind <string>
         name <string>
         uid <string>
      resourceVersion <string>
      selfLink <string>
      uid <string>
   spec <Object>
      allocateLoadBalancerNodePorts <boolean>
      clusterIP <string>
      clusterIPs <[]string>
      externalIPs <[]string>
      externalName <string>
      externalTrafficPolicy <string>
      healthCheckNodePort <integer>
      internalTrafficPolicy <string>
      ipFamilies <[]string>
      ipFamilyPolicy <string>
      loadBalancerClass <string>
      loadBalancerIP <string>
      loadBalancerSourceRanges <[]string>
      ports <[]Object>
         appProtocol <string>
         name <string>
         nodePort <integer>
         port <integer>
         protocol <string>
         targetPort <string>
      publishNotReadyAddresses <boolean>
      selector <map[string]string>
      sessionAffinity <string>
      sessionAffinityConfig <Object>
         clientIP <Object>
            timeoutSeconds <integer>
      topologyKeys <[]string>
      type <string>
   status <Object>
      conditions <[]Object>
         lastTransitionTime <string>
         message <string>
         observedGeneration <integer>
         reason <string>
         status <string>
         type <string>
      loadBalancer <Object>
         ingress <[]Object>
            hostname <string>
            ip <string>
            ports <[]Object>
               error <string>
               port <integer>
               protocol <string>

查看某个API具体的某个字段的下一级字段

kubectl explain svc.spec.ports
KIND:     Service
VERSION:  v1

RESOURCE: ports <[]Object>

DESCRIPTION:
     The list of ports that are exposed by this service. More info:
     https://kubernetes.io/docs/concepts/services-networking/service/#virtual-ips-and-service-proxies

     ServicePort contains information on service's port.

FIELDS:
   name <string>
     The name of this port within the service. This must be a DNS_LABEL. All
     ports within a ServiceSpec must have unique names. This maps to the 'Name'
     field in EndpointPort objects. Optional if only one ServicePort is defined
     on this service.

   nodePort     <integer>
     The port on each node on which this service is exposed when type=NodePort
     or LoadBalancer. Usually assigned by the system. If specified, it will be
     allocated to the service if unused or else creation of the service will
     fail. Default is to auto-allocate a port if the ServiceType of this Service
     requires one. More info:
     https://kubernetes.io/docs/concepts/services-networking/service/#type-nodeport

   port <integer> -required-
     The port that will be exposed by this service.

   protocol     <string>
     The IP protocol for this port. Supports "TCP", "UDP", and "SCTP". Default
     is TCP.

   targetPort   <string>
     Number or name of the port to access on the pods targeted by the service.
     Number must be in the range 1 to 65535. Name must be an IANA_SVC_NAME. If
     this is a string, it will be looked up as a named port in the target Pod's
     container ports. If this is not specified, the value of the 'port' field is
     used (an identity map). This field is ignored for services with
     clusterIP=None, and should be omitted or set equal to the 'port' field.
     More info:
     https://kubernetes.io/docs/concepts/services-networking/service/#defining-a-service

资源

资源:在k8s中,所有的内容都被抽象为资源,当资源被实例化后,成为对象。

集群资源的分类

  • 名称空间(namespace)级别的资源

    • 工作负载资源(workload)

      • Pod

      • ReplicationController,v1.11被废弃

      • ReplicaSet

      • Deployment

      • StatefulSet

      • DaemonSet

      • Job

      • CronJob

    • 服务发现/负载均衡资源(ServiceDiscovery LoadBalance)

      • Service

      • Ingress

    • 配置与存储资源

      • Volume

      • CSI(容器存储接口)

    • 特殊类型存储卷

      • ConfigMap(配置中心资源类型)

      • Secret(保存敏感数据)

      • DownwardAPI(将外部环境中的信息输出给容器)

  • 集群级别的资源

    • Namespace

    • Node

    • Role

    • ClusterRole

    • RoleBinding

    • ClusterRoleBinding

  • 元数据型的资源

    • HPA

    • PodTemplate

    • LimitRange

Deployment

Pod和控制器的关系

Deployment控制器与其他控制器的最主要目的就是为了方便管理k8s中的容器,而Deployment是最常见的工作负载控制器,是k8s的一个抽象概念,用于更高层次的对象,负责部署、管理Pod。与之类似的控制器还有DaemonSet、StatefulSet等。

主要功能:

  • 管理Pod和ReplicaSet

  • 具有上线部署、副本设定、滚动升级、回滚等功能

  • 提供声明式更新

应用场景:

  • 网站

  • API

  • 微服务

管理应用生命周期

image-20220115205620781

Deployment可以针对应用生命周期进行管理:

部署

kubectl apply -f xxx.yaml
kubectl create deployment web --image=nginx:1.15 --replicas=3

升级

kubectl apply -f xxx.yaml
kubectl set image deployment/web nginx=nginx=1.17
kubectl set image deployment/web nginx=nginx=1.17 --record=true  # 升级并记录命令
kubectl edit deployment/web # 使用系统编辑器打开

滚动升级:k8s对Pod升级的默认策略,通过使用新版本Pod逐步更新旧版本Pod,实现零停机发布,用户无感知

发布策略主要有:蓝绿、灰度(金丝雀、A/B测试、冒烟测试)、滚动。(停机瀑布式升级已经过时)

部署升级的过程

部署:ReplicaSet控制副本数为3
升级:
  新创建一个ReplicaSet,设置副本数为1,并将旧的RS的副本数缩容为2
  将新的RS副本数设置为2,并将旧的RS的副本数缩容为1
  将新的RS副本数设置为3,并将旧的RS的副本数缩容为0

kubectl describe deploy web 命令可以查看时间记录,从而观察这个过程
kubectl get rs 可以查看已经创建的rs,一个rs对应一个升级版本

水平扩容缩容

修改yaml文件的replicas值,再apply
kubectl sacale deployment web --replicas=10

扩容/缩容操作实际就是控制ReplicaSet的副本数

回滚(不常用)

kubectl rollout history deployment web  # 查看web的更新历史
版本编号   版本记录(上面--record=true,会记录升级时的命令,鸡肋没什么用)
REVISION  CHANGE-CAUSE  
1         <none>
2         <none>
3         <none>
4         kubectl set image deployment/web nginx=nginx=1.17 --record=true

# 回滚到上个版本
kubectl rollout undo deployment web 

# 回滚到指定的版本
kubectl rollout undo deployment web --to-revision=2

回滚是调用ReplicaSet重新部署某个版本(每个版本都有对应的一个ReplicaSet,可以查看rs的信息确认对应的版本号已经所做的改动):

image-20220115222629354

项目下线

kubectl delete deploy/web
kubectl delete svc/web

ReplicaSet

用途:

  1. Pod副本数量管理,不断对比当前Pod数量和期望Pod数量

  2. Deployment每次发布都会创建一个ReplicaSet作为记录,用于实现回滚

  3. 滚动升级,创建新的RS进行逐步替换旧的RS

kubectl get rs # 查看RS记录
kubectl rollout history deployment web # 版本对应RS记录

Pod

最小封装集合(豌豆荚,容器荚),一个Pod中会封装多个容器,也是k8s管理的最小单位。有些服务之间关联性较强,需要共享网络环境、存储环境等。如果使用标准容器则很难完成操作,所以k8s在容器外部增加了一个Pod的概念。

pod

特点:

  1. 一个Pod可以理解为一个应用程序示例

  2. Pod中的所有容器始终部署在同一个Node上

  3. Pod中容器共享网络,存储资源

边车模式设计(侧面可以带人的三轮摩托):

  • 通过在Pod中定义专门容器来执行业务容器需要的辅助工作

  • 可以将辅助功能同主业务容器解耦,实现独立发布和能力重用

  • 比如:日志收集、应用监控

Pod对象管理命令

注意:一半没有人直接创建Pod

创建Pod:

  1. kubectl apply -f pod.yaml,kind资源类型为Pod

  2. kubectl run nginx --image=nginx

查看Pod:

  1. kubectl get pods

  2. kubectl get pods -w 实时查看

  3. kubectl describe pod <pod_name> 资源的详细信息

查看日志:

  1. kubectl logs <pod_name> [-c 容器名称]

  2. kubectl logs <pod_name> [-c 容器名称] -f 实时查看

进入容器终端:

  1. kubectl exec <pod_name> [-c 容器名称] -- bash

删除Pod:

  1. kubectl delete pod <pod_name>

Pod的状态

  1. Pending 挂起:Pod已经被k8s系统接受,但是有一个或者多个容器尚未创建。这段时间包括:调度Pod、镜像下载等

  2. Running 运行中:Pod已经被绑定在某个Node上,Pod中的所有容器都已经被创建。至少有一个容器处于运行状态,或者正在处于启动或者重启状态

  3. Succeeded 成功:Pod中的所有容器都被成功终止,并且不会再重启

  4. Failed 失败:Pod中至少有一个容器是失败终止的(容器以非0状态退出或者被系统终止)

  5. Unknown 未知:因为某些原因无法取得Pod状态,通常是因为与Pod所在主机通讯失败

创建Pod的流程

流程-时序图

Kubernetes基于list-watch机制的控制器架构,实现组件间交互的解耦。其他组件监控自己负责的资源,当这些组件方发生变化时,kube-apiserver会通知这些组件,这个过程类似发布与订阅。

  1. 执行命令创建Pod(或是由ControllerManager发送,比如Deployment控制器创建的),命令行将会通过api发送到APIServer,并将创建pod的配置信息提交给ETCD键值对存储系统

  2. Schedule检测到未绑定节点的Pod,当根据自身算法选择一个合适的节点,并给这个pod打一个标记,比如:nodename=node1;然后响应给apiserver,并写入到etcd中

  3. kubelet通过apiserver发现有分配到自己节点的新pod,于是调用CRI创建容器,随后将容器状态上报给apiserver,然后写入etcd

  4. kubectl get 请求apiserver 获取当前命令空间pod列表状态,apiserver从etcd直接读取

Pod的生命周期

image-20220114195532614

创建pod成功后:

  1. 初始化基础容器(pause容器)以完成Pod内网络存储的共享

  2. 由pause容器启动一个或者多个init容器,多个init容器链式执行

    1. 如果前一个执行完毕,切没有错误,就会执行下一个init容器

    2. 如果init容器启动失败,且Pod的重启策略为Always,那么Pod将会不断地重启

    3. 作用:在容器创建前,使用初始化容器完成一些工具以及数据的初始化,以防止MainC不安全或者冗余

    4. 作用:Init容器使用LinuxNamespace,所以相对于应用程序具有不同的文件视图,他们具有访问Secret的权限,MainC不具备

    5. 先于MainC运行,可以用于阻塞启动容器

  3. 并发启动所有MainC(业务容器)

    1. 对每个主容器进行readiness/liveness 就绪/存活检测

    2. 就绪检测之后,容器会变为Ready就绪状态,就绪之后才会开放端口,暴露服务

    3. 存活检测跟随整个容器生命,会不间断的对容器的生存情况进行检测,如果不存活,则会根据Pod的重启策略,对Pod进行重启

  4. start/stop 生命周期动作

验证:Pod内多容器网络、资源共享机制

kubectl create deployment test-pod --dry-run=client --image=nginx -o yaml > test-pod.yaml
vi test-pod.yaml

修改yaml资源清单如下:

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: test-pod
  name: test-pod
spec:
  replicas: 1
  selector:
    matchLabels:
      app: test-pod
  strategy: {}
  template:
    metadata:
      labels:
        app: test-pod
    spec:
      containers:
      - image: nginx
        name: nginx
        resources: {}
        volumeMounts: # 挂载数据卷到容器
        - name: html  # 挂载html数据卷到容器的/usr/share/nginx/html目录下
          mountPath: /usr/share/nginx/html
      # 增加一个busybox容器
      - image: busybox
        name: bs
        # 容器启动后执行sleep命令保证其不会退出
        command: ["/bin/sh", "-c", "sleep 12h"]
        volumeMounts:
        - name: html # 挂载html数据卷到容器的/data目录下
          mountPath: /data
      # 定义一个数据卷,用于存储nginx的html资源
      volumes:
      - name: html   # 数据卷名称为html
        emptyDir: {} # 数据卷类型为空目录

执行:

$ kubectl apply -f test-pod.yaml

# 查看创建的资源
$ kubectl get deploy,pod -o wide | grep test-pod
deployment.apps/test-pod   1/1     1            1           4m39s   nginx,bs     nginx,busybox   app=test-pod
pod/test-pod-5dcdc754cd-2m8kw   2/2     Running   0          4m39s   10.244.36.95   k8s-node1   <none>           <none>

验证网络以及资源共享:

# 进入busybox容器,在共享目录下添加文件index.html
$ kubectl exec -it test-pod-5dcdc754cd-2m8kw -c bs -- sh
/ #  cd /data/
/ #  echo "<h1>HelloPod</h1>" > /data/index.html

# 在busybox内访问localhost:80端口
/ # wget localhost:80
Connecting to localhost:80 (127.0.0.1:80)
saving to 'index.html'
index.html           100% |***********************************|    18  0:00:00 ETA
'index.html' saved

/ # cat index.html
<h1>HelloPod</h1>

验证:每个Pod都会初始化一个pause容器

[root@k8s-node1 ~]# docker ps | grep nginx
9fd7b718d6f7   nginx                                                 "/docker-entrypoint.…"   3 minutes ago   Up 3 minutes             k8s_nginx_nginx_default_9cd01c38-71ee-4743-b9ab-1dde7ab05bc3_0
6ca9803b29bc   registry.aliyuncs.com/google_containers/pause:3.4.1   "/pause"                 3 minutes ago   Up 3 minutes             k8s_POD_nginx_default_9cd01c38-71ee-4743-b9ab-1dde7ab05bc3_0

名称为nginx的容器共有两个,其中一个为pause容器。

环境变量

创建Pod时,可以为其下的容器设置环境变量。

应用场景:

  1. 容器内通过环境变量获取Pod信息

  2. 容器内应用程序通过用户自定义变量改变应用程序默认行为

环境变量定义方式:

  1. 自定义变量值

  2. 变量值从Pod属性获取

  3. 变量值从Secret、ConfigMap获取

测试:

apiVersion: v1
kind: Pod
metadata:
  labels:
    run: test-pod
  name: test-pod
spec:
  containers:
  - image: busybox
    name: test-pod
    command: ["sh", "-c", "sleep 12h"]
    resources: {}
    env:
    # 变量值从pod属性中获取
    - name: MY_NODE_NAME
      valueFrom:
        fieldRef:
          fieldPath: spec.nodeName
    - name: MY_POD_NAME
      valueFrom:
        fieldRef:
          fieldPath: metadata.name
    - name: MY_POD_NAMESPACE
      valueFrom:
        fieldRef:
          fieldPath: metadata.namespace
    - name: MY_POD_IP
      valueFrom:
        fieldRef:
          fieldPath: status.podIP
    - name: ABC
      value: "123456"
  dnsPolicy: ClusterFirst
  restartPolicy: Always

创建上述pod:

kubectl apply -f testpod.yaml

查看环境变量配置是否生效:

kubectl describe pod test-pod
Name:         test-pod
Namespace:    default
Priority:     0
Node:         k8s-node1/192.168.41.11
Start Time:   Fri, 14 Jan 2022 15:37:28 +0800
Labels:       run=test-pod
Annotations:  cni.projectcalico.org/containerID: 6ec8706992387019b175f4aa38489789379dd5470d2a9e4f9e752cc0c804d32b
              cni.projectcalico.org/podIP: 10.244.36.103/32
              cni.projectcalico.org/podIPs: 10.244.36.103/32
Status:       Running
IP:           10.244.36.103
IPs:
  IP:  10.244.36.103
Containers:
  test-pod:
    Container ID:  docker://30fb7d4bebe0a4fe50cfd752e6f79599f79b59b36240ebced54effeeab634a6e
    Image:         busybox
    Image ID:      docker-pullable://busybox@sha256:5acba83a746c7608ed544dc1533b87c737a0b0fb730301639a0179f9344b1678
    Port:          <none>
    Host Port:     <none>
    Command:
      sh
      -c
      sleep 12h
    State:          Running
      Started:      Fri, 14 Jan 2022 15:37:35 +0800
    Ready:          True
    Restart Count:  0
    ####################### 环境变量 ######################
    Environment:
      MY_NODE_NAME:       (v1:spec.nodeName)
      MY_POD_NAME:       test-pod (v1:metadata.name)
      MY_POD_NAMESPACE:  default (v1:metadata.namespace)
      MY_POD_IP:          (v1:status.podIP)
      ABC:               123456
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-nl9t2 (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             True
  ContainersReady   True
  PodScheduled      True
Volumes:
  kube-api-access-nl9t2:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type    Reason     Age    From               Message
  ----    ------     ----   ----               -------
  Normal  Scheduled  2d17h  default-scheduler  Successfully assigned default/test-pod to k8s-node1
  Normal  Pulling    2d17h  kubelet            Pulling image "busybox"
  Normal  Pulled     2d17h  kubelet            Successfully pulled image "busybox" in 3.323362543s
  Normal  Created    2d17h  kubelet            Created container test-pod
  Normal  Started    2d17h  kubelet            Started container test-pod

进入容器输出环境变量:

kubectl exec -it test-pod -- sh
/ # echo $MY_NODE_NAME,$MY_POD_NAME,$MY_POD_NAMESPACE,$MY_POD_IP,$ABC
k8s-node1,test-pod,default,10.244.36.103,123456

init容器/初始化容器

初始化容器用于初始化工作,执行完毕就结束,可以理解为一次性任务

  1. 支持大部分应用容器配置,但是不支持健康检查

  2. 优先于应用容器执行

应用场景:

  1. 环境检查:确保应用容器以来的服务启动后再启动应用容器

  2. 初始化配置:例如给应用容器准备配置文件

示例:下载并初始化配置

# Pod 初始化容器
apiVersion: v1
kind: Pod
metadata:
  name: tomcat-initc
  labels:
    app: tomcat-initc
spec:
  # 定义卷
  volumes:
    - name: tomcat-initc-volume
      emptyDir: {}
  initContainers:
    # 为tomcat初始化数据的初始化容器
    - name: init-html
      image: busybox:latest
      imagePullPolicy: IfNotPresent
      # 命令可以去下载war包
      command: ['sh', '-c','mkdir -p /usr/local/tomcat/webapps && echo ''<h1>你好</h1>'' >> /usr/local/tomcat/webapps/index.html']
      volumeMounts: # 将tomcat-initc-volume 挂载到init容器下,并初始化此卷的内容,准备配置以及数据
        - mountPath: /usr/local/tomcat/webapps/
          name: tomcat-initc-volume
  containers:
    - name: tomcat-initc
      image: tomcat
      imagePullPolicy: IfNotPresent
      volumeMounts: #  将tomcat-initc-volume 挂载到webapp目录下
        - mountPath: /usr/local/tomcat/webapps/
          name: tomcat-initc-volume
  restartPolicy: Always

---

apiVersion: v1
kind: Service
metadata:
  name: tomcat-initc-svc
spec:
  selector:
    app: tomcat-initc
  ports:
    - port: 8080
  type: NodePort

重启策略 restartPolicy

共有三中那个重启策略,分别用在不同的场景:

  1. Always:当容器终止退出后,总是重新启动容器,是Pod的默认策略。适合需要持续运行提供服务的程序,比如nginx、redis、javaapp

  2. OnFailure:当容器异常退出(退出状态码非0)时,才重启容器。适合需要周期性运行的程序,比如数据库备份、巡检

  3. Never:当容器退出后,从不重启容器。适合一次性运行的程序,比如计算程序、数据离线处理程序

设置Pod的重启策略:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: web
  labels:
    app: web
spec:
  replicas: 1
  template:
    metadata:
      name: web
      labels:
        app: web
    spec:
      containers:
        - name: web
          image: nginx
          imagePullPolicy: IfNotPresent
      restartPolicy: Always # 指定重启策略
  selector:
    matchLabels:
      app: web

健康检查

健康检查分为三个阶段,位于Init容器运行成功之后:

  1. startupProbe:启动检查,检查成功才由存活检查接手,用于保护慢启动容器(某些容器启动过程时间长,通过启动检查可以排除环境问题,防止长时间启动最后因环境失败的情况)

  2. livenessPobe:存活检查,将杀死容器,根据Pod的restartPolicy来操作

  3. redinessProbe:就绪检查,检查服务是否正常运行,比如项目是否启动成功。如果检查失败,Kubernetes会把Pod从service endpoints中剔除

其中每种检查都支持一下三种检查方法:

  1. httGet:发送HTTP请求,返回200-400范围状态码为成功

  2. exec:执行shell命令返回状态码是0成功

  3. tcpSocket:发起TCP Socket建立成功

示例:健康检查-端口探测(http)

部署一个deployment,他的资源清单nginx.yaml如下:

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: nginx
  name: nginx
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx
  strategy: {}
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: nginx
    spec:
      containers:
        - image: nginx
          name: nginx
          ports:
            - containerPort: 80
          resources: {}
          # 容器存活检查
          livenessProbe:
            httpGet:
              port: 80
              path: /
            # 启动容器多少秒开始进行存活检查
            initialDelaySeconds: 20
            # 以后每间隔多少秒检查依次
            periodSeconds: 10
          # 容器就绪检查
          readinessProbe:
            httpGet:
              port: 80
              path: /
            initialDelaySeconds: 30
            periodSeconds: 10

---

apiVersion: v1
kind: Service
metadata:
  name: nginx
  labels:
    app: nginx
spec:
  type: NodePort
  ports:
    - port: 80
      name: web
      targetPort: 80
  selector: # 关联pod
    app: nginx

查看部署情况:

kubectl get deploy,pod,svc
NAME                    READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/nginx   0/1     1            0           8s

NAME                         READY   STATUS              RESTARTS   AGE
pod/nginx-5fbd849686-bgvmq   0/1     ContainerCreating   0          8s

NAME                 TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)        AGE
service/kubernetes   ClusterIP   10.96.0.1       <none>        443/TCP        20h
service/nginx        NodePort    10.100.240.34   <none>        80:30063/TCP   8s

查看pod内容器的详细信息,确认livenessProbe和redinessProbe配置是否成功

kubectl describe pod nginx-5fbd849686-bgvmq
Name:         nginx-5fbd849686-bgvmq
Namespace:    default
Priority:     0
Node:         k8s-node1/192.168.41.11
Start Time:   Fri, 14 Jan 2022 10:30:15 +0800
Labels:       app=nginx
              pod-template-hash=5fbd849686
Annotations:  cni.projectcalico.org/containerID: 6a97b48184cd15527d99cc965f3e7f3445619a084df0e0856005646f5d046ef7
              cni.projectcalico.org/podIP: 10.244.36.102/32
              cni.projectcalico.org/podIPs: 10.244.36.102/32
Status:       Running
IP:           10.244.36.102
IPs:
  IP:           10.244.36.102
Controlled By:  ReplicaSet/nginx-5fbd849686
Containers:
  nginx:
    Container ID:   docker://1b801e6715c6cb6b3714a383d10071fd84292bb6980dc9278a3eb77819ffb8b4
    Image:          nginx
    Image ID:       docker-pullable://nginx@sha256:0d17b565c37bcbd895e9d92315a05c1c3c9a29f762b011a10c54a66cd53c9b31
    Port:           80/TCP
    Host Port:      0/TCP
    State:          Running
      Started:      Fri, 14 Jan 2022 10:30:25 +0800
    Ready:          False
    Restart Count:  0
    ######### 在这里
    Liveness:       http-get http://:80/ delay=20s timeout=1s period=10s #success=1 #failure=3
    Readiness:      http-get http://:80/ delay=30s timeout=1s period=10s #success=1 #failure=3
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-cspnd (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  kube-api-access-cspnd:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason       Age    From               Message
  ----     ------       ----   ----               -------
  Normal   Scheduled    2d12h  default-scheduler  Successfully assigned default/nginx-5fbd849686-bgvmq to k8s-node1
  Warning  FailedMount  2d12h  kubelet            MountVolume.SetUp failed for volume "kube-api-access-cspnd" : failed to sync configmap cache: timed out waiting for the condition
  Normal   Pulling      2d12h  kubelet            Pulling image "nginx"
  Normal   Pulled       2d12h  kubelet            Successfully pulled image "nginx" in 5.460878686s
  Normal   Created      2d12h  kubelet            Created container nginx
  Normal   Started      2d12h  kubelet            Started container nginx

验证存活检查与就绪检查是否每段时间发送一次请求

# 查看nginx日志
kubectl logs nginx-5fbd849686-bgvmq
/docker-entrypoint.sh: /docker-entrypoint.d/ is not empty, will attempt to perform configuration
/docker-entrypoint.sh: Looking for shell scripts in /docker-entrypoint.d/
/docker-entrypoint.sh: Launching /docker-entrypoint.d/10-listen-on-ipv6-by-default.sh
10-listen-on-ipv6-by-default.sh: info: Getting the checksum of /etc/nginx/conf.d/default.conf
10-listen-on-ipv6-by-default.sh: info: Enabled listen on IPv6 in /etc/nginx/conf.d/default.conf
/docker-entrypoint.sh: Launching /docker-entrypoint.d/20-envsubst-on-templates.sh
/docker-entrypoint.sh: Launching /docker-entrypoint.d/30-tune-worker-processes.sh
/docker-entrypoint.sh: Configuration complete; ready for start up
2022/01/14 02:30:25 [notice] 1#1: using the "epoll" event method
2022/01/14 02:30:25 [notice] 1#1: nginx/1.21.5
2022/01/14 02:30:25 [notice] 1#1: built by gcc 10.2.1 20210110 (Debian 10.2.1-6)
2022/01/14 02:30:25 [notice] 1#1: OS: Linux 3.10.0-957.el7.x86_64
2022/01/14 02:30:25 [notice] 1#1: getrlimit(RLIMIT_NOFILE): 1048576:1048576
2022/01/14 02:30:25 [notice] 1#1: start worker processes
2022/01/14 02:30:25 [notice] 1#1: start worker process 31 # 启动时间
192.168.41.11 - - [14/Jan/2022:02:30:45 +0000] "GET / HTTP/1.1" 200 615 "-" "kube-probe/1.21" "-" # 20s后触发第一次存活检测
192.168.41.11 - - [14/Jan/2022:02:30:55 +0000] "GET / HTTP/1.1" 200 615 "-" "kube-probe/1.21" "-" # 30s后触发第一次就绪检测

192.168.41.11 - - [14/Jan/2022:02:30:55 +0000] "GET / HTTP/1.1" 200 615 "-" "kube-probe/1.21" "-" # 第二次存活检测

# 下面每十秒发生一次存活检测、一次就绪检测
192.168.41.11 - - [14/Jan/2022:02:31:05 +0000] "GET / HTTP/1.1" 200 615 "-" "kube-probe/1.21" "-" 
192.168.41.11 - - [14/Jan/2022:02:31:05 +0000] "GET / HTTP/1.1" 200 615 "-" "kube-probe/1.21" "-"

192.168.41.11 - - [14/Jan/2022:02:31:15 +0000] "GET / HTTP/1.1" 200 615 "-" "kube-probe/1.21" "-"
192.168.41.11 - - [14/Jan/2022:02:31:15 +0000] "GET / HTTP/1.1" 200 615 "-" "kube-probe/1.21" "-"

192.168.41.11 - - [14/Jan/2022:02:31:25 +0000] "GET / HTTP/1.1" 200 615 "-" "kube-probe/1.21" "-"
192.168.41.11 - - [14/Jan/2022:02:31:25 +0000] "GET / HTTP/1.1" 200 615 "-" "kube-probe/1.21" "-"

192.168.41.11 - - [14/Jan/2022:02:31:35 +0000] "GET / HTTP/1.1" 200 615 "-" "kube-probe/1.21" "-"
192.168.41.11 - - [14/Jan/2022:02:31:35 +0000] "GET / HTTP/1.1" 200 615 "-" "kube-probe/1.21" "-"

192.168.41.11 - - [14/Jan/2022:02:31:45 +0000] "GET / HTTP/1.1" 200 615 "-" "kube-probe/1.21" "-"
192.168.41.11 - - [14/Jan/2022:02:31:45 +0000] "GET / HTTP/1.1" 200 615 "-" "kube-probe/1.21" "-"

192.168.41.11 - - [14/Jan/2022:02:31:55 +0000] "GET / HTTP/1.1" 200 615 "-" "kube-probe/1.21" "-"
192.168.41.11 - - [14/Jan/2022:02:31:55 +0000] "GET / HTTP/1.1" 200 615 "-" "kube-probe/1.21" "-"

验证存活检查失败,是否将杀死容器,并根据Pod的restartPolicy来操作

# 进入容器删除index.html 让httpGet访问结果为404 模拟存活检查失败:
kubectl exec -it nginx-5fbd849686-bgvmq -- /bin/sh
$ rm -rf /usr/share/nginx/html/index.html
$ exit

再次查看pod信息:

kubectl describe pod nginx-5fbd849686-bgvmq
Name:         nginx-5fbd849686-bgvmq
Namespace:    default
Priority:     0
Node:         k8s-node1/192.168.41.11
Start Time:   Fri, 14 Jan 2022 10:30:15 +0800
Labels:       app=nginx
              pod-template-hash=5fbd849686
Annotations:  cni.projectcalico.org/containerID: 6a97b48184cd15527d99cc965f3e7f3445619a084df0e0856005646f5d046ef7
              cni.projectcalico.org/podIP: 10.244.36.102/32
              cni.projectcalico.org/podIPs: 10.244.36.102/32
Status:       Running
IP:           10.244.36.102
IPs:
  IP:           10.244.36.102
Controlled By:  ReplicaSet/nginx-5fbd849686
Containers:
  nginx:
    Container ID:   docker://5d3c4ca052837601b5ff1bb9f43a6dc1df04f6b4a4c05726edd7a0a928a55577
    Image:          nginx
    Image ID:       docker-pullable://nginx@sha256:0d17b565c37bcbd895e9d92315a05c1c3c9a29f762b011a10c54a66cd53c9b31
    Port:           80/TCP
    Host Port:      0/TCP
    State:          Running
      Started:      Fri, 14 Jan 2022 10:41:03 +0800
    Last State:     Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Fri, 14 Jan 2022 10:30:25 +0800
      Finished:     Fri, 14 Jan 2022 10:40:55 +0800
    Ready:          False
    Restart Count:  1
    Liveness:       http-get http://:80/ delay=20s timeout=1s period=10s #success=1 #failure=3
    Readiness:      http-get http://:80/ delay=30s timeout=1s period=10s #success=1 #failure=3
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-cspnd (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  kube-api-access-cspnd:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason       Age                    From               Message
  ----     ------       ----                   ----               -------
  Normal   Scheduled    2d12h                  default-scheduler  Successfully assigned default/nginx-5fbd849686-bgvmq to k8s-node1
  Warning  FailedMount  2d12h                  kubelet            MountVolume.SetUp failed for volume "kube-api-access-cspnd" : failed to sync configmap cache: timed out waiting for the condition
  Normal   Pulled       2d12h                  kubelet            Successfully pulled image "nginx" in 5.460878686s
  ############### 就绪检测失败、存货检测失败
  Warning  Unhealthy    2d12h (x3 over 2d12h)  kubelet            Readiness probe failed: HTTP probe failed with statuscode: 403
  Warning  Unhealthy    2d12h (x3 over 2d12h)  kubelet            Liveness probe failed: HTTP probe failed with statuscode: 403
  ############### 容器即将重启
  Normal   Killing      2d12h                  kubelet            Container nginx failed liveness probe, will be restarted
  Normal   Pulling      2d12h (x2 over 2d12h)  kubelet            Pulling image "nginx"
  Normal   Created      2d12h (x2 over 2d12h)  kubelet            Created container nginx
  Normal   Started      2d12h (x2 over 2d12h)  kubelet            Started container nginx
  Normal   Pulled       2d12h                  kubelet            Successfully pulled image "nginx" in 6.968038775s

重启次数:

kubectl get pods
NAME                     READY   STATUS    RESTARTS   AGE
nginx-5fbd849686-bgvmq   1/1     Running   1          12m  # 变为1

验证Kubernetes是否会把Pod从service endpoints中剔除

# 进入容器删除index.html 让httpGet访问结果为404 模拟存活检查失败:
kubectl exec -it nginx-5fbd849686-bgvmq -- /bin/sh
$ rm -rf /usr/share/nginx/html/index.html
$ exit

在执行上述命令之前,使用如下命令持续监测endpoint的状态,可以看到如下的结果:

kubectl get ep -w
NAME         ENDPOINTS            AGE
kubernetes   192.168.41.10:6443   20h
nginx        10.244.36.102:80     16m 
nginx                             16m # 就绪检测失败,移除svc关联的endpoint
nginx        10.244.36.102:80     17m # nginx服务恢复,重新添加

静态Pod

特点:

  1. Pod由特定节点上的kubelet管理

  2. 不能使用控制器

  3. Pod名称标志当前节点名称

应用场景:

  1. k8s搭建就是这种机制,用于启动kube系统组件

  2. 工作中不会用

在kubelet配置文件中启用静态Pod参数:

vi /var/lib/kubelet/config.yaml
...
staticPodPath: /etc/kubernets/manifests
...

[root@k8s-master ~]# cd /etc/kubernetes/manifests
[root@k8s-master manifests]# ll
总用量 16
-rw------- 1 root root 2220 1月  13 14:06 etcd.yaml
-rw------- 1 root root 3330 1月  13 14:06 kube-apiserver.yaml
-rw------- 1 root root 2828 1月  13 19:27 kube-controller-manager.yaml
-rw------- 1 root root 1414 1月  13 19:26 kube-scheduler.yaml

将部署的pod yaml放在该目录由kubelet自动创建,从这个目录移除就会自动移除静态pod。

DaemonSet

功能:

  1. 在每个Node上都运行一个Pod

  2. 新加入的Node也同样会自动运行一个Pod

应用场景:

  1. 网络插件

  2. 监控Agent

  3. 日志Agent

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: nginx-daemonset
  labels:
    app: daemonset-pod
spec:
  selector:
    matchLabels:
      app: daemonset-pod
  template:
    metadata:
      name: daemonset-pod
      labels:
        app: daemonset-pod
    spec:
      containers:
        - name: daemonset-container
          image: nginx:1.21.1

# kubectl get DaemonSet 查询所有的DaemonSet
# 或者使用 kubectl get ds 

查看调度失败原因,kubectl describe pod <NAME>

  1. 节点CPU/内存不足

  2. 有污点,没容忍

  3. 没有匹配到节点标签

Service

Service的引入主要解决Pod的动态变化(IP每次部署都不同),并提供统一的访问入口:

  1. **服务发现:**防止Pod失联,找到提供同一个服务的Pod

  2. 负载均衡:定义一组Pod的访问策略,并可以避免将流量发送到不可达的Pod上

是集群内服务的代理节点。

Pod和Service的关系

  1. Service 通过标签关联一组Pod

  2. Service通过iptables或者ipvs为一组Pod提供负载均衡的能力

定义与创建Service

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-nginx
spec:
  selector:
    matchLabels:
      app: my-nginx
  template:
    metadata:
      labels:
        app: my-nginx
    spec:
      containers:
      - name: my-nginx
        image: nginx
        ports:
        - containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
  name: my-nginx
  labels:
    app: my-nginx
spec:
  ports:
  - port: 80
    protocol: TCP
    name: http
  selector:
    app: my-nginx
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: my-nginx
  annotations:
    kubernetes.io/ingress.class: "nginx"
spec:
  rules:
  - host: nginx.test.com  # 将域名映射到 my-nginx 服务
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service: 
            name: my-nginx  # 将所有请求发送到 my-nginx 服务的 80 端口
            port: 
              number: 80 

创建svc:

kubectl apply -f svc.yaml

查看已经创建的svc:

kubectl get svc
NAME         TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
my-nginx     ClusterIP   10.97.1.73   <none>        80/TCP    61s

type字段

常见的Service类型有三种:

  1. ClusterIP,默认值,分配一个IP地址,即VIP,只能在集群内部访问

    spec:
      ports:
        # service以80端口暴露服务
        - port: 80
          name: web
          targetPort: 80 # 将pod的80端口服务提供给service
      selector: # 关联pod
        app: nginx
    NAME         TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
    nginx        ClusterIP   10.97.1.73   <none>        80/TCP    61s  
    类型是ClusterIP, 集群IP是 10.97.1.73
  2. NodePort,在每个节点上启用一个端口来暴露服务,可以让其通过任意node+端口来进行外部访问;同时也像ClusterIP一样分配一个集群内部IP供集群内部访问

    spec:
      type: NodePort
      ports:
        # service以80端口暴露服务
        - port: 80
          name: web
          targetPort: 80 # 将pod的80端口服务提供给service
          nodePort: 30001 # 端口范围在 30000 - 32767 之间,如果不写,默认会随机分配一个端口
      selector: # 关联pod
        app: nginx
    kubectl get svc
    NAME         TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)        AGE
    nginx        NodePort    10.97.1.73   <none>        80:30001/TCP   21m
    image-20220119151136177
  3. LoadBalance,与NodePort完全一致。除此以外,k8s会请求底层云平台(比如阿里云、腾讯云、AWS等)上的负载均衡,将每个Node([NodeIp]:[NodePort]) 作为后端添加进去

    image-20220119152054063

service负载均衡实现机制

image-20220119153337811

Service 底层实现主要有iptables和 ipvs两种网络模式,决定你如何转发流量。

service DNS名称解析

CoreDNS是一个DNS服务器,k8s默认采用pod的方式部署在集群中。CoreDNS服务监视KubernetesAPI,为每一个Service创建DNS A记录用于域名解析。其格式为 <service-name>.<namespace-name>.svc.cluster.local

CoreDNS Yaml文件可以参考: https://github.com/kubernetes/kubernetes/tree/master/cluster/addons/coredns

验证:当创建service时会自动添加一个DNS记录:

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: web
  name: web
spec:
  replicas: 1
  selector:
    matchLabels:
      app: web
  strategy: {}
  template:
    metadata:
      labels:
        app: web
    spec:
      containers:
      - image: nginx
        name: nginx
        resources: {}
      - image: busybox:1.28.3
        name: busybox
        resources: {}
        command:
        - "sh"
        - "-c"
        - "sleep 12h"

---

apiVersion: v1
kind: Service
metadata:
  name: web
spec:
  selector:
    app: web
  ports:
    - protocol: TCP
      port: 80
      targetPort: 80

创建Deployment以及service,进入容器,测试DNS:

$ kubectl get deployment,pod,svc -owide

NAME                  READY   UP-TO-DATE   AVAILABLE   AGE   CONTAINERS      IMAGES                 SELECTOR
deployment.apps/web   1/1     1            1           20m   nginx,busybox   nginx,busybox:1.28.3   app=web

NAME                       READY   STATUS    RESTARTS   AGE    IP          NODE             NOMINATED NODE   READINESS GATES
pod/web-7dfb85867c-nf5p5   2/2     Running   0          2m5s   10.1.0.26   docker-desktop   <none>           <none>

NAME                 TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)   AGE    SELECTOR
#---- 是Kubernetes默认的service,用于让k8s中的pod访问到k8s集群
service/kubernetes   ClusterIP   10.96.0.1      <none>        443/TCP   7d4h   <none>  
service/web          ClusterIP   10.102.2.103   <none>        80/TCP    20m    app=web

kubectl exec -it pod/web-7dfb85867c-nf5p5 -c busybox -- sh
/ # nslookup web.default.svc.cluster.local
Server:    10.96.0.10  # DNS服务地址,也就是 service/kubernetes 的ip地址
Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local

Name:      web.default.svc.cluster.local 
Address 1: 10.102.2.103 web.default.svc.cluster.local # 域名解析记录,正好是service的IP

Ingress

既然后又NodePort,为什么还需要Ingress?

  1. NodePort 是基于iptables/ipvs实现的负载均衡器,他是四层转发

  2. 四层转发是指在传输层基于IP和Port的转发方式,这种方式的转发不能满足类似域名分流、重定向之类的需求

  3. 所以引入的Ingress七层转发(应用层),他可以针对HTTP等应用层协议的内容转发,可以满足的场景更多

image-20220121144403957
  1. Ingress:k8s中的一个抽象资源,用于给管理员提供一个暴露应用的入口定义方法

  2. Ingress Controller:负责流量路由,根据Ingress生成具体的路由规则,并对Pod进行负载均衡

  3. 外部用户通过Ingress Controller访问服务,由Ingress规则决定访问哪个Service

  4. IngressController内包含一个Service,也可以通过NodePort暴露端口,让用户访问

  5. 然后将流量直接转发到对应的Pod上(注意:只通过Service找到对应的Pod,实际发送并不经过Service,这样更高效)

  6. IngressController是社区提供的一种接口,其下面有很多具体的实现,比如 Nginx、Kong等

  7. 最主流的实现为kubernetes/ingress-nginx

部署ingress-nginx

  1. 从github中下载yaml配置

    curl -o deploy.yaml https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v1.1.1/deploy/static/provider/cloud/deploy.yaml
  2. ingress-nginx相关镜像位于google镜像仓库中,国内网络无法访问;可以从docker hub上寻找相关镜像,修改yaml中的相关镜像地址

  3. 修改用于暴露Ingress-Nginx-Controller的Service的端口暴露方式(ingress controller是pod,负责动态生成nginx配置):

    vi deploy.yaml
    apiVersion: v1
    kind: Service
    metadata:
      annotations:
      labels:
        helm.sh/chart: ingress-nginx-4.0.10
        app.kubernetes.io/name: ingress-nginx
        app.kubernetes.io/instance: ingress-nginx
        app.kubernetes.io/version: 1.1.0
        app.kubernetes.io/managed-by: Helm
        app.kubernetes.io/component: controller
      name: ingress-nginx-controller
      namespace: ingress-nginx
    spec:
      type: NodePort
      ports:
        - name: http
          port: 80
          nodePort: 30080
          protocol: TCP
          targetPort: http
          appProtocol: http
        - name: https
          port: 443
          protocol: TCP
          nodePort: 30443
          targetPort: https
          appProtocol: https
      selector:
        app.kubernetes.io/name: ingress-nginx
        app.kubernetes.io/instance: ingress-nginx
        app.kubernetes.io/component: controller
  4. 执行部署

    kubectl apply -f deploy.yaml

kubernetes里命名空间删不掉的问题

如果某个命名空间(此例里是ingress-nginx)迟迟删除不掉,状态一直是Terminating,然后在此命名空间里重新创建资源时报如下错误:

Error from server (Forbidden): error when creating "nginx-controller.yaml": roles.rbac.authorization.k8s.io "ingress-nginx-admission" is forbidden: unable to create new content in namespace ingress-nginx because it is being terminated

解决方案:

  1. 在第一个终端里执行: kubectl proxy

  2. 在第二个终端里:kubectl get namespace ingress-nginx -o json > xx.json

  3. 更改json文件:

    file

    改为

    file
  4. 最后执行:

    curl -k -H "Content-Type: application/json" -X PUT --data-binary @xx.json \
     http://127.0.0.1:8001/api/v1/namespaces/ingress-nginx/finalize

创建ingress规则(HTTP)

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-nginx
spec:
  selector:
    matchLabels:
      app: my-nginx
  template:
    metadata:
      labels:
        app: my-nginx
    spec:
      containers:
      - name: my-nginx
        image: nginx
        ports:
        - containerPort: 80 # 注意要指定端口否则ingress无法正常通过pod提供服务
---
apiVersion: v1
kind: Service
metadata:
  name: my-nginx
  labels:
    app: my-nginx
spec:
  ports:
  - port: 80
    protocol: TCP
    name: http
  selector:
    app: my-nginx
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: my-nginx
  annotations:
    kubernetes.io/ingress.class: "nginx"
spec:
  rules:
  - host: nginx.test.com  # 将域名映射到 my-nginx 服务
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service: 
            name: my-nginx  # 将所有请求发送到 my-nginx 服务的 80 端口
            port: 
              number: 80 

查看ingress规则:

$ kubectl get ingress
NAME               CLASS   HOSTS                            ADDRESS   PORTS   AGE
ingress-host-bar   nginx   hello.chenby.cn,demo.chenby.cn             80      6s

配置hosts,访问这些地址即可。

创建ingress规则(HTTPS)

  1. 准备域名证书文件(使用阿里云免费证书,或者使用openssl/cfssl创建自签证书)

  2. 将证书文件保存到k8s Secret中

    kubectl create secret tls ingress-yangsx95-com --cert=7182207_ingress.yangsx95.com.pem --key=7182207_ingress.yangsx95.com.key
    kubectl get secret
    NAME                   TYPE                                  DATA   AGE
    default-token-2wnpx    kubernetes.io/service-account-token   3      75m
    ingress-yangsx95-com   kubernetes.io/tls                     2      28s
  3. 使用Ingress规则配置tls

    apiVersion: networking.k8s.io/v1
    kind: Ingress
    metadata:
      name: ingress-yangsx95-com
    spec:
      ingressClassName: nginx # 指定ingress的实现是ingress-nginx
      tls:
      - hosts: 
        - ingress.yangsx95.com
        secretName: ingress-yangsx95-com
      rules:
      - host: "ingress.yangsx95.com"
        http:
          paths:
          - pathType: Prefix
            path: "/"
            backend:
              service:
                name: my-nginx
                port:
                  number: 80
  4. 配置hosts文件,访问: https://ingress.yangsx95.com:30443/

工作原理

IngressController通过与k8s API交互,动态感知集群中Ingress规则变化,然后读取他,按照自定义的规则,规则就是写明了哪个域名对应哪个service,生成一段nginx配置,应用到管理的nginx服务,然后加载生效。以此来达到负载均衡配置即热更新的效果。

工作流程:

域名+端口 -> Ingress Controller -> Pod

StatefulSet(部署有状态应用)

有状态和无状态

Deployment控制器的设计原则:管理的所有Pod一模一样,提供同一个服务,也不考虑在哪台Node运行,可随意扩容缩容。这种应用成为无状态应用。比如web应用程序就是无状态应用。

在实际场景中,并不能满足所有的应用,尤其是分布式应用程序,一般会部署多个实例,不同于无状态应用比如web服务,这些实例之间往往有依赖关系,例如:主从关系、主备关系,这种应用成为有状态应用,比如Mysql集群,ETCD集群。

StatefulSet就是为了解决部署有状态应用而出现的控制器:

  1. 给Pod分配一个唯一稳定的网络标志符(主机名、唯一域名):使用Headless Service来维护网络的身份

    # 这是一个Headless Service
    apiVersion: v1
    kind: Service
    metadata:
      name: nginx
      labels:
        app: nginx
    spec:
      ports:
        - port: 80
          name: web
      clusterIP: None # 指定Service不分配ClusterIP,因为无状态应用是通过Service的统一IP来暴露服务的,使用Service的ClusterIP无法区分集群内的多个Pod的不同的角色,故这里指定Service的ClusterIP为None
      selector:
        app: nginx
        
    --- 
    
    # 这是一个StatefulSet的一部分
    apiVersion: apps/v1
    kind: StatefulSet
    metadata:
      name: web
    spec:
      serviceName: "nginx"  # 使用serviceName关联无头服务,主要是根据无头服务找到Service关联的那一组Pod
      ....
  2. 稳定唯一的持久的存储(唯一的PV和PVC):StatefulSet的存储卷使用VolumeClaimTemplate(卷申请模板),当StatefulSet使用VolumeClaimTemplate创建一个PersisentVolume时,同样也会为每个Pod分配并创建一个对应的PVC

    # 创建StatefulSet
    apiVersion: apps/v1
    kind: StatefulSet
    metadata:
      name: web
    spec:
      serviceName: "nginx" # 指定HeadlessService
      replicas: 2 # 副本数 2
      selector: # 与Pod进行绑定
        matchLabels:
          app: nginx
      template: 
        metadata:
          labels:
            app: nginx
        spec:
          containers:
            - name: nginx
              image: nginx:1.21.1
              ports:
                - containerPort: 80
                  name: web
              volumeMounts: 
                - name: www
                  mountPath: /usr/share/nginx/html
      volumeClaimTemplates: # 指定卷申请模板
        - metadata:
            name: www
          spec:
            accessModes: [ "ReadWriteOnce" ] # 访问模式,只可以被一个容器访问
            resources:
              requests:
                storage: 1Gi 

StatefulSet三要素:

  1. 域名

  2. 主机名

  3. 存储(PVC)

部署StatefulSet

# 创建HeadlessService用于发布StatefulSet中Pod的IP和Port
apiVersion: v1
kind: Service
metadata:
  name: nginx
  labels:
    app: nginx
spec:
  ports:
    - port: 80
      name: web
  clusterIP: None # 标志此Service为HeadlessService,
  selector:
    app: nginx

---

# 创建StatefulSet
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: web
spec:
  serviceName: "nginx" # 指定HeadlessService
  replicas: 2 # 副本数 2
  selector: # 与Pod进行绑定
    matchLabels:
      app: nginx
  template: # 定义Pod模板
    metadata:
      labels:
        app: nginx
    spec:
      containers: # 定义容器
        - name: nginx
          image: nginx:1.21.1
          ports:
            - containerPort: 80
              name: web
          volumeMounts: # 挂载卷,name指定卷名称,mountPath指定要挂载的容器路径
            - name: www
              mountPath: /usr/share/nginx/html
  volumeClaimTemplates: # 定义申领卷,动态方式
    - metadata:
        name: www
      spec:
        accessModes: [ "ReadWriteOnce" ] # 访问模式,只可以被一个容器访问
        resources:
          requests:
            storage: 1Gi # 申领1g空间

# kubectl get pods -w -l app=nginx  查看StatefulSet的Pod的创建情况
# 参数-w表示watch实时监控 -l表示labels表示根据标签过滤资源

# 顺序创建:StatefulSet拥有多个副本时,会按照顺序创建,web-0处于Running或者Ready状态,web-1才会启动
# 稳定网络标志:使用kubectl exec 循环获取hostname for i in 0 1; do kubectl exec "web-$i" -- sh -c 'hostname'; done
# 稳定的存储:获取web0以及web1的pvc  kubectl get pvc -l app=nginx
# 扩容副本为5:kubectl scale sts web --replicas=5
# 缩容副本为3:kubectl patch sts web -p '{"spec":{"replicas":3}}'

调度

配置Pod的资源限制

apiVersion: v1
kind: Pod
metadata:
  labels:
    run: testpod
  name: testpod
spec:
  containers:
  - image: nginx
    name: testpod
    resources:
      # 容器最大资源限制 limit
      limits:
        cpu: 200m
        memory: 100Mi
      # 容器使用的最小资源请求 request
      request:
        cpu: 200m
        memory: 100Mi
  dnsPolicy: ClusterFirst
  restartPolicy: Always
  1. cpu的单位为毫核(m)或者为浮点数字,比如500m = 0.51000m = 1

  2. 内存的单位为Mb,GB等,例如500Mi1Gi

  3. request代表应用程序启动时需要的资源数量,调度器会寻找符合request要求的节点,如果没有Pod就会一直处于pending状态

  4. limit代表应用程序运行时最多占用的资源数量,这个值对调度机制并不起决定性的作用,如果request的值满足,那么就会部署容器

  5. limit可以防止应用程序假死或者超负荷运行导致主机崩溃的情况,可以更合理控制资源

  6. request的值设置的过大会造成资源浪费,被request分配的资源,不管应用程序有没有使用,其他容器都无法再分配使用他们了

如何配置这几个值得大小:

  1. request的值根据应用程序启动并正常提供服务时,大约占用的资源量决定

  2. limit的值不建议超过宿主机实际物理配置的20%,剩余空间用来保证物理机的正常运行

  3. limit的值可以根据request配置:不能大于request、request要小于limit 20%~30%

  4. limit的值也可以根据应用根据实际压测估算

查看pod的资源限制:

 kubectl describe pod testpod
Name:         testpod
Namespace:    default
Priority:     0
Node:         docker-desktop/192.168.65.4
Start Time:   Tue, 18 Jan 2022 15:11:23 +0800
Labels:       run=testpod
Annotations:  <none>
Status:       Running
IP:           10.1.0.10
IPs:
  IP:  10.1.0.10
Containers:
  testpod:
    Container ID:   docker://421614a5c6a4d1de9c472dfccdee9480f28de61e1b2e343162df92dc3097cb87
    Image:          nginx
    Image ID:       docker-pullable://nginx@sha256:0d17b565c37bcbd895e9d92315a05c1c3c9a29f762b011a10c54a66cd53c9b31
    Port:           <none>
    Host Port:      <none>
    State:          Running
      Started:      Tue, 18 Jan 2022 15:12:04 +0800
    Ready:          True
    Restart Count:  0
    ######################## 容器限制
    Limits:
      cpu:     200m
      memory:  100Mi
    Requests:
      cpu:        200m
      memory:     100Mi
    Environment:  <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-t2fl5 (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             True
  ContainersReady   True
  PodScheduled      True
Volumes:
  kube-api-access-t2fl5:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Guaranteed
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type    Reason     Age    From               Message
  ----    ------     ----   ----               -------
  Normal  Scheduled  2m43s  default-scheduler  Successfully assigned default/testpod to docker-desktop
  Normal  Pulling    2m42s  kubelet            Pulling image "nginx"
  Normal  Pulled     2m3s   kubelet            Successfully pulled image "nginx" in 38.832099142s
  Normal  Created    2m2s   kubelet            Created container testpod
  Normal  Started    2m2s   kubelet            Started container testpod

查看Node信息,看Node上运行的容器的资源限制情况与Node本身的资源情况

kubectl describe node docker-desktop
Name:               docker-desktop
Roles:              control-plane,master
Labels:             beta.kubernetes.io/arch=arm64
                    beta.kubernetes.io/os=linux
                    kubernetes.io/arch=arm64
                    kubernetes.io/hostname=docker-desktop
                    kubernetes.io/os=linux
                    node-role.kubernetes.io/control-plane=
                    node-role.kubernetes.io/master=
                    node.kubernetes.io/exclude-from-external-load-balancers=
Annotations:        kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock
                    node.alpha.kubernetes.io/ttl: 0
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Fri, 14 Jan 2022 09:47:41 +0800
Taints:             <none>
Unschedulable:      false
Lease:
  HolderIdentity:  docker-desktop
  AcquireTime:     <unset>
  RenewTime:       Tue, 18 Jan 2022 15:18:07 +0800
Conditions:
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------  -----------------                 ------------------                ------                       -------
  MemoryPressure   False   Tue, 18 Jan 2022 15:17:46 +0800   Fri, 14 Jan 2022 09:47:41 +0800   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure     False   Tue, 18 Jan 2022 15:17:46 +0800   Fri, 14 Jan 2022 09:47:41 +0800   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure      False   Tue, 18 Jan 2022 15:17:46 +0800   Fri, 14 Jan 2022 09:47:41 +0800   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready            True    Tue, 18 Jan 2022 15:17:46 +0800   Fri, 14 Jan 2022 09:48:12 +0800   KubeletReady                 kubelet is posting ready status
Addresses:
  InternalIP:  192.168.65.4
  Hostname:    docker-desktop
Capacity:
  cpu:                4
  ephemeral-storage:  61255492Ki
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  hugepages-32Mi:     0
  hugepages-64Ki:     0
  memory:             8126780Ki
  pods:               110
  
###### 该节点总共可分配的资源信息
Allocatable:
  cpu:                4
  ephemeral-storage:  56453061334
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  hugepages-32Mi:     0
  hugepages-64Ki:     0
  memory:             8024380Ki
  pods:               110
System Info:
  Machine ID:                 72066838-b7d0-4811-9f4d-a82203068bec
  System UUID:                72066838-b7d0-4811-9f4d-a82203068bec
  Boot ID:                    71c72b3c-3da4-41bc-ae54-8db53c078f15
  Kernel Version:             5.10.76-linuxkit
  OS Image:                   Docker Desktop
  Operating System:           linux
  Architecture:               arm64
  Container Runtime Version:  docker://20.10.11
  Kubelet Version:            v1.22.4
  Kube-Proxy Version:         v1.22.4
Non-terminated Pods:          (10 in total)
  Namespace                   Name                                      CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                   ----                                      ------------  ----------  ---------------  -------------  ---
  ###################################### cpu和内存资源限制的配置
  default                     testpod                                   200m (5%)     200m (5%)   100Mi (1%)       100Mi (1%)     6m46s
  kube-system                 coredns-78fcd69978-2tb8q                  100m (2%)     0 (0%)      70Mi (0%)        170Mi (2%)     4d5h
  kube-system                 coredns-78fcd69978-8bqxk                  100m (2%)     0 (0%)      70Mi (0%)        170Mi (2%)     4d5h
  kube-system                 etcd-docker-desktop                       100m (2%)     0 (0%)      100Mi (1%)       0 (0%)         4d5h
  kube-system                 kube-apiserver-docker-desktop             250m (6%)     0 (0%)      0 (0%)           0 (0%)         4d5h
  kube-system                 kube-controller-manager-docker-desktop    200m (5%)     0 (0%)      0 (0%)           0 (0%)         4d5h
  kube-system                 kube-proxy-vblmk                          0 (0%)        0 (0%)      0 (0%)           0 (0%)         4d5h
  kube-system                 kube-scheduler-docker-desktop             100m (2%)     0 (0%)      0 (0%)           0 (0%)         4d5h
  kube-system                 storage-provisioner                       0 (0%)        0 (0%)      0 (0%)           0 (0%)         4d5h
  kube-system                 vpnkit-controller                         0 (0%)        0 (0%)      0 (0%)           0 (0%)         4d5h
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests     Limits
  --------           --------     ------
  cpu                1050m (26%)  200m (5%)
  memory             340Mi (4%)   440Mi (5%)
  ephemeral-storage  0 (0%)       0 (0%)
  hugepages-1Gi      0 (0%)       0 (0%)
  hugepages-2Mi      0 (0%)       0 (0%)
  hugepages-32Mi     0 (0%)       0 (0%)
  hugepages-64Ki     0 (0%)       0 (0%)
Events:
  Type    Reason                   Age                From        Message
  ----    ------                   ----               ----        -------
  Normal  Starting                 15m                kube-proxy
  Normal  Starting                 16m                kubelet     Starting kubelet.
  Normal  NodeHasSufficientMemory  16m (x8 over 16m)  kubelet     Node docker-desktop status is now: NodeHasSufficientMemory
  Normal  NodeHasNoDiskPressure    16m (x8 over 16m)  kubelet     Node docker-desktop status is now: NodeHasNoDiskPressure
  Normal  NodeHasSufficientPID     16m (x7 over 16m)  kubelet     Node docker-desktop status is now: NodeHasSufficientPID
  Normal  NodeAllocatableEnforced  16m                kubelet     Updated Node Allocatable limit across pods

将Pod分配给指定节点

nodeName

指定节点名称,用于将Pod调度到指定的Node上,不经过调度器。所有污点、节点亲和都将会失效。

apiVersion: v1
kind: Pod
metadata:
  name: nginx
spec:
  containers:
  - name: nginx
    image: nginx
  nodeName: kube-01

nodeSelector

用于将Pod调度到匹配Label的Node上,如果没有匹配的标签,调度会失败

作用:

  1. 约束Pod到特定的节点上运行

  2. 完全匹配节点标签

应用场景:

  1. 专用节点:根据业务线将Node分组管理

  2. 配备特殊硬件:部分Node配有SSD硬盘、GPU

示例,去报pod被分配具有ssd硬盘的节点上:

  1. 给含有ssd的node,设置一个标签:

    kubectl label node k8s-node1 disktype=ssd
  2. 查看node的标签信息

    kubectl get node k8s-node1 --show-labels
    NAME        STATUS   ROLES    AGE    VERSION   LABELS
    k8s-node1   Ready    <none>   5d3h   v1.21.0   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,disktype=ssd,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-node1,kubernetes.io/os=linux
  3. 创建含有nodeSelect的Pod

    apiVersion: v1
    kind: Pod
    metadata:
      creationTimestamp: null
      labels:
        run: pod1
      name: pod1
    spec:
      containers:
      - image: nginx
        name: pod1
        resources: {}
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      nodeSelector:
        disktype: "ssd"
  4. 验证,确实在node1上

    kubectl get pods -o wide
    NAME   READY   STATUS              RESTARTS   AGE   IP       NODE        NOMINATED NODE   READINESS GATES
    pod1   0/1     ContainerCreating   0          7s    <none>   k8s-node1   <none>           <none>
  5. 如果不需要标签栏,可以移除标签:

    kubectl label node k8s-node1 disktype-
    node/k8s-node1 labeled
    
    kubectl get node k8s-node1 --show-labels
    NAME        STATUS   ROLES    AGE    VERSION   LABELS
    k8s-node1   Ready    <none>   5d3h   v1.21.0   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-node1,kubernetes.io/os=linux

nodeAffinity

节点亲和类似于nodeSelector,可以根据节点上的标签来约束Pod可以调度在哪些节点上。相比于nodeSelector:

  • 匹配有更多的逻辑组合,不只是字符串的完全相等,支持的操作有: In、NotIn、Exist、DoesNotExist、Gt、Lt

  • 调度分为软策略与应策略:

    • 硬(required):必须满足,如果不满足则调度失败

    • 软(preferred):尽量满足,如果不满足也继续调度,满足则调度到目标

参考官方文档:将 Pod 分配给节点 | Kubernetes

示例:

apiVersion: v1
kind: Pod
metadata:
  name: with-node-affinity
spec:
  affinity:
    # 节点亲和
    nodeAffinity:
      # 硬亲和
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          # node必须包含key kubernetes.io/e2e-az-name,且值在  e2e-az1,e2e-az2数组中
          - key: kubernetes.io/e2e-az-name
            operator: In
            values:
            - e2e-az1
            - e2e-az2
      # 软亲和
      preferredDuringSchedulingIgnoredDuringExecution:
      # 权重为1
      - weight: 1
        preference:
          # node最好包含key another-node-label-key,且值为 another-node-label-value
          matchExpressions:
          - key: another-node-label-key
            operator: In
            values:
            - another-node-label-value
  containers:
  - name: with-node-affinity
    image: k8s.gcr.io/pause:2.0

其中,权重值weight的范围为 1~100,权重值越大,这条亲和规则优先级就越高,调度器就会优先选择

污点和污点容忍

Taints:污点,避免Pod调度到特定的Node

Tolerations:污点容忍,允许Pod调度到持有Taints的Node上

应用场景:

  1. 保证master节点安全,在master节点含有污点,防止pod在master节点运行

  2. 专用节点:根据业务将Node分组管理,希望在默认情况下不调度该节点,只有配置了污点容忍才允许分配

  3. 配备特殊硬件:部分Node配有SSD硬盘、CPU,希望在默认情况下不调度该节点,只有配置了污点容忍才允许分配

  4. 基于Taint的驱逐

查看master节点的污点:

kubectl describe node k8s-master
Name:               k8s-master
Roles:              control-plane,master
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=linux
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=k8s-master
                    kubernetes.io/os=linux
                    node-role.kubernetes.io/control-plane=
                    node-role.kubernetes.io/master=
                    node.kubernetes.io/exclude-from-external-load-balancers=
Annotations:        kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock
                    node.alpha.kubernetes.io/ttl: 0
                    projectcalico.org/IPv4Address: 192.168.41.10/24
                    projectcalico.org/IPv4IPIPTunnelAddr: 10.244.235.192
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Thu, 13 Jan 2022 13:56:51 +0800

#################### 不允许调度的污点
Taints:             node-role.kubernetes.io/master:NoSchedule
Unschedulable:      false
Lease:
  HolderIdentity:  k8s-master
  AcquireTime:     <unset>
  RenewTime:       Tue, 18 Jan 2022 18:55:44 +0800
Conditions:
  Type                 Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----                 ------  -----------------                 ------------------                ------                       -------
  NetworkUnavailable   False   Tue, 18 Jan 2022 17:26:05 +0800   Tue, 18 Jan 2022 17:26:05 +0800   CalicoIsUp                   Calico is running on this node
  MemoryPressure       False   Tue, 18 Jan 2022 18:51:21 +0800   Thu, 13 Jan 2022 13:56:47 +0800   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure         False   Tue, 18 Jan 2022 18:51:21 +0800   Thu, 13 Jan 2022 13:56:47 +0800   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure          False   Tue, 18 Jan 2022 18:51:21 +0800   Thu, 13 Jan 2022 13:56:47 +0800   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready                True    Tue, 18 Jan 2022 18:51:21 +0800   Thu, 13 Jan 2022 19:27:20 +0800   KubeletReady                 kubelet is posting ready status
Addresses:
  InternalIP:  192.168.41.10
  Hostname:    k8s-master
Capacity:
  cpu:                2
  ephemeral-storage:  17394Mi
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             1863252Ki
  pods:               110
Allocatable:
  cpu:                2
  ephemeral-storage:  16415037823
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             1760852Ki
  pods:               110
System Info:
  Machine ID:                 752d054974304aa8a04e23779cc60c55
  System UUID:                8CF74D56-3C99-7C12-13A9-B2530762D312
  Boot ID:                    47c1f93c-a422-4c0e-ae7b-959a42d92cbb
  Kernel Version:             3.10.0-957.el7.x86_64
  OS Image:                   CentOS Linux 7 (Core)
  Operating System:           linux
  Architecture:               amd64
  Container Runtime Version:  docker://20.10.12
  Kubelet Version:            v1.21.0
  Kube-Proxy Version:         v1.21.0
PodCIDR:                      10.244.0.0/24
PodCIDRs:                     10.244.0.0/24
Non-terminated Pods:          (8 in total)
  Namespace                   Name                                          CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                   ----                                          ------------  ----------  ---------------  -------------  ---
  kube-system                 calico-kube-controllers-6b9fbfff44-pstmt      0 (0%)        0 (0%)      0 (0%)           0 (0%)         5d6h
  kube-system                 calico-node-slrpz                             250m (12%)    0 (0%)      0 (0%)           0 (0%)         5d6h
  kube-system                 etcd-k8s-master                               100m (5%)     0 (0%)      100Mi (5%)       0 (0%)         5d6h
  kube-system                 kube-apiserver-k8s-master                     250m (12%)    0 (0%)      0 (0%)           0 (0%)         5d6h
  kube-system                 kube-controller-manager-k8s-master            200m (10%)    0 (0%)      0 (0%)           0 (0%)         5d
  kube-system                 kube-proxy-65mlq                              0 (0%)        0 (0%)      0 (0%)           0 (0%)         5d6h
  kube-system                 kube-scheduler-k8s-master                     100m (5%)     0 (0%)      0 (0%)           0 (0%)         5d
  kubernetes-dashboard        dashboard-metrics-scraper-5594697f48-zrzc5    0 (0%)        0 (0%)      0 (0%)           0 (0%)         5d4h
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests    Limits
  --------           --------    ------
  cpu                900m (45%)  0 (0%)
  memory             100Mi (5%)  0 (0%)
  ephemeral-storage  100Mi (0%)  0 (0%)
  hugepages-1Gi      0 (0%)      0 (0%)
  hugepages-2Mi      0 (0%)      0 (0%)
Events:              <none>

使用污点和污点容忍

  1. 给节点添加污点:

    kubectl taint node [node] key=value:<effect>
    kubectl taint node k8s-node1 gpu=yes:NoSchedule
    
    effect的取值:
    - NoSchedule: 一定不能被调度
    - PreferNoSchedule: 不配置污点容忍也有可能被调度,只是尽量保证不调度
    - NoExecute: 不仅不会调度,还会驱逐Node上已有的Pod
  2. 验证是否正常添加:

    kubectl describe node k8s-node1 | grep Taint
    Taints:             gpu=yes:NoSchedule
  3. 配置污点容忍(pod可以容忍有gpu的节点):

    apiVersion: v1
    kind: Pod
    metadata:
      creationTimestamp: null
      labels:
        run: pod1
      name: pod1
    spec:
      containers:
      - image: nginx
        name: pod1
        resources: {}
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      # 容忍gpu=yes:NoSchedule的污点
      tolerations:
      - key: "gpu"
        value: "yes"
        effect: NoSchedule
        operator: Equal
  4. 删除污点(在后面增加一个减号):

    kubectl taint node k8s-node1 gpu:NoSchedule-

存储

容器中的文件是在磁盘中临时存放的,这给容器中运行比较重要的应用程序带来如下问题:

  1. 当容器升级或者崩溃,kubelet会重建容器,容器内的文件会丢失

  2. 一个pod中运行多个容器需要共享文件

所以Kubernetes需要数据卷(Volume),常用的数据卷有:

  1. 节点本地卷(hostPath,emptyDir)

  2. 网络卷(NFS,Ceph,GlusterFS)

  3. 公有云(AWS,EBS)

  4. K8s资源(configMap,secret)

所有支持的卷类型,可以参考:卷 | Kubernetes

emptyDir 临时数据卷

是一个临时的存储卷,与Pod的生命周期绑定在一起,如果Pod删除了卷也会被删除。主要用于Pod中的多个容器之间数据共享。

apiVersion: v1
kind: Pod
metadata:
  name: test-pd
spec:
  containers:
  - image: k8s.gcr.io/test-webserver
    name: test-container
    volumeMounts:
    - mountPath: /cache
      name: cache-volume  # 挂载卷
  # 定义卷
  volumes:
  - name: cache-volume
    emptyDir: {}

empty实际上是位于宿主机上的一个文件夹,容器都是共享的这个宿主机文件夹。他的位置在:/var/lib/kubelet/pod/podid/volumes/kubernetes.io~empty-dir/data中。

hostPath 节点数据卷

挂载node的文件系统,也就是pod所在的节点上的文件或者目录到pod中的容器。主要应用在Pod中的容器需要访问宿主机的文件的情况,比如DaemonSet。

apiVersion: v1
kind: Pod
metadata:
  name: test-pd
spec:
  containers:
  - image: k8s.gcr.io/test-webserver
    name: test-container
    volumeMounts:
    - mountPath: /test-pd
      name: test-volume
  volumes:
  - name: test-volume
    hostPath:
      # 宿主上目录位置
      path: /data
      # 此字段为可选
      type: Directory # 如果是文件则是File

注意:当因为某些情况pod被调度到其他节点上时,节点数据卷是不会被迁移过去的。

不安全,不建议使用,建议使用共享存储

NFS 网络数据卷

使用nfs网络数据卷共享存储:

image-20220202001448943

NFS服务端一半是集群外的一台主机,而NFS客户端一般是需要使用共享存储的节点。

部署NFS

centos下准备NFS环境:

# 安装NFS(每个需要共享数据的节点都要安装)
yum install nfs-utils

# 选择需要作为nfs服务端的服务器,编辑nfs exports配置文件
vi /etc/exports
/ifs/kubernetes *(rw,no_root_squash)
# 共享目录为 /ifs/kubernetes 
# * 代表可以连接的nfs客户端的网段,这里是任意网段
# rw 代表可读写
# no_root_squash: 登入 NFS 主机使用分享目录的使用者,如果是 root 的话,那么对于这个分享的目录来说,他就具有 root 的权限!
# root_squash:在登入 NFS 主机使用分享之目录的使用者如果是 root 时,那么这个使用者的权限将被压缩成为匿名使用者,通常他的 UID 与 GID 都会变成 nobody 那个身份

# 创建共享目录
mkdir -p /ifs/kubernetes

# 启动nfs server
systemctl start nfs
systemctl enable nfs

ubuntu下准备NFS环境:

# 安装NFS(每个需要共享数据的节点都要安装)
sudo apt update
sudo apt install nfs-kernel-server
# 查看nfs版本启用状态,-2代表2版本禁用
sudo cat /proc/fs/nfsd/versions
# -2 +3 +4 +4.1 +4.2

# 选择需要作为nfs服务端的服务器,编辑nfs exports配置文件
vi /etc/exports
/ifs/kubernetes *(rw,no_root_squash)

# 创建共享目录
mkdir -p /ifs/kubernetes

# 启动nfs server
sudo /etc/init.d/nfs-kernel-server restart

测试NFS

在任意一个节点上执行:

# 将本地/mnt/目录挂载到nfs服务121.10下的远程/ifs/kubernetes目录下
mount -t nfs 192.168.121.10:/ifs/kubernetes /mnt/
# 创建文件查看是否同步

使用NFS网络数据卷

apiVersion: v1
kind: Pod
metadata:
  name: test-busybox
spec:
  containers:
  - image: busybox
    name: busybox
    volumeMounts:
    - mountPath: /root
      name: bsroot
      command: ["/bin/sh", "-c", "sleep 12h"]
  volumes:
  - name: bsroot
    nfs:
      # nfs的远程地址
      server: 192.168.121.10
      # 共享的nfs的路径
      path: /ifs/kubernetes

进入容器查看nfs情况:

kubectl exec -it test-busybox -- sh

PV和PVC

  • PersistentVolume(PV):对存储资源创建和使用的抽象,使得存储作为集群中的资源管理

  • PersistentVolumeClaim(PVC):让用户不需要关心具体的Volume实现细节

Pod申请PVC作为卷来使用,Kubernetes通过PVC查找绑定的PV,并Mount给Pod。

  1. pvc与pv是一对一的关系,一块存储只能给一个pvc使用

  2. pvc会向上匹配第一个符合要求的pv,如果满足不了,pod处于pending

  3. 存储容量并不能做到有效的限制,他只是一个标志

使用pv和pvc:静态供给

  1. 定义需求卷:

    apiVersion: v1
    kind: PersistentVolume
    metadata:
     name: my-pv
    spec:
      capacity:
        storage: 5Gi
      accessModes:
        - ReadWriteOnce # 读写权限,但同时只能被一个节点写入
      # 数据提供来源 nfs
      nfs:
        path: "/ifs/kubernetes"
        server: 192.168.31.63
  2. 定义卷需求,Pod 使用 PersistentVolumeClaim 来请求物理存储

    apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      name: my-pvc
    spec:
      accessModes:
        - ReadWriteOnce
      resources: # 需求资源大小:3gb
        requests:
          storage: 3G
  3. 容器应用使用卷需求

    apiVersion: v1
    kind: Pod
    metadata:
      name: my-pod
    spec:
      containers:
        - name: nginx
          image: nginx
          ports:
            - containerPort: 80
          volumeMounts: # 挂载www卷
            - mountPath: "/usr/share/nginx/html"
              name: www
      volumes: # 定义www卷,卷使用my-pvc卷需求对象完成
        - name: www
          persistentVolumeClaim:
            claimName: my-pod # 使用指定的存储要求

pv的访问模式

AccessMode是用来对PV进行访问模式的设置, 用于描述用户应用对存储资源的访问权限,包含以下几种:

  1. ReadWriteOnce:拥有读写权限,但是只能被单个节点挂载

  2. ReadOnlyMany:只读权限,可以被多个节点挂载

  3. ReadWriteMany:读写权限,可以被多个节点挂载

pv的回收策略

  1. Retain:当将pvc删除时,pv进入Released状态,这个状态下保留数据,需要管理员手动清理数据,默认策略,推荐使用

  2. Recycle:清除pv中的数据,效果相同于执行命令rm -rf /共享目录/*

  3. Delete:与pv相连的后端存储也一并删除

pv的状态

一个pv的生命周期中,可能会处于四种不同的状态:

  1. Avaliable:可用状态,还未被任何PVC绑定

  2. Bound:表示PVC已经被PVC绑定

  3. Released:已释放,表示PVC被删除,但是资源还未被集群重新声明

  4. Faild:失败状态,表示PV的自动回收失败

Storage Class

StorageClass是存储类,对一类存储资源的分类,不同的StorageClass可能代表值不同的存储服务的质量等级或者备份策略,比如固态硬盘与机械硬盘,定时备份与不做备份。

官方文档:https://kubernetes.io/zh/docs/concepts/storage/storage-classes/

创建一个StorageClass:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: standard
# 指定存储制备器,参考官方文档支持的存储制备器
# 这里使用aws云存储
provisioner: kubernetes.io/aws-ebs
# aws云存储所需要的参数,需要参考aws的文档
parameters:
  type: gp2
# StorageClass对应的PV的回收策略:Delete(默认)、Retain
reclaimPolicy: Retain
# 允许卷扩展:允许用户通过编辑相应的 PVC 对象来调整卷大小
allowVolumeExpansion: true
mountOptions:
  - debug
# 指定卷绑定和动态制备 应该发生在什么时候
# Immediate 模式表示一旦创建了 PersistentVolumeClaim 也就完成了卷绑定和动态制备
# WaitForFirstConsumer 模式,直到使用该 PersistentVolumeClaim 的 Pod 被创建才完成了卷绑定和动态制备
volumeBindingMode: Immediate

pv动态供给

允许按需创建PV,不需要运维人员每次手动添加,大大降低了维护成本。pv的动态供给主要由StorageClass对象实现。

动态卷供应 | Kubernetes

image-20220202114401828

PVC存储请求被创建,将会由对应的StorageClass自动创建一个PV。StorageClass是存储类,对一类存储志愿的分类与抽象。

NFS

Kubernetes 不包含内部 NFS 驱动。你需要使用外部驱动为 NFS 创建 StorageClass。 这里有些例子:

以subdir为例:

  1. 下载三个主要的yaml文件:nfs-subdir-external-provisioner/deploy at master · kubernetes-sigs/nfs-subdir-external-provisioner (github.com)

    1. rabc.yaml:存储供给程序需要创建PV,需要调用K8s API,需要RABC授权

    2. deployment.yaml:存储供给程序

    3. class.yaml:StorageClass 对象,指定nfs存储供给程序

  2. 默认的pv的删除策略为delete,可以在class.yaml文件中进行更改

  3. 修改deployment.yaml,更改镜像地址,否则无法下载镜像

  4. 修改deployment.yaml,更改NFS服务端信息,IP以及PATH(注意有两个地方)

  5. 依次部署这三个yaml文件

查看已经创建的StorageClass:

kubectl get sc
NAME                  PROVISIONER                                   RECLAIMPOLICY   VOLUMEBINDINGMODE   ALLOWVOLUMEEXPANSION   AGE
managed-nfs-storage   k8s-sigs.io/nfs-subdir-external-provisioner   Delete          Immediate           false                  5m14s

使用NFS动态供给:

# 定义卷需求
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: my-pvc
spec:
  storageClassName: "managed-nfs-storage"
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 3G
---
apiVersion: v1
kind: Pod
metadata:
  name: my-pod
spec:
  containers:
    - name: nginx
      image: nginx
      ports:
        - containerPort: 80
      volumeMounts: 
        - mountPath: "/usr/share/nginx/html"
          name: www
  volumes: 
    - name: www
      persistentVolumeClaim:
        claimName: my-pvc 

创建上述PVC,并查看PV,应该有自动创建的PV:

kubectl get pvc,pv
NAME                           STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS          AGE
persistentvolumeclaim/my-pvc   Bound    pvc-36c44a7c-0518-4738-be2d-d2432194c7c1   3G         RWO            managed-nfs-storage   6h52m

ConfigMap

image-20220202223227362

ConfigMap用于应用程序的配置存储,Secret则用于存储敏感数据。ConfigMap共有两种创建方式:

  1. 键值对类型键

  2. 文件类型键

创建后,其数据将会存储在ETCD中。相应的,Pod也可以通过两种不同的方式获取ConfigMap中的数据到应用程序中:

  1. 变量注入

  2. 数据卷挂载

创建ConfigMap

# 定义ConfigMap
apiVersion: v1
kind: ConfigMap
metadata:
  name: game-demo
data:
  # 类属性键;每一个键都映射到一个简单的值
  player_initial_lives: "3"
  ui_properties_file_name: "user-interface.properties"

  # 类文件键
  game.properties: |
    enemy.types=aliens,monsters
    player.maximum-lives=5
  user-interface.properties: |
    color.good=purple
    color.bad=yellow
    allow.textmode=true
binaryData:
  token: "5aWl5Yip57uZ" #echo -n '奥利给' | base64

使用ConfigMap

apiVersion: v1
kind: Pod
metadata:
  name: configmap-demo-pod
spec:
  containers:
    - name: demo
      image: alpine
      command: ["sleep", "3600"]
      # 定义环境变量
      env:
        - name: PLAYER_INITIAL_LIVES
          valueFrom: # 指定环境部变量值来源为configMap
            configMapKeyRef:
              name: game-demo           # 指定ConfigMap的名称
              key: player_initial_lives # 指定需要从ConfigMap中取出值得键
        - name: UI_PROPERTIES_FILE_NAME
          valueFrom:
            configMapKeyRef:
              name: game-demo
              key: ui_properties_file_name
      # 挂载卷,config卷是一个ConfigMap对象
      volumeMounts:
        - name: config
          mountPath: "/config"
          readOnly: true
  # 定义配置卷
  volumes:
    - name: config
      configMap:
        name: game-demo # ConfigMap 卷命令
        items: # ConfigMap的一组键,与容器文件名的映射
          - key: "game.properties"
            path: "game.properties"
          - key: "user-interface.properties"
            path: "user-interface.properties"

Secret

与ConfigMap类似,区别在于Secret主要存储敏感数据,所有数据都要经过base64编码(不加密)。

Secret的创建命令kubectl create secret支持存储创建三种数据类型的Secret:

  1. docker-registry:存储镜像仓库认证信息

  2. generic:密码

  3. tls:存储证书

generic

创建secret:

apiVersion: v1

kind: Secret
metadata:
  name: mysecret
type: Opaque
data: # data代表私密数据,需要以base64的方式填入
  username: cm9vdA==
  password: cm9vdDEyMw==

使用secret(环境变量方式):

apiVersion: v1
# 以环境变量方式导入secret到Pod
kind: Pod
metadata:
  name: use-secret-env
spec:
  containers:
    - name: use-secret-env
      imagePullPolicy: IfNotPresent
      image: nginx
      env:
        - name: SECRET_USERNAME
          valueFrom:
            secretKeyRef:
              name: mysecret
              key: username
        - name: SECRET_PASSWORD
          valueFrom:
            secretKeyRef:
              name: mysecret
              key: password

使用secret(volume挂载):

apiVersion: v1
# 以volume方式挂载secret到Pod
kind: Pod
metadata:
  name: use-secret-volume
spec:
  containers:
    - name: use-secret-env
      imagePullPolicy: IfNotPresent
      image: nginx
      volumeMounts:
        - name: "foo"
          mountPath: "/etc/foo" # 挂载完毕后,会在此目录下看到两个名称为username和password的文件,文件内容就是具体的secret的值
          readOnly: true
  volumes:
    - name: foo
      secret:
        secretName: mysecret

安全

k8s安全框架主要由下面3个阶段进行控制,每个阶段都支持插件方式,通过API Server配置来启用插件:

  1. Authentication 鉴权:

  2. Authorization 授权

  3. Admission Control 准入控制

kubectl 发送指令到API Server依次经过这三个步骤进行安全控制,通过后才后继续进行后续的操作。

image-20220202230537467

Authentication 鉴权

k8s API Server提供三种客户端身份认证:

  1. HTTPS证书认证:基于CA证书签名的数字证书认证(kubeconfig,kubectl就是使用这种方式)

  2. HTTP Token认证:通过一个Token来识别用户(ServiceAccount,一般提供给程序使用,但也可以提供给kubectl)

  3. HTTP Basic认证:用户名 + 密码认证(1.19版本废弃)

Authorization 授权

基于RABC完成授权工作。RABC根据API请求属性,决定允许还是拒绝。

image-20220203160733497
  • 主体(subject)

    • User:用户

    • Group:用户组

    • ServiceAccount:服务账号

  • 角色

    • Role:授权特定命名空间的访问权限

    • ClusterRole:授权所有命名空间的(也就是整个集群)访问权限

  • 角色绑定

    • RoleBinding:将角色绑定到主体

    • ClusterRoleBinding:将集群角色绑定到主体

上图描述了这几个概念之间的关系。

Admission Control 准入控制

Admission Control实际上是一个准入控制器插件列表,发送到 API Server的请求都要经过这个列表中每个准入控制插件的检查,检查不通过则拒绝请求。

启用一个准入控制器:

kube-apiserver --enable-admission-plugins=NamespaceLifecycle,LimitRanger ...

关闭一个准入控制器:

kube-apiserver --disable-admission-plugins=PodNodeSelector,AlwaysDeny ...

查看默认启用:

# 在 kube-apiserver-k8s-master 这个pod中执行命令 kube-apiserver -h ,查看Admission启用的插件
kubectl exec kube-apiserver-k8s-master -n kube-system -- kube-apiserver -h | grep enable-admission-plugins

示例:配置一个新的kubectl集群客户端

大致步骤:

  1. 用k8s CA(根证书)签发客户端证书

  2. 生成kubeconfig配置文件

  3. 创建RABC权限策略

  4. 指定kubeconfig文件测试权限

证书链的意思是有一个证书机构A,A生成证书B,B也可以生成证书C,那么A是根证书。操作系统预先安装的一些根证书,都是国际上很有权威的证书机构,比如 verisign 、 ENTRUST 这些公司。

这里k8s集群的根证书位于/etc/kubernetes/pki/ca.crt,可以根据根证书下发子证书。

证书名描述

ca.cer

中间证书和根证书

nginx.cn.cer

你申请的ssl证书

fullchain.cer

包括了 ca.cer 和 nginx.cn.cer 的全链证书

nginx.cn.key

证书的私钥

创建脚本cert.sh,用于生成证书:

# 创建证书配置文件
cat > ca-config.json <<EOF
{
  "signing": {
    "default": {
      "expiry": "87600h" 
    },
    "profiles": {
      "kubernetes": {
        "usages": [
            "signing",
            "key encipherment",
            "server auth",
            "client auth"
        ],
        "expiry": "87600h"
      }
    }
  }
}
EOF

# 创建证书请求文件
cat > yangsx-csr.json <<EOF
{
  "CN": "yangsx",
  "hosts": [],
  "key": {
    "algo": "rsa",
    "size": 2048
  },
  "names": [
    {
      "C": "CN",
      "ST": "BeiJing",
      "L": "BeiJing",
      "O": "k8s",
      "OU": "System"
    }
  ]
}
EOF

# 使用cfsslt生成客户端证书
# -ca 指定根证书 -ca-ky指定根证书私钥  -config指定证书配置文件 
cfssl gencert -ca=/etc/kubernetes/pki/ca.crt -ca-key=/etc/kubernetes/pki/ca.key -config=ca-config.json -profile=kubernetes aliang-csr.json | cfssljson -bare yangsx

执行脚本将会生成:

yangsx-key.pem  证书私钥
yangsx.pem      证书

再创建脚本kubeconfig.sh,使用此脚本创建kubeconfig:

# 添加集群信息到配置文件
# 这里可以修改集群名称、apiserver地址、配置文件名称等
kubectl config set-cluster kubernetes \
  --certificate-authority=/etc/kubernetes/pki/ca.crt \
  --embed-certs=true \
  --server=https://192.168.121.10:6443 \
  --kubeconfig=yangsx.kubeconfig

# 添加用户以及客户端认证认证信息到配置文件
kubectl config set-credentials yangsx \
  --client-key=yangsx-key.pem \
  --client-certificate=yangsx.pem \
  --embed-certs=true \
  --kubeconfig=yangsx.kubeconfig

# 设置默认上下文
kubectl config set-context kubernetes \
  --cluster=kubernetes \
  --user=yangsx \
  --kubeconfig=yangsx.kubeconfig

# 设置使用配置
kubectl config use-context kubernetes --kubeconfig=yangsx.kubeconfig

执行完毕后将会生成yangsx.kubeconfig配置文件,然后将文件下发给某个用户,配置给kubectl即可使用。

在未给yangsx这个用户授权之前,做任何操作都无法通过API Server鉴权的:

kubectl get pods --kubeconfig=yangsx.kubeconfig
Error from server (Forbidden): pods is forbidden: User "yangsx" cannot list resource "pods" in API group "" in the namespace "default"

我们需要通过创建rbac资源,给指定的用户赋予权限,创建rbac.yaml

kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  namespace: default
  name: pod-reader
rules:
- apiGroups: [""]  # 资源组,可以通过 kubectl api-resources 命令查看,其第三列就代表apiGroup,空字符串代表核心组
  resources: ["pods"] # 核心组下的名称为pods的资源
  verbs: ["get", "watch", "list"] # 对pod可进行的操作

---

kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: read-pods
  namespace: default
subjects:
- kind: User # 指定角色要绑定的主体的类型为用户
  name: yangx  # 指定用户名称
  apiGroup: rbac.authorization.k8s.io
roleRef: # 指定将pod-reader这个角色绑定给用户
  kind: Role
  name: pod-reader
  apiGroup: rbac.authorization.k8s.io

使用kube-admin用户创建上述资源清单中的资源:

kubectl apply -f rbac.yaml

这样就给用户yangsx分配了权限:

root@k8s-master:~# kubectl get pods --kubeconfig=./yangsx.kubeconfig
NAME                                      READY   STATUS    RESTARTS   AGE
my-pod                                    1/1     Running   0          25h

可以查看role、rolebinding的创建情况:

root@k8s-master:~# kubectl get role,rolebinding
NAME                                                                   CREATED AT
role.rbac.authorization.k8s.io/pod-reader                              2022-02-03T14:33:44Z

NAME                                                                          ROLE                                         AGE
rolebinding.rbac.authorization.k8s.io/read-pods                               Role/pod-reader                              2m29s

示例:为一个ServiceAccount分配一个只能创建deployment、daemonset、statefulset的权限

ServiceAccount一般提供给程序使用,但也可以给kubectl使用。

实现方式一,通过命令创建:

# 创建集群角色
kubectl create clusterrole deployment-clusterrole --verb=create --resource=deployments,daemonsets,statefulsets
# 创建服务账号
kubectl create serviceaccount cicd-token -n app-team1
# 将服务账号绑定角色
kubectl create rolebinding cicd-token --serviceaccount=app-team1:cicd-token --clusterrole=deployment-clusterrole -n app-team1
# 测试服务账号权限
kubectl --as=system:serviceaccount:app-team1:cicd-token get pods -n app-team1

实现方式二,通过yaml创建:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: cicd-token
  namespace: app-team1
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: deployment-clusterrole
rules:
- apiGroups: ["apps"]
  resources: ["deployments","daemonsets","statefulsets"]
  verbs: ["create"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: cicd-token
  namespace: app-team1
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: deployment-clusterrole
subjects:
- kind: ServiceAccount
  name: cicd-token
  namespace: app-team1

网络策略

默认情况下,Kubernetes 集群网络没任何网络限制,Pod 可以与任何其他Pod 通信,在某些场景下就需要进行网络控制,减少网络攻击面,提高安全性,这就会用到网络策略。网络策略(Network Policy):是一个K8s资源,用于限制Pod出入流量,提供Pod级别和Namespace级别网络访问控制。

网络策略的应用场景(偏重多租户下):

  • 应用程序间的访问控制,例如项目A不能访问项目B的Pod

  • 开发环境命名空间不能访问测试环境命名空间Pod

  • 当Pod暴露到外部时,需要做Pod白名单

网络策略的工作流程:

image-20220204105532677
  1. 创建Network Policy资源

  2. Policy Controller监控网路策略,同步并通知节点上的程序

  3. 节点上DaemonSet运行的程序从etcd获取Policy,调用本地Iptables规则

案例:拒绝其他命名空间Pod访问

需求:test命名空间下所有pod可以互相访问,也可以访问其他命名空间Pod,但其他命名空间不能访问test命名空间Pod

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: deny-all-namespaces
  namespace: test # 指定命名空间
spec:
  podSelector: {} # 网络策略应用的目标pod,未配置代表所有pod
  policyTypes: # 策略类型,指定策略用于入站(Ingress)、出站(Egress)流量
    - Ingress
  ingress:
    - from: # 指定入站的白名单
        - podSelector: {} # 未配置,匹配所有的pod

测试:

kubectl run busybox --image=busybox -n test -- sleep 12h
kubectl run web --image=nginx -n test

# 同命名空间pod可访问测试
kubectl exec busybox -n test -- ping <同命名空间pod IP>

# 非test命名空间pod不可访问test命名空间测试
kubectl exec busybox -- ping <test命名空间pod IP>

案例:同一个命名空间下应用之间限制访问

需求:将test命名空间携带run=web标签的Pod隔离,只允许携带run=client1标签的Pod访问80端口。

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: app-to-app
  namespace: test
spec:
  podSelector:
    matchLabels:
      run: web # test命名空间携带run=web标签的Pod
policyTypes:
  - Ingress
ingress:
  - from: # 指定白名单  只允许携带run=client1标签的Pod
      - podSelector:
        matchLabels:
          run: client1
    ports:
      - protocol: TCP # 访问80端口
        port: 80

测试:

kubectl run web --image=nginx -n test
kubectl run client1 --image=busybox -n test -- sleep 12h
# 可以访问
kubectl exec client1 -n test -- wget <test命名空间pod IP>
# 不能访问
kubectl exec busybox -- wget <test命名空间pod IP>

案例:只允许指定命名空间中的应用访问

需求:只允许dev命名空间中的Pod访问test命名空间中的pod 80端口

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-port-from-namespace
  namespace: test
spec:
  podSelector: {}
  policyTypes:
    - Ingress
  ingress:
    - from: # 白名单,dev命名空间的pod
        - namespaceSelector: 
            matchLabels:
              name: dev
      ports:
        - protocol: TCP
          port: 80

测试:

# 命名空间打标签:
kubectl label namespace dev name=dev

kubectl run busybox --image=busybox -n dev -- sleep 12h
# 可以访问
kubectl exec busybox -n dev -- wget <test命名空间pod IP>
# 不可以访问
kubectl exec busybox -- wget <test命名空间pod IP>

最后更新于