01-以Kubeadm上部署K8s集群

1.环境准备

在华为云上申请3台ECS虚机服务器，临时测试使用，建议按需付费申请。选择VPC为192.168.0.0/16网段，将在这三个节点部署v1.23.1最新版本Kubernetes集群。为何要选三个节点？是因为准备实验ceph作为后端分布式存储，至少需要三个节点起，且需要一个裸盘作为ceph存储数据盘，所以在华为云上给每个ECS配置一个额外的100G空EVS块存储，具体信息如下表；选择flannel作为底层网络插件。

角色	OS	节点name	存储	IP	docker version	kubelet version	kubeadm version	kubectl version	network
master	Centos7.9	master	40G+100G(数据盘)	192.168.0.11	Docker 20.10.8	V1.23.1	V1.23.1	V1.23.1	flannel
master	Centos7.9	node1	40G+100G(数据盘)	192.168.0.23	Docker 20.10.8	V1.23.1	V1.23.1	V1.23.1	flannel
master	Centos7.9	node2	40G+100G(数据盘)	192.168.0.51	Docker 20.10.8	V1.23.1	V1.23.1	V1.23.1	flannel

表1：环境信息，可左右滑动查看全部信息。

注: Kubernetes从v.1.20版本起默认移除 docker 的依赖，如果宿主机上安装了 docker 和 containerd，将优先使用 docker 作为容器运行引擎，如果宿主机上未安装 docker 只安装了 containerd，将使用 containerd 作为容器运行引擎。为减少学习成本，这里选择安装docker。

2.配置安全组

无论华为云、腾讯云还是阿里云、AWS、Azure在配置生成VM虚机都会默认选择安全组，对虚机的网络进行简单防护，即通过安全组限制哪些端口放开，哪些端口可访问，K8s在安装过程中一些组件是通过Service、POD提供网络服务的，是需要开启对应的端口的，否则服务会异常无法部署成功。

当然如果是直接采用物理服务器，或者VMware虚拟机则不需要考虑。默认需要开启的网络端口如下：

2.1 入站规则

master 节点端口检查：

Protocol	Direction	Port Range	Purpose
TCP	Inbound	6443	Kube-apiserver
TCP	Inbound	2379-2380	Etcd API
TCP	Inbound	10250	Kubelet API
TCP	Inbound	10251	Kube-scheduler
TCP	Inbound	10252	Kube-controller-manager

node1、node2 节点端口检查：

Protocol	Direction	Port Range	Purpose
TCP	Inbound	10250	Kubelet api
TCP	Inbound	30000-32767	NodePort Service

2.2 出站规则

协议规则	端口	来源	策略
ALL	ALL	0.0.0.0/0	允许

3. 配置基础信息

给所有节点提前安装准备好基础软件。

3.1修改主机信息

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
# 分别在三个主机上设置主机名，重启生效
hostnamectl set-hostname master
reboot

hostnamectl set-hostname node1
reboot

hostnamectl set-hostname node2
reboot

# 三个主机上同步时间
systemctl restart chronyd

#三个主机上配置hosts地址DNS解析地址
cat >> /etc/hosts << EOF
192.168.0.11  master
192.168.0.23  node1
192.168.0.51  node2
EOF

# 设置三台机子间无密码访问，在主节点生成密钥，拷贝至另外两台，则可在master直接登录node1和node2
# 如果想在node1和node2访问节点，也分别执行下面语句生成密钥并拷贝
ssh-keygen -t rsa
ssh-copy-id root@node1
ssh-copy-id root@node2

# 关闭防火墙和iptables
systemctl stop firewalld.service
systemctl disable firewalld.service
systemctl stop iptables.service
systemctl disable iptables.service

# 关闭SELinux
setenforce 0
sed -i 's/^SELINUX=.*/SELINUX=disabled/' /etc/selinux/config

# 关闭swap
swapoff -a
sed -i 's/.*swap.*/#&/' /etc/fstab

# 配置内核参数：
cat > /etc/sysctl.d/k8s.conf <<EOF
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
EOF

3.2 修改yum源

1
2
3
4
sudo mkdir /etc/yum.repos.d/bak && mv /etc/yum.repos.d/*.repo /etc/yum.repos.d/bak
wget -O /etc/yum.repos.d/CentOS-Base.repo https://repo.huaweicloud.com/repository/conf/CentOS-7-reg.repo
sudo yum clean all
sudo yum makecache fast

3.3 安装基础软件

1
2
3
4
5
#安装自动补全软件与基础依赖包
sudo yum install -y bash-completion
source /etc/profile.d/bash_completion.sh
sudo yum remove docker docker-common docker-selinux docker-engine
sudo yum install -y yum-utils device-mapper-persistent-data lvm2

4.安装docker

给所有节点安装Docker、配置镜像加速。

4.1 安装Docker软件

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
#补充docker yum源
wget -O /etc/yum.repos.d/docker-ce.repo https://repo.huaweicloud.com/docker-ce/linux/centos/docker-ce.repo
sudo sed -i 's+download.docker.com+repo.huaweicloud.com/docker-ce+' /etc/yum.repos.d/docker-ce.repo
sudo yum makecache fast

#安装最新版本Docker 20.10.8，安装前可使用yum list docker-ce --showduplicates  |sort -r查看yum源中的docker列表
sudo yum install -y  docker-ce
sudo systemctl enable docker 
sudo systemctl start docker
sudo systemctl status docker
docker --version

4.2 Docker镜像加速与设置cgroup

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
#配置/etc/docker/daemon.json文件，对镜像加速，注意换成自己镜像地址
sudo tee /etc/docker/daemon.json <<-'EOF'
{
  "registry-mirrors": ["https://e2660ea6dc2b4a16a3ae382f8d227beb.mirror.swr.myhuaweicloud.com"],
 "exec-opts": ["native.cgroupdriver=systemd"]
}
EOF

# 上面配置"exec-opts": ["native.cgroupdriver=systemd"]，即是将docker使用systemd作为cgroupdriver，否则kubelet可能启动不正常

# 重启docker
sudo systemctl daemon-reload
sudo systemctl restart docker
sudo systemctl status docker

5. 安装Kubernets集群

首先对Master节点安装Kubernetes，然后将Node1、Node2加入集群。

5.1 安装Kubeadm

分别在三个节点上安装kubeadm，kubelet，kubectl工具。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
# 所有节点上添加阿里的Kubernetes源
cat >> /etc/yum.repos.d/kubernetes.repo  <<EOF 
[kubernetes]
name=Kubernetes Repository
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64/
gpgcheck=1
gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg
EOF

sudo yum clean all
sudo yum makecache fast

#分别查询1.23.1包是否在yum源里
yum list kubelet --showduplicates | sort -r
yum list kubectl --showduplicates | sort -r
yum list kubeadm --showduplicates | sort -r

#安装kubeadm，会自动安装好kubectl，kubelet
sudo yum install -y kubeadm
sudo systemctl enable kubelet
sudo systemctl start kubelet

kubeadm version
kubectl version
kubelet --version

#kubectl命令补全
cd
echo "source <(kubectl completion bash)" >> ~/.bash_profile
source .bash_profile 

# master查看所需的镜像
sudo kubeadm config images list
------------------------------------------------
#查询的需要如下镜像
k8s.gcr.io/kube-apiserver:v1.23.1
k8s.gcr.io/kube-controller-manager:v1.23.1
k8s.gcr.io/kube-scheduler:v1.23.1
k8s.gcr.io/kube-proxy:v1.23.1
k8s.gcr.io/pause:3.6
k8s.gcr.io/etcd:3.5.1-0
k8s.gcr.io/coredns/coredns:v1.8.6
-------------------------------------------------

# 由于kubeadm依赖国外的k8s.gcr.io的镜像，国内被墙所以这边的解决方案是下载国内的镜像重新打tag的方式
docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/kube-apiserver:v1.23.1
docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/kube-controller-manager:v1.23.1
docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/kube-scheduler:v1.23.1
docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/kube-proxy:v1.23.1
docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/pause:3.6
docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/etcd:3.5.1-0
docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/coredns:v1.8.6

# 修改tag回k8s.gcr.io（重命名）
docker tag registry.cn-hangzhou.aliyuncs.com/google_containers/kube-apiserver:v1.23.1  k8s.gcr.io/kube-apiserver:v1.23.1
docker tag registry.cn-hangzhou.aliyuncs.com/google_containers/kube-controller-manager:v1.23.1  k8s.gcr.io/kube-controller-manager:v1.23.1
docker tag registry.cn-hangzhou.aliyuncs.com/google_containers/kube-scheduler:v1.23.1  k8s.gcr.io/kube-scheduler:v1.23.1
docker tag registry.cn-hangzhou.aliyuncs.com/google_containers/kube-proxy:v1.23.1  k8s.gcr.io/kube-proxy:v1.23.1
docker tag registry.cn-hangzhou.aliyuncs.com/google_containers/pause:3.6  k8s.gcr.io/pause:3.6
docker tag registry.cn-hangzhou.aliyuncs.com/google_containers/etcd:3.5.1-0  k8s.gcr.io/etcd:3.5.1-0
docker tag registry.cn-hangzhou.aliyuncs.com/google_containers/coredns:v1.8.6  k8s.gcr.io/coredns/coredns:v1.8.6

5.2 Master节点初始化

1
2
3
4
5
6
# master执行init初始化，指定pod网络为10.244.0.0/16，服务网络为10.1.0.0/16，这两个均为集群内部网络，API-server为master节点IP（从华为云上VPC中分配的）
kubeadm init \
--kubernetes-version=1.23.1 \
--apiserver-advertise-address=192.168.0.11 \
--service-cidr=10.1.0.0/16 \
--pod-network-cidr=10.244.0.0/16

初始化安装过程打印如下：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
[init] Using Kubernetes version: v1.23.1
[preflight] Running pre-flight checks
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.lo                                          cal master] and IPs [10.1.0.1 192.168.0.46]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [localhost master] and IPs [192.168.0.46 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [localhost master] and IPs [192.168.0.46 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[apiclient] All control plane components are healthy after 5.502543 seconds
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config-1.23" in namespace kube-system with the configuration for the kubelets in the cluster
NOTE: The "kubelet-config-1.23" naming of the kubelet ConfigMap is deprecated. Once the UnversionedKubeletConfigMap feature gate graduates to Beta the default name will become just "kubelet-config". Kubeadm upgrade will handle this transition transparently.
[upload-certs] Skipping phase. Please see --upload-certs
[mark-control-plane] Marking the node master as control-plane by adding the labels: [node-role.kubernetes.io/master(deprecated) node-role.kubernetes.io/control-plane node.kubernetes.io/exclude-from-external-load-balancers]
[mark-control-plane] Marking the node master as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule]
[bootstrap-token] Using token: b2n16t.n6filxh3vc6byr7c
[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to get nodes
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstrap-token] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstrap-token] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstrap-token] Creating the "cluster-info" ConfigMap in the "kube-public" namespace
[kubelet-finalize] Updating "/etc/kubernetes/kubelet.conf" to point to a rotatable kubelet client certificate and key
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy

Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

Alternatively, if you are the root user, you can run:

  export KUBECONFIG=/etc/kubernetes/admin.conf

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

Then you can join any number of worker nodes by running the following on each as root:

kubeadm join 192.168.0.46:6443 --token b2n16t.n6filxh3vc6byr7c \
        --discovery-token-ca-cert-hash sha256:f4d103707658df3fa7a8dc95a59719f362cd42edb40c8ebc5ae19d53655813d1

根据提示，将配置拷贝至.kube目录下

1
2
3
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

应用网络插件flannel

1
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml

5.3 节点加入集群

分别在两个节点上执行kubectl join加入集群。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
[root@node1 ~]# kubeadm join 192.168.0.46:6443 --token b2n16t.n6filxh3vc6byr7c \
>         --discovery-token-ca-cert-hash sha256:f4d103707658df3fa7a8dc95a59719f362cd42edb40c8ebc5ae19d53655813d1
[preflight] Running pre-flight checks
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...

This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.

Run 'kubectl get nodes' on the control-plane to see this node join the cluster.

[root@node1 ~]#

如果想在节点上执行kubectl命令，则需要将master节点配置拷贝至节点的$HOME/.kube目录下。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
#在节点上创建一个目录，即kubectl默认启动读取证书配置目录
[root@node1 ~]# mkdir -p $HOME/.kube

#在master节点将配置复制到node1节点
[root@master ~]# scp .kube/config root@node1:/root/.kube/

#可在node节点查看节点，组件，pod等状态
[root@node1 ~]# kubectl get nodes
NAME     STATUS   ROLES                  AGE   VERSION
master   Ready    control-plane,master   43m   v1.23.1
node1    Ready    <none>                 39m   v1.23.1
node2    Ready    <none>                 39m   v1.23.1
[root@node1 ~]# kubectl get ns
NAME              STATUS   AGE
default           Active   43m
kube-node-lease   Active   43m
kube-public       Active   43m
kube-system       Active   43m
[root@node1 ~]# kubectl get cs
Warning: v1 ComponentStatus is deprecated in v1.19+
NAME                 STATUS    MESSAGE                         ERROR
scheduler            Healthy   ok
controller-manager   Healthy   ok
etcd-0               Healthy   {"health":"true","reason":""}
[root@node1 ~]# kubectl get pods -nkube-system
NAME                             READY   STATUS    RESTARTS   AGE
coredns-64897985d-bs6b9          1/1     Running   0          43m
coredns-64897985d-s2kml          1/1     Running   0          43m
etcd-master                      1/1     Running   0          44m
kube-apiserver-master            1/1     Running   0          44m
kube-controller-manager-master   1/1     Running   0          44m
kube-flannel-ds-8jpd4            1/1     Running   0          39m
kube-flannel-ds-jlfzx            1/1     Running   0          39m
kube-flannel-ds-jztwk            1/1     Running   0          41m
kube-proxy-5lnr9                 1/1     Running   0          39m
kube-proxy-thghs                 1/1     Running   0          43m
kube-proxy-w7rhv                 1/1     Running   0          39m
kube-scheduler-master            1/1     Running   0          44m
[root@node1 ~]#

5.4 安装ceph存储

我们通过rook来安装ceph存储，ceph要求至少三个节点，每个节点至少有一个裸盘。我们在申请ECS的时候加了一个100G的EVS块存储盘。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
[root@master ~]# lsblk
NAME   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
vda    253:0    0   40G  0 disk
└─vda1 253:1    0   40G  0 part /
vdb    253:16   0  100G  0 disk
[root@master ~]# lsblk -f
NAME   FSTYPE LABEL UUID                                 MOUNTPOINT
vda
└─vda1 ext4         b64c5c5d-9f6b-4754-9e1e-eaef91437f7a /
vdb

为了方便，可先到github把整个rook项目下载下来。

1
2
yum install -y git
git clone https://github.com/rook/rook.git

确保系统内核支持rbd

1
2
3
4
5
6
[root@master ~]# uname -r
3.10.0-1160.15.2.el7.x86_64
[root@master ~]# modprobe rbd
[root@master ~]# lsmod |grep rbd
rbd                   102400  0
libceph               413696  1 rbd

因为我们只部署了3个节点，而ceph最低要求3个节点，pod默认不允许部署在master节点，为了ceph的pod能正常部署，我们提前将master节点的污点去掉，允许pod部署在master节点上。

1
[root@master1 ~]# kubectl taint nodes --all node-role.kubernetes.io/master-

开始部署rook

1
2
3
4
5
cd /root/rook/deploy/examples
kubectl apply -f crds.yaml -f common.yaml 
kubectl apply -f operator.yaml   #如果img下载不下来可提前pull到本地
kubectl apply -f cluster.yaml 
kubectl get pods -n rook-ceph -o wide

根据提示会有很多image拉取不到，到aliyun上逐渐获取，需要在各个节点上执行，记录的几个镜像：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14

docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/rook/ceph:master
docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/csi-node-driver-registrar:v2.3.0
docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/csi-provisioner:v3.0.0
docker pull quay.io/cephcsi/cephcsi:v3.4.0
docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/csi-attacher:v3.3.0
docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/csi-snapshotter:v4.2.0
docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/csi-resizer:v1.3.0

docker tag  registry.cn-hangzhou.aliyuncs.com/google_containers/csi-node-driver-registrar:v2.3.0 k8s.gcr.io/sig-storage/csi-node-driver-registrar:v2.3.0
docker tag registry.cn-hangzhou.aliyuncs.com/google_containers/csi-provisioner:v3.0.0 k8s.gcr.io/sig-storage/csi-provisioner:v3.0.0
docker tag registry.cn-hangzhou.aliyuncs.com/google_containers/csi-attacher:v3.3.0 k8s.gcr.io/sig-storage/csi-attacher:v3.3.0
docker tag registry.cn-hangzhou.aliyuncs.com/google_containers/csi-snapshotter:v4.2.0 k8s.gcr.io/sig-storage/csi-snapshotter:v4.2.0
docker tag registry.cn-hangzhou.aliyuncs.com/google_containers/csi-resizer:v1.3.0 k8s.gcr.io/sig-storage/csi-resizer:v1.3.0

镜像问题解决后查看namespace下的pod状态：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
[root@master examples]# kubectl get pods -nrook-ceph
NAME                                               READY   STATUS      RESTARTS   AGE
csi-cephfsplugin-ct842                             3/3     Running     0          25m
csi-cephfsplugin-cvb7f                             3/3     Running     0          25m
csi-cephfsplugin-j5gbm                             3/3     Running     0          25m
csi-cephfsplugin-provisioner-5c8b6d6f4-hhvjq       6/6     Running     0          25m
csi-cephfsplugin-provisioner-5c8b6d6f4-kr4n5       6/6     Running     0          25m
csi-rbdplugin-fcbk9                                3/3     Running     0          25m
csi-rbdplugin-fpv8t                                3/3     Running     0          25m
csi-rbdplugin-provisioner-8564cfd44-jkqrq          6/6     Running     0          25m
csi-rbdplugin-provisioner-8564cfd44-q8srg          6/6     Running     0          25m
csi-rbdplugin-qtgvt                                3/3     Running     0          25m
rook-ceph-crashcollector-master-7bcf565ddc-4mvmk   1/1     Running     0          20m
rook-ceph-crashcollector-node1-7bfc99f96d-2jw4w    1/1     Running     0          20m
rook-ceph-crashcollector-node2-678f85bdf-qw2gq     1/1     Running     0          20m
rook-ceph-mgr-a-574b6956fd-fzt5q                   1/1     Running     0          20m
rook-ceph-mon-a-668b48987f-g5zfw                   1/1     Running     0          25m
rook-ceph-mon-b-54996b7487-6qscc                   1/1     Running     0          24m
rook-ceph-mon-c-6cc5bd5c85-wsrn9                   1/1     Running     0          22m
rook-ceph-operator-75dd789779-8kq7z                1/1     Running     0          30m
rook-ceph-osd-0-849c84cc87-bzpf9                   1/1     Running     0          20m
rook-ceph-osd-1-77cfc975bb-hbdnn                   1/1     Running     0          20m
rook-ceph-osd-2-5c7d59d74d-g67fz                   1/1     Running     0          20m
rook-ceph-osd-prepare-master-98nld                 0/1     Completed   0          20m
rook-ceph-osd-prepare-node1-nvqvg                  0/1     Completed   0          20m
rook-ceph-osd-prepare-node2-x6cnk                  0/1     Completed   0          20m
[root@master examples]# kubectl get service -n rook-ceph
NAME                       TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)             AGE
csi-cephfsplugin-metrics   ClusterIP   10.1.101.105   <none>        8080/TCP,8081/TCP   26m
csi-rbdplugin-metrics      ClusterIP   10.1.238.71    <none>        8080/TCP,8081/TCP   26m
rook-ceph-mgr              ClusterIP   10.1.98.179    <none>        9283/TCP            21m
rook-ceph-mgr-dashboard    ClusterIP   10.1.251.161   <none>        8443/TCP            21m
rook-ceph-mon-a            ClusterIP   10.1.0.149     <none>        6789/TCP,3300/TCP   26m
rook-ceph-mon-b            ClusterIP   10.1.42.253    <none>        6789/TCP,3300/TCP   25m
rook-ceph-mon-c            ClusterIP   10.1.99.90     <none>        6789/TCP,3300/TCP   24m

<1>. 三个节点rook-ceph-osd-prepare的正常状态为Completed
<2>. 如果其中一个为Running或缺少rook-ceph-osd节点，注意检查异常节点的时间，防火墙，内存使用情况等。
<3>. 部署Toolbox工具

上面的dashboard是cluster IP集群内部访问，如果想在外部访问，可部署NodePort类型Dashboard，好在rook项目已经写好了，直接使用即可。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
[root@master examples]# cd /root/rook/deploy/examples
[root@master examples]#
[root@master examples]# kubectl apply -f dashboard-external-https.yaml
service/rook-ceph-mgr-dashboard-external-https created
[root@master examples]# kubectl get service -n rook-ceph
NAME                                     TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)             AGE
csi-cephfsplugin-metrics                 ClusterIP   10.1.101.105   <none>        8080/TCP,8081/TCP   31m
csi-rbdplugin-metrics                    ClusterIP   10.1.238.71    <none>        8080/TCP,8081/TCP   31m
rook-ceph-mgr                            ClusterIP   10.1.98.179    <none>        9283/TCP            26m
rook-ceph-mgr-dashboard                  ClusterIP   10.1.251.161   <none>        8443/TCP            26m
rook-ceph-mgr-dashboard-external-https   NodePort    10.1.182.240   <none>        8443:30301/TCP      35s
rook-ceph-mon-a                          ClusterIP   10.1.0.149     <none>        6789/TCP,3300/TCP   31m
rook-ceph-mon-b                          ClusterIP   10.1.42.253    <none>        6789/TCP,3300/TCP   30m
rook-ceph-mon-c                          ClusterIP   10.1.99.90     <none>        6789/TCP,3300/TCP   28m
[root@master examples]#

已经多出一个30301端口的NodePort类型服务，随便拿一个node的IP访问：https://Node-EIP1:30301，输入用户名和密码即可。
访问dashboard的用户名默认是admin，密码通过如下命令获取：

1
kubectl -n rook-ceph get secret rook-ceph-dashboard-password -o jsonpath="{['data']['password']}" | base64 --decode && echo

部署Ceph toolbox: 默认启动的Ceph集群，是开启Ceph认证的，这样你登陆Ceph组件所在的Pod里，是没法去获取集群状态，以及执行CLI命令，这时需要部署Ceph toolbox，命令如下：

1
2
3
4
5
6
7
kubectl apply -f toolbox.yaml
#查看是否正常
kubectl -n rook-ceph get pods -o wide | grep ceph-tools
#然后可以登陆该pod后，执行Ceph CLI命令：
kubectl -n rook-ceph exec -it rook-ceph-tools-76c7d559b6-8w7bk bash
#查看集群状态
ceph status

rook提供RBD服务，rook可以提供以下3类型的存储：

Block: Create block storage to be consumed by a pod
Object: Create an object store that is accessible inside or outside the Kubernetes cluster
Shared File System: Create a file system to be shared across multiple pods

在提供（Provisioning）块存储之前，需要先创建StorageClass和存储池。K8S需要这两类资源，才能和Rook交互，进而分配持久卷（PV）。

在kubernetes集群里，要提供rbd块设备服务，需要有如下步骤：

1）创建rbd-provisioner pod

创建rbd对应的storageclass
创建pvc，使用rbd对应的storageclass
创建pod使用rbd pvc
通过rook创建Ceph Cluster之后，rook自身提供了rbd-provisioner服务，所以我们不需要再部署其provisioner。
创建pool和StorageClass
查看storageclass.yaml的配置， vim storageclass.yaml，配置文件中包含了一个名为replicapool的存储池，名为rook-ceph-block的storageClass，运行yaml文件

1
2
3
4
[root@master ~]#cd rook/deploy/examples/csi/rbd
[root@master rbd]# kubectl apply -f storageclass.yaml
cephblockpool.ceph.rook.io/replicapool created
storageclass.storage.k8s.io/rook-ceph-block created

2）查看创建的storageclass:

1
2
3
[root@master rbd]# kubectl get storageclass
NAME              PROVISIONER                  RECLAIMPOLICY   VOLUMEBINDINGMODE   ALLOWVOLUMEEXPANSION   AGE
rook-ceph-block   rook-ceph.rbd.csi.ceph.com   Delete          Immediate           true                   2m36s

3）登录ceph dashboard查看创建的存储池：
使用存储，以官方服务wordpress示例为例，创建一个经典的wordpress和mysql应用程序来使用Rook提供的块存储，这两个应用程序都将使用Rook提供的block volumes。
查看yaml文件配置，主要看定义的pvc和挂载volume部分，以wordpress.yaml和mysql.yaml为例：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
[root@master ~]# cd rook/deploy/examples/
[root@master examples]# kubectl apply -f wordpress.yaml -f mysql.yaml
service/wordpress created
persistentvolumeclaim/wp-pv-claim created
deployment.apps/wordpress created
service/wordpress-mysql created
persistentvolumeclaim/mysql-pv-claim created
deployment.apps/wordpress-mysql created
[root@master examples]# kubectl get deployments.apps
NAME              READY   UP-TO-DATE   AVAILABLE   AGE
wordpress         0/1     1            0           28s
wordpress-mysql   0/1     1            0           28s

这2个应用都会创建一个块存储卷，并且挂载到各自的pod中，查看声明的pvc和pv：

1
2
3
4
5
6
7
8
9
[root@master examples]# kubectl get pvc
NAME             STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS      AGE
mysql-pv-claim   Bound    pvc-cdfbbd11-a22e-4f72-96cd-064e228eb730   20Gi       RWO            rook-ceph-block   83s
wp-pv-claim      Bound    pvc-b09ce46e-d00e-4b7d-8303-748bbb7d0944   20Gi       RWO            rook-ceph-block   83s
[root@master examples]# kubectl get pv
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                    STORAGECLASS      REASON   AGE
pvc-b09ce46e-d00e-4b7d-8303-748bbb7d0944   20Gi       RWO            Delete           Bound    default/wp-pv-claim      rook-ceph-block            86s
pvc-cdfbbd11-a22e-4f72-96cd-064e228eb730   20Gi       RWO            Delete           Bound    default/mysql-pv-claim   rook-ceph-block            86s
[root@master examples]#

这里的pv会自动创建，当提交了包含 StorageClass 字段的 PVC 之后，Kubernetes 就会根据这个 StorageClass 创建出对应的 PV，这是用到的是Dynamic Provisioning机制来动态创建pv，PV 支持 Static 静态请求，和动态创建两种方式。

登录ceph dashboard查看创建的images

ceph-block

5.5 安装Dashboard可视化面板

从github获取dashboard源码

1
wget https://raw.githubusercontent.com/kubernetes/dashboard/v2.4.0/aio/deploy/recommended.yaml

为了测试方便，我们将Service改成NodePort类型，注意在YAML中下面的 Service 部分新增一个type=NodePort：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
kind: Service
apiVersion: v1
metadata:
  labels:
    k8s-app: kubernetes-dashboard
  name: kubernetes-dashboard
  namespace: kubernetes-dashboard
spec:
  ports:
    - port: 443
      targetPort: 8443
  type: NodePort
  selector:
    k8s-app: kubernetes-dashboard

默认没有字段type: NodePort，服务类型为cluster IP类型。

然后直接部署新版本的dashboard即可：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
[root@master ~]# kubectl apply -f recommended.yaml
namespace/kubernetes-dashboard created
serviceaccount/kubernetes-dashboard created
service/kubernetes-dashboard created
secret/kubernetes-dashboard-certs created
secret/kubernetes-dashboard-csrf created
secret/kubernetes-dashboard-key-holder created
configmap/kubernetes-dashboard-settings created
role.rbac.authorization.k8s.io/kubernetes-dashboard created
clusterrole.rbac.authorization.k8s.io/kubernetes-dashboard created
rolebinding.rbac.authorization.k8s.io/kubernetes-dashboard created
clusterrolebinding.rbac.authorization.k8s.io/kubernetes-dashboard created
deployment.apps/kubernetes-dashboard created
service/dashboard-metrics-scraper created
deployment.apps/dashboard-metrics-scraper created
[root@master ~]# kubectl get ns
NAME                   STATUS   AGE
default                Active   50m
kube-node-lease        Active   50m
kube-public            Active   50m
kube-system            Active   50m
kubernetes-dashboard   Active   11s
rook-ceph              Active   46m
[root@master ~]# kubectl get svc -nkubernetes-dashboard
NAME                        TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)         AGE
dashboard-metrics-scraper   ClusterIP   10.1.213.171   <none>        8000/TCP        32s
kubernetes-dashboard        NodePort    10.1.221.14    <none>        443:31712/TCP   32s

其中NodePort为31712，随意组合一个Node节点IP即可访问。https://NodeIP:31712, 由于在华为云外网无法直接访问VPC内部IP地址，所以需要使用外部EIP访问，EIP会隐射到内部Node节点IP上去。

huaweicloud-k8s-dashboard

这个时候需要使用Token或者Kubeconfig来登陆。

1
2
3
kubectl create serviceaccount  dashboard-admin -n kube-system
kubectl create clusterrolebinding  dashboard-admin --clusterrole=cluster-admin --serviceaccount=kube-system:dashboard-admin
kubectl describe secrets -n kube-system $(kubectl -n kube-system get secret | awk '/dashboard-admin/{print $1}')

示例一下Token：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
[root@master ~]# kubectl describe secrets -n kube-system $(kubectl -n kube-system get secret | awk '/dashboard-admin/{print $1}')
Name:         dashboard-admin-token-thf6q
Namespace:    kube-system
Labels:       <none>
Annotations:  kubernetes.io/service-account.name: dashboard-admin
              kubernetes.io/service-account.uid: d6ea3599-19c6-48a9-aa3b-2ec7ce265a24

Type:  kubernetes.io/service-account-token

Data
====
token:      eyJhbGciOiJSUzI1NiIsImtpZCI6ImJQRzl4aF9wMFdRbWE2blp0b1JvN2dVNWhkRkdZVzRpMndLMnhJbks5S00ifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJrdWJlLXN5c3RlbSIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJkYXNoYm9hcmQtYWRtaW4tdG9rZW4tdGhmNnEiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC5uYW1lIjoiZGFzaGJvYXJkLWFkbWluIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQudWlkIjoiZDZlYTM1OTktMTljNi00OGE5LWFhM2ItMmVjN2NlMjY1YTI0Iiwic3ViIjoic3lzdGVtOnNlcnZpY2VhY2NvdW50Omt1YmUtc3lzdGVtOmRhc2hib2FyZC1hZG1pbiJ9.PlaEmz10kVjQf1zxUSNfiGytP0Ha6hCLuk2fBFM08owjEaFcIWHdRVRsHL6RO0w0i81YG0Gh6x3zJffy_ojhi_M-bCaPSVubPFrZz-CYO7Uia4fYv1P8f5c6I2X1e_-K2DzCYUlJvI3nzZy-jrFMIz_W19k63rRbxeNrqkdBJpsheWmaT_g8fjAzjtCDEnYUGDDPTVOtEvuhaSC_yci42f7eqTtlR2_QK1Bg2Id0GIEtEXT3xBgaofWuyjJVEex1mc4LImsdzpVFMtmPum9vEoZzxq1EONhOWxaaFIaadstfM-id9vDNlvZ5O2szk5xVtdgryFi72ICX7x5EpPyOqw
ca.crt:     1099 bytes
namespace:  11 bytes

可以拿上面的token直接登陆，另也可以使用config文件也可以登录Dashboard。

生成kubeconfig文件

1
2
3
4
5
6
7
8
9
DASH_TOCKEN=$(kubectl get secret -n kube-system dashboard-admin-token-thf6q -o jsonpath={.data.token}|base64 -d)
#其中的 dashboard-admin-token-thf6q为上面生成的token名

kubectl config set-cluster kubernetes --server=192.168.0.11:6443 --kubeconfig=/root/dashbord-admin.conf
#其中server地址为API-server地址

kubectl config set-credentials dashboard-admin --token=$DASH_TOCKEN --kubeconfig=/root/dashbord-admin.conf
kubectl config set-context dashboard-admin@kubernetes --cluster=kubernetes --user=dashboard-admin --kubeconfig=/root/dashbord-admin.conf
kubectl config use-context dashboard-admin@kubernetes --kubeconfig=/root/dashbord-admin.conf

生成的dashbord-admin.conf即可用于登录Dashboard。

kubernetes_dashboard_2

全文完。