部署TLS k8s
Kubernetes是一个开源的,用于管理云平台中多个主机上的容器化的应用,Kubernetes的目标是让部署容器化的应用简单并且高效(powerful),Kubernetes提供了应用部署,规划,更新,维护的一种机制。Kubernetes一个核心的特点就是能够自主的管理容器来保证云平台中的容器按照用户的期望状态运行。
环境:
Centos 7.4.1708
dockers 18.02.0-ce-rc1
kubernetes v1.9.2
etcd 3.2.15
k8s下载地址:https://github.com/kubernetes/kubernetes/releases
https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.9.md#v192
基础配置
同步时间,所有节点均操作1
2
3
4
5# cp /usr/share/zoneinfo/Asia/Shanghai /etc/localtime -a
# ntpdate s1a.time.edu.cn
# crontab -l
* */3 * * * ntpdate s1a.time.edu.cn &> /dev/null
#
设置主机名1
2
3
4
5
6
7
8
9
10
11
12
13
14[root@master1 ~]# hostname master1.example.com
[root@master2 ~]# hostname master2.example.com
[root@master3 ~]# hostname master3.example.com
[root@node1 ~]# hostname node1.example.com
[root@node2 ~]# hostname node2.example.com
#三个节点均做hosts解析
# tail -5 /etc/hosts
192.168.1.195 master1.example.com master1
192.168.1.196 master2.example.com master2
192.168.1.197 master3.example.com master3
192.168.1.198 node1.example.com node1
192.168.1.199 node2.example.com node2
#
关闭防火墙和Selinux,所有节点都操作1
2
3
4iptables -F
systemctl stop firewalld
systemctl disable firewalld
setenforce 0
环境说明
| 角色 | IP | 组件 |
|---|---|---|
| Kube-Master + ETCD1 | 192.168.1.195 | etcd、kube-apiserver、kube-controller-manager、kube-scheduler、Flannel |
| ETCD2 | 192.168.1.196 | etcd |
| ETCD3 | 192.168.1.197 | etcd |
| Kube-Node1 | 192.168.1.198 | kubelet、kube-proxy、docker、Flannel |
| Kube-Node2 | 192.168.1.199 | kubelet、kube-proxy、docker、Flannel |
| 名称 | 网段/地址 | 参数 | 备注 |
|---|---|---|---|
| Service_CIDR | 172.16.0.0/16 | --service-cluster-ip-range |
服务 网段 |
| Cluster_CIDR | 172.30.0.0/16 | --cluster-cidr |
POD 网段 |
| CLUSTER_KUBERNETES_SVC_IP | 172.16.0.1 | - | kubernetes 服务IP,SERVICE_CIDR 中第一个IP |
| CLUSTER_DNS_SVC_IP | 172.16.0.2 | - | 集群DNS 服务IP,SERVICE_CIDR 中第二个IP |
| NODE_PORT_RANGE | 8400-9000 | 服务端口 范围 | |
| CLUSTER_DNS_DOMAIN | - | cluster.local. |
集群DNS域名 |
| FLANNEL_ETCD_PREFIX | - | /kubernetes/network |
flanneld 网络配置前缀 |
安装Docker
在node1/node2这2个节点都安装docker引擎1
2
3
4
5
6
7
8yum remove docker docker-common docker-selinux docker-engine -y
yum install -y yum-utils device-mapper-persistent-data lvm2
yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
yum-config-manager --enable docker-ce-edge
yum-config-manager --enable docker-ce-test
yum install docker-ce -y
systemctl enable docker
systemctl start docker
创建 CA 证书和秘钥
kubernetes 系统各组件需要使用 TLS 证书对通信进行加密,这里使用 CloudFlare 的 PKI 工具集 cfssl 来生成 Certificate Authority (CA) 证书和秘钥文件,CA 是自签名的证书,用来签名后续创建的其它 TLS 证书
安装 CFSSL
cfssl下载地址:https://github.com/cloudflare/cfssl/releases
cfssl R1.2工具包本地下载1
2
3
4
5
6
7
8
9[root@master1 ~]# wget https://pkg.cfssl.org/R1.2/cfssl_linux-amd64 -O /usr/local/bin/cfssl
[root@master1 ~]# wget https://pkg.cfssl.org/R1.2/cfssljson_linux-amd64 -O /usr/local/bin/cfssljson
[root@master1 ~]# wget https://pkg.cfssl.org/R1.2/cfssl-certinfo_linux-amd64 -O /usr/local/bin/cfssl-certinfo
[root@master1 ~]# chmod +x /usr/local/bin/cfssl*
[root@master1 ~]# mkdir ssl && cd ssl
#测试
[root@master1 ssl]# cfssl print-defaults config > config.json
[root@master1 ssl]# cfssl print-defaults csr > csr.json
创建 CA (Certificate Authority)
| 证书名称 | 配置文件 | 用途 |
|---|---|---|
| ca.pem | ca-config.json | CA 配置文件 |
| etcd.pem | ca-csr.json | CA 证书 |
1 | [root@master1 ssl]# cat ca-config.json |
- ca-config.json:可以定义多个 profiles,分别指定不同的过期时间、使用场景等参数;后续在签名证书时使用某个 profile;
- signing:表示该证书可用于签名其它证书;生成的 ca.pem 证书中 CA=TRUE;
- server auth:表示 client 可以用该 CA 对 server 提供的证书进行验证;
- client auth:表示 server 可以用该 CA 对 client 提供的证书进行验证;
创建 CA 证书签名请求
1 | [root@master1 ssl]# cat ca-csr.json |
- “CN”:Common Name,kube-apiserver 从证书中提取该字段作为请求的用户名 (User Name);浏览器使用该字段验证网站是否合法;
- “O”:Organization,kube-apiserver 从证书中提取该字段作为请求用户所属的组 (Group);
生成 CA 证书和私钥
1 | [root@master1 ssl]# cfssl gencert -initca ca-csr.json | cfssljson -bare ca |
分发证书
1 | [root@master1 ssl]# mkdir -p /etc/kubernetes/ssl |
1 | [root@master1 ssl]# ssh root@192.168.1.196 "mkdir -p /etc/kubernetes/ssl" |
1 | [root@master1 ssl]# ssh root@192.168.1.197 "mkdir -p /etc/kubernetes/ssl" |
1 | [root@master1 ssl]# ssh root@192.168.1.198 "mkdir -p /etc/kubernetes/ssl" |
1 | [root@master1 ssl]# ssh root@192.168.1.199 "mkdir -p /etc/kubernetes/ssl" |
部署高可用 etcd 集群
需要关闭 selinux,关闭防火墙,ntpdate 时间同步
kuberntes 系统使用 etcd 存储所有数据,这里和kuberntes master安装到一起
三个etcd分别取名为:etcd1、etcd2、etcd3
| 集群名称 | IP | 集群地址 |
|---|---|---|
| etcd1 | 192.168.1.195 | https://192.168.1.195:2379 |
| etcd2 | 192.168.1.196 | https://192.168.1.196:2379 |
| etcd3 | 192.168.1.197 | https://192.168.1.197:2379 |
安装 Etcd
三个节点master均安装
etcd下载地址:https://github.com/coreos/etcd/releases
etcd-v3.2.15本地下载1
2
3[root@master1 ~]# wget https://github.com/coreos/etcd/releases/download/v3.2.15/etcd-v3.2.15-linux-amd64.tar.gz
[root@master1 ~]# tar -zxf etcd-v3.2.15-linux-amd64.tar.gz
[root@master1 ~]# cp -a etcd-v3.2.15-linux-amd64/etcd* /usr/local/bin/
创建 TLS 秘钥和证书
为了保证通信安全,客户端(如 etcdctl) 与 etcd 集群、etcd 集群之间的通信需要使用 TLS 加密
创建 etcd 证书签名请求1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26[root@master1 ~]# mkdir -p ssl/
[root@master1 ~]# cd ssl/
[root@master1 ssl]# cat etcd-csr.json
{
"CN": "etcd",
"hosts": [
"127.0.0.1",
"192.168.1.195",
"192.168.1.196",
"192.168.1.197"
],
"key": {
"algo": "rsa",
"size": 2048
},
"names": [
{
"C": "CN",
"ST": "BeiJing",
"L": "BeiJing",
"O": "k8s",
"OU": "System"
}
]
}
[root@master1 ssl]#
- hosts 字段指定授权使用该证书的 etcd 节点 IP
生成 etcd 证书和私钥1
2
3
4
5[root@master1 ssl]# cfssl gencert -ca=/etc/kubernetes/ssl/ca.pem -ca-key=/etc/kubernetes/ssl/ca-key.pem -config=/etc/kubernetes/ssl/ca-config.json -profile=kubernetes etcd-csr.json | cfssljson -bare etcd
[root@master1 ssl]# ls etcd*
etcd.csr etcd-csr.json etcd-key.pem etcd.pem
[root@master1 ssl]# mkdir -p /etc/etcd/ssl
[root@master1 ssl]# cp etcd*.pem /etc/etcd/ssl/
创建 etcd 的 systemd unit 文件1
[root@master1 ~]# mkdir -p /var/lib/etcd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34[root@master1 ~]# cat etcd.service
[Unit]
Description=Etcd Server
After=network.target
After=network-online.target
Wants=network-online.target
Documentation=https://github.com/coreos
[Service]
Type=notify
WorkingDirectory=/var/lib/etcd/
ExecStart=/usr/local/bin/etcd \
--name=etcd1 \
--cert-file=/etc/etcd/ssl/etcd.pem \
--key-file=/etc/etcd/ssl/etcd-key.pem \
--peer-cert-file=/etc/etcd/ssl/etcd.pem \
--peer-key-file=/etc/etcd/ssl/etcd-key.pem \
--trusted-ca-file=/etc/kubernetes/ssl/ca.pem \
--peer-trusted-ca-file=/etc/kubernetes/ssl/ca.pem \
--initial-advertise-peer-urls=https://192.168.1.195:2380 \
--listen-peer-urls=https://192.168.1.195:2380 \
--listen-client-urls=https://192.168.1.195:2379,http://127.0.0.1:2379 \
--advertise-client-urls=https://192.168.1.195:2379 \
--initial-cluster-token=etcd-cluster-0 \
--initial-cluster=etcd1=https://192.168.1.195:2380,etcd2=https://192.168.1.196:2380,etcd3=https://192.168.1.197:2380 \
--initial-cluster-state=new \
--data-dir=/var/lib/etcd
Restart=on-failure
RestartSec=5
LimitNOFILE=65536
[Install]
WantedBy=multi-user.target
[root@master1 ~]#
- 指定 etcd 的工作目录和数据目录为
/var/lib/etcd,需在启动服务前创建这个目录 --name为当前etcd的名称,每个etcd节点名字不能相同--initial-advertise-peer-urls,--listen-peer-urls,--listen-client-urls,--advertise-client-urls这四个参数需要修改为当前节点的IP地址--initial-cluster参数为,etcd集群所有节点的IP地址- 为了保证通信安全,需要指定 etcd 的公私钥(
cert-file和key-file)、Peers 通信的公私钥和 CA 证书(peer-cert-file、peer-key-file、peer-trusted-ca-file)、客户端的CA证书(trusted-ca-file) --initial-cluster-state值为 new 时,--name的参数值必须位于--initial-cluster列表中
1 | [root@master1 ~]# mv etcd.service /etc/systemd/system/ |
分发etcd证书1
2
3[root@master1 ~]# ssh root@192.168.1.196 "mkdir -p /var/lib/etcd"
[root@master1 ~]# ssh root@192.168.1.196 "mkdir -p /etc/etcd/ssl"
[root@master1 ~]# scp /etc/etcd/ssl/etcd* root@192.168.1.196:/etc/etcd/ssl/1
2
3[root@master1 ~]# ssh root@192.168.1.197 "mkdir -p /var/lib/etcd"
[root@master1 ~]# ssh root@192.168.1.197 "mkdir -p /etc/etcd/ssl"
[root@master1 ~]# scp /etc/etcd/ssl/etcd* root@192.168.1.197:/etc/etcd/ssl/
分发etcd.service1
2[root@master1 ~]# scp /etc/systemd/system/etcd.service root@192.168.1.196:/etc/systemd/system/
[root@master1 ~]# scp /etc/systemd/system/etcd.service root@192.168.1.197:/etc/systemd/system/
启动 etcd 服务
依次启动所有节点的etcd服务1
2
3systemctl daemon-reload
systemctl enable etcd
systemctl start etcd
- 最先启动的 etcd 进程会卡住一段时间,再等待其它节点上的 etcd 进程加入集群,这是正常现象
1 | [root@master1 ~]# export ETCDCTL_API=3 |
部署Flannel 网络
Flannel是 CoreOS 团队针对 Kubernetes 设计的一个覆盖网络(Overlay Network)工具,其目的在于帮助每一个使用 Kuberentes 的 CoreOS 主机拥有一个完整的子网。简单来说,它的功能是让集群中的不同节点主机创建的Docker容器都具有全集群唯一的虚拟IP地址,因为在默认的Docker配置中,每个节点上的Docker服务会分别负责所在节点容器的IP分配。这样会导致一个问题,不同节点上容器可能获得相同的内外IP地址。Flannel的设计目的就是为集群中的所有节点重新规划IP地址的使用规则,从而使得不同节点上的容器能够获得“同属一个内网”且”不重复的”IP地址,并让属于不同节点上的容器能够直接通过内网IP通信。
Flannel通过Etcd服务维护了一张节点间的路由表
参考:http://dockone.io/article/618
flannel下载地址:https://github.com/coreos/flannel/releases
flannel-v0.10.0本地下载
| 节点名称 | 节点IP | 分配地址 |
|---|---|---|
| node1 | 192.168.1.198 | 172.30.57.0 |
| node2 | 192.168.1.199 | 172.30.41.0 |
在安装有cfssl工具的服务器上操作
创建 TLS 秘钥和证书
etcd 集群启用了双向 TLS 认证,所以需要为 flanneld 指定与 etcd 集群通信的 CA 和秘钥
创建 CA 配置文件
创建 flanneld 证书签名请求1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19[root@master1 ssl]# cat flanneld-csr.json
{
"CN": "flanneld",
"hosts": [],
"key": {
"algo": "rsa",
"size": 2048
},
"names": [
{
"C": "CN",
"ST": "BeiJing",
"L": "BeiJing",
"O": "k8s",
"OU": "System"
}
]
}
[root@master1 ssl]#
- hosts 字段为空
生成 flanneld 证书和私钥1
2
3
4
5[root@master1 ssl]# cfssl gencert -ca=/etc/kubernetes/ssl/ca.pem -ca-key=/etc/kubernetes/ssl/ca-key.pem -config=/etc/kubernetes/ssl/ca-config.json -profile=kubernetes flanneld-csr.json | cfssljson -bare flanneld
[root@master1 ssl]# ls flanneld*
flanneld.csr flanneld-csr.json flanneld-key.pem flanneld.pem
[root@master1 ssl]# mkdir -p /etc/flanneld/ssl
[root@master1 ssl]# cp -a flanneld*.pem /etc/flanneld/ssl
分发flanneld证书1
2[root@master1 ssl]# ssh root@192.168.1.198 "mkdir -p /etc/flanneld/ssl"
[root@master1 ssl]# scp /etc/flanneld/ssl/flanneld* root@192.168.1.198:/etc/flanneld/ssl/1
2[root@master1 ssl]# ssh root@192.168.1.199 "mkdir -p /etc/flanneld/ssl"
[root@master1 ssl]# scp /etc/flanneld/ssl/flanneld* root@192.168.1.199:/etc/flanneld/ssl/
向 etcd 写入集群 Pod 网段信息
使用 etcd v2 API 写入配置
前面执行了export ETCDCTL_API=3,这个命令会调用etcd v3 API,所以需要重新打开一个xshell窗口,否则会报错1
2
3[root@master1 ~]# etcdctl --endpoints=https://192.168.1.195:2379,https://192.168.1.195:2379,https://192.168.1.195:2379 --ca-file=/etc/kubernetes/ssl/ca.pem --cert-file=/etc/flanneld/ssl/flanneld.pem --key-file=/etc/flanneld/ssl/flanneld-key.pem set /kubernetes/network/config '{"Network":"'172.30.0.0/16'", "SubnetLen": 24, "Backend": {"Type": "vxlan"}}'
{"Network":"172.30.0.0/16", "SubnetLen": 24, "Backend": {"Type": "vxlan"}}
[root@master1 ~]# 1
2
3
4#查看POD网段信息
[root@master1 ~]# etcdctl --endpoints=https://192.168.1.195:2379,https://192.168.1.195:2379,https://192.168.1.195:2379 --ca-file=/etc/kubernetes/ssl/ca.pem --cert-file=/etc/flanneld/ssl/flanneld.pem --key-file=/etc/flanneld/ssl/flanneld-key.pem get /kubernetes/network/config
{"Network":"172.30.0.0/16", "SubnetLen": 24, "Backend": {"Type": "vxlan"}}
[root@master1 ~]#
安装和配置 flanneld
在node1/node2节点安装,master不安装flannel1
2
3
4[root@node1 ~]# wget https://github.com/coreos/flannel/releases/download/v0.10.0/flannel-v0.10.0-linux-amd64.tar.gz
[root@node1 ~]# mkdir flannel
[root@node1 ~]# tar -zxf flannel-v0.10.0-linux-amd64.tar.gz -C flannel
[root@node1 ~]# cp -a flannel/{flanneld,mk-docker-opts.sh} /usr/local/bin/1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24[root@node1 ~]# cat flanneld.service
[Unit]
Description=Flanneld overlay address etcd agent
After=network.target
After=network-online.target
Wants=network-online.target
After=etcd.service
Before=docker.service
[Service]
Type=notify
ExecStart=/usr/local/bin/flanneld \
-etcd-cafile=/etc/kubernetes/ssl/ca.pem \
-etcd-certfile=/etc/flanneld/ssl/flanneld.pem \
-etcd-keyfile=/etc/flanneld/ssl/flanneld-key.pem \
-etcd-endpoints=https://192.168.1.195:2379,https://192.168.1.195:2379,https://192.168.1.195:2379 \
-etcd-prefix=/kubernetes/network
ExecStartPost=/usr/local/bin/mk-docker-opts.sh -k DOCKER_NETWORK_OPTIONS -d /run/flannel/docker
Restart=on-failure
[Install]
WantedBy=multi-user.target
RequiredBy=docker.service
[root@node1 ~]#
- mk-docker-opts.sh脚本将分配给 flanneld 的 Pod 子网网段信息写入到 /run/flannel/docker 文件中,后续 docker 启动时使用这个文件中参数值设置 docker0 网桥;
- flanneld 使用系统缺省路由所在的接口和其它节点通信,对于有多个网络接口的机器(如,内网和公网),可以用 –iface 选项值指定通信接口(上面的 systemd unit 文件没指定这个选项),如本着 Vagrant + Virtualbox,就要指定–iface=enp0s8;
1 | [root@node1 ~]# cp flanneld.service /etc/systemd/system/ |
启动 flanneld1
2
3systemctl daemon-reload
systemctl enable flanneld
systemctl start flanneld
docker集成flanneld网络
让docker0网卡使用flanneld网络网段地址
docker.service-yum安装1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28[root@node1 ~]# grep -iv '#' /usr/lib/systemd/system/docker.service
[Unit]
Description=Docker Application Container Engine
Documentation=https://docs.docker.com
After=network-online.target firewalld.service
Wants=network-online.target
[Service]
Type=notify
#新增下面两个参数
ExecStart=/usr/bin/dockerd $DOCKER_NETWORK_OPTIONS
EnvironmentFile=/run/flannel/docker
ExecReload=/bin/kill -s HUP $MAINPID
LimitNOFILE=infinity
LimitNPROC=infinity
LimitCORE=infinity
TimeoutStartSec=0
Delegate=yes
KillMode=process
Restart=on-failure
StartLimitBurst=3
StartLimitInterval=60s
[Install]
WantedBy=multi-user.target
[root@node1 ~]#1
[root@node1 ~]# systemctl restart docker
检查 flanneld 服务1
2
3
4
5
6
7
8
9
10
11[root@node1 ~]# ifconfig flannel.1
flannel.1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1450
inet 172.30.57.0 netmask 255.255.255.255 broadcast 0.0.0.0
inet6 fe80::7890:e0ff:fe20:836e prefixlen 64 scopeid 0x20<link>
ether 7a:90:e0:20:83:6e txqueuelen 0 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 8 overruns 0 carrier 0 collisions 0
[root@node1 ~]#
- 其他节点安装flanneld同上
检查分配给各 flanneld 的 Pod 网段信息
查看集群 Pod 网段(/16)1
2
3[root@master1 ~]# etcdctl --endpoints=https://192.168.1.195:2379,https://192.168.1.195:2379,https://192.168.1.195:2379 --ca-file=/etc/kubernetes/ssl/ca.pem --cert-file=/etc/flanneld/ssl/flanneld.pem --key-file=/etc/flanneld/ssl/flanneld-key.pem get /kubernetes/network/config
{"Network":"172.30.0.0/16", "SubnetLen": 24, "Backend": {"Type": "vxlan"}}
[root@master1 ~]#
查看已分配的 Pod 子网段列表(/24)1
2[root@master1 ~]# etcdctl --endpoints=https://192.168.1.195:2379,https://192.168.1.195:2379,https://192.168.1.195:2379 --ca-file=/etc/kubernetes/ssl/ca.pem --cert-file=/etc/flanneld/ssl/flanneld.pem --key-file=/etc/flanneld/ssl/flanneld-key.pem ls /kubernetes/network/subnets
/kubernetes/network/subnets/172.30.57.0-24
查看某一 Pod 网段对应的 flanneld 进程监听的 IP 和网络参数1
2
3
4[root@master1 ~]# etcdctl --endpoints=https://192.168.1.195:2379,https://192.168.1.195:2379,https://192.168.1.195:2379 --ca-file=/etc/kubernetes/ssl/ca.pem --cert-file=/etc
/flanneld/ssl/flanneld.pem --key-file=/etc/flanneld/ssl/flanneld-key.pem get /kubernetes/network/subnets/172.30.57.0-24
{"PublicIP":"192.168.1.198","BackendType":"vxlan","BackendData":{"VtepMAC":"7a:90:e0:20:83:6e"}}
[root@master1 ~]#
确保各节点间 Pod 网段能互联互通
在各节点上部署完 Flannel 后,查看已分配的 Pod 子网段列表(/24)1
2
3
4[root@master1 ~]# etcdctl --endpoints=https://192.168.1.195:2379,https://192.168.1.195:2379,https://192.168.1.195:2379 --ca-file=/etc/kubernetes/ssl/ca.pem --cert-file=/etc/flanneld/ssl/flanneld.pem --key-file=/etc/flanneld/ssl/flanneld-key.pem ls /kubernetes/network/subnets
/kubernetes/network/subnets/172.30.41.0-24
/kubernetes/network/subnets/172.30.57.0-24
[root@master1 ~]#
当前两个节点分配的 Pod 网段分别是:172.30.42.0-24、172.30.100.0-241
2
3
4
5
6[root@node1 ~]# ip a | egrep -i 'docker0|flannel.1'
3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN
inet 172.30.57.1/24 brd 172.30.57.255 scope global docker0
4: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN
inet 172.30.57.0/32 scope global flannel.1
[root@node1 ~]#1
2
3
4
5
6[root@node2 ~]# ip a | egrep -i 'docker0|flannel.1'
3: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN
inet 172.30.42.0/32 scope global flannel.1
4: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN
inet 172.30.42.1/24 brd 172.30.41.255 scope global docker0
[root@node2 ~]#
在各节点上分配 ping 这两个网段的网关地址,确保能通1
2[root@node1 ~]# ping 172.30.57.0 -c 4
[root@node1 ~]# ping 172.30.42.0 -c 4
部署 k8s master 节点
kubernetes master 节点包含的组件:
- kube-apiserver
- kube-scheduler
- kube-controller-manager
这三个组件需要部署在同一台机器上
kube-scheduler、kube-controller-manager和kube-apiserver三者的功能紧密相关;- 同时只能有一个
kube-scheduler、kube-controller-manager进程处于工作状态,如果运行多个,则需要通过选举产生一个 leader;
kubernetes发布版 tarball(下载脚本) 下载地址:https://github.com/kubernetes/kubernetes/releases
kubernetes CHANGELOG(server/client) 下载地址:https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG.md
| Service | 通讯地址 |
|---|---|
| kube-apiserver | 192.168.1.195:6443 |
| kube-controll | 127.0.0.1:10252 |
| kube-schedule | 127.0.0.1:10251 |
百度网盘packages目录,密码:nwzk1
2
3
4
5[root@master1 ~]# wget https://dl.k8s.io/v1.9.2/kubernetes-server-linux-amd64.tar.gz -O kubernetes-server-linux-amd64-v1.9.2.tar.gz
[root@master1 ~]# tar -zxf kubernetes-server-linux-amd64-v1.9.2.tar.gz
[root@master1 ~]# cd kubernetes
[root@master1 kubernetes]# tar -zxf kubernetes-src.tar.gz
[root@master1 kubernetes]# cp -r server/bin/{kube-apiserver,kube-controller-manager,kube-scheduler,kubectl,kube-proxy,kubelet} /usr/local/bin/
创建 kubernetes 证书
创建 kubernetes 证书签名请求1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29[root@master1 ~]# cd ~/ssl/
[root@master1 ssl]# cat kubernetes-csr.json
{
"CN": "kubernetes",
"hosts": [
"127.0.0.1",
"192.168.1.195",
"172.16.0.1",
"kubernetes",
"kubernetes.default",
"kubernetes.default.svc",
"kubernetes.default.svc.cluster",
"kubernetes.default.svc.cluster.local"
],
"key": {
"algo": "rsa",
"size": 2048
},
"names": [
{
"C": "CN",
"ST": "BeiJing",
"L": "BeiJing",
"O": "k8s",
"OU": "System"
}
]
}
[root@master1 ssl]#
192.168.1.195为当前部署的 master 机器 IP172.16.0.1kubernetes 服务 IP (预分配,一般是 SERVICE_CIDR 中第一个IP)- 如果 hosts 字段不为空则需要指定授权使用该证书的 IP 或域名列表,所以上面分别指定了当前部署的 master 节点主机 IP;
- 还需要添加 kube-apiserver 注册的名为 kubernetes 的服务 IP (Service Cluster IP),一般是 kube-apiserver –service-cluster-ip-range 选项值指定的网段的第一个IP,如 “172.16.0.1”;
生成 kubernetes 证书和私钥1
2
3
4
5
6[root@master1 ~]# cd ssl/
[root@master1 ssl]# cfssl gencert -ca=/etc/kubernetes/ssl/ca.pem -ca-key=/etc/kubernetes/ssl/ca-key.pem -config=/etc/kubernetes/ssl/ca-config.json -profile=kubernetes kubernetes-csr.json | cfssljson -bare kubernetes
[root@master1 ssl]# ls kubernetes*
kubernetes.csr kubernetes-csr.json kubernetes-key.pem kubernetes.pem
[root@master1 ssl]# mkdir -p /etc/kubernetes/ssl/
[root@master1 ssl]# mv kubernetes*.pem /etc/kubernetes/ssl/
配置和启动 kube-apiserver
创建 kube-apiserver 使用的客户端 token 文件
kubelet 首次启动时向 kube-apiserver 发送 TLS Bootstrapping 请求,kube-apiserver 验证 kubelet 请求中的 token 是否与它配置的 token.csv 一致,如果一致则自动为 kubelet生成证书和秘钥1
2
3
4
5
6
7
8
9
10#生成TLS Bootstrapping 使用的 Token
[root@master1 ~]# head -c 16 /dev/urandom | od -An -t x | tr -d ' '
6240b18d950d086ff9eb596e215d243f
[root@master1 ssl]# cat token.csv
6240b18d950d086ff9eb596e215d243f,kubelet-bootstrap,10001,"system:kubelet-bootstrap"
[root@master1 ssl]# mv token.csv /etc/kubernetes/
[root@master1 ssl]# cat basic-auth.csv
admin,admin@123,1
readonly,readonly,2
[root@master1 ssl]# mv basic-auth.csv /etc/kubernetes/
创建 kube-apiserver 的 systemd unit 文件1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43[root@master1 ssl]# cat kube-apiserver.service
[Unit]
Description=Kubernetes API Server
Documentation=https://github.com/GoogleCloudPlatform/kubernetes
After=network.target
[Service]
ExecStart=/usr/local/bin/kube-apiserver \
--admission-control=NamespaceLifecycle,LimitRanger,ServiceAccount,DefaultStorageClass,ResourceQuota \
--advertise-address=192.168.1.195 \
--bind-address=192.168.1.195 \
--insecure-bind-address=192.168.1.195 \
--authorization-mode=RBAC \
--runtime-config=rbac.authorization.k8s.io/v1alpha1 \
--kubelet-https=true \
--token-auth-file=/etc/kubernetes/token.csv \
--service-cluster-ip-range=172.16.0.0/16 \
--service-node-port-range=8400-9000 \
--tls-cert-file=/etc/kubernetes/ssl/kubernetes.pem \
--tls-private-key-file=/etc/kubernetes/ssl/kubernetes-key.pem \
--client-ca-file=/etc/kubernetes/ssl/ca.pem \
--service-account-key-file=/etc/kubernetes/ssl/ca-key.pem \
--etcd-cafile=/etc/kubernetes/ssl/ca.pem \
--etcd-certfile=/etc/kubernetes/ssl/kubernetes.pem \
--etcd-keyfile=/etc/kubernetes/ssl/kubernetes-key.pem \
--etcd-servers=https://192.168.1.195:2379,https://192.168.1.196:2379,https://192.168.1.197:2379 \
--enable-swagger-ui=true \
--allow-privileged=true \
--apiserver-count=3 \
--audit-log-maxage=30 \
--audit-log-maxbackup=3 \
--audit-log-maxsize=100 \
--audit-log-path=/var/lib/audit.log \
--event-ttl=1h \
--v=2
Restart=on-failure
RestartSec=5
Type=notify
LimitNOFILE=65536
[Install]
WantedBy=multi-user.target
[root@master1 ssl]#
--bind-address=192.168.1.195为当前部署的 master 机器 IP,但不能为127.0.0.1--insecure-bind-address=192.168.1.195为当前部署的 master 机器 IP--service-cluster-ip-range指定 Service Cluster IP 地址段,该地址段不能路由可达--service-node-port-range="8400-9000"指定 NodePort 的端口范围--etcd-servers="https://192.168.1.195:2379,https://192.168.1.195:2379,https://192.168.1.195:2379"为etcd 集群服务地址列表- kube-apiserver 1.6 版本开始使用 etcd v3 API 和存储格式
--authorization-mode=RBAC指定在安全端口使用 RBAC 授权模式,拒绝未通过授权的请求- kube-scheduler、kube-controller-manager 一般和 kube-apiserver 部署在同一台机器上,它们使用非安全端口和 kube-apiserver通信
- kube-proxy、kubectl 通过在使用的证书里指定相关的 User、Group 来达到通过 RBAC 授权的目的
- 如果使用了 kubelet TLS Boostrap 机制,则不能再指定
--kubelet-certificate-authority,--kubelet-client-certificate和--kubelet-client-key选项,否则后续 kube-apiserver 校验 kubelet 证书时出现 ”x509: certificate signed by unknown authority“ 错误
1 | [root@master1 ssl]# cp -a kube-apiserver.service /etc/systemd/system/ |
启动kube-apiserver1
2
3[root@master1 ~]# systemctl daemon-reload
[root@master1 ~]# systemctl enable kube-apiserver
[root@master1 ~]# systemctl start kube-apiserver1
2
3
4[root@master1 ~]# netstat -tunlp |grep kube-apiserve
tcp 0 0 192.168.1.195:6443 0.0.0.0:* LISTEN 19206/kube-apiserve
tcp 0 0 192.168.1.195:8080 0.0.0.0:* LISTEN 19206/kube-apiserve
[root@master1 ~]#1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22#不能有任何报错
[root@master1 ~]# systemctl status kube-apiserver
● kube-apiserver.service - Kubernetes API Server
Loaded: loaded (/etc/systemd/system/kube-apiserver.service; enabled; vendor preset: disabled)
Active: active (running) since Mon 2018-01-29 10:53:14 CST; 15min ago
Docs: https://github.com/GoogleCloudPlatform/kubernetes
Main PID: 19206 (kube-apiserver)
CGroup: /system.slice/kube-apiserver.service
└─19206 /usr/local/bin/kube-apiserver --admission-control=NamespaceLifecycle,LimitRanger,ServiceAccount,DefaultStorageClass,ResourceQuota --advertise-address=...
Jan 29 11:08:14 master1.example.com kube-apiserver[19206]: I0129 11:08:14.947346 19206 wrap.go:42] PUT /apis/apiregistration.k8s.io/v1beta1/apiservices/v1beta...95:42960]
Jan 29 11:08:14 master1.example.com kube-apiserver[19206]: I0129 11:08:14.947470 19206 wrap.go:42] PUT /apis/apiregistration.k8s.io/v1beta1/apiservices/v1beta...95:42960]
Jan 29 11:08:14 master1.example.com kube-apiserver[19206]: I0129 11:08:14.948643 19206 wrap.go:42] PUT /apis/apiregistration.k8s.io/v1beta1/apiservices/v1beta...95:42960]
Jan 29 11:08:14 master1.example.com kube-apiserver[19206]: I0129 11:08:14.997905 19206 wrap.go:42] GET /api/v1/services: (1.191044ms) 200 [[kube-apiserver/v1....95:42960]
Jan 29 11:08:15 master1.example.com kube-apiserver[19206]: I0129 11:08:15.003220 19206 wrap.go:42] GET /api/v1/services: (1.016741ms) 200 [[kube-apiserver/v1....95:42960]
Jan 29 11:08:15 master1.example.com kube-apiserver[19206]: I0129 11:08:15.054109 19206 wrap.go:42] GET /api/v1/namespaces/kube-system: (1.388292ms) 200 [[kube...95:42960]
Jan 29 11:08:15 master1.example.com kube-apiserver[19206]: I0129 11:08:15.055377 19206 wrap.go:42] GET /api/v1/namespaces/kube-public: (1.053295ms) 200 [[kube...95:42960]
Jan 29 11:08:15 master1.example.com kube-apiserver[19206]: I0129 11:08:15.405477 19206 wrap.go:42] GET /api/v1/namespaces/default: (1.48537ms) 200 [[kube-apis...95:42960]
Jan 29 11:08:15 master1.example.com kube-apiserver[19206]: I0129 11:08:15.407091 19206 wrap.go:42] GET /api/v1/namespaces/default/services/kubernetes: (1.0559...95:42960]
Jan 29 11:08:15 master1.example.com kube-apiserver[19206]: I0129 11:08:15.408295 19206 wrap.go:42] GET /api/v1/namespaces/default/endpoints/kubernetes: (971.4….195:42960
]Hint: Some lines were ellipsized, use -l to show in full.
[root@master1 ~]#
配置和启动 kube-controller-manager
创建 kube-controller-manager 的 systemd unit 文件1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25[root@master1 ssl]# cat kube-controller-manager.service
[Unit]
Description=Kubernetes Controller Manager
Documentation=https://github.com/GoogleCloudPlatform/kubernetes
[Service]
ExecStart=/usr/local/bin/kube-controller-manager \
--address=127.0.0.1 \
--master=http://192.168.1.195:8080 \
--allocate-node-cidrs=true \
--service-cluster-ip-range=172.16.0.0/16 \
--cluster-cidr=172.30.0.0/16 \
--cluster-name=kubernetes \
--cluster-signing-cert-file=/etc/kubernetes/ssl/ca.pem \
--cluster-signing-key-file=/etc/kubernetes/ssl/ca-key.pem \
--service-account-private-key-file=/etc/kubernetes/ssl/ca-key.pem \
--root-ca-file=/etc/kubernetes/ssl/ca.pem \
--leader-elect=true \
--v=2
Restart=on-failure
RestartSec=5
[Install]
WantedBy=multi-user.target
[root@master1 ssl]#
--address值必须为 127.0.0.1,因为当前 kube-apiserver 期望 scheduler 和 controller-manager 在同一台机器--master=http://192.168.1.195:8080:使用非安全 8080 端口与 kube-apiserver 通信--cluster-cidr指定 Cluster 中 Pod 的 CIDR 范围,该网段在各 Node 间必须路由可达(flanneld保证)--service-cluster-ip-range参数指定 Cluster 中 Service 的CIDR范围,该网络在各 Node 间必须路由不可达,必须和 kube-apiserver 中的参数一致--cluster-signing-*指定的证书和私钥文件用来签名为 TLS BootStrap 创建的证书和私钥--root-ca-file用来对 kube-apiserver 证书进行校验,指定该参数后,才会在Pod 容器的 ServiceAccount 中放置该 CA 证书文件--leader-elect=true部署多台机器组成的 master 集群时选举产生一处于工作状态的 kube-controller-manager 进程
1 | [root@master1 ssl]# cp kube-controller-manager.service /etc/systemd/system/ |
启动 kube-controller-manager1
2
3[root@master1 ~]# systemctl daemon-reload
[root@master1 ~]# systemctl enable kube-controller-manager
[root@master1 ~]# systemctl start kube-controller-manager1
2
3[root@master1 ~]# netstat -tunlp | grep -i kube-controll
tcp 0 0 127.0.0.1:10252 0.0.0.0:* LISTEN 19300/kube-controll
[root@master1 ~]#1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21[root@master1 ~]# systemctl status kube-controller-manager
● kube-controller-manager.service - Kubernetes Controller Manager
Loaded: loaded (/etc/systemd/system/kube-controller-manager.service; enabled; vendor preset: disabled)
Active: active (running) since Thu 2018-02-01 14:57:01 CST; 41s ago
Docs: https://github.com/GoogleCloudPlatform/kubernetes
Main PID: 7798 (kube-controller)
CGroup: /system.slice/kube-controller-manager.service
└─7798 /usr/local/bin/kube-controller-manager --address=127.0.0.1 --master=http://192.168.1.195:8080 --allocate-node-cidrs=true --service-cluster-ip-range=172...
Jan 29 11:10:14 master1.example.com kube-controller-manager[7798]: I0201 14:57:12.832889 7798 controller_utils.go:1019] Waiting for caches to sync for cidrall...ntroller
Jan 29 11:10:14 master1.example.com kube-controller-manager[7798]: I0201 14:57:12.832920 7798 taint_controller.go:181] Starting NoExecuteTaintManager
Jan 29 11:10:15 master1.example.com kube-controller-manager[7798]: I0201 14:57:12.932953 7798 controller_utils.go:1026] Caches are synced for cidrallocator controller
Jan 29 11:10:15 master1.example.com kube-controller-manager[7798]: I0201 14:57:13.549661 7798 resource_quota_controller.go:434] syncing resource quota control... {apps v
Jan 29 11:10:15 master1.example.com kube-controller-manager[7798]: I0201 14:57:13.549766 7798 controller_utils.go:1019] Waiting for caches to sync for resourc...ntroller
Jan 29 11:10:15 master1.example.com kube-controller-manager[7798]: I0201 14:57:13.649852 7798 controller_utils.go:1026] Caches are synced for resource quota controller
Jan 29 11:10:15 master1.example.com kube-controller-manager[7798]: I0201 14:57:13.696714 7798 garbagecollector.go:182] syncing garbage collector with updated ...onregist
Jan 29 11:10:15 master1.example.com kube-controller-manager[7798]: I0201 14:57:15.046578 7798 controller_utils.go:1019] Waiting for caches to sync for garbage...ntroller
Jan 29 11:10:15 master1.example.com kube-controller-manager[7798]: I0201 14:57:15.146718 7798 controller_utils.go:1026] Caches are synced for garbage collecto...ntroller
Jan 29 11:10:15 master1.example.com kube-controller-manager[7798]: I0201 14:57:15.146731 7798 garbagecollector.go:219] synced garbage collector
Hint: Some lines were ellipsized, use -l to show in full.
[root@master1 ~]#
配置和启动 kube-scheduler
创建 kube-scheduler 的 systemd unit 文件1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17[root@master1 ssl]# cat kube-scheduler.service
[Unit]
Description=Kubernetes Scheduler
Documentation=https://github.com/GoogleCloudPlatform/kubernetes
[Service]
ExecStart=/usr/local/bin/kube-scheduler \
--address=127.0.0.1 \
--master=http://192.168.1.195:8080 \
--leader-elect=true \
--v=2
Restart=on-failure
RestartSec=5
[Install]
WantedBy=multi-user.target
[root@master1 ssl]#
--address值必须为 127.0.0.1,因为当前 kube-apiserver 期望 scheduler 和 controller-manager 在同一台机器--master=http:/192.168.1.195:8080:使用非安全 8080 端口与 kube-apiserver 通信--leader-elect=true部署多台机器组成的 master 集群时选举产生一处于工作状态的 kube-controller-manager 进程
1 | [root@master1 ssl]# cp -a kube-scheduler.service /etc/systemd/system/ |
启动 kube-scheduler1
2
3[root@master1 ~]# systemctl daemon-reload
[root@master1 ~]# systemctl enable kube-scheduler
[root@master1 ~]# systemctl start kube-scheduler1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21[root@master1 ~]# systemctl status kube-scheduler
● kube-scheduler.service - Kubernetes Scheduler
Loaded: loaded (/etc/systemd/system/kube-scheduler.service; enabled; vendor preset: disabled)
Active: active (running) since Mon 2018-01-29 11:27:07 CST; 2min 34s ago
Docs: https://github.com/GoogleCloudPlatform/kubernetes
Main PID: 19360 (kube-scheduler)
CGroup: /system.slice/kube-scheduler.service
└─19360 /usr/local/bin/kube-scheduler --address=127.0.0.1 --master=http://192.168.1.195:8080 --leader-elect=true --v=2
Jan 29 11:27:07 master1.example.com systemd[1]: Starting Kubernetes Scheduler...
Jan 29 11:27:07 master1.example.com kube-scheduler[19360]: W0129 11:27:07.324795 19360 server.go:159] WARNING: all flags than --config are deprecated. Please ...ile ASAP.
Jan 29 11:27:07 master1.example.com kube-scheduler[19360]: I0129 11:27:07.325298 19360 server.go:551] Version: v1.9.2
Jan 29 11:27:07 master1.example.com kube-scheduler[19360]: I0129 11:27:07.325412 19360 factory.go:837] Creating scheduler from algorithm provider 'DefaultProvider'
Jan 29 11:27:07 master1.example.com kube-scheduler[19360]: I0129 11:27:07.325419 19360 factory.go:898] Creating scheduler with fit predicates 'map[CheckNodeDi...{} NoDisk
Jan 29 11:27:07 master1.example.com kube-scheduler[19360]: I0129 11:27:07.325564 19360 server.go:570] starting healthz server on 127.0.0.1:10251
Jan 29 11:27:08 master1.example.com kube-scheduler[19360]: I0129 11:27:08.126353 19360 controller_utils.go:1019] Waiting for caches to sync for scheduler controller
Jan 29 11:27:08 master1.example.com kube-scheduler[19360]: I0129 11:27:08.226450 19360 controller_utils.go:1026] Caches are synced for scheduler controller
Jan 29 11:27:08 master1.example.com kube-scheduler[19360]: I0129 11:27:08.226468 19360 leaderelection.go:174] attempting to acquire leader lease...
Jan 29 11:27:08 master1.example.com kube-scheduler[19360]: I0129 11:27:08.232415 19360 leaderelection.go:184] successfully acquired lease kube-system/kube-scheduler
Hint: Some lines were ellipsized, use -l to show in full.
[root@master1 ~]#1
2
3[root@master1 ~]# netstat -tunlp | grep -i kube-schedule
tcp 0 0 127.0.0.1:10251 0.0.0.0:* LISTEN 19360/kube-schedule
[root@master1 ~]#
部署 kubectl client节点
kubernetes发布版 tarball(下载脚本) 下载地址:https://github.com/kubernetes/kubernetes/releases
kubernetes CHANGELOG(server/client) 下载地址:https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG.md
百度网盘packages目录,密码:nwzk1
2
3[root@master1 ~]# wget https://dl.k8s.io/v1.9.2/kubernetes-client-linux-amd64.tar.gz -O kubernetes-client-linux-amd64-v1.9.2.tar.gz
[root@master1 ~]# tar -zxf kubernetes-client-linux-amd64-v1.9.2.tar.gz
[root@master1 ~]# cp -a kubernetes/client/bin/kube* /usr/local/bin/
创建 admin 证书
kubectl 与 kube-apiserver 的安全端口通信,需要为安全通信提供 TLS 证书和秘钥
创建 admin 证书签名请求1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18[root@master1 ssl]# cat admin-csr.json
{
"CN": "admin",
"hosts": [],
"key": {
"algo": "rsa",
"size": 2048
},
"names": [
{
"C": "CN",
"ST": "BeiJing",
"L": "BeiJing",
"O": "system:masters",
"OU": "System"
}
]
}
kube-apiserver使用RBAC对客户端(如kubelet、kube-proxy、Pod)请求进行授权kube-apiserver预定义了一些RBAC使用的RoleBindings,如cluster-admin将 Groupsystem:masters与 Rolecluster-admin绑定,该 Role 授予了调用kube-apiserver所有 API的权限- O 指定该证书的 Group 为
system:masters,kubelet使用该证书访问kube-apiserver时 ,由于证书被 CA 签名,所以认证通过,同时由于证书用户组为经过预授权的system:masters,所以被授予访问所有 API 的权限 - hosts 属性值为空列表
生成admin证书和私钥1
2
3
4[root@master1 ssl]# cfssl gencert -ca=/etc/kubernetes/ssl/ca.pem -ca-key=/etc/kubernetes/ssl/ca-key.pem -config=/etc/kubernetes/ssl/ca-config.json -profile=kubernetes admin-csr.json | cfssljson -bare admin
[root@master1 ssl]# ls admin*
admin.csr admin-csr.json admin-key.pem admin.pem
[root@master1 ssl]# mv admin*.pem /etc/kubernetes/ssl/
创建 kubectl kubeconfig 文件1
2
3
4
5
6
7
8# 设置集群参数
[root@master1 ~]# kubectl config set-cluster kubernetes --certificate-authority=/etc/kubernetes/ssl/ca.pem --embed-certs=true --server=https://192.168.1.195:6443
# 设置客户端认证参数
[root@master1 ~]# kubectl config set-credentials admin --client-certificate=/etc/kubernetes/ssl/admin.pem --embed-certs=true --client-key=/etc/kubernetes/ssl/admin-key.pem
# 设置上下文参数
[root@master1 ~]# kubectl config set-context kubernetes --cluster=kubernetes --user=admin
# 设置默认上下文
[root@master1 ~]# kubectl config use-context kubernetes1
2
3[root@master1 ~]# ls ~/.kube/
config
[root@master1 ~]#
- 设置集群参数和客户端认证参数时
--embed-certs都为true,这会将certificate-authority、client-certificate和client-key指向的证书文件内容写入到生成的kube-proxy.kubeconfig文件中 kube-proxy.pem证书中 CN 为system:kube-proxy,kube-apiserver预定义的 RoleBindingcluster-admin将Usersystem:kube-proxy与 Rolesystem:node-proxier绑定,该 Role 授予了调用kube-apiserverProxy 相关 API 的权限admin.pem证书 O 字段值为system:masters,kube-apiserver预定义的 RoleBindingcluster-admin将 Groupsystem:masters与 Rolecluster-admin绑定,该 Role 授予了调用kube-apiserver相关 API 的权限- 生成的 kubeconfig 被保存到
~/.kube/config文件
分发 kubeconfig 文件
将 ~/.kube/config 文件拷贝到运行 kubelet 命令的机器的 ~/.kube/ 目录下
其他服务器部署kubectl-client工具
先在需要解压安装kubernetes-client-linux-amd64-v1.9.2.tar.gz1
2
3
4[root@master1 ~]# ssh root@192.168.1.196 "mkdir -p /etc/kubernetes/ssl/"
[root@master1 ~]# scp /etc/kubernetes/ssl/admin* root@192.168.1.196:/etc/kubernetes/ssl/
[root@master1 ~]# ssh root@192.168.1.196 "mkdir -p ~/.kube/"
[root@master1 ~]# scp ~/.kube/config root@192.168.1.196:~/.kube/1
2
3
4
5
6
7
8[root@master2 ~]# kubectl get componentstatuses
NAME STATUS MESSAGE ERROR
controller-manager Healthy ok
scheduler Healthy ok
etcd-0 Healthy {"health": "true"}
etcd-1 Healthy {"health": "true"}
etcd-2 Healthy {"health": "true"}
[root@master2 ~]#
如果遇到下面的错误,需要检查kubectl是否可以与api-server通讯1
2[root@node1 ~]# kubectl get componentstatuses
The connection to the server localhost:8080 was refused - did you specify the right host or port?
查看集群状态
查看k8s 查看各组件信息
需要先安装kubectl命令(kubernetes-client)1
2
3
4
5
6
7
8[root@master1 ~]# kubectl get componentstatuses
NAME STATUS MESSAGE ERROR
controller-manager Healthy ok
scheduler Healthy ok
etcd-0 Healthy {"health": "true"}
etcd-1 Healthy {"health": "true"}
etcd-2 Healthy {"health": "true"}
[root@master1 ~]#
或1
2
3
4
5
6
7
8[root@master1 ~]# kubectl -s http://192.168.1.195:8080 get componentstatuses
NAME STATUS MESSAGE ERROR
scheduler Healthy ok
controller-manager Healthy ok
etcd-2 Healthy {"health": "true"}
etcd-1 Healthy {"health": "true"}
etcd-0 Healthy {"health": "true"}
[root@master1 ~]#
查看k8s svc地址(服务虚拟IP)1
2
3
4[root@master1 ~]# kubectl get svc kubernetes
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 172.16.0.1 <none> 443/TCP 1d
[root@master1 ~]#
或1
2
3
4[root@master1 ~]# kubectl get service
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 172.16.0.1 <none> 443/TCP 1d
[root@master1 ~]#
查看集群信息1
2
3[root@master1 ~]# kubectl cluster-info
Kubernetes master is running at https://192.168.1.195:6443
[root@master1 ~]#
部署 K8S Node 节点
kubernetes Node 节点包含如下组件:
- flanneld
- docker
- kubelet
- kube-proxy
安装和配置 kubelet
kubelet 启动时向 kube-apiserver 发送 TLS bootstrapping 请求,需要先将 bootstrap token 文件中的 kubelet-bootstrap 用户赋予 system:node-bootstrapper 角色,然后 kubelet 才有权限创建认证请求(certificatesigningrequests)
kubelet 和kube-proxy 在node1/node2上部署
百度网盘packages目录,密码:nwzk1
2
3[root@node1 ~]# wget https://dl.k8s.io/v1.9.2/kubernetes-server-linux-amd64.tar.gz -O kubernetes-server-linux-amd64-v1.9.2.tar.gz
[root@node1 ~]# tar -zxf kubernetes-server-linux-amd64-v1.9.2.tar.gz
[root@node1 ~]# cp -a kubernetes/server/bin/{kube-proxy,kubelet} /usr/local/bin/
把在安装有cfssl 工具的服务器(这里是在master1)上生成的admin相关证书和key拷贝到node1上1
2
3
4[root@master1 ~]# ssh root@192.168.1.198 "mkdir -p /etc/kubernetes/ssl/"
[root@master1 ~]# scp /etc/kubernetes/ssl/admin* root@192.168.1.198:/etc/kubernetes/ssl/
[root@master1 ~]# ssh root@192.168.1.198 "mkdir -p ~/.kube/"
[root@master1 ~]# scp ~/.kube/config root@192.168.1.198:~/.kube/1
2
3#如果kubectl命令没有,需要安装kubernetes-client-linux-amd64-v1.9.2.tar.gz
#如果有多个节点,这条命令只执行一次
[root@node1 ~]# kubectl create clusterrolebinding kubelet-bootstrap --clusterrole=system:node-bootstrapper --user=kubelet-bootstrap
--user=kubelet-bootstrap是master服务器文件/etc/kubernetes/token.csv中指定的用户名,同时也写入了文件/etc/kubernetes/bootstrap.kubeconfig
如果报下面错误,需要先安装kubectl工具,参考上面操作1
2
3[root@node2 ~]# kubectl create clusterrolebinding kubelet-bootstrap --clusterrole=system:node-bootstrapper --user=kubelet-bootstrap
The connection to the server localhost:8080 was refused - did you specify the right host or port?
[root@node2 ~]#
创建 kubelet bootstrapping kubeconfig 文件1
2
3
4
5
6
7
8
9
10
11# 设置集群参数
[root@node1 ~]# kubectl config set-cluster kubernetes --certificate-authority=/etc/kubernetes/ssl/ca.pem --embed-certs=true --server=https://192.168.1.195:6443 --kubeconfig=bootstrap.kubeconfig
# 设置客户端认证参数
[root@node1 ~]# kubectl config set-credentials kubelet-bootstrap --token=6240b18d950d086ff9eb596e215d243f --kubeconfig=bootstrap.kubeconfig
# 设置上下文参数
[root@node1 ~]# kubectl config set-context default --cluster=kubernetes --user=kubelet-bootstrap --kubeconfig=bootstrap.kubeconfig
# 设置默认上下文
[root@node1 ~]# kubectl config use-context default --kubeconfig=bootstrap.kubeconfig
[root@node1 ~]# ls bootstrap.kubeconfig
bootstrap.kubeconfig
[root@node1 ~]# mv bootstrap.kubeconfig /etc/kubernetes/1
[root@node1 ~]# mkdir -p /var/lib/kubelet
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35[root@node1 ~]# cat kubelet.service
[Unit]
Description=Kubernetes Kubelet
Documentation=https://github.com/GoogleCloudPlatform/kubernetes
After=docker.service
Requires=docker.service
[Service]
WorkingDirectory=/var/lib/kubelet
ExecStart=/usr/local/bin/kubelet \
--address=192.168.1.198 \
--hostname-override=192.168.1.198 \
--pod-infra-container-image=registry.access.redhat.com/rhel7/pod-infrastructure:latest \
--experimental-bootstrap-kubeconfig=/etc/kubernetes/bootstrap.kubeconfig \
--kubeconfig=/etc/kubernetes/kubelet.kubeconfig \
--require-kubeconfig \
--cert-dir=/etc/kubernetes/ssl \
--cluster-dns=223.5.5.5,223.6.6.6,8.8.8.8 \
--cluster-domain=aliyun.com. \
--hairpin-mode promiscuous-bridge \
--allow-privileged=true \
--serialize-image-pulls=false \
--logtostderr=true \
--v=2
#kubelet cAdvisor 默认在所有接口监听 4194 端口的请求, 以下iptables限制内网访问
ExecStartPost=/sbin/iptables -A INPUT -s 172.30.0.0/16 -p tcp --dport 4194 -j ACCEPT
ExecStartPost=/sbin/iptables -A INPUT -s 172.16.0.0/16 -p tcp --dport 4194 -j ACCEPT
ExecStartPost=/sbin/iptables -A INPUT -s 192.168.0.0/16 -p tcp --dport 4194 -j ACCEPT
ExecStartPost=/sbin/iptables -A INPUT -p tcp --dport 4194 -j DROP
Restart=on-failure
RestartSec=5
[Install]
WantedBy=multi-user.target
[root@node1 ~]#
--address不能设置为127.0.0.1,否则后续 Pods 访问 kubelet 的 API 接口时会失败,因为 Pods 访问的 127.0.0.1 指向自己而不是 kubelet。--address设置为当前部署的节点 IP- 如果设置了
--hostname-override选项,则 kube-proxy 也需要设置该选项,否则会出现找不到 Node 的情况。--hostname-override设置为当前部署的节点 IP --cluster-dns设置为集群 DNS 服务 IP (从 SERVICE_CIDR 中预分配),此环境中使用的 SERVICE_CIDR为172.16.0.0/16,可在master的kube-apiserver配置文件中看到--cluster-domain设置为集群 DNS 域名--experimental-bootstrap-kubeconfig指向 bootstrap kubeconfig 文件,kubelet 使用该文件中的用户名和 token 向 kube-apiserver 发送 TLS Bootstrapping 请求- 管理员通过了 CSR 请求后,kubelet 自动在
--cert-dir目录创建证书和私钥文件(kubelet-client.crt和kubelet-client.key),然后写入--kubeconfig文件(自动创建--kubeconfig指定的文件) - 建议在
--kubeconfig配置文件中指定kube-apiserver地址,如果未指定--api-servers选项,则必须指定--require-kubeconfig选项后才从配置文件中读取 kue-apiserver 的地址,否则 kubelet 启动后将找不到 kube-apiserver (日志中提示未找到 API Server),kubectl get nodes不会返回对应的 Node 信息 --cluster-dns指定 kubedns 的 Service IP(可以先分配,后续创建 kubedns 服务时指定该 IP),--cluster-domain指定域名后缀,这两个参数同时指定后才会生效- kubelet cAdvisor 默认在所有接口监听 4194 端口的请求,对于有外网的机器来说不安全,
ExecStartPost选项指定的 iptables 规则只允许内网机器访问 4194 端口 --cluster-dns:集群DNS服务IP(从 SERVICE_CIDR 中预分配)。经测试,如果k8s内部没有搭建DNS,建议使用外部公共DNS地址。如果该DNS不存在,将会影响业务对域名的解析--cluster-domain:集群 DNS 域名。经测试,此DNS域名后缀必须为真实域名后缀,k8s会写入到POD的/etc/resolv.conf文件中。如果使用默认的DNS后缀(cluster.local.),因该DNS后缀不存在,故在解析DNS时会超时,会严重影响业务
1 | [root@node1 ~]# mv kubelet.service /etc/systemd/system/kubelet.service |
启动 kubelet
v1.8版后,需要手动绑定 system:nodes组到 system:node的clusterrole1
2#kubelet-node-clusterbinding为名称,可自定义。如果有多个节点加入,名称不能相同
[root@master1 ~]# kubectl create clusterrolebinding kubelet-node-clusterbinding --clusterrole=system:node --user=system:node:192.168.1.1981
2
3
4
5#启动kubelet需要关闭swap分区
[root@node1 ~]# swapoff -a
[root@node1 ~]# systemctl daemon-reload
[root@node1 ~]# systemctl enable kubelet
[root@node1 ~]# systemctl start kubelet
需要关闭swap分区,否则会报下面的错误1
Feb 1 16:56:59 k8s-4 kubelet: error: failed to run Kubelet: Running with swap on is not supported, please disable swap! or set --fail-swap-on flag to false. /proc/swaps contained: [Filename#011#011#011#011Type#011#011Size#011Used#011Priority /dev/dm-1 partition#0112097148#01116#011-1]
1 | [root@node1 ~]# systemctl status kubelet |
Q&A master日志报RBAC DENY: user “kubelet-bootstrap” groups错误
启动kubelet服务后,需要看master的kube-apiserver日志(/var/log/messages)中没有如下报错,才算启动成功。1
2Feb 1 03:41:53 master1 kube-apiserver: I0201 16:41:53.245112 8202 rbac.go:116] RBAC DENY: user "kubelet-bootstrap" groups ["system:kubelet-bootstrap" "system:authentica
ted"] cannot "create" resource "certificatesigningrequests.certificates.k8s.io/nodeclient" cluster-wide
如果在启动node时,master的/var/log/messages日志报上面的错误,原因如下:
1.8版本之前,开启rbac后,apiserver默认绑定system:nodes组到system:node的clusterrole。但v1.8之后,此绑定默认不存在,需要手工绑定,否则kubelet启动后会报认证错误,使用kubectl get nodes查看无法成为Ready状态。
默认角色与默认角色绑定
API Server会创建一组默认的ClusterRole和ClusterRoleBinding对象。这些默认对象中有许多包含system:前缀,表明这些资源由Kubernetes基础组件“拥有”。 对这些资源的修改可能导致非功能性集群(non-functional cluster)
这个角色定义了kubelets的权限。如果这个角色被修改,可能会导致kubelets无法正常工作。
所有默认的ClusterRole和ClusterRoleBinding对象都会被标记为kubernetes.io/bootstrapping=rbac-defaults
kubectl get clusterrolebinding和kubectl get clusterrole可以查看系统中的角色与角色绑定kubectl get clusterrolebindings system:node -o yaml或kubectl describe clusterrolebindings system:node查看system:node角色绑定的详细信息
查看 system:node 角色绑定的详细信息
system:node角色默认绑定为空1
2
3
4
5
6
7
8
9
10
11[root@master1 ~]# kubectl describe clusterrolebindings system:node
Name: system:node
Labels: kubernetes.io/bootstrapping=rbac-defaults
Annotations: rbac.authorization.kubernetes.io/autoupdate=true
Role:
Kind: ClusterRole
Name: system:node
Subjects:
Kind Name Namespace
---- ---- ---------
[root@master1 ~]#
在整个集群范围内将 system:node ClusterRole 授予用户system:node:192.168.1.198或组system:nodes1
2
3
4
5
6
7
8
9
10
11
12[root@master1 ~]# kubectl describe clusterrolebindings kubelet-node-clusterbinding
Name: kubelet-node-clusterbinding
Labels: <none>
Annotations: <none>
Role:
Kind: ClusterRole
Name: system:node
Subjects:
Kind Name Namespace
---- ---- ---------
User system:node:192.168.1.198
[root@master1 ~]#
通过 kubelet 的 TLS 证书请求
kubelet 首次启动时向 kube-apiserver 发送证书签名请求,必须通过后 kubernetes 系统才会将该 Node 加入到集群
查看未授权的 CSR 请求:
下面如果没有获取到也代表kubelet没有启动成功1
2
3
4
5
6
7[root@master1 ~]# kubectl get csr
NAME AGE REQUESTOR CONDITION
node-csr-DeFIxWS7IZimyAUaZGtIh8q4sp_CiNHL2bO1cwEm26U 9m kubelet-bootstrap Pending
[root@master1 ~]#
[root@master1 ~]# kubectl get nodes
No resources found.
[root@master1 ~]#
通过 CSR 请求:1
2
3[root@master1 ~]# kubectl certificate approve node-csr-DeFIxWS7IZimyAUaZGtIh8q4sp_CiNHL2bO1cwEm26U
certificatesigningrequest "node-csr-DeFIxWS7IZimyAUaZGtIh8q4sp_CiNHL2bO1cwEm26U" approved
[root@master1 ~]#1
2
3
4[root@master1 ~]# kubectl get csr
NAME AGE REQUESTOR CONDITION
node-csr-DeFIxWS7IZimyAUaZGtIh8q4sp_CiNHL2bO1cwEm26U 13m kubelet-bootstrap Approved,Issued
[root@master1 ~]#
自动生成了 kubelet kubeconfig 文件和公私钥1
2
3
4
5
6
7
8[root@node1 ~]# ls -l /etc/kubernetes/kubelet.kubeconfig
-rw------- 1 root root 2280 Jan 31 19:19 /etc/kubernetes/kubelet.kubeconfig
[root@node1 ~]# ls -l /etc/kubernetes/ssl/kubelet*
-rw-r--r-- 1 root root 1046 Jan 31 19:19 /etc/kubernetes/ssl/kubelet-client.crt
-rw------- 1 root root 227 Jan 31 19:18 /etc/kubernetes/ssl/kubelet-client.key
-rw-r--r-- 1 root root 1115 Jan 31 19:15 /etc/kubernetes/ssl/kubelet.crt
-rw------- 1 root root 1675 Jan 31 19:15 /etc/kubernetes/ssl/kubelet.key
[root@node1 ~]#
如果节点状态变为Ready才算成功,查看日志都已经正常1
2
3
4[root@master1 ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
192.168.1.198 Ready <none> 25m v1.9.2
[root@master1 ~]#
参考:http://blog.csdn.net/zhaihaifei/article/details/79098564
多个Node节点加入
1 | [root@master1 ~]# kubectl describe clusterrolebindings kubelet-node199-clusterbinding |
1 | [root@master1 ~]# kubectl get nodes |
配置 kube-proxy
创建 kube-proxy 证书签名请求1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19[root@master1 ssl]# cat kube-proxy-csr.json
{
"CN": "system:kube-proxy",
"hosts": [],
"key": {
"algo": "rsa",
"size": 2048
},
"names": [
{
"C": "CN",
"ST": "BeiJing",
"L": "BeiJing",
"O": "k8s",
"OU": "System"
}
]
}
[root@master1 ssl]#
- CN 指定该证书的 User 为
system:kube-proxy; - kube-apiserver 预定义的 RoleBinding
system:node-proxier将Usersystem:kube-proxy与 Rolesystem:node-proxier绑定,该 Role 授予了调用kube-apiserverProxy 相关 API 的权限; - hosts 属性值为空列表
生成 kube-proxy 客户端证书和私钥1
2
3
4[root@master1 ssl]# cfssl gencert -ca=/etc/kubernetes/ssl/ca.pem -ca-key=/etc/kubernetes/ssl/ca-key.pem -config=/etc/kubernetes/ssl/ca-config.json -profile=kubernetes kube-proxy-csr.json | cfssljson -bare kube-proxy
[root@master1 ssl]# ls kube-proxy*
kube-proxy.csr kube-proxy-csr.json kube-proxy-key.pem kube-proxy.pem
[root@master1 ssl]#
把在安装有cfssl 工具的服务器(这里是在master1)上生成的kube-proxy 相关证书和key拷贝到node1上1
[root@master1 ssl]# scp kube-proxy*.pem root@192.168.1.198:/etc/kubernetes/ssl/
创建 kube-proxy kubeconfig 文件1
2
3
4
5
6
7
8# 设置集群参数
[root@node1 ~]# kubectl config set-cluster kubernetes --certificate-authority=/etc/kubernetes/ssl/ca.pem --embed-certs=true --server=https://192.168.1.195:6443 --kubeconfig=kube-proxy.kubeconfig
# 设置客户端认证参数
[root@node1 ~]# kubectl config set-credentials kube-proxy --client-certificate=/etc/kubernetes/ssl/kube-proxy.pem --client-key=/etc/kubernetes/ssl/kube-proxy-key.pem --embed-certs=true --kubeconfig=kube-proxy.kubeconfig
#设置上下文参数
[root@node1 ~]# kubectl config set-context default --cluster=kubernetes --user=kube-proxy --kubeconfig=kube-proxy.kubeconfig
#设置默认上下文
[root@node1 ~]# kubectl config use-context default --kubeconfig=kube-proxy.kubeconfig1
[root@node1 ~]# mv kube-proxy.kubeconfig /etc/kubernetes/
- 设置集群参数和客户端认证参数时
--embed-certs都为true,这会将certificate-authority、client-certificate和client-key指向的证书文件内容写入到生成的kube-proxy.kubeconfig文件中 kube-proxy.pem证书中 CN 为system:kube-proxy,kube-apiserver预定义的 RoleBindingcluster-admin将Usersystem:kube-proxy与 Rolesystem:node-proxier绑定,该 Role 授予了调用kube-apiserver Proxy相关 API 的权限
创建 kube-proxy 的 systemd unit 文件
创建工作目录1
[root@node1 ~]# mkdir -p /var/lib/kube-proxy
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22[root@node1 ~]# cat kube-proxy.service
[Unit]
Description=Kubernetes Kube-Proxy Server
Documentation=https://github.com/GoogleCloudPlatform/kubernetes
After=network.target
[Service]
WorkingDirectory=/var/lib/kube-proxy
ExecStart=/usr/local/bin/kube-proxy \
--bind-address=192.168.1.198 \
--hostname-override=192.168.1.198 \
--cluster-cidr=172.30.0.0/16 \
--kubeconfig=/etc/kubernetes/kube-proxy.kubeconfig \
--logtostderr=true \
--v=2
Restart=on-failure
RestartSec=5
LimitNOFILE=65536
[Install]
WantedBy=multi-user.target
[root@node1 ~]#
--hostname-override参数值必须与 kubelet 的值一致,否则 kube-proxy 启动后会找不到该 Node,从而不会创建任何 iptables 规则--cluster-cidr必须与 kube-controller-manager 的--cluster-cidr选项值一致,172.30.0.0/16- kube-proxy 根据
--cluster-cidr判断集群内部和外部流量,指定--cluster-cidr或--masquerade-all选项后 kube-proxy 才会对访问 Service IP 的请求做 SNAT --kubeconfig指定的配置文件嵌入了 kube-apiserver 的地址、用户名、证书、秘钥等请求和认证信息- 预定义的 RoleBinding
cluster-admin将Usersystem:kube-proxy与 Rolesystem:node-proxier绑定,该 Role 授予了调用kube-apiserverProxy 相关 API 的权限
启动 kube-proxy1
2
3
4[root@node1 ~]# cp kube-proxy.service /etc/systemd/system/
[root@node1 ~]# systemctl daemon-reload
[root@node1 ~]# systemctl enable kube-proxy
[root@node1 ~]# systemctl start kube-proxy1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22[root@node1 ~]# systemctl status kube-proxy
● kube-proxy.service - Kubernetes Kube-Proxy Server
Loaded: loaded (/etc/systemd/system/kube-proxy.service; enabled; vendor preset: disabled)
Active: active (running) since Fri 2018-02-02 14:24:10 CST; 15s ago
Docs: https://github.com/GoogleCloudPlatform/kubernetes
Main PID: 3660 (kube-proxy)
Memory: 8.7M
CGroup: /system.slice/kube-proxy.service
‣ 3660 /usr/local/bin/kube-proxy --bind-address=192.168.1.198 --hostname-override=192.168.1.198 --cluster-cidr=172.30.0.0/16 --kubeconfig=/etc/kubernetes/kube...
Feb 02 14:24:10 node1.example.com kube-proxy[3660]: I0202 14:24:10.223982 3660 conntrack.go:98] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_established' to 86400
Feb 02 14:24:10 node1.example.com kube-proxy[3660]: I0202 14:24:10.224981 3660 conntrack.go:98] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_close_wait' to 3600
Feb 02 14:24:10 node1.example.com kube-proxy[3660]: I0202 14:24:10.225415 3660 config.go:202] Starting service config controller
Feb 02 14:24:10 node1.example.com kube-proxy[3660]: I0202 14:24:10.225427 3660 controller_utils.go:1019] Waiting for caches to sync for service config controller
Feb 02 14:24:10 node1.example.com kube-proxy[3660]: I0202 14:24:10.225514 3660 config.go:102] Starting endpoints config controller
Feb 02 14:24:10 node1.example.com kube-proxy[3660]: I0202 14:24:10.225523 3660 controller_utils.go:1019] Waiting for caches to sync for endpoints config controller
Feb 02 14:24:10 node1.example.com kube-proxy[3660]: I0202 14:24:10.325575 3660 controller_utils.go:1026] Caches are synced for service config controller
Feb 02 14:24:10 node1.example.com kube-proxy[3660]: I0202 14:24:10.325624 3660 proxier.go:984] Not syncing iptables until Services and Endpoints have been re...om master
Feb 02 14:24:10 node1.example.com kube-proxy[3660]: I0202 14:24:10.326087 3660 controller_utils.go:1026] Caches are synced for endpoints config controller
Feb 02 14:24:10 node1.example.com kube-proxy[3660]: I0202 14:24:10.326163 3660 proxier.go:329] Adding new service port "default/kubernetes:https" at 172.16.0.1:443/TCP
Hint: Some lines were ellipsized, use -l to show in full.
[root@node1 ~]#
验证集群功能
定义文件:
nginx-ds.yml1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36[root@master1 ~]# cat nginx-ds.yml
apiVersion: v1
kind: Service
metadata:
name: nginx-ds
labels:
app: nginx-ds
spec:
type: NodePort
selector:
app: nginx-ds
ports:
- name: http
port: 80
targetPort: 80
---
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
name: nginx-ds
labels:
addonmanager.kubernetes.io/mode: Reconcile
spec:
template:
metadata:
labels:
app: nginx-ds
spec:
containers:
- name: my-nginx
image: nginx:1.7.9
ports:
- containerPort: 80
[root@master1 ~]#1
[root@master1 ~]# kubectl create -f nginx-ds.yml
检查各 Node 上的 Pod IP 连通性
当前k8s上有两个Node,在nginx-ds.yml中使用的是DaemonSet模式,所以会在每个Node上启动这个Pod1
2#DaemonSet官方解释
A DaemonSet ensures that all (or some) Nodes run a copy of a Pod. As nodes are added to the cluster, Pods are added to them. As nodes are removed from the cluster, those Pods are garbage collected. Deleting a DaemonSet will clean up the Pods it created.
参考:https://kubernetes.io/docs/concepts/workloads/controllers/daemonset/#what-is-a-daemonset1
2
3
4
5
6#两个nginx容器的IP地址不同
[root@master1 ~]# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE
nginx-ds-9p9pl 1/1 Running 0 4s 172.30.57.2 192.168.1.198
nginx-ds-tvp6b 1/1 Running 0 4s 172.30.41.2 192.168.1.199
[root@master1 ~]#
检查服务 IP 和端口可达性1
2
3[root@master1 ~]# kubectl get svc |grep nginx-ds
nginx-ds NodePort 172.16.117.40 <none> 80:8446/TCP 5m
[root@master1 ~]#
- 服务IP:172.16.117.40
- 服务端口:80
- NodePort端口:8446
在所有 Node 上执行,IP为nginx-ds的CLUSTER-IP(执行kubectl get svc |grep nginx-ds命令获取)1
[root@node1 ~]# curl 172.16.117.40
检查服务的 NodePort 可达性
在外部机器的浏览器访问 http://192.168.1.198:8446/
预期输出 nginx 欢迎页面内容
部署 kubedns 插件
官方文件目录:kubernetes/cluster/addons/dns
https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/#pods-dns-config
系统预定义的 RoleBinding
预定义的 RoleBinding system:kube-dns 将 kube-system 命名空间的 kube-dns ServiceAccount 与 system:kube-dns Role 绑定, 该 Role 具有访问 kube-apiserver DNS 相关 API 的权限1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22[root@master1 ~]# kubectl get clusterrolebindings system:kube-dns -o yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
annotations:
rbac.authorization.kubernetes.io/autoupdate: "true"
creationTimestamp: 2018-01-31T11:03:07Z
labels:
kubernetes.io/bootstrapping: rbac-defaults
name: system:kube-dns
resourceVersion: "86"
selfLink: /apis/rbac.authorization.k8s.io/v1/clusterrolebindings/system%3Akube-dns
uid: 51871810-0676-11e8-8cb0-1e2d0a5bc3f5
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:kube-dns
subjects:
- kind: ServiceAccount
name: kube-dns
namespace: kube-system
[root@master1 ~]#kubedns-controller.yaml 中定义的 Pods 时使用了 kubedns-sa.yaml 文件定义的 kube-dns ServiceAccount,所以具有访问 kube-apiserver DNS 相关 API 的权限
配置 kube-dns 服务
busybox.yaml
kube-dns.yaml1
2
3
4
5
6
7
8
9
10
11
12[root@master1 dns]# pwd
/root/dns
[root@master1 dns]# ls
busybox.yaml kube-dns.yaml
[root@master1 dns]# egrep -i 'clusterIP|image|domain|server=/cluster' kube-dns.yaml
clusterIP: 172.16.0.2
image: registry.cn-hangzhou.aliyuncs.com/google_containers/k8s-dns-kube-dns-amd64:1.14.7
- --domain=cluster.local.
image: registry.cn-hangzhou.aliyuncs.com/google_containers/k8s-dns-dnsmasq-nanny-amd64:1.14.7
- --server=/cluster.local/127.0.0.1#10053
image: registry.cn-hangzhou.aliyuncs.com/google_containers/k8s-dns-sidecar-amd64:1.14.7
[root@master1 dns]#
- 需要将
clusterIP设置为集群环境变量中变量CLUSTER_DNS_SVC_IP值,这个 IP 需要和 kubelet 的—cluster-dns参数值一致;
配置 kube-dns Deployment
--domain为集群环境文档 变量 CLUSTER_DNS_DOMAIN 的值- 使用系统已经做了 RoleBinding 的
kube-dnsServiceAccount,该账户具有访问 kube-apiserver DNS 相关 API 的权限
创建DNSkubectl create -f kube-dns.yaml创建dns,kubectl delete -f kube-dns.yaml删除dns1
2
3
4
5
6[root@master1 ~]# kubectl create -f dns/busybox.yaml
[root@master1 ~]# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE
busybox 1/1 Running 0 19s 172.30.57.2 192.168.1.198
[root@master1 ~]# 1
2
3
4
5
6
7
8
9
10[root@master1 ~]# kubectl create -f dns/kube-dns.yaml
[root@master1 ~]#
[root@master1 ~]# kubectl get pods -o wide -n kube-system
NAME READY STATUS RESTARTS AGE IP NODE
kube-dns-9d8b5fb76-vz6ll 3/3 Running 0 1h 172.30.41.5 192.168.1.199
[root@master1 ~]#
[root@master1 ~]# kubectl get svc -o wide -n kube-system
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
kube-dns ClusterIP 172.16.0.2 <none> 53/UDP,53/TCP 1h k8s-app=kube-dns
[root@master1 ~]#
如果报错了,可以通过这条命令查看报错日志1
[root@master1 ~]# kubectl describe pod kube-dns-9d8b5fb76-vz6ll --namespace=kube-system
创建nginx 服务测试dns
nginx-deployment.yaml
nginx-service.yaml1
2[root@master1 ~]# kubectl create -f nginx-deployment.yaml
[root@master1 ~]# kubectl create -f nginx-service.yaml1
2
3
4
5
6
7
8
9
10[root@master1 ~]# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE
nginx-deployment-d8d99448f-rb57v 1/1 Running 0 11m 172.30.41.2 192.168.1.199
[root@master1 ~]#
[root@master1 ~]# kubectl get svc -o wide
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
kubernetes ClusterIP 172.16.0.1 <none> 443/TCP 7d <none>
nginx-service NodePort 172.16.215.237 <none> 88:8527/TCP 2d app=nginx
[root@master1 ~]#1
2
3
4
5
6
7[root@master1 ~]# kubectl exec busybox -it nslookup nginx-service
Server: 172.16.0.2
Address 1: 172.16.0.2 kube-dns.kube-system.svc.cluster.local
Name: nginx-service
Address 1: 172.16.215.237 nginx-service.default.svc.cluster.local
[root@master1 ~]#
部署 heapster 插件
heapster release下载页面:https://github.com/kubernetes/heapster/releases
heapster-v1.5.0.tar.gz1
2
3
4
5
6[root@master1 ~]# wget https://codeload.github.com/kubernetes/heapster/tar.gz/v1.5.0 -O heapster-v1.5.0.tar.gz
[root@master1 ~]# tar -zxf heapster-v1.5.0.tar.gz
[root@master1 ~]# cd heapster-1.5.0/deploy/kube-config/influxdb
[root@master1 influxdb]# ls
grafana.yaml heapster.yaml influxdb.yaml
[root@master1 influxdb]#
配置rbac
heapster-rbac.yaml
无需修改配置1
2
3[root@master1 influxdb]# pwd
/root/heapster-1.5.0/deploy/kube-config/influxdb
[root@master1 influxdb]# kubectl create -f ../rbac/heapster-rbac.yaml
如果不执行heapster-rbac.yaml文件,则会报下面的错误:1
E0518 06:08:09.927460 1 reflector.go:190] k8s.io/heapster/metrics/util/util.go:30: Failed to list *v1.Node: nodes is forbidden: User "system:serviceaccount:kube-system:heapster" cannot list nodes at the cluster scope
如果在启动docker容器时,一直处于ContainerCreating状态,且最后报下面的错误:1
2Warning FailedCreatePodSandBox 34s (x11 over 43s) kubelet, 192.168.1.204 Failed create pod sandbox.
Normal SandboxChanged 33s (x11 over 43s) kubelet, 192.168.1.204 Pod sandbox changed, it will be killed and re-created.
执行journalctl --since 01:02:00 -u kubelet查看日志发现,正在下载registry.access.redhat.com/rhel7/pod-infrastructure:latestdocker镜像。
解决办法是:kube-node节点缺少registry.access.redhat.com/rhel7/pod-infrastructure:latestdocker镜像。1
docker pull registry.access.redhat.com/rhel7/pod-infrastructure:latest
配置 influxdb-deployment
influxdb.yaml1
2
3
4[root@master1 influxdb]# egrep -i 'image|nodeport' influxdb.yaml
image: lvanneo/heapster-influxdb-amd64:v1.1.1 #修改image镜像
type: NodePort
[root@master1 influxdb]#1
2
3[root@master1 influxdb]# pwd
/root/heapster-1.5.0/deploy/kube-config/influxdb
[root@master1 influxdb]# kubectl create -f influxdb.yaml
查看influxdb数据库集群地址和端口1
2
3
4[root@master1 influxdb]# kubectl get svc -o wide -n kube-system
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
monitoring-influxdb NodePort 172.16.38.42 <none> 8086:8898/TCP 14s k8s-app=influxdb
[root@master1 influxdb]#
配置 heapster-deployment
heapster.yaml1
2
3
4
5[root@master1 influxdb]# egrep -i 'image:|source|sink' heapster.yaml
image: lvanneo/heapster-amd64:v1.3.0-beta.1 #修改image镜像
- --source=kubernetes:https://192.168.1.195:6443 #指定kube-apiserver认证地址
- --sink=influxdb:http://172.16.38.42:8086 #指定influxdb数据库CLUSTER-IP地址
[root@master1 influxdb]#1
2
3[root@master1 influxdb]# pwd
/root/heapster-1.5.0/deploy/kube-config/influxdb
[root@master1 influxdb]# kubectl create -f heapster.yaml
查看heapster集群地址和端口1
2
3
4 [root@master1 influxdb]# kubectl get svc -o wide -n kube-system
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
heapster ClusterIP 172.16.202.225 <none> 80/TCP 12s k8s-app=heapster
[root@master1 influxdb]#
配置 grafana-deployment
grafana.yaml
修改如下配置1
2
3
4
5
6[root@master1 influxdb]# egrep -i 'image|value: /|type: nodeport' grafana.yaml
image: lvanneo/heapster-grafana-amd64:v4.0.2 #修改image镜像
# value: /api/v1/namespaces/kube-system/services/monitoring-grafana/proxy #默认
value: / #默认
type: NodePort #去掉注释
[root@master1 influxdb]#1
2
3[root@master1 influxdb]# pwd
/root/heapster-1.5.0/deploy/kube-config/influxdb
[root@master1 influxdb]# kubectl create -f grafana.yaml1
2
3[root@master1 influxdb]# kubectl get svc -o wide -n kube-system
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
monitoring-grafana NodePort 172.16.207.209 <none> 80:8642/TCP 31m k8s-app=grafana
检查 Deployment1
2
3
4
5[root@master1 ~]# kubectl get deployments -n kube-system | grep -E 'heapster|monitoring'
heapster 1 1 1 1 25m
monitoring-grafana 1 1 1 1 53s
monitoring-influxdb 1 1 1 1 28m
[root@master1 ~]#
检查 Pods1
2
3
4
5
6[root@master1 ~]# kubectl get pods -n kube-system -o wide
NAME READY STATUS RESTARTS AGE IP NODE
heapster-5fc8f648dc-pn4lm 1/1 Running 0 1h 172.30.57.4 192.168.1.198
monitoring-grafana-57b8fcd7b4-67r2j 1/1 Running 0 28m 172.30.57.5 192.168.1.198
monitoring-influxdb-68d87d45f5-5d7qs 1/1 Running 0 1h 172.30.41.3 192.168.1.199
[root@master1 ~]#
需要逐一检查下各docker的logs有没有报错信息
访问grafana
访问 http://192.168.1.198:8642/
配置influxdb-datasource
部署 dashboard 插件
kubernetes-dashboard.yaml
通过rbac 认证kind: ServiceAccount调用认证文件(/etc/kubernetes/bootstrap.kubeconfig和/etc/kubernetes/kube-proxy.kubeconfig),访问的接口1
2
3
4
5
6
7[root@master1 ~]# egrep -i 'image|apiserver|heapster-host|nodeport' kubernetes-dashboard.yaml
image: k8scn/kubernetes-dashboard-amd64:v1.8.0
- --apiserver-host=http://192.168.1.195:8080
- --heapster-host=http://172.16.202.225
type: NodePort
#nodePort: 38443 #k8s kind:server 可以使用nodePort指定端口
[root@master1 ~]#1
[root@master1 ~]# kubectl create -f kubernetes-dashboard.yaml
查看kubernetes-dashboard分配的Node1
2
3[root@master1 ~]# kubectl get pod -o wide -n kube-system
NAME READY STATUS RESTARTS AGE IP NODE
kubernetes-dashboard-666fbbf977-v9vsh 1/1 Running 0 49s 172.30.41.4 192.168.1.199
查看kubernetes-dashboard分配的 NodePort1
2
3[root@master1 ~]# kubectl get svc -o wide -n kube-system
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
kubernetes-dashboard NodePort 172.16.59.24 <none> 443:8847/TCP 39m k8s-app=kubernetes-dashboard
- NodePort 8847映射到 dashboard pod 80端口
检查 controller1
2
3
4[root@master1 ~]# kubectl get deployment kubernetes-dashboard -n kube-system
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
kubernetes-dashboard 1 1 1 1 15m
[root@master1 ~]#
获取集群服务地址列表1
2
3
4
5
6
7
8
9[root@master1 ~]# kubectl cluster-info
Kubernetes master is running at https://192.168.1.195:6443
Heapster is running at https://192.168.1.195:6443/api/v1/namespaces/kube-system/services/heapster/proxy
KubeDNS is running at https://192.168.1.195:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
monitoring-grafana is running at https://192.168.1.195:6443/api/v1/namespaces/kube-system/services/monitoring-grafana/proxy
monitoring-influxdb is running at https://192.168.1.195:6443/api/v1/namespaces/kube-system/services/monitoring-influxdb/proxy
To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
[root@master1 ~]#
访问dashboard
kubernetes-dashboard 服务暴露了 NodePort,可以使用 http://NodeIP:nodePort 地址访问 dashboard
访问 https://192.168.1.199:8847 访问k8s dashboard,经测试火狐浏览器可以,但360浏览器打不开,报404
如果不安装Heapster/influxdb等插件,k8s-dashboard不能展示Pod,Nodes的CPU,内存等 metric 图形
保存镜像
导出镜像
镜像保存在百度网盘packages目录,密码:nwzk1
2
3
4
5
6
7
8docker save k8scn/kubernetes-dashboard-amd64:v1.8.0 > kubernetes-dashboard-amd64-v1.8.0.tar.gz
docker save registry.cn-hangzhou.aliyuncs.com/google_containers/k8s-dns-sidecar-amd64:1.14.7 > k8s-dns-sidecar-amd64-1.14.7.tar.gz
docker save registry.cn-hangzhou.aliyuncs.com/google_containers/k8s-dns-kube-dns-amd64:1.14.7 > k8s-dns-kube-dns-amd64-1.14.7.tar.gz
docker save registry.cn-hangzhou.aliyuncs.com/google_containers/k8s-dns-dnsmasq-nanny-amd64:1.14.7 > k8s-dns-dnsmasq-nanny-amd64-1.14.7.tar.gz
docker save lvanneo/heapster-influxdb-amd64:v1.1.1 > heapster-influxdb-amd64-v1.1.1.tar.gz
docker save lvanneo/heapster-grafana-amd64:v4.0.2 > heapster-grafana-amd64-v4.0.2.tar.gz
docker save lvanneo/heapster-amd64:v1.3.0-beta.1 > heapster-amd64-v1.3.0-beta.1.tar.gz
docker save k8scn/kubernetes-dashboard-amd64:v1.8.0 > kubernetes-dashboard-amd64-v1.8.0.tar.gz
导入镜像1
docker load -i kubernetes-dashboard-amd64-v1.8.0.tar.gz
参考:https://github.com/opsnull/follow-me-install-kubernetes-cluster
本作品采用知识共享署名 2.5 中国大陆许可协议进行许可,欢迎转载,但转载请注明来自Jack Wang Blog,并保持转载后文章内容的完整。本人保留所有版权相关权利。
本文出自”Jack Wang Blog”:http://www.yfshare.vip/2018/02/23/%E9%83%A8%E7%BD%B2TLS-k8s/