glusterfs做持久化存储

Glusterfs简介
GlusterFS是Scale-Out存储解决方案Gluster的核心，它是一个开源的分布式文件系统，具有强大的横向扩展能力，通过扩展能够支持数PB存储容量和处理数千客户端。GlusterFS借助TCP/IP或InfiniBandRDMA网络将物理分布的存储资源聚集在一起，使用单一全局命名空间来管理数据
Glusterfs特点
- 扩展性和高性能
  GlusterFS利用双重特性来提供几TB至数PB的高扩展存储解决方案。Scale-Out架构允许通过简单地增加资源来提高存储容量和性能，磁盘、计算和I/O资源都可以独立增加，支持10GbE和InfiniBand等高速网络互联。Gluster弹性哈希（ElasticHash）解除了GlusterFS对元数据服务器的需求，消除了单点故障和性能瓶颈，真正实现了并行化数据访问
- 高可用性
  GlusterFS可以对文件进行自动复制，如镜像或多次复制，从而确保数据总是可以访问，甚至是在硬件故障的情况下也能正常访问。自我修复功能能够把数据恢复到正确的状态，而且修复是以增量的方式在后台执行，几乎不会产生性能负载。GlusterFS没有设计自己的私有数据文件格式，而是采用操作系统中主流标准的磁盘文件系统（如EXT3、ZFS）来存储文件，因此数据可以使用各种标准工具进行复制和访问
- 弹性卷管理
  数据储存在逻辑卷中，逻辑卷可以从虚拟化的物理存储池进行独立逻辑划分而得到。存储服务器可以在线进行增加和移除，不会导致应用中断。逻辑卷可以在所有配置服务器中增长和缩减，可以在不同服务器迁移进行容量均衡，或者增加和移除系统，这些操作都可在线进行。文件系统配置更改也可以实时在线进行并应用，从而可以适应工作负载条件变化或在线性能调优

环境说明

主机名	IP	Size	备注
master1.example.com	192.168.1.195	50G	glusterfs
master2.example.com	192.168.1.196	50G	glusterfs
master3.example.com	192.168.1.197	50G	glusterfs
node1.example.com	192.168.1.198	50G	glusterfs
node2.example.com	192.168.1.199	50G	glusterfs

环境：
　　　centos 7.4
　　　glusterfs v3.12.6
　　　k8s v1.8.4

部署glusterfs

以下操作所有准备做glusterfs存储的节点均要操作
同步时间

1	[root@master1 ~]# ntpdate time.windows.com

安装gluster 源

1	[root@master1 ~]# yum install -y centos-release-gluster

安装glusterfs 组件

1	[root@master1 ~]# yum install -y glusterfs glusterfs-server glusterfs-fuse glust

创建glusterfs 工作目录

1	[root@master1 ~]# mkdir -p /opt/glusterd

修改glusterd 工作目录

1	[root@master1 ~]# sed -i 's/var\/lib/opt/g' /etc/glusterfs/glusterd.vol

启动glusterfs

1 2	[root@master1 ~]# systemctl enable glusterd [root@master1 ~]# systemctl start glusterd

1
2
3

[root@master1 ~]# netstat -tunlp | grep -i glusterd
tcp        0      0 0.0.0.0:24007           0.0.0.0:*               LISTEN      25387/glusterd      
[root@master1 ~]#

配置glusterfs

[root@master1 ~]# tail -5 /etc/hosts
192.168.1.195 master1.example.com master1
192.168.1.196 master2.example.com master2
192.168.1.197 master3.example.com master3
192.168.1.198 node1.example.com node1
192.168.1.199 node2.example.com node2
[root@master1 ~]#

开放端口
如果有防火墙需要打开glusterd端口24007

1	[root@master1 ~]# iptables -I INPUT -p tcp --dport 24007 -j ACCEPT

创建存储目录

1	[root@master1 ~]# mkdir /data/gfs_data

添加节点到集群
执行操作的本机不需要probe 本机

[root@master1 ~]# gluster peer probe master2.example.com
[root@master1 ~]# gluster peer probe master3.example.com
[root@master1 ~]# gluster peer probe node1.example.com
[root@master1 ~]# gluster peer probe node2.example.com

查看集群状态

[root@master1 ~]# gluster peer status
Number of Peers: 4

Hostname: master2.example.com
Uuid: b91ba577-cccd-43cc-918a-44e001f3d7bf
State: Peer in Cluster (Connected)

Hostname: master3.example.com
Uuid: e0652345-9817-4a08-aec0-3673723002ed
State: Peer in Cluster (Connected)

Hostname: node1.example.com
Uuid: 34b3faed-9335-41ca-9ed5-734ed140f6f3
State: Peer in Cluster (Connected)

Hostname: node2.example.com
Uuid: f1ad8a8e-1469-4558-9a35-f07cbbc679f2
State: Peer in Cluster (Connected)
[root@master1 ~]#

配置volume

GlusterFS 中的volume 的模式分为：

分布卷(默认模式)：即DHT, 也叫分布卷: 将文件已hash 算法随机分布到每台服务器节点中存储
复制模式：即AFR, 创建volume 时带replica x 数量: 将文件复制到replica x 个节点中
条带模式：即Striped, 创建volume 时带stripe x 数量：将文件切割成数据块，分别存储到stripe x 个节点中( 类似raid 0 )
分布式条带模式：最少需要4 台服务器才能创建。创建volume 时stripe 2 server = 4 个节点：是DHT 与Striped 的组合型
分布式复制模式：最少需要4 台服务器才能创建。创建volume 时replica 2 server = 4 个节点：是DHT 与AFR 的组合型
条带复制卷模式：最少需要4 台服务器才能创建。创建volume 时stripe 2 replica 2 server = 4 个节点：是Striped 与AFR 的组合型
三种模式混合：至少需要8 台服务器才能创建。stripe 2 replica 2 ，每4 个节点组成一个组

[root@master1 ~]# gluster volume create k8s-volume master1.example.com:/data/gfs_data/ master2.example.com:/data/gfs_data/ master3.example.com:/data/gfs_data/ node1.example.com:/data/gfs_data/ node2.example.com:/data/gfs_data/ force

查看volume 状态

[root@master1 ~]# gluster volume info
 
Volume Name: k8s-volume
Type: Distribute
Volume ID: e3bdd47a-d166-4aef-827a-5cde1d6c7d91
Status: Created
Snapshot Count: 0
Number of Bricks: 5
Transport-type: tcp
Bricks:
Brick1: master1.example.com:/data/gfs_data
Brick2: master2.example.com:/data/gfs_data
Brick3: master3.example.com:/data/gfs_data
Brick4: node1.example.com:/data/gfs_data
Brick5: node2.example.com:/data/gfs_data
Options Reconfigured:
transport.address-family: inet
nfs.disable: on
[root@master1 ~]#

启动分布卷

1	[root@master1 ~]# gluster volume start k8s-volume

挂载测试

1	[root@master1 ~]# mount -t glusterfs master1.example.com:k8s-volume /media/

[root@master1 ~]# df -h
Filesystem                      Size  Used Avail Use% Mounted on
master1.example.com:k8s-volume  250G  163M  250G   1% /media
[root@master1 ~]#

配置endpoints

参考：https://github.com/kubernetes/kubernetes/blob/master/examples/volumes/glusterfs/glusterfs-endpoints.json
glusterfs-endpoints.json

1	[root@master1 ~]# wget https://raw.githubusercontent.com/kubernetes/kubernetes/master/examples/volumes/glusterfs/glusterfs-endpoints.json

[root@master1 ~]# cat glusterfs-endpoints.json 
{
  "kind": "Endpoints",
  "apiVersion": "v1",
  "metadata": {
    "name": "gfs-cluster",
    #"namespace": "name"    #如果指定namespace，需要提前创建namespace，kubectl create namespace name
  },
  "subsets": [
    {
      "addresses": [
        {
          "ip": "192.168.1.195"
        }
      ],
      "ports": [
        {
          "port": 1
        }
      ]
    },
    {
      "addresses": [
        {
          "ip": "192.168.1.196"
        }
      ],
      "ports": [
        {
          "port": 1
        }
      ]
    },
    {
      "addresses": [
        {
          "ip": "192.168.1.197"
        }
      ],
      "ports": [
        {
          "port": 1
        }
      ]
    },
    {
      "addresses": [
        {
          "ip": "192.168.1.198"
        }
      ],
      "ports": [
        {
          "port": 1
        }
      ]
    },
    {
      "addresses": [
        {
          "ip": "192.168.1.199"
        }
      ],
      "ports": [
        {
          "port": 1
        }
      ]
    }
  ]
}
[root@master1 ~]#

1	[root@master1 ~]# kubectl create -f glusterfs-endpoints.json

查看endpoints 信息

默认所有应用全在default的namespace中

[root@master1 ~]# kubectl get Endpoints --all-namespaces
NAMESPACE           NAME                          ENDPOINTS                                         AGE
default             gfs-cluster                   192.168.2.161:1,192.168.2.162:1,192.168.2.163:1   6d
[root@master1 ~]#

创建服务

参考：https://github.com/kubernetes/kubernetes/blob/master/examples/volumes/glusterfs/glusterfs-service.json
glusterfs-service.json

1	[root@master1 ~]# wget https://raw.githubusercontent.com/kubernetes/kubernetes/master/examples/volumes/glusterfs/glusterfs-service.json

[root@master1 ~]# cat glusterfs-service.json 
{
  "kind": "Service",
  "apiVersion": "v1",
  "metadata": {
    "name": "gfs-cluster",
    #"namespace": "name"
  },
  "spec": {
    "ports": [
      {"port": 19999}
    ]
  }
}
[root@master1 ~]#

1	[root@master1 ~]# kubectl create -f glusterfs-service.json

[root@master1 ~]# kubectl get svc -o wide
NAME                TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)       AGE       SELECTOR
glusterfs-cluster   ClusterIP   172.16.170.167   <none>        19999/TCP     1m        <none>
[root@master1 ~]#

创建测试pod

参考：https://github.com/kubernetes/kubernetes/blob/master/examples/volumes/glusterfs/glusterfs-pod.json
glusterfs-pod.json

1	[root@master1 ~]# wget https://raw.githubusercontent.com/kubernetes/kubernetes/master/examples/volumes/glusterfs/glusterfs-pod.json

[root@master1 ~]# cat glusterfs-pod.json 
{
    "apiVersion": "v1",
    "kind": "Pod",
    "metadata": {
        "name": "glusterfs"
        #"namespace": "name"
    },
    "spec": {
        "containers": [
            {
                "name": "glusterfs",
                "image": "nginx",
                "volumeMounts": [
                    {
                        "mountPath": "/mnt/glusterfs",
                        "name": "glusterfsvol"
                    }
                ]
            }
        ],
        "volumes": [
            {
                "name": "glusterfsvol",
                "glusterfs": {
                    "endpoints": "glusterfs-cluster",
                    "path": "k8s-volume",     #改成创建glusterfs卷时设置的名称
                    "readOnly": true
                }
            }
        ]
    }
}
[root@master1 ~]#

1	[root@master1 ~]# kubectl create -f glusterfs-pod.json

[root@master1 ~]# kubectl get pods -o wide
NAME                               READY     STATUS    RESTARTS   AGE       IP            NODE
glusterfs                          1/1       Running   0          6m        172.30.41.6   192.168.1.199
[root@master1 ~]#

1
2
3

[root@master1 ~]# kubectl exec glusterfs mount | grep gluster
192.168.1.195:k8s-volume on /mnt/glusterfs type fuse.glusterfs (ro,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
[root@master1 ~]#

Glusterfs持久卷管理

注：PV和PVC是一一对应的关系，一个PV只能绑定一个PVC，如果PVC删除，对应的PV状态也会变成Released或者Failed状态，这时如果新建PVC则不能注册到以前的PV上
如果一个PV状态从READY(Available)变成其他状态(Bound/Released/Failed)，则不可继续被申请
参考：https://www.slahser.com/2017/03/22/k8s%E5%90%8E%E6%97%A5%E8%B0%88-glusterfs%E4%B8%8Epv/

1
2
3

[root@master1 ~]# kubectl get pv,pvc
No resources found.
[root@master1 ~]#

PersistentVolume（PV，持久卷）：对存储抽象实现，使得存储作为集群中的资源
PersistentVolumeClaim（PVC，持久卷申请）：PVC消费PV的资源
PVC 是持久卷申请 pvc 消费 pv
PV和PVC类似于VG和LV关系

存储Plugin	ReadWriteOnce	ReadOnlyMany	ReadWriteMany	备注
AWSElasticBlockStore	支持	不支持	不支持
AzureFile	支持	支持	支持
AzureDisk	支持	不支持	不支持
Cinder	支持	不支持	不支持
FlexVolume	支持	支持	不支持
Flocker	支持	不支持	不支持
GCEPersistentDisk	支持	支持	不支持
PhotonPersistentDisk	支持	不支持	不支持
Quobyte	支持	支持	支持
PortworxVolume	支持	不支持	支持
ScaleIO	支持	支持	不支持
FC(Fibre Channel)	支持	支持	不支持
NFS	支持	支持	支持
ISCSI	支持	支持	不支持
RBD(Ceph Block Device)	支持	支持	不支持
CephFS	支持	支持	支持
Cinder(OpenStack Block Storage)	支持	不支持	不支持
Glusterfs	支持	支持	支持
VsphereVolume	支持	不支持	不支持
HostPath	支持	不支持	不支持	只支持单节点，不支持跨节点

三种PV的访问模式

ReadWriteOnce：是最基本的方式，可读可写，但只支持被单个Pod挂载
ReadOnlyMany：可以以只读的方式被多个Pod挂载
ReadWriteMany：这种存储可以以读写的方式被多个Pod共享

参考：https://zhuanlan.zhihu.com/p/29706309

卷的状态：

Available – a free resource that is not yet bound to a claim
Bound – the volume is bound to a claim
Released – the claim has been deleted, but the resource is not yet reclaimed by the cluster
Failed – the volume has failed its automatic reclamation

参考：https://www.kubernetes.org.cn/pvpvcstorageclass

glusterfs 创建pv
gfs-pv.yml

[root@master1 ~]# cat gfs-pv.yml 
apiVersion: v1
kind: PersistentVolume
metadata:
  name: "gluster-pv"
  #namespace: "name"
spec:
  capacity:
    storage: 250Gi                   #再挂载测试时可以看到gfs卷的大小
  accessModes:
    - ReadWriteMany
  glusterfs:
    endpoints: "glusterfs-cluster"    #通过kubectl get endpoints查询Name
    path: "k8s-volume"                #通过gluster volume info查询Volume Name
    readOnly: false
[root@master1 ~]#

1	[root@master1 ~]# kubectl create -f gfs-pv.yml

[root@master1 ~]# kubectl get pv,pvc
NAME            CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS      CLAIM     STORAGECLASS   REASON    AGE
pv/gluster-pv   250Gi      RWX            Retain           Available                                      12m
[root@master1 ~]#

glusterfs 创建pvc
gfs-pvc.yml

[root@master1 ~]# cat gfs-pvc.yml 
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: "glusterfs-pvc"
  #namespace: "name"
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 250Gi
[root@master1 ~]#

1	[root@master1 ~]# kubectl create -f gfs-pvc.yml

[root@master1 ~]# kubectl get pv,pvc --all-namespaces
NAMESPACE   NAME        CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS    CLAIM                       STORAGECLASS   REASON    AGE
            pv/gfs-pv   300Gi      RWX            Retain           Bound     default/gfs-pvc                            7m

NAMESPACE           NAME          STATUS    VOLUME    CAPACITY   ACCESS MODES   STORAGECLASS   AGE
default   pvc/gfs-pvc   Bound     gfs-pv    300Gi      RWX                           6m
[root@master1 ~]#

glusterfs 创建pod 应用
gfs-pvc-pod.yml
yaml文件另一种写法：https://github.com/kubernetes/kubernetes/blob/master/examples/volumes/glusterfs/glusterfs-pod.json

apiVersion: v1
kind: Pod
metadata:
  name: test
  namespace: "dev-00-www-pphuishou-com"
spec:
  containers:
    - name: nginx
      image: nginx:1.13
      volumeMounts:
      - mountPath: "/var/log/nginx/"   #把glusterfs(k8s-volume)挂载到/usr/share/nginx/html/目录下
        name: log
  volumes:
    - name: log
      persistentVolumeClaim:
        claimName: glusterfs-pvc
---
apiVersion: v1
kind: Service
metadata:
  name: test-nginx
  labels:
    app: nginx
spec:
  type: NodePort
  ports:
  - port: 80
    targetPort: 80
    nodePort: 29000
  selector:
    app: nginx

1	[root@master1 ~]# kubectl create -f gfs-pvc-pod.yml

验证

[root@master1 ~]# kubectl get pods -o wide
NAME                               READY     STATUS    RESTARTS   AGE       IP            NODE
mypod-gfs                          1/1       Running   0          1d        172.30.57.3   192.168.1.198
[root@master1 ~]# kubectl get svc -o wide
NAME                TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)       AGE       SELECTOR
nginx-service       NodePort    172.16.215.237   <none>        88:8527/TCP   20d       app=nginx
[root@master1 ~]#

gluster存储在宿主机/data/gfs_data目录下

[root@master1 ~]# gluster volume info
Volume Name: k8s-volume
...
Brick1: master1.example.com:/data/gfs_data
Brick2: master2.example.com:/data/gfs_data
Brick3: master3.example.com:/data/gfs_data
Brick4: node1.example.com:/data/gfs_data
Brick5: node2.example.com:/data/gfs_data
...
[root@master1 ~]#

向gluster存储目录上传文件(待验证用)

[root@master1 ~]# cd /data/gfs_data/
[root@master1 gfs_data]# ls
index.html
[root@master1 gfs_data]#

验证通过
应用nginx-service使用的是gluster存储

[root@node1 ~]# docker ps -a
CONTAINER ID        IMAGE      COMMAND                  CREATED          STATUS         PORTS          NAMES
9dd7a0db5281        nginx      "nginx -g 'daemon of…"   41 hours ago     Up 41 hours                   k8s_nginx_mypod-gfs_default_586a45c2-194d-11e8-b996-1e2d0a5bc3f5_0
[root@node1 ~]# docker exec -it 9dd7a0db5281 /bin/bash
root@mypod-gfs:/# df -h
Filesystem                Size  Used Avail Use% Mounted on
192.168.1.195:k8s-volume  250G  163M  250G   1% /usr/share/nginx/html
root@mypod-gfs:/# 
root@mypod-gfs:/# cd /usr/share/nginx/html/
root@mypod-gfs:/usr/share/nginx/html# ls
index.html
root@mypod-gfs:/usr/share/nginx/html#

Deployment写法

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: "filebeat-deployment"
  namespace: "name"
spec:
  replicas: 1
  template:
    metadata:
      labels:
        app: filebeat
    spec:
      nodeName: 192.168.2.161
      containers:
      - name: filebeat
        image: yfshare/filebeat:6.2.2
        imagePullPolicy: Always
        ports:
        env:
        volumeMounts:
          - name: filebeat-time
            mountPath: "/etc/localtime"
            readOnly: true
          - name: filebeat-yml
            mountPath: "/etc/filebeat/filebeat.yml"
            readOnly: true
          - name: filebeat-logs
            mountPath: "/var/log/filebeat/"
            readOnly: false
          - name: read-logs
            mountPath: "/data"
            readOnly: false
      volumes:
      - name: filebeat-time
        hostPath:
          path: "/etc/localtime"
      - name: filebeat-yml
        hostPath:
          path: "/root/k8s_elk/filebeat/filebeat.yml"
      - name: filebeat-logs
        hostPath:
          path: "/data/elk/filebeat"
      - name: read-logs
        persistentVolumeClaim:
          claimName: gfs-pvc

删除卷（Delete Volume）

停止卷

1	gluster volume stop VOLNAME

1	gluster volume delete VOLNAME

glusterfs扩容卷（Expanding Volume）

把新增的存储服务器加入存储池

1	gluster peer probe HOSTNAME

把brick加入卷中

1	gluster volume add-brick k8s-volume gfs1.example.com:/mnt/gfs_data force

重新平衡卷

1	gluster volume rebalance k8s-volume start

检查rebalance状态

1	gluster volume rebalance k8s-volume status

glusterfs缩小卷（Shrinking Volume）

把brick从卷中移除

1	gluster volume remove-brick k8s-volume gfs1.example.com:/mnt/gfs_data force

重新平衡卷

1	gluster volume rebalance k8s-volume start

检查rebalance状态

1	gluster volume rebalance k8s-volume status

在线迁移数据（Migrating Data）

迁移数据到另一个brick中

1	gluster volume replace-brick VOLNAME BRICK NEWBRICK start

查看迁移进度状态

1	gluster volume replace-brick VOLNAME BRICK NEWBRICK status

提交迁移数据

1	gluster volume replace-brick VOLNAME BRICK NEWBRICK commit

gluster常用命令

列出集群中的所有卷

1	gluster volume list

查看集群中的卷信息

1	gluster volume info [all]

查看集群中的卷状态

1 2	gluster volume status [all] gluster volume status <VOLNAME> [detail\| clients \| mem \| inode \| fd]

参考：http://blog.csdn.net/i_chips/article/details/12656527

Q&A

查看pod的日志信息

1	kubectl describe pods test

如果报下面的错误，则检查下k8s的node节点是否支持glusterfs文件系统
通过下面命令可以检查，如果报mount: 未知的文件系统类型“glusterfs”，则说明不支持glusterfs文件系统

1	# mount -t glusterfs gfs1.example.com:k8s_gfs /media/

解决方法：

1	yum install -y glusterfs glusterfs-server glusterfs-fuse glust

如果报下面的错误，则检查下k8s创建gfs-endpoint和gfs-server时配置文件是否正确

#查看Volume Name
gluster volume info

#创建test.yml时，检查创建pv,pvc时配置是否正确
kubectl create -f test.yml

正确配置如下：

# kubectl describe pods test

# kubectl get pods -o wide
NAME        READY     STATUS    RESTARTS   AGE       IP             NODE
test        1/1       Running   0          29m       172.20.40.5    192.168.2.11
#

如果报下面的错误，则需要检查下K8s Node节点状态和hosts解析

1 2	#获取当前k8s的Node节点信息，需要看STATUS值是否是Ready kubectl get nodes

#需要把gfs所有节点的域名做hosts解析，否则应用挂载时可能找不到gfs节点而导致挂载失败
[root@master1 ~]# tail -5 /etc/hosts
192.168.1.195 master1.example.com master1
192.168.1.196 master2.example.com master2
192.168.1.197 master3.example.com master3
192.168.1.198 node1.example.com node1
192.168.1.199 node2.example.com node2
[root@master1 ~]#