一.介绍
1.1简介
Ceph是一种为优秀的性能、可靠性和可扩展性而设计的统一的、分布式文件系统 。ceph 的统一体现在可以提供文件系统、块存储和对象存储 ,分布式体现在可以动态扩展。在国内一些公司的云环境中,通常会采用 ceph 作为openstack 的唯一后端存储来提高数据转发效率
2.2特点
高性能:
- 摒弃了传统的集中式存储元数据寻址的方案,采用CRUSH算法,数据分布均衡,并行度高。
- 考虑了容灾域的隔离,能够实现各类负载的副本放置规则 ,例如跨机房、机架
感知等。 - 能够支持上千个存储节点的规模 ,支持TB到PB级的数据。
高可用性:
- 副本数可以灵活控制。
- 支持故障域分隔 ,数据强一致性。
- 多种故障场景自动进行修复自愈 。
- 没有单点故障 ,自动管理。
高可扩展性:
- 去中心化。
- 扩展灵活。
- 随着节点增加而线性增长。
特性丰富:
- 支持三种存储接口:块存储、文件存储、对象存储。
- 支持自定义接口 ,支持多种语言驱动。
1.3核心组件
(1)Monitors :监视器,维护集群状态的多种映射,同时提供认证和日志记录服务 ,包括有关monitor 节点端到端的信息,其中包括 Ceph 集群ID,监控主机名和IP以及端口。并且存储当前版本信息以及最新更改信息,通过 “ceph mon dump”查看 monitor map。
(2)MDS(Metadata Server):Ceph 元数据,主要保存的是Ceph文件系统的元数据 。注意:ceph的块存储和ceph对象存储都不需要MDS。
(3)OSD:即对象存储守护程序,但是它并非针对对象存储。是物理磁盘驱动器,将数据以对象的形式存储到集群中的每个节点的物理磁盘上。OSD负责存储数据、处理数据复制、恢复、回(Backfilling) 、再平衡。完成存储数据的工作绝大多数是由 OSD daemon 进程实现。在构建 Ceph OSD的时候,建议采用SSD 磁盘以及xfs文件系统来格式化分区。此外OSD还对其它OSD进行心跳检测,检测结果汇报给Monitor
(4)RADOS :Reliable Autonomic Distributed Object Store。RADOS是ceph存储集群的基础。在ceph中,所有数据都以对象的形式存储,并且无论什么数据类型,RADOS对象存储都将负责保存这些对象。RADOS层可以确保数据始终保持一致。
(5)librados :librados库,为应用程度提供访问接口 。同时也为块存储、对象存储、文件系统提供原生的接口。
(6)RADOSGW :网关接口,提供对象存储服务 。它使用librgw和librados来实现允许应用程序与Ceph对象存储建立连接。并且提供S3 和 Swift 兼容的RESTful API接口。
(7)RBD:块设备 ,它能够自动精简配置并可调整大小,而且将数据分散存储在多个OSD上。
(8)CephFS:Ceph文件系统,与POSIX兼容的文件系统,基于librados封装原生接口。
1.4 底层逻辑
1.5 文件存储流程
- 文件上传后会被分割为相同大小的对象(Object)默认是4M,并会分配一个oid(Object id)
- 通过对oid进行哈希运算和掩码运算,可以计算出运行在哪个pg中
- 通过crush算法最终得到数据存储在哪个osd中
二.安装配置
2.1环境准备
角色 | ip | 运行服务 |
---|---|---|
ceph-admin | 10.9.132.128 | ceph-deploy |
mon01 | 10.9.145.154 | osd mon |
mon02 | 10.9.158.133 | osd mon |
mon03 | 10.9.82.6 | osd mon mgr |
stor04 | 10.9.171.105 | osd mgr |
配置主机名和本地hosts
10.9.132.128 ceph-admin
10.9.145.154 mon01
10.9.158.133 mon02
10.9.82.6 mon03
10.9.171.105 stor04
ceph-admin操作
下载ceph-release-1-1.el7yum包
rpm -ivh https://mirrors.aliyun.com/ceph/rpm-mimic/el7/noarch/ceph-release-1-1.el7.noarch.rpm
安装完成可以在yum文件中看到ceph的yum源
[root@ceph-admin yum.repos.d]# ll /etc/yum.repos.d/ceph.repo
-rw-r--r-- 1 root root 535 May 5 2018 /etc/yum.repos.d/ceph.repo
baseurl=http://download.ceph.com/rpm-mimic/el7/$basearch
enabled=1
gpgcheck=1
type=rpm-md
gpgkey=https://download.ceph.com/keys/release.asc
[Ceph-noarch]
name=Ceph noarch packages
baseurl=http://download.ceph.com/rpm-mimic/el7/noarch
enabled=1
gpgcheck=1
type=rpm-md
gpgkey=https://download.ceph.com/keys/release.asc
[ceph-source]
name=Ceph source packages
baseurl=http://download.ceph.com/rpm-mimic/el7/SRPMS
enabled=1
gpgcheck=1
type=rpm-md
gpgkey=https://download.ceph.com/keys/release.asc
安装epel源
[root@ceph-admin yum.repos.d]# yum -y install epel-release
创建用户并设置sudo root无需密码 其他节点也需要操作
useradd cephadm && echo huhuhahei | passwd --stdin cephadm
[root@ceph-admin ~]# cat /etc/sudoers.d/cephadm
cephadm ALL=(root) NOPASSWD: ALL
配置免密登陆 拷贝sudo文件
ssh-keygen -t rsa -P ""
ssh-copy-id -i .ssh/id_rsa.pub cephadm@mon01
ssh-copy-id -i .ssh/id_rsa.pub cephadm@mon02
ssh-copy-id -i .ssh/id_rsa.pub cephadm@mon03
ssh-copy-id -i .ssh/id_rsa.pub cephadm@stor04
2.2 初始化RADOS集群
安装ceph-deploy
[cephadm@ceph-admin ceph-cluster]$ yum -y install ceph-deploy python-setuptools python2-subprocess32
[cephadm@ceph-admin ceph-cluster]$ ceph-deploy new --cluster-network 10.9.0.0/16 --public-network 10.9.0.0/16 mon01
----
[ceph_deploy.new][DEBUG ] Resolving host mon01
[ceph_deploy.new][DEBUG ] Monitor mon01 at 10.9.145.154
[ceph_deploy.new][DEBUG ] Monitor initial members are ['mon01']
[ceph_deploy.new][DEBUG ] Monitor addrs are [u'10.9.145.154']
[ceph_deploy.new][DEBUG ] Creating a random mon key...
[ceph_deploy.new][DEBUG ] Writing monitor keyring to ceph.mon.keyring...
[ceph_deploy.new][DEBUG ] Writing initial config to ceph.conf...
#我这里只有一个网卡所以网络都是一个网段
初始化完成会在当前目录下生成配置文件
[cephadm@ceph-admin ceph-cluster]$ ll
total 44
-rw-rw-r-- 1 cephadm cephadm 254 Dec 4 16:56 ceph.conf
-rw-rw-r-- 1 cephadm cephadm 33959 Dec 4 17:04 ceph-deploy-ceph.log
-rw------- 1 cephadm cephadm 73 Dec 4 16:55 ceph.mon.keyring
[global]
fsid = b5fa13e6-6b7f-4eeb-bac3-a7b6bc535109
public_network = 10.9.0.0/16
cluster_network = 10.9.0.0/16
mon_initial_members = mon01
mon_host = 10.9.145.154
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
安装ceph集群
[cephadm@ceph-admin ceph-cluster]$ ceph-deploy install mon01 mon02 mon03 stor04
---
[stor04][DEBUG ]
[stor04][DEBUG ] Complete!
[stor04][INFO ] Running command: sudo ceph --version
[stor04][DEBUG ] ceph version 13.2.10 (564bdc4ae87418a232fc901524470e1a0f76d641) mimic (stable)
初始化第一个mon节点
[cephadm@ceph-admin ceph-cluster]$ ceph-deploy mon create-initial
---
[ceph_deploy.gatherkeys][INFO ] Storing ceph.client.admin.keyring
[ceph_deploy.gatherkeys][INFO ] Storing ceph.bootstrap-mds.keyring
[ceph_deploy.gatherkeys][INFO ] Storing ceph.bootstrap-mgr.keyring
[ceph_deploy.gatherkeys][INFO ] keyring 'ceph.mon.keyring' already exists
[ceph_deploy.gatherkeys][INFO ] Storing ceph.bootstrap-osd.keyring
[ceph_deploy.gatherkeys][INFO ] Storing ceph.bootstrap-rgw.keyring
[ceph_deploy.gatherkeys][INFO ] Destroy temp directory /tmp/tmpXsDRpH
#在mon01节点查看进程已经运行
[root@mon01 ~]# ps -ef | grep ceph-mon
ceph 9013 1 0 17:09 ? 00:00:00 /usr/bin/ceph-mon -f --cluster ceph --id mon01 --setuser ceph --setgroup ceph
root 9311 8412 0 17:11 pts/0 00:00:00 grep --color=auto ceph-mon
拷贝ceph配置文件到其他节点
[cephadm@ceph-admin ceph-cluster]$ ceph-deploy admin ceph-admin mon01 mon02 mon03 stor04
---
[stor04][DEBUG ] connection detected need for sudo
[stor04][DEBUG ] connected to host: stor04
[stor04][DEBUG ] detect platform information from remote host
[stor04][DEBUG ] detect machine type
[stor04][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
#其他节点查看
[root@mon01 ~]# ll /etc/ceph/
total 12
-rw------- 1 root root 151 Dec 4 17:14 ceph.client.admin.keyring
-rw-r--r-- 1 root root 254 Dec 4 17:14 ceph.conf
-rw-r--r-- 1 root root 92 Apr 24 2020 rbdmap
-rw------- 1 root root 0 Dec 4 17:09 tmpVELNIz
#ll 可以看到admin.keying文件权限是600 cephadm用户是无法访问的 所以需要给下权限
[root@mon01 ~]# setfacl -m u:cephadm:rw /etc/ceph/ceph.client.admin.keyring
[root@mon01 ~]# ll /etc/ceph/
total 12
-rw-rw----+ 1 root root 151 Dec 4 17:14 ceph.client.admin.keyring
配置mon03为mgr进程
[cephadm@ceph-admin ceph-cluster]$ ceph-deploy mgr create mon03
---
[mon03][INFO ] Running command: sudo systemctl enable ceph-mgr@mon03
[mon03][WARNIN] Created symlink from /etc/systemd/system/ceph-mgr.target.wants/ceph-mgr@mon03.service to /usr/lib/systemd/system/ceph-mgr@.service.
[mon03][INFO ] Running command: sudo systemctl start ceph-mgr@mon03
[mon03][INFO ] Running command: sudo systemctl enable ceph.target
#进入节点查看
[root@mon03 ~]# ps -ef
ceph 8990 1 7 17:19 ? 00:00:00 /usr/bin/ceph-mgr -f --clust
root 9057 8454 0 17:19 pts/0 00:00:00 ps -ef
查看集群状态
[cephadm@ceph-admin ceph-cluster]$ ceph -s
cluster:
id: b5fa13e6-6b7f-4eeb-bac3-a7b6bc535109
health: HEALTH_WARN
OSD count 0 < osd_pool_default_size 3
services:
mon: 1 daemons, quorum mon01
mgr: mon03(active)
osd: 0 osds: 0 up, 0 in
data:
pools: 0 pools, 0 pgs
objects: 0 objects, 0 B
usage: 0 B used, 0 B / 0 B avail
pgs:
添加mon节点
[cephadm@ceph-admin ceph-cluster]$ ceph-deploy mon add mon02
---
[mon03][DEBUG ] ********************************************************************************
[mon03][INFO ] monitor: mon.mon02 is running
#查看ceph mon信息
[cephadm@ceph-admin ceph-cluster]$ ceph quorum_status --format json-pretty
{
"election_epoch": 16,
"quorum": [
0,
1,
2
],
"quorum_names": [
"mon03",
"mon01",
"mon02"
],
"quorum_leader_name": "mon03",
"monmap": {
"epoch": 3,
"fsid": "b5fa13e6-6b7f-4eeb-bac3-a7b6bc535109",
"modified": "2021-12-04 18:06:16.162457",
"created": "2021-12-04 17:09:58.218633",
"features": {
"persistent": [
"kraken",
"luminous",
"mimic",
"osdmap-prune"
],
"optional": []
},
"mons": [
{
"rank": 0, #主节点
"name": "mon03",
"addr": "10.9.82.6:6789/0",
"public_addr": "10.9.82.6:6789/0"
},
{
"rank": 1,
"name": "mon01",
"addr": "10.9.145.154:6789/0",
"public_addr": "10.9.145.154:6789/0"
},
{
"rank": 2,
"name": "mon02",
"addr": "10.9.158.133:6789/0",
"public_addr": "10.9.158.133:6789/0"
}
]
}
}
添加mgr节点
[cephadm@ceph-admin ceph-cluster]$ ceph-deploy mgr create stor04
---
[stor04][INFO ] Running command: sudo systemctl enable ceph-mgr@stor04
[stor04][INFO ] Running command: sudo systemctl start ceph-mgr@stor04
[stor04][INFO ] Running command: sudo systemctl enable ceph.target
#查看
[cephadm@ceph-admin ceph-cluster]$ ceph -s
cluster:
id: b5fa13e6-6b7f-4eeb-bac3-a7b6bc535109
health: HEALTH_WARN
too few PGs per OSD (24 < min 30)
clock skew detected on mon.mon01, mon.mon02
services:
mon: 3 daemons, quorum mon03,mon01,mon02
mgr: mon03(active), standbys: stor04 #备节点
osd: 4 osds: 4 up, 4 in
data:
pools: 1 pools, 32 pgs
objects: 0 objects, 0 B
usage: 4.0 GiB used, 76 GiB / 80 GiB avail
pgs: 32 active+clean
2.3RADOS加入OSD
可以先查看可用磁盘
[cephadm@ceph-admin ceph-cluster]$ ceph-deploy disk list mon01
---
[mon01][INFO ] Running command: sudo fdisk -l
[mon01][INFO ] Disk /dev/vda: 21.5 GB, 21474836480 bytes, 41943040 sectors
[mon01][INFO ] Disk /dev/vdb: 21.5 GB, 21474836480 bytes, 41943040 sectors
擦除磁盘的数据
所有节点都需要操作
[cephadm@ceph-admin ceph-cluster]$ ceph-deploy disk zap mon01 /dev/vdb
---
[mon01][WARNIN] --> Zapping: /dev/vdb
[mon01][WARNIN] --> --destroy was not specified, but zapping a whole device will
[mon01][WARNIN] Running command: /bin/dd if=/dev/zero of=/dev/vdb bs=1M count=10
[mon01][WARNIN] --> Zapping successful for: <Raw Device: /dev/vdb>
##这里如果有报错 可以使用下面命令自己在节点擦除 完成需要重启节点 然后重新操作
[mon01][WARNIN] stderr: wipefs: error: /dev/vdb: probing initialization failed: Device or resource busy
[mon01][WARNIN] --> failed to wipefs device, will try again to workaround probable race condition
[root@mon01 ~]# dd if=/dev/zero of=/dev/vdb bs=512K count=1
[root@mon01 ~]# reboot
添加osd
[cephadm@ceph-admin ceph-cluster]$ ceph-deploy osd create --data /dev/vdb mon01
---
[mon01][INFO ] checking OSD status...
[mon01][DEBUG ] find the location of an executable
[mon01][INFO ] Running command: sudo /bin/ceph --cluster=ceph osd stat --format=json
[ceph_deploy.osd][DEBUG ] Host mon01 is now ready for osd use.
#查看集群状态 可以看到osd已经全部添加成功
[cephadm@ceph-admin ceph-cluster]$ ceph -s
cluster:
id: b5fa13e6-6b7f-4eeb-bac3-a7b6bc535109
health: HEALTH_OK
services:
mon: 1 daemons, quorum mon01
mgr: mon03(active)
osd: 4 osds: 4 up, 4 in
data:
pools: 0 pools, 0 pgs
objects: 0 objects, 0 B
usage: 4.0 GiB used, 76 GiB / 80 GiB avail
pgs
2.4 测试
创建存储池pool
[cephadm@ceph-admin ceph-cluster]$ ceph osd pool create mypool 32 32
pool 'mypool' created
#查看
[cephadm@ceph-admin ceph-cluster]$ ceph osd pool ls
mypool
#调整pool默认的备份数量
[cephadm@ceph-admin ceph-cluster]$ ceph osd pool get mypool size
size: 3
[cephadm@ceph-admin ceph-cluster]$ ceph osd pool set mypool size 2
set pool 1 size to 2
#调整pool中pg的数量
[cephadm@ceph-admin ceph-cluster]$ ceph osd pool get mypool pg_num
pg_num: 32
[cephadm@ceph-admin ceph-cluster]$ ceph osd pool set mypool pg_num 64
set pool 1 pg_num to 64
上传数据
[cephadm@ceph-admin ceph-cluster]$ rados put hosts /etc/hosts -p mypool
#查看
[cephadm@ceph-admin ceph-cluster]$ rados ls -p mypool
hosts
#查看map信息
[cephadm@ceph-admin ceph-cluster]$ ceph osd map mypool hosts
osdmap e20 pool 'mypool' (1) object 'hosts' -> pg 1.ea1b298e (1.e) -> up ([1,0,2], p1) acting ([1,0,2], p1)
#删除数据
[cephadm@ceph-admin ceph-cluster]$ rados rm hosts -p mypool
查看osd的延迟情况
2.5 osd换盘
移除osd
#停止进程
sudo systemctl stop ceph-osd@{osd-num}
#停用设备
ceph osd out {osd-num}
#移除设备信息
ceph osd crush rm {osd-num}
ceph osd rm {osd-num}
ceph auth rm {osd-num}
如果osd磁盘有坏道的话可以用这个命令查看osd的延迟情况
ceph osd perf
osd commit_latency(ms) apply_latency(ms)
4 1 1
0 0 0
2 1 1
3 1 1
1 1 1
2.6 ceph 文件一致性检查
ceph
集群默认每天会进行一次scrub
每周会进行一次deep-scrub
深度检查 我们也可以通过ml进行手动检查
#ceph的一致性检查是针对pg的所以需要根据pg id进行检查 可以获取需要检查的pg id
ceph pg dump
ceph pg scrub 1.19
instructing pg 1.19 on osd.2 to scrub
#深度检查
ceph pg deep-scrub 1.19
instructing pg 1.19 on osd.2 to deep-scrub