Ceph RGW Storage Policies

The storage policy allows a custom placement for user buckets, on a RADOS pool. A user bucket creation only request needs to specify the location constraint. Since default bucket placement uses common data, and index pools, and pool is a physical isolation unit for data in Ceph. A custom bucket placement allows us to use different pools for data and index. It solves the problem of a too-big pool, maintenance on a pool and expansion of a pool.

S3 Bucket Location Constraint
Bucket PUT request could contain location constraint as follows:

<CreateBucketConfiguration  
<LocationConstraint>BucketRegion</LocationConstraint>  
</CreateBucketConfiguration>

Use Cases for a cluster

  1. A set of graded pools that use SSD, HDD, Hybrid storage media. We can prioritize user data latency requirements with a suitable pool.

  2. Replicated and EC pools for data could help us efficiently host hot and cold data under the same cluster.

  3. A template based cluster expansion where we create new pools for new storage needs.

  4. Physically isolate critical accounts

References

Method to place a bucket

  • Update the zonegroup configuration to add a new bucket placement rule.
  • Commit and update rgw period.
  • Update the zone configuration to add an entry for new placement, with pools for data, index and extra.
  • Commit and update rgw period.

Client code for bucket creation

import boto3

access_key = 'abcde'
secret_key = '12345'
endpoint_url = 'http://10.32.34.157'

conn = boto3.client(service_name='s3',
aws_access_key_id=access_key,
aws_secret_access_key=secret_key,
endpoint_url=endpoint_url)

connection_type = 'client'
conn.create_bucket(Bucket='ourbucket', 
                    CreateBucketConfiguration=
                    {'LocationConstraint': ":special-placement"})

Output

# radosgw-admin zonegroup get
{

"id": "696b9fca-bf82-4030-85fd-9b4281241c66",

"name": "in",

"api_name": "in",

"is_master": "true",

"endpoints": [],

"hostnames": [],

"hostnames_s3website": [],

"master_zone": "5221e79f-0ce1-4e9b-82d7-3b0095bba8ad",

"zones": [

{

"id": "5221e79f-0ce1-4e9b-82d7-3b0095bba8ad",

"name": ".in-chennai-1",

"endpoints": [],

"log_meta": "false",

"log_data": "false",

"bucket_index_max_shards": 0,

"read_only": "false",

"tier_type": "",

"sync_from_all": "true",

"sync_from": []

}

],

"placement_targets": [

{

"name": "default-placement",

"tags": []

},

{

"name": "special-placement",

"tags": []

}

],

"default_placement": "default-placement",

"realm_id": "5f04c150-b66b-4d05-ba31-38354eaf6ec0"

}
# radosgw-admin zone get

{

"id": "5221e79f-0ce1-4e9b-82d7-3b0095bba8ad",

"name": ".in-chennai-1",

"domain_root": ".in-chennai-1.rgw.meta:root",

"control_pool": ".in-chennai-1.rgw.control",

"gc_pool": ".in-chennai-1.rgw.log:gc",

"lc_pool": ".in-chennai-1.rgw.log:lc",

"log_pool": ".in-chennai-1.rgw.log",

"intent_log_pool": ".in-chennai-1.rgw.log:intent",

"usage_log_pool": ".in-chennai-1.rgw.log:usage",

"reshard_pool": ".in-chennai-1.rgw.log:reshard",

"user_keys_pool": ".in-chennai-1.rgw.meta:users.keys",

"user_email_pool": ".in-chennai-1.rgw.meta:users.email",

"user_swift_pool": ".in-chennai-1.rgw.meta:users.swift",

"user_uid_pool": ".in-chennai-1.rgw.meta:users.uid",

"system_key": {

"access_key": "",

"secret_key": ""

},

"placement_pools": [

{

"key": "default-placement",

"val": {

"index_pool": ".in-chennai-1.rgw.buckets.index",

"data_pool": ".in-chennai-1.rgw.buckets.data",

"data_extra_pool": ".in-chennai-1.rgw.buckets.non-ec",

"index_type": 0,

"compression": ""

}

},

{

"key": "special-placement",

"val": {

"index_pool": ".in-chennai-1.rgw.buckets.index",

"data_pool": ".in-chennai-1.rgw.special.storage",

"data_extra_pool": ".in-chennai-1.rgw.buckets.non-ec",

"index_type": 0,

"compression": ""

}

}

],

"metadata_heap": "",

"tier_config": [],

"realm_id": "5f04c150-b66b-4d05-ba31-38354eaf6ec0"

}
# radosgw-admin bucket stats --bucket=ourbucket

{

"bucket": "ourbucket",

"zonegroup": "696b9fca-bf82-4030-85fd-9b4281241c66",

"placement_rule": "special-placement",

"explicit_placement": {

"data_pool": "",

"data_extra_pool": "",

"index_pool": ""

},

"id": "5221e79f-0ce1-4e9b-82d7-3b0095bba8ad.646695.2",

"marker": "5221e79f-0ce1-4e9b-82d7-3b0095bba8ad.646695.2",

"index_type": "Normal",

"owner": "rgwadmin",

"ver": "0#3",

"master_ver": "0#0",

"mtime": "2018-06-16 00:24:22.644511",

"max_marker": "0#",

"usage": {

"rgw.main": {

"size": 7707,

"size_actual": 8192,

"size_utilized": 7707,

"size_kb": 8,

"size_kb_actual": 8,

"size_kb_utilized": 8,

"num_objects": 1

}

},

"bucket_quota": {

"enabled": false,

"check_on_raw": false,

"max_size": -1,

"max_size_kb": 0,

"max_objects": -1

}
}

EMC/VMWare VAAI (VStoarge API for Array Integration)

  • VAAI (vStorage API for Array Integration) is EMC/VMWare API that allows a hardware array to expose certain features of the array.
  • The hardware array typically offers hardware acceleration for certain operations, including:
    • Clone Blocks/ XCOPY because intra-array copy is faster than through the host.
    • Zero Blocks/ Write same
    • Block delete
    • Atomic test and Set, used on VMFS volume
    • Thin provisioning
    • Native snapshot support
  • Storage array should support hardware acceleration through VAAI.

Ceph and VAAI

  • ESXi data store need storage array with a iSCSI interface.
  • All ESXi writes are sync writes
  • Ceph has a block device solution that is RBD (RADOS Block Device). This block device can act as back end to an iSCSI front end.
  • iSCSI front end could be:
    • TGT (Linux SCSI target framework)
      • Mostly, userspace code is needed for iSCSI.
      • Not VAAI compliant
    • LIO (Linux IO)
      • VAAI compliant
    • SCST
    • NFS
  • ESXi operations like consolidating snapshots/storage vmotion/cloning use 64kb IO’s. Assuming latency of 4ms for an operation, 1/0.004 =250 iops 64kb*250 = 16MB/s
  • IO from VM’s will be passed through to Ceph at whatever size it is submitted as.

Resources

  1. http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-July/003179.html
  2. http://linux-iscsi.org/wiki/VStorage_APIs_for_Array_Integration
  3. http://geekfluent.com/2012/03/26/what-will-vaai-v2-do-for-you-part-2-of-2-nfs/
  4. https://www.spinics.net/lists/ceph-devel/msg29124.html