What is ETag?

A web client (a browser) requests a resource from a web server.
Multiple calls for a resource would hit the server everytime and
server sends the response with return code 200.

What if the requested data is unchanged for most of the calls? Could the client somehow help server with the previous value of the requested resource?


  • The purpose of ETag is to Save bandwidth and utilize client side caching.
  • The client gets a hash of content for the first time request of a resource.
  • The next call from the client has the ETag=Received hash value.
  • Server checks if the value of the requested respurce has modified. So it recalculates the hash and compares it against the received hash.
  • If Etag is same, server returns 304.

Written with StackEdit.


Ceph RGW Storage Policies

The storage policy allows a custom placement for user buckets, on a RADOS pool. A user bucket creation only request needs to specify the location constraint. Since default bucket placement uses common data, and index pools, and pool is a physical isolation unit for data in Ceph. A custom bucket placement allows us to use different pools for data and index. It solves the problem of a too-big pool, maintenance on a pool and expansion of a pool.

S3 Bucket Location Constraint
Bucket PUT request could contain location constraint as follows:


Use Cases for a cluster

  1. A set of graded pools that use SSD, HDD, Hybrid storage media. We can prioritize user data latency requirements with a suitable pool.

  2. Replicated and EC pools for data could help us efficiently host hot and cold data under the same cluster.

  3. A template based cluster expansion where we create new pools for new storage needs.

  4. Physically isolate critical accounts


Method to place a bucket

  • Update the zonegroup configuration to add a new bucket placement rule.
  • Commit and update rgw period.
  • Update the zone configuration to add an entry for new placement, with pools for data, index and extra.
  • Commit and update rgw period.

Client code for bucket creation

import boto3

access_key = 'abcde'
secret_key = '12345'
endpoint_url = ''

conn = boto3.client(service_name='s3',

connection_type = 'client'
                    {'LocationConstraint': ":special-placement"})


# radosgw-admin zonegroup get

"id": "696b9fca-bf82-4030-85fd-9b4281241c66",

"name": "in",

"api_name": "in",

"is_master": "true",

"endpoints": [],

"hostnames": [],

"hostnames_s3website": [],

"master_zone": "5221e79f-0ce1-4e9b-82d7-3b0095bba8ad",

"zones": [


"id": "5221e79f-0ce1-4e9b-82d7-3b0095bba8ad",

"name": ".in-chennai-1",

"endpoints": [],

"log_meta": "false",

"log_data": "false",

"bucket_index_max_shards": 0,

"read_only": "false",

"tier_type": "",

"sync_from_all": "true",

"sync_from": []



"placement_targets": [


"name": "default-placement",

"tags": []



"name": "special-placement",

"tags": []



"default_placement": "default-placement",

"realm_id": "5f04c150-b66b-4d05-ba31-38354eaf6ec0"

# radosgw-admin zone get


"id": "5221e79f-0ce1-4e9b-82d7-3b0095bba8ad",

"name": ".in-chennai-1",

"domain_root": ".in-chennai-1.rgw.meta:root",

"control_pool": ".in-chennai-1.rgw.control",

"gc_pool": ".in-chennai-1.rgw.log:gc",

"lc_pool": ".in-chennai-1.rgw.log:lc",

"log_pool": ".in-chennai-1.rgw.log",

"intent_log_pool": ".in-chennai-1.rgw.log:intent",

"usage_log_pool": ".in-chennai-1.rgw.log:usage",

"reshard_pool": ".in-chennai-1.rgw.log:reshard",

"user_keys_pool": ".in-chennai-1.rgw.meta:users.keys",

"user_email_pool": ".in-chennai-1.rgw.meta:users.email",

"user_swift_pool": ".in-chennai-1.rgw.meta:users.swift",

"user_uid_pool": ".in-chennai-1.rgw.meta:users.uid",

"system_key": {

"access_key": "",

"secret_key": ""


"placement_pools": [


"key": "default-placement",

"val": {

"index_pool": ".in-chennai-1.rgw.buckets.index",

"data_pool": ".in-chennai-1.rgw.buckets.data",

"data_extra_pool": ".in-chennai-1.rgw.buckets.non-ec",

"index_type": 0,

"compression": ""




"key": "special-placement",

"val": {

"index_pool": ".in-chennai-1.rgw.buckets.index",

"data_pool": ".in-chennai-1.rgw.special.storage",

"data_extra_pool": ".in-chennai-1.rgw.buckets.non-ec",

"index_type": 0,

"compression": ""




"metadata_heap": "",

"tier_config": [],

"realm_id": "5f04c150-b66b-4d05-ba31-38354eaf6ec0"

# radosgw-admin bucket stats --bucket=ourbucket


"bucket": "ourbucket",

"zonegroup": "696b9fca-bf82-4030-85fd-9b4281241c66",

"placement_rule": "special-placement",

"explicit_placement": {

"data_pool": "",

"data_extra_pool": "",

"index_pool": ""


"id": "5221e79f-0ce1-4e9b-82d7-3b0095bba8ad.646695.2",

"marker": "5221e79f-0ce1-4e9b-82d7-3b0095bba8ad.646695.2",

"index_type": "Normal",

"owner": "rgwadmin",

"ver": "0#3",

"master_ver": "0#0",

"mtime": "2018-06-16 00:24:22.644511",

"max_marker": "0#",

"usage": {

"rgw.main": {

"size": 7707,

"size_actual": 8192,

"size_utilized": 7707,

"size_kb": 8,

"size_kb_actual": 8,

"size_kb_utilized": 8,

"num_objects": 1



"bucket_quota": {

"enabled": false,

"check_on_raw": false,

"max_size": -1,

"max_size_kb": 0,

"max_objects": -1


Notes on RGW Sytem Object State

RGW raw object store has following structure:

// rgw/rgw_rados.h
struct RGWRawObjState {
  rgw_raw_obj obj;
  bool has_attrs{false};
  bool exists{false};
  uint64_t size{0};
  ceph::real_time mtime;
  uint64_t epoch;
  bufferlist obj_tag;
  bool has_data{false};
  bufferlist data;
  bool prefetch_data{false};
  uint64_t pg_ver{0};

  /* important! don't forget to update copy constructor */

  RGWObjVersionTracker objv_tracker;

  map<string, bufferlist> attrset;
  RGWRawObjState() {}

Written with StackEdit.

Notes on RGW Request Path

The principle class is RGWOp. It defines request state, RGWRados store pointer.
A RGW request struct req_state has

  • Ceph contect
  • op type info
  • account, bucket info
  • zonegroup name
  • RGWBucketInfo bucket_info
  • RGWUserInfo *user
Op Execution

RGWGetObj::execute() is the primary execution context under the class RGWGetObj. It uses interfaces of class RGWRados::Object to perfrom I/O ops. The read op carries various information such as zone id, pg version, mod_ptr, object size etc.
Next, the RGWRados::Object::prepare( ) is called.

Written with StackEdit.

Notes on RGW Manifest

RGW maintains a manifest of each object. The class RGWObjManifest implements the details with object head, tail placement.
Manifest is written as XATTRs along with RGWRados::Object::Write::_do_write_meta( ).

 * Write/overwrite an object to the bucket storage.
 * bucket: the bucket to store the object in
 * obj: the object name/key
 * data: the object contents/value
 * size: the amount of data to write (data must be this long)
 * accounted_size: original size of data before compression, encryption
 * mtime: if non-NULL, writes the given mtime to the bucket storage
 * attrs: all the given attrs are written to bucket storage for the given object
 * exclusive: create object exclusively
 * Returns: 0 on success, -ERR# otherwise.

Written with StackEdit.

Notes on Ceph librados Client

Cluster Connection

  • A client is an application that uses librados to connect to a Ceph cluster.

  • It needs a cluster object populatd with cluster info (cluster name, info from ceph.conf)

  • Then the client do a rados_connect and cluster handle is populated.

  • A cluster handle can bind with different pools.

Cluster IO context

  • The I/O happens on a pool so the connection needs to bind to a pool.

  • The connection to a pool gives the client an I/O context.

  • The client only species an object name/xattr and librados maps it to a PG & OSD in the cluster.

  • An obhect write to rados require key, value, and value size.

  • librados::bufferlist is primarily used for storing object value.


Written with StackEdit.