What is ETag?

A web client (a browser) requests a resource from a web server.
Multiple calls for a resource would hit the server everytime and
server sends the response with return code 200.

What if the requested data is unchanged for most of the calls? Could the client somehow help server with the previous value of the requested resource?

Etags:

  • The purpose of ETag is to Save bandwidth and utilize client side caching.
  • The client gets a hash of content for the first time request of a resource.
  • The next call from the client has the ETag=Received hash value.
  • Server checks if the value of the requested respurce has modified. So it recalculates the hash and compares it against the received hash.
  • If Etag is same, server returns 304.

Written with StackEdit.

Advertisements

Golang Runtime and Concurrency

  • Golang uses a user-space component (runtime) linked to the executable.
  • The runtime is written in C.
  • It has implementation of scheduler, goroutine management and OS-threads management.
  • Per go process, there is a max limit of OS threads.
  • Go runtime schedules N goroutines on M OS threads
  • One goroutine runs exactly on one thread.
  • A goroutine can get blocked (e.g. on a syscall) and blocks the OS-thread too.

References

Linux Device Driver Development: Block Device Driver

It is my very first interaction with Linux kernel at device driver level. My objective is to develop a block device driver, very simple, that just forward I/O requests to a virtual device. This post explains my observations limited to attacking the problem.

Block v/s Character Device

Linux support block and character device drivers. Only block devices can host and support a filesystem. Block devices support random read/write operations. Each block is composed of sectors, usually 512 bytes long and uniquely addressable. Block is a logical entity. Filesystems usually use 4096 bytes blocks (8*512) or 8 sectors. In Linux kernel, a block device is represented as a logical entity (actually just a C structure). So, we can export anything as a device as long as we can facilitate read/writes operations on sector level.

Device driver is the layer that glues Linux kernel and the device. Kernel receives device targeted I/O requests from an application. All I/O requests pass through buffer cache and I/O scheduler. The latter arranges I/O requests optimally to improve seek time, assuming requests would run on a disk. In fact, Linux kernel has various I/O schedulers and hence multiple type of I/O request order could exist.

A device driver always implement a request queue. The Linux I/O scheduler enqueues requests in driver’s queue. How to serve these requests? That is device driver’s headache. The request queue is represented by the request_queue structure and is defined in “blkdev.h". Driver dequeues requests from this queue and send them to device. It then acknowledgement to each requests with error status.

If a device do not need optimal I/O order, it may opt for direct handing of I/O requests. An excellent example of such driver is loopback driver (loop.c, loop,h). It handles struct bio that stands for block I/O. A bio structure is a scatter gather list of page aligned buffer (usually 4K). Handling of bio structure is almost same as a struct req.

What are requirements for my driver

 

  • Runs on flash storage drives
  • Perform plain I/O forwarding
  • Minimal overhead, minimal code size

In my next post, I will discuss design of my driver.

Mongo DB: Good to know things

  • Mongo DB is a No-SQL, free, open-source solution that is highly scalable, highly available and high performance solution.
  • Engine is coded in C++
  • Works in a client-server model
  • Major components:
    • mongod: The storage server
    • mongos: The sharding server
    • config server(s):
      • Stores metadata that accomplish sharding
      • Is actually a mongod process
  • Mongo provides write operations durability with journaling (write ahead logging)
  • User data is seen as a database of collection of records
    • Collection is roughly similar to a table in RDBMS
    • Record could be map to a row in a table (incorrect but helps understanding)
  • Mongo stores data in BSON format (on-wire and on-disk)

Rules for abort handling

An “Abort” is a special type of error in a system, usually injected by an external actor. In a multi-threaded application, managing abort requests becomes pain. I am sharing a few observations that could improve/minimize mistakes.

  • Implement one single handler for abort requests
  • Outside the handler, if a thread is going to wait, and abort may arrive in the meanwhile; then thread should check for abort as the first task
  • Use locks if we are determining abort with a flag
  • Never mix the abort path with a regular path in application. It is not wise to scatter abort related functionality among other threads

A design of abort handling module

The handling of abort of a operation is essential for a software. An abort represents:

  • An error condition
    • Internal errors
    • Subsystem errors
  • A user requested abort

The requirements of abort handling are:

  • Quickness: The ability to respond to an abort
  • Reliability: The measure of abort getting accepted at any phase of software execution
  • Robust: The resource cleanup and rendering system in an usable, stable state (no panic, exception or error)

Assume that there are two threads/ processes. One thread performs the job and another accepts request from the user. Abort can be requested from the job thread or job thread itself get aborted (error or another subsystem/layer error).

There are at least two approaches for abort handling that I have found in my experience:

  1. Using a check at multiple points in the execution path
  2. if (true == is_aborted) {
    goto exit_fun1;
    }
    
  3. Treating abort as an event and creating an event-handler
abort_me()
{
  enqueue(abort_event);
}
abort_handler(event e)
{
  case USER_ABORT:
    ....
    break;
  case INTERNAL_ABORT:
    ....
    break;
  default:
}

The first approach results litter in the code. Also, we need a variable acting as a flag to figure out occurrance of abort event. The use of flag needs a lock to get atomic value of abort flag. It make your code messy in a long run and things become difficult to manage and understand. A sample code could look as follows:

a()
{
  is_aborted();
  b();
  is_aborted();
  c();
  is_aborted();
}
b()
{
  is_aborted();
}

The latter approach is based on event handling. It needs more work. But it is cleaner and more intuitive. One possible implementation is by representing the software as a state machine. In a state machine, all events are managed by at least one queue. The state handler handles an abort event by switching to a state (named as “abort”/”error”). The handling of cleanup and appropriate action finds place in this handler.

state_machine()
{
   e = dequeue_event();

   switch(cur_state) {
        state_A:
            A_handler();
            break;
        state_B:
            B_handler();
            break;
        state_Abort:
            Abort_handler();
            break;
        default:
            assert(0);
   }
}

This model is discrete, predictable and manageable. We do not require a lock and flag checks. Thus we have better control and visibility of where an abort happened and where it was handled.