Gluster Developer Guide - Part 1
Hello, this is first in series of blog posts that I will be doing as introduction to Gluster from developer perspective.
Some terminologies from Gluster world:
Some terminologies from Gluster world:
- Node: - A server in Gluster distributed system.
- Storage Pool: - A trusted set of Nodes you combine and gather all storage space from. We use gluster peer commands for performing addition/deletion of nodes to/from pool. The first gluster peer probe from one node with another nodes' IP performs an handshake between them and creates a pool of two nodes. Subsequent additions to the pool can be done by performing peer probe commands from one of the nodes in the trusted storage pool with IP of the node to be added.
- Brick: A brick is a filesystem partition on one of the nodes of the trusted storage pool. Some of the Gluster features may require special properties of brick, for example, snapshots require them to be a lvm logical volume from thin pool.
- Volume: A volume is the entity in Gluster terminology which is a combination of number of bricks in a certain configuration which provides a unique namespace to files. In perspective of an end user it is a simple filesystem.
Say I have two nodes in my setup
I will perform a peer probe of rastarserver2.rastarnet from rastarserver1.rastarnet to create a trusted storage pool of two nodes. I have two partitions on each nodes mounted at /gluster/export/brick1 and /gluster/export/brick2. On both the nodes brick1 is of size 1 TB and brick2 is of size 2 TB.
To create a volume in gluster, we use a gluster volume create command:
gluster volume create xcube rastarserver1.rastarnet:/
gluster/export/brick1 rastarserver2.rastarnet:/ gluster/export/brick1 rastarserver1.rastarnet:/ gluster/export/brick2 rastarserver2.rastarnet:/ gluster/export/brick2
Now, we have a volume of 4 bricks in default configuration(distribute only).
A distribute configuration means a file created on volume xcube will reside on 1 and only 1 of the 4 bricks which make the volume. Files get distributed evenly across the given set of bricks. Even distribution is ensured by using hash of the file names.
The concept of xlators:
If you visualize the whole system, a Gluster volume is a filesystem for a end user and nothing more; underneath it is composed of many partitions(hence filesystems). It showcases the idea of Gluster that is a transparent filesystem on top of existing partitions.
In the simplest case an open call entering the Gluster filesystem would end up calling open on one of the underlying filesystems.
In any non trivial case however, we would do some magic on(transformation of) the incoming data or incoming call or both before we hand over the call to underlying filesystem.
An xlator is a translator entity which defines what transformations it would perform as part of which syscalls.
To borrow from Gluster documentation directly
- A translator converts requests from users into requests for storage.*One to one, one to many, one to zero (e.g. caching)
- A translator can modify requests on the way through :convert one request type to another ( during the request transfer amongst the translators) modify paths, flags, even data (e.g. encryption)
- Translators can intercept or block the requests. (e.g. access control)
- Translators can spawn new requests (e.g. pre-fetch)
Hence a Gluster volume is a combination of a number of bricks and a number of xlators in a graph.
The bricks are the leaves of the graph and hence the graph has as many leaves as the bricks in the volume.
A system call that enters the Gluster Filesystem would enter from root node of graph and reach multiple leaves OR "one leaf" OR "none" and return value retraces the same path in reverse.
Once a volume has been created and started, we will have one glusterfsd(gluster filesystem daemon) process running per brick on its respective node.
In our example, we have two bricks on each node, so we will have two glusterfsd processes on each node.
The glusterfsd process represents a section of the graph starting from one brick at the bottom as a leaf node till some node but not all the way upto the root node.
The sole purpose of glusterfsd process is to listen for incoming operations on that particular brick. It has a protocol/server xlator at the top which is a RPC server implementation for the GlusterFS protocol. At the lower most part is the posix xlator which makes the call into the underlying filesystem.
Once a mount of the volume is done, we have a glusterfs process(Gluster filesystem process) running.
The glusterfs process represents a section of the graph starting from root node at the top till some child/children node but not all the way down to leaves.
It is responsibility of the glusterfs process to accept incoming syscalls/apicalls from users through a master xlator(fuse or gfapi or gNFS) and send it over to one or many glusterfsd processes through protocol/client xlator at the end which is a RPC client implementation of the GlusterFS protocol.
Combining all the sections of the graph from a mount point(glusterfs process) and all the brick processes(glusterfsd processes) should give us the graph corresponding to the volume we had created.