Clustering

../../_images/example-sharding-db.png

Sharding with a Baratine Cluster

Overview

Baratine clustering is used to assign servers to applications with the following benefits:

  • Microservices can failover to backup servers for reliability
  • Replication of shared data
  • Partitioning of services across many servers for scalability

There are two cluster processes in Baratine to make this work:

  • Join servers into a cluster with heartbeats between seed servers
  • Allocate servers to application with a Pod assigned to each application.

Services are deployed to pods, which is Baratine’s terminology for a virtual cluster. A pod is a subset of servers within a physical Baratine cluster.

Within each pod are virtual nodes, which map to physical servers. A service instance is deployed to each virtual node. Baratine will hash the URL and send the request to the right instance. For example, “/hello/abc” might be hashed to node 2 and “/hello/def” to node 3. As a consequence, a service that doesn’t support children (i.e. doesn’t implement @OnLookup) will always hash to the same node.

Baratine’s cluster concepts break down into the following:

  • seed servers to join all servers into a cluster, kept together by periodic heartbeats
  • pods for application deployment. A pod is a set of server, a virtual cluster, where an application is deployed.
  • BFS, Bartender File System, is a distributed filesystem across the cluster for shared configuration and data.
  • Client connections. Java clients can either participate in the cluster like servlet web-apps, or they can act as standalone clients.

Seed Servers

Baratine servers join together into a cluster by contacting seed servers.

Seed servers are either configured in a baratine.cf or specified on the command line. When a server starts, it contacts the seed servers, including itself if the server is a seed server. The seeds join the servers together with periodic heartbeats.

For deployments with more than three servers, it’s recommended that at least three servers be specified as seed servers. For two or one servers, it’s recommended that all be specified as a seed.

Cluster Heartbeats

Baratine servers send heartbeat messages to the seed servers periodically. The heartbeat ensures that the cluster knows which servers are alive and that all servers know about the current pod configuration.

baratine.cf

The baratine configuration file baratine.cf specifies the seed servers as well as the default port to be used for non-seed servers. The same configuration is used for all servers. A seed server will recognized itself in the configuration.

In the following example, the first three servers are the seed servers. Any additional server will use 8085 as its default port.

baratine.cf:

cluster {
  server 10.0.0.1 8085;
  server 10.0.0.2 8085;
  server 10.0.0.3 8085;
  server 8085;
}

The server is started with the --conf command line option:

$ baratine start --conf baratine.cf

–seed-server

The seed servers can also be set on the command-line. All servers should use the same set of seed servers, including the seed servers. A seed server must know that it’s a seed server, because it should not be promoted to seed by another server.

$ baratine start --seed-server 10.0.0.1:8085 \
                 --seed-server 10.0.0.2:8085

WEB-INF/baratine.cf - servlet web-app client configuration

When a servlet web-app is the client to a Baratine cluster, the client needs to know about the seed servers. The configuration baratine.cf is the same configuration file, although it’s recommended to use a different default port.

WEB-INF/baratine.cf:

cluster {
  server 10.0.0.1 8085;
  server 10.0.0.2 8085;
  server 10.0.0.3 8085;
  server 8084;
}

Heartbeat

Servers in a Baratine cluster send periodic heartbeat messages to a seed hub to verify which servers are live. The heartbeats ensure that all servers know which servers are alive and share pod assignment.

The heartbeats share pod configuration, managed by a root server that is assigned from the seed servers. The root server is dynamic. If one root server stops, a new root will be assigned.

Pod

A Baratine pod is a virtual cluster of servers assigned to an application.

The pod is the basis for failover, replication and scaling.

Pod Types

By default, a pod is of the solo type, where only one node will be active in the pod. A node holds one service instance. The available pod types are:

  • off: the pod is in the off state
  • solo: 1 service instance
  • pair: 2 service instances
  • triad: 3 service instances
  • cluster: m x N service instances, where N is the size of the cluster, and m is instances per node

Solo Pod

A solo pod is used for failover of a microservice. In a solo pod, one server is the primary and another two are backup servers. Solo is the default pod type.

The following configuration file is a full assignment of servers to a solo pod. Server “10.0.0.1” is the primary. While it is active, it will process all requests to its services. If “10.0.0.1” crashes or is taken down for maintenance, the service will failover to “10.0.0.2”.

bfs:///config/pods/00-my-pod.cf:

pod my-pod type="solo" {
  server 10.0.0.1 8085;
  server 10.0.0.2 8085;
  server 10.0.0.3 8085;
}

The configuration file can be uploaded to BFS with the following command:

$ baratine put 00-my-pod.cf /config/pods/00-my-pod.cf

If the servers are not assigned by a configuration file, Baratine will assign servers to the pod automatically.

../../_images/example-microservice-failover.png

Microservice failover in a solo pod

Pair Pod

A pair pod splits services by URL across two servers. An optional third server acts as a failover server. In a pair pod, both servers handle requests, but for different URLs.

The partitioning is by URL. “/auction/2704” might belong to server A, while “/auction/2705” might belong to server B. For services with many URLs like an auction, the load will be evenly split.

../../_images/example-sharding-db.png

Pair Pod that Partitions Data

Configuration might look like the following. Servers “10.0.0.1” and “10.0.0.2” split requests by URL. The third server “10.0.0.3” is a backup in case either server fails.

bfs:///config/pods/00-my-pod.cf:

pod my-pod type="pair" {
  server 10.0.0.1 8085;
  server 10.0.0.2 8085;
  server 10.0.0.3 8085;
}

The configuration file can be uploaded to BFS with the following command:

$ baratine put 00-my-pod.cf /config/pods/00-my-pod.cf

When the configuation file is uploaded, Baratine will read it and update the active pods.

Triad Pod

A triad pod splits services by URL across three servers. In a triad pod, all three servers handle requests, but for different URLs.

The triad pod has an advantage that when one server is taken down for maintenance, it is still backed up by two servers. In a triad a server going down only increases backup load by 50% instead of doubling the load.

The partitioning is by URL. “/auction/2704” might belong to server A, while “/auction/2705” might belong to server C. For services with many URLs like an auction, the load will be evenly split.

Configuration might look like the following. Servers “10.0.0.1”, “10.0.0.2” and “10.0.0.3” split requests by URL.

bfs:///config/pods/00-my-pod.cf:

pod my-pod type="triad" {
  server 10.0.0.1 8085;
  server 10.0.0.2 8085;
  server 10.0.0.3 8085;
}

The configuration file can be uploaded to BFS with the following command:

$ baratine put 00-my-pod.cf /config/pods/00-my-pod.cf

When the configuation file is uploaded, Baratine will read it and update the active pods.

Cluster Pod

The cluster pod is used when the pod is expected to grow to large numbers of servers. A cluster pod can still have one or two servers when the load is small.

Configuration might look like the following. Servers “10.0.0.1”, “10.0.0.2”, “10.0.0.3”, and “10.0.0.4” split requests by URL.

bfs:///config/pods/00-my-pod.cf:

pod my-pod type="cluster" {
  server 10.0.0.1 8085;
  server 10.0.0.2 8085;
  server 10.0.0.3 8085;
  server 10.0.0.4 8085;
}

Pod Node

Each pod is split into multiple nodes, which serve the URLs for services assigned to the pod. A solo pod has one node. Pair pods have two nodes.

Service URLs are hashed and assigned to the nodes. For an auction service with multiple instances, “/auction/2704” might be assigned to node 3 while “/auction/2705” might be assigned to node 5.

Each node has a primary, secondary, and tertiary server for failover.

Special Pods

Some internal pods are created by Baratine automatically and are used to manage the cluster and its data.

local

The local pod is a special pod local to the server itself. It always consists of the local server.

The local pod is used for local databases.

cluster_root

cluster_root is the heartbeat pod. It is a solo pod with multiple backup servers. The root server is the manager for the cluster. All configuration changes are managed by the root.

cluster_hub

cluster_hub is the basis for distributed storage. BFS, Baratine’s distributed filesystem, uses the cluster_hub to store and replicate its data.

cluster

The cluster pod contains all servers in the cluster.

client

The client pod is a special pod that servlet web-app clients are assigned to. Unlike other servers, clients cannot have applications assigned to them.

Baratine File System

The internal pod configuration is stored in Baratine’s distributed filesystem, BFS. BFS is replicated on the cluster_hub and distributed to all servers in the cluster.

Pod assignment is configured in BFS, in the /config/pods directory. Even the automatic assignment of servers to pods has a file in that directory.

Application deployment also is in BFS. The /config/pods directory configures the application, while the jar files are typically in /usr/lib/pods. All servers have access to all deployed applictions. This distribution is uses for cross-pod lambda calls.

BFS is accessible from the Baratine command line.

$ baratine put 00-my-pod.cf /config/pods/00-my-pod.cf
$ baratine get /config/pods/00-my-pod.cf 00-my-pod.cf
$ baratine cat /config/pods/00-my-pod.cf
$ baratine ls /config/pods

/proc debugging

/proc filesystem in BFS shows current server data. For clustering, there are two important systems:

  • /proc/servers - shows all servers known to the current server
  • /proc/pods - shows all pods known to the current pod

They are accessible through BFS commands:

* baratine cat /proc/servers
* baratine cat /proc/pods
* baratine cat /proc/pods/pod

Failover

../../_images/example-microservice-failover.png

Microservice failover in a solo pod

Each Pod Node has a primary, secondary, and tertiary server. When the primary fails, the secondary server starts its own node, and accepts service calls. When the primary is restored, the secondary will shut down and transfer calls back to the primary.

Data Replication and Failover

When an application uses Baratine’s database, replication is automatic. Inserts into the database are copied to the secondary and tertiary servers.

When failover occurs, the secondary server reads its replicated data.

@Journal Replication and Failover

A @Journal annotation on service sends a copy of the journal to the secondary server for each @Modify call.

When a primary server fails over, the secondary replays all journal entries that were not saved by an @OnSave annotation.

Data Partitioning (Sharding)

../../_images/example-sharding-db.png

Sharding with a Baratine Pod

Sharding is an extension of pod failover, where services are split across multiple nodes. Each service address belongs to exactly one pod node. Because the service is unique, it can take advantage of its ownership of its data for in-memory operation.

Sharding can be used with two modes:

  • Service addresses are automatically sharded by the “pod:” scheme.
  • The ServiceRef node() method can specify a node for the “pod:” scheme.

Typically the address sharding is more commonly used, because it’s simpler and requires no extra code. Clients may not even know that their calls are sharded.

For example, an auction proxy might be created as follows:

Auction auction = manager.lookup("pod://auction/auction/2704")
                         .as(Auction.class);

The auction client does not need to know which server is handling the auction.