Sunday, August 25, 2019

Messaging patterns in ZeroMQ

As outlined here http://www.aosabook.org/en/zeromq.html, one can look upon all messaging to fall in these categories
  1. Publish/Subscribe
  2. Synchronous Request/Reply
  3. Asynchronous Request/Reply
  4. Push/Pull
  5. Parallelised pipeline 
Refer to https://blog.scottlogic.com/2015/03/20/ZeroMQ-Quick-Intro.html for how this is used . The idea is to have "smart endpoint, dumb network", unlike Kafka which is a "dumb endpoint, smart network" model. The clients essentially connect to the server, and the server communicates to the clients based on the above messaging modes. Note that the clients and server can both be on the same machine (IPC) or even different threads in a process talking via an "endpoint" etc. 

Sunday, August 18, 2019

Twitter Infra

https://blog.twitter.com/engineering/en_us/topics/infrastructure/2017/the-infrastructure-behind-twitter-scale.html

Periscope infra:

(from https://qr.ae/TWrUa5 ) and a video https://www.youtube.com/watch?v=xjC3ZKYG74g
  1. Wowza Media Systems for streaming
  2. PubNub for the chatroom
  3. Circle CI and Travis CI
  4. Fabric
  5. Iron.io
  6. Algoria for search and indexing
  7. Slack

Tuesday, August 13, 2019

Scaling globally

There are 3 types of scalability issues that need to be addressed to scale to a global scale . Those are
  1. Network scalability & service discovery. 
  2. Compute scalability & virtualization
  3. Storage scalability
One will see that organizations that offer cloud as a service use all three of these scalabilities.

Network Scalability :
(A) Load Balancers
Refer to this blog which points out how modern L4/L7 load balancers work. 
https://blog.envoyproxy.io/introduction-to-modern-network-load-balancing-and-proxying-a57f6ff80236 . I have also seen L3 load balancers used via DNS (e.g. UltraDNS sitebacker pools).
In summary I have seen load balancers of the following types:
  • Proxy based load balancers
  1. L3 load balancing: DNS based load balancing via pools (round-robin) or via mapping changes (Akamai), or via Anycast (See this for how BGP makes this happen: https://www.imperva.com/blog/how-anycast-works/
  2. L4 load balancing via HAProxy (SSL termination via NGINX)
  3. L7 load balancing via HAProxy and a sidecar like Muttley (Uber) , which is essentially based on Healthchecks, Traffic controller rules, and Zookeeper nodes that are maintained at a /zone/service/ level , and updated when a particular service is deployed to a machine.
  • Client side load balancers:
  1. GRPC based load balancing is an example of client-based load balancing.  Refer to https://github.com/grpc/grpc/blob/master/doc/load-balancing.md. (I believe this could be done using something like a muttley sidecar too) 
(B) Service discovery :
When a service is deployed on a machine, it needs to be discoverable. This can be done in the following ways
  1. DNS based service discovery such as Mesos-DNS 
  2. DNS based service discovery using SRV records (See this https://docs.citrix.com/en-us/citrix-adc/13/dns/service-discovery-using-dns-srv-records.html
  3. Zookeeper based service discovery

Storage Scalability:
Refer to http://www.cloudbus.org/reports/DistributedStorageTaxonomy.pdf for a taxonomy of Distributed Storage Systems (DSS)
In summary, Distributed storage can be looked at from different perspectives. If we look at it from the point of view of "functionality" there is the following categorization:

  1. Archival: Provide persistent nonvolatile storage. Achieving reliability, even in the event of failure, supersedes all other objectives and data replication is a key instrument in achieving this
  2. General purpose Filesystem: Persistent nonvolatile  POSIX compliant filesystem e.g. NFS, CODA, xFS, 
  3. Publish/Share: More volatile, think peer-peer
  4. Performance: Operate in parallel over a fast network, typically will stripe data e.g. Zebra, 
  5. Federation middleware: Bring together various filesystems over a single API
  6. Custom: GFS (combination of many of the things above
Example
  1. DHT : Store the keys associated with a node in that node's DNS records (e.g. TXT record) and the node info is obtained via SRV record for that node (refer to : https://labs.spotify.com/2013/02/25/in-praise-of-boring-technology/)
Compute Scalability

There are 4 main categories of cluster workloads (ref: https://eng.uber.com/peloton/)

  1. Stateless jobs
  2. Stateful jobs
  3. Batch jobs
  4. Daemon jobs
The idea is to have these jobs scheduled diversely to a cluster. This is done using the tools such as Borg, YARN (slowly moving to Spark in the industry), Mesos and Kubernetes.