Thursday, December 5, 2019

Types of Models

There are 3 types of models that we do (ref: http://www.problemistics.org/courseware/toolbook/modelling.html)

  1. Iconic
  2. Analog
  3. Symbolic
    1. mathematical
    2. logical 
    3. adhoc
What was interesting for me here is that so far, I have looked upon models as Icons connected by symbolic logic e.g. via rules of logical inference (based on Set theory)

Monday, December 2, 2019

Interesting Personalities in Distributed Systems and Netwoking

I grew up not much interested in Science and Mathematics but my interest in those fields was piqued by a famous personality by the name Richard Feynman by reading the book : "Surely you are joking Mr. Feynman". This book somehow made me realize that mathematics and physics can be fun !

Similarly my interest in the fields of Signal processing, and computational theory and algorithms was guided by my teacher during undergrad. (Udayan Kanade), and subsequently I got interested in Computer Architecture, Video Encoding, Video Streaming, Algorithms, Optimization, Game Theory etc.

Over years, I have been trying to find out similar personalities that would get me interested in the fields of Networking and Distributed Systems. Here are some colorful personalities in the field of distributed Systems
(1) https://martin.kleppmann.com/ 

Sunday, December 1, 2019

Types of cache

Quick reminder about the 2 types of cache
(1) Lookaside Cache (note how Memcache uses the concept of Lease)
(2) Inline/Write through cache. (TAO is an example of a read-through, write-through cache)

https://blog.the-pans.com/different-ways-of-caching-in-distributed-system/

Wednesday, November 13, 2019

Measures of developer productivity

The most used metrics are

  1. lines of code (LOC) per unit time (Delorey, Knutson, & Chun, 2007; Maxwell, Van Wassenhove, & Dutta, Oct 1996) 
  2. Function points per unit time (Delorey et al., 2007; Maxwell & Forselius, 2000).
  3. Number of diffs landed per unit time (Facebook, Uber)

Tuesday, November 5, 2019

Monday, October 14, 2019

Golang tools

Dependency management tools such as Godep, Sltr, etc.
Go’s templating language
Go’s code generation tools, such as Stringer
Popular Go web frameworks, such as Revel
Router packages, such as Gorilla Mux
Godoc comments

Tuesday, October 8, 2019

Software Engineering Blog posts

Crypto Notes

https://wanguolin.github.io/assets/cryptography_and_network_security.pdf
GF(2^n) and it's use in AES: https://engineering.purdue.edu/kak/compsec/NewLectures/Lecture7.pdf (and uploading the pdf to my google drive, in case it goes missing in the future from the Purdue link)

Kubernetes Notes


  1. Getting Started: https://kubernetes.io/
  2. Cloud Native Computing Foundation:  https://www.cncf.io/
  3. Course to learn Kubernetes: https://courses.edx.org/courses/course-v1:LinuxFoundationX+LFS158x+2T2019/course/. As a first step install the software called Minikube ( see Instructions here: https://kubernetes.io/docs/setup/learning-environment/minikube/), as a part of which you will need to install kubectl 
Install go 1.12 (google for it), then clone kubernetes source code from git. We will start navigating the code base using the kubectl command, use the following to build kubectl e.g. on my mac
$ make WHAT='cmd/kubectl'
+++ [1014 17:55:59] Building go targets for darwin/amd64:
    cmd/kubectl 
which shows up in the output directory as :
./_output/local/go/bin/kubectl
./_output/local/bin/darwin/amd64/kubectl 
To look at the source code ; go to the directory cmd; you will see the following files that have an entrypoint main defined, which points out all the different things that the kubernetes product is built out of.

  • clicheck/check_cli_conventions.go:func main() {
  • cloud-controller-manager/controller-manager.go:func main() {
  • gendocs/gen_kubectl_docs.go:func main() {
  • genkubedocs/gen_kube_docs.go:func main() {
  • genman/gen_kube_man.go:func main() {
  • genswaggertypedocs/swagger_type_docs.go:func main() {
  • genyaml/gen_kubectl_yaml.go:func main() {
  • hyperkube/main.go:func main() {
  • importverifier/importverifier.go:func main() {
  • kube-apiserver/apiserver.go:func main() {
  • kube-controller-manager/controller-manager.go:func main() {
  • kube-proxy/proxy.go:func main() {
  • kube-scheduler/scheduler.go:func main() {
  • kubeadm/kubeadm.go:func main() {
  • kubectl/kubectl.go:func main() {
  • kubelet/kubelet.go:func main() {
  • kubemark/hollow-node.go:func main() {
  • linkcheck/links.go:func main() {
  • preferredimports/preferredimports.go:func main() {
  • verifydependencies/verifydependencies.go:func main() {

     

Sunday, September 22, 2019

Tcp timeouts

https://blog.cloudflare.com/when-tcp-sockets-refuse-to-die/amp/

Tuesday, September 17, 2019

Thread local storage

This is a good pictorial representation of how linux stores thread local storage, and how it can be accessed using the key used in pthread_setspecific (each thread uses the key to find out its local thread local storage block) e.g. http://weng-blog.com/2016/07/Linux-tls/

how to become a good programmer

Embedded C++

RTTI, dynamic memory allocation and exceptions are among the most hotly debated subjects in embedded circles. (ref: http://forums.codeguru.com/showthread.php?539611-I-have-7-days-to-prepare-for-an-Embedded-C-Interview-Any-tips-and-good-links)

RTTI is used by dynamic_cast to figure out if a base ptr can be reinterpreted as a derived class ptr . look at https://blog.feabhas.com/2013/09/casting-what-could-possibly-go-wrong/

Also look at this https://cs.nyu.edu/courses/fall16/CSCI-UA.0470-001/slides/MemoryLayoutMultipleInheritance.pdf for description about virtual inheritance and memory layouts therein.

Saturday, September 14, 2019

ipfs

Aim is to replace HTTP
https://ipfs.io/

Further, when reading about IPFs i came across Merkle trees, which are incidentally also used in Git to reduce the time for finding out what has changed between 2 branches.

This further took me to this paper : https://people.csail.mit.edu/silvio/Selected%20Scientific%20Papers/Zero%20Knowledge/Zero-Knowledge_Sets.pdf

Sunday, September 8, 2019

Thrift

Containers from scratch

Things to explore further:https://ericchiang.github.io/post/containers-from-scratch/
(1) chroot, and it does not have private namespaces
(1) creating "namespaces" with unshare
(2) entering namspace with nsenter
(3) network namesapces can be shared e.g. across containers.
(4) cgroup directories can be created in /sys/fs/cgroups, and then appropriate values configured. cgroups is a way for the kernel to have "controlled isolation"

Saturday, September 7, 2019

Why BGP is a better iGP

https://archive.nanog.org/meetings/nanog55/presentations/Monday/Lapukhov.pdf and the RFC is https://tools.ietf.org/html/draft-ietf-rtgwg-bgp-routing-large-dc-01

Note that normally, when BGP RIBs are exchanged by two routers, if both sit in the same AS, then while their RIB will show the path, it will be marked with an "i", and that path will not be advertised to the outside world. If we instead give an AS to each rack and then use BGP it can be still made to work because each ASN is considered a private ASN. 

Distributed Systems: Compute Infra.

Tuesday, September 3, 2019

Compute Platform comparison

I liked this comparison of platforms of (a) Virutalized compute resources vs. CaaS (taken from https://thenewstack.io/container-orchestration-scheduling-herding-computational-cattle/). In general modern cloud can be divided into
 Infrastructure as a Service (IaaS), Containers as a Service (CaaS), and Platform as a Service (PaaS).



Sunday, August 25, 2019

Messaging patterns in ZeroMQ

As outlined here http://www.aosabook.org/en/zeromq.html, one can look upon all messaging to fall in these categories
  1. Publish/Subscribe
  2. Synchronous Request/Reply
  3. Asynchronous Request/Reply
  4. Push/Pull
  5. Parallelised pipeline 
Refer to https://blog.scottlogic.com/2015/03/20/ZeroMQ-Quick-Intro.html for how this is used . The idea is to have "smart endpoint, dumb network", unlike Kafka which is a "dumb endpoint, smart network" model. The clients essentially connect to the server, and the server communicates to the clients based on the above messaging modes. Note that the clients and server can both be on the same machine (IPC) or even different threads in a process talking via an "endpoint" etc. 

Sunday, August 18, 2019

Twitter Infra

https://blog.twitter.com/engineering/en_us/topics/infrastructure/2017/the-infrastructure-behind-twitter-scale.html

Periscope infra:

(from https://qr.ae/TWrUa5 ) and a video https://www.youtube.com/watch?v=xjC3ZKYG74g
  1. Wowza Media Systems for streaming
  2. PubNub for the chatroom
  3. Circle CI and Travis CI
  4. Fabric
  5. Iron.io
  6. Algoria for search and indexing
  7. Slack

Tuesday, August 13, 2019

Scaling globally

There are 3 types of scalability issues that need to be addressed to scale to a global scale . Those are
  1. Network scalability & service discovery. 
  2. Compute scalability & virtualization
  3. Storage scalability
One will see that organizations that offer cloud as a service use all three of these scalabilities.

Network Scalability :
(A) Load Balancers
Refer to this blog which points out how modern L4/L7 load balancers work. 
https://blog.envoyproxy.io/introduction-to-modern-network-load-balancing-and-proxying-a57f6ff80236 . I have also seen L3 load balancers used via DNS (e.g. UltraDNS sitebacker pools).
In summary I have seen load balancers of the following types:
  • Proxy based load balancers
  1. L3 load balancing: DNS based load balancing via pools (round-robin) or via mapping changes (Akamai), or via Anycast (See this for how BGP makes this happen: https://www.imperva.com/blog/how-anycast-works/
  2. L4 load balancing via HAProxy (SSL termination via NGINX)
  3. L7 load balancing via HAProxy and a sidecar like Muttley (Uber) , which is essentially based on Healthchecks, Traffic controller rules, and Zookeeper nodes that are maintained at a /zone/service/ level , and updated when a particular service is deployed to a machine.
  • Client side load balancers:
  1. GRPC based load balancing is an example of client-based load balancing.  Refer to https://github.com/grpc/grpc/blob/master/doc/load-balancing.md. (I believe this could be done using something like a muttley sidecar too) 
(B) Service discovery :
When a service is deployed on a machine, it needs to be discoverable. This can be done in the following ways
  1. DNS based service discovery such as Mesos-DNS 
  2. DNS based service discovery using SRV records (See this https://docs.citrix.com/en-us/citrix-adc/13/dns/service-discovery-using-dns-srv-records.html
  3. Zookeeper based service discovery

Storage Scalability:
Refer to http://www.cloudbus.org/reports/DistributedStorageTaxonomy.pdf for a taxonomy of Distributed Storage Systems (DSS)
In summary, Distributed storage can be looked at from different perspectives. If we look at it from the point of view of "functionality" there is the following categorization:

  1. Archival: Provide persistent nonvolatile storage. Achieving reliability, even in the event of failure, supersedes all other objectives and data replication is a key instrument in achieving this
  2. General purpose Filesystem: Persistent nonvolatile  POSIX compliant filesystem e.g. NFS, CODA, xFS, 
  3. Publish/Share: More volatile, think peer-peer
  4. Performance: Operate in parallel over a fast network, typically will stripe data e.g. Zebra, 
  5. Federation middleware: Bring together various filesystems over a single API
  6. Custom: GFS (combination of many of the things above
Example
  1. DHT : Store the keys associated with a node in that node's DNS records (e.g. TXT record) and the node info is obtained via SRV record for that node (refer to : https://labs.spotify.com/2013/02/25/in-praise-of-boring-technology/)
Compute Scalability

There are 4 main categories of cluster workloads (ref: https://eng.uber.com/peloton/)

  1. Stateless jobs
  2. Stateful jobs
  3. Batch jobs
  4. Daemon jobs
The idea is to have these jobs scheduled diversely to a cluster. This is done using the tools such as Borg, YARN (slowly moving to Spark in the industry), Mesos and Kubernetes.

Thursday, May 2, 2019

Turing completeness using Mov

Apparently, turing completeness can be created using a single x86 instruction which is the MOV command:

https://drive.google.com/open?id=1cbnCSdBmkjEGxoScn2VtcR7SiC8hc45x 

Sunday, March 24, 2019

Type Systems in Computer programs

https://en.wikipedia.org/wiki/Type_system has good survey of the landscape. What I was looking for was duck typing (as happens in Golang)

Friday, March 15, 2019

An O(ND) Difference Algorithm and Its Variations

https://neil.fraser.name/writing/diff/myers.pdf  which is considered the best general purpose diff algorithm.  See this: https://github.com/google/diff-match-patch