Some Notes: 2019

Thursday, December 5, 2019

Types of Models

There are 3 types of models that we do (ref: http://www.problemistics.org/courseware/toolbook/modelling.html)

Iconic
Analog
Symbolic

mathematical
logical
adhoc

What was interesting for me here is that so far, I have looked upon models as Icons connected by symbolic logic e.g. via rules of logical inference (based on Set theory)

Monday, December 2, 2019

Interesting Personalities in Distributed Systems and Netwoking

I grew up not much interested in Science and Mathematics but my interest in those fields was piqued by a famous personality by the name Richard Feynman by reading the book : "Surely you are joking Mr. Feynman". This book somehow made me realize that mathematics and physics can be fun !

Similarly my interest in the fields of Signal processing, and computational theory and algorithms was guided by my teacher during undergrad. (Udayan Kanade), and subsequently I got interested in Computer Architecture, Video Encoding, Video Streaming, Algorithms, Optimization, Game Theory etc.

Over years, I have been trying to find out similar personalities that would get me interested in the fields of Networking and Distributed Systems. Here are some colorful personalities in the field of distributed Systems
(1) https://martin.kleppmann.com/

Sunday, December 1, 2019

Types of cache

Quick reminder about the 2 types of cache
(1) Lookaside Cache (note how Memcache uses the concept of Lease)
(2) Inline/Write through cache. (TAO is an example of a read-through, write-through cache)

https://blog.the-pans.com/different-ways-of-caching-in-distributed-system/

Wednesday, November 13, 2019

Measures of developer productivity

The most used metrics are

lines of code (LOC) per unit time (Delorey, Knutson, & Chun, 2007; Maxwell, Van Wassenhove, & Dutta, Oct 1996)
Function points per unit time (Delorey et al., 2007; Maxwell & Forselius, 2000).
Number of diffs landed per unit time (Facebook, Uber)

Wednesday, November 6, 2019

Cloud Connectivity Patterns

https://blogs.oracle.com/cloud-platform/cloud-to-on-premise-connectivity-patterns-v2

Tuesday, November 5, 2019

Postgres internals

http://www.interdb.jp/pg/

Finding out what the world is working on

Came across this link : https://stackify.com/popular-programming-languages-2018/ which pointed me to these two links

https://www.tiobe.com/tiobe-index/ the languages that are popular today
https://octoverse.github.com/projects the projects that are popular today.

Monday, October 14, 2019

Golang tools

Dependency management tools such as Godep, Sltr, etc.

Go’s templating language

Go’s code generation tools, such as Stringer

Popular Go web frameworks, such as Revel

Router packages, such as Gorilla Mux

Godoc comments

Wednesday, October 9, 2019

istio blogs

https://hackernoon.com/distributed-tracing-with-envoy-service-mesh-jaeger-c365b6191592

Tuesday, October 8, 2019

Software Engineering Blog posts

1. Paul Graham: http://www.paulgraham.com/ambitious.html
2. Joel https://www.joelonsoftware.com/

Crypto Notes

https://wanguolin.github.io/assets/cryptography_and_network_security.pdf
GF(2^n) and it's use in AES: https://engineering.purdue.edu/kak/compsec/NewLectures/Lecture7.pdf (and uploading the pdf to my google drive, in case it goes missing in the future from the Purdue link)

Kubernetes Notes

Getting Started: https://kubernetes.io/
Cloud Native Computing Foundation: https://www.cncf.io/
Course to learn Kubernetes: https://courses.edx.org/courses/course-v1:LinuxFoundationX+LFS158x+2T2019/course/. As a first step install the software called Minikube ( see Instructions here: https://kubernetes.io/docs/setup/learning-environment/minikube/), as a part of which you will need to install kubectl

Install go 1.12 (google for it), then clone kubernetes source code from git. We will start navigating the code base using the kubectl command, use the following to build kubectl e.g. on my mac

$ make WHAT='cmd/kubectl'

+++ [1014 17:55:59] Building go targets for darwin/amd64:
cmd/kubectl

which shows up in the output directory as :

./_output/local/go/bin/kubectl
./_output/local/bin/darwin/amd64/kubectl

To look at the source code ; go to the directory cmd; you will see the following files that have an entrypoint main defined, which points out all the different things that the kubernetes product is built out of.

clicheck/check_cli_conventions.go:func main() {
cloud-controller-manager/controller-manager.go:func main() {
gendocs/gen_kubectl_docs.go:func main() {
genkubedocs/gen_kube_docs.go:func main() {
genman/gen_kube_man.go:func main() {
genswaggertypedocs/swagger_type_docs.go:func main() {
genyaml/gen_kubectl_yaml.go:func main() {
hyperkube/main.go:func main() {
importverifier/importverifier.go:func main() {
kube-apiserver/apiserver.go:func main() {
kube-controller-manager/controller-manager.go:func main() {
kube-proxy/proxy.go:func main() {
kube-scheduler/scheduler.go:func main() {
kubeadm/kubeadm.go:func main() {
kubectl/kubectl.go:func main() {
kubelet/kubelet.go:func main() {
kubemark/hollow-node.go:func main() {
linkcheck/links.go:func main() {
preferredimports/preferredimports.go:func main() {
verifydependencies/verifydependencies.go:func main() {

Wednesday, October 2, 2019

iptables / netfilter on linux

Have a look at this picture for how iptables works. Importantly note https://danielmiessler.com/study/iptables/: its made up of tables, chains and targets.
the flow http://xkr47.outerspace.dyndns.org/netfilter/packet_flow/

ref: https://www.youtube.com/watch?v=iP8YWcvKDr0

Language release history

https://golang.org/doc/devel/release.html

Sunday, September 22, 2019

Tcp timeouts

https://blog.cloudflare.com/when-tcp-sockets-refuse-to-die/amp/

Tuesday, September 17, 2019

Thread local storage

This is a good pictorial representation of how linux stores thread local storage, and how it can be accessed using the key used in pthread_setspecific (each thread uses the key to find out its local thread local storage block) e.g. http://weng-blog.com/2016/07/Linux-tls/

how to become a good programmer

I liked this advice.
Write code 3 times !
(https://www.javaworld.com/article/2072651/becoming-a-great-programmer--use-your-trash-can.html)

Embedded C++

RTTI, dynamic memory allocation and exceptions are among the most hotly debated subjects in embedded circles. (ref: http://forums.codeguru.com/showthread.php?539611-I-have-7-days-to-prepare-for-an-Embedded-C-Interview-Any-tips-and-good-links)

RTTI is used by dynamic_cast to figure out if a base ptr can be reinterpreted as a derived class ptr . look at https://blog.feabhas.com/2013/09/casting-what-could-possibly-go-wrong/

Also look at this https://cs.nyu.edu/courses/fall16/CSCI-UA.0470-001/slides/MemoryLayoutMultipleInheritance.pdf for description about virtual inheritance and memory layouts therein.

Saturday, September 14, 2019

ipfs

Aim is to replace HTTP
https://ipfs.io/

Further, when reading about IPFs i came across Merkle trees, which are incidentally also used in Git to reduce the time for finding out what has changed between 2 branches.

This further took me to this paper : https://people.csail.mit.edu/silvio/Selected%20Scientific%20Papers/Zero%20Knowledge/Zero-Knowledge_Sets.pdf,

Monday, September 9, 2019

How facebook scaled up memcached

https://www.facebook.com/notes/facebook-engineering/scaling-memcached-at-facebook/39391378919/
https://www.usenix.org/system/files/conference/nsdi13/nsdi13-final170_update.pdf

Memcached is available on github at https://github.com/memcached/memcached

Ref: https://medium.com/@SkyscannerEng/journey-to-the-centre-of-memcached-b239076e678a and picture from there:

Another reference is https://medium.com/@Alibaba_Cloud/redis-vs-memcached-in-memory-data-storage-systems-3395279b0941,

Sunday, September 8, 2019

Thrift

https://github.com/apache/thrift

Containers from scratch

Things to explore further:https://ericchiang.github.io/post/containers-from-scratch/
(1) chroot, and it does not have private namespaces
(1) creating "namespaces" with unshare
(2) entering namspace with nsenter
(3) network namesapces can be shared e.g. across containers.
(4) cgroup directories can be created in /sys/fs/cgroups, and then appropriate values configured. cgroups is a way for the kernel to have "controlled isolation"

Saturday, September 7, 2019

Why BGP is a better iGP

https://archive.nanog.org/meetings/nanog55/presentations/Monday/Lapukhov.pdf and the RFC is https://tools.ietf.org/html/draft-ietf-rtgwg-bgp-routing-large-dc-01

Note that normally, when BGP RIBs are exchanged by two routers, if both sit in the same AS, then while their RIB will show the path, it will be marked with an "i", and that path will not be advertised to the outside world. If we instead give an AS to each rack and then use BGP it can be still made to work because each ASN is considered a private ASN.

Distributed Systems: Compute Infra.

Google's paper about Borg: https://storage.googleapis.com/pub-tools-public-publication-data/pdf/43438.pdf

https://ericchiang.github.io/post/containers-from-scratch/

Wednesday, September 4, 2019

Job scheduling in Mesos and Kubernetes

https://stackoverflow.com/questions/44130725/differences-in-scheduling-between-mesos-and-kubernetes (which points to https://medium.com/@ArmandGrillet/comparison-of-container-schedulers-c427f4f7421)

and
https://stackoverflow.com/questions/43076831/dcos-cluster-resource-allocation-is-np-hard/43448790#43448790 (which suggests that Mesosphere Marathon uses first-fit bin packing algorithm)

Tuesday, September 3, 2019

Compute Platform comparison

I liked this comparison of platforms of (a) Virutalized compute resources vs. CaaS (taken from https://thenewstack.io/container-orchestration-scheduling-herding-computational-cattle/). In general modern cloud can be divided into
Infrastructure as a Service (IaaS), Containers as a Service (CaaS), and Platform as a Service (PaaS).

Sunday, August 25, 2019

Messaging patterns in ZeroMQ

As outlined here http://www.aosabook.org/en/zeromq.html, one can look upon all messaging to fall in these categories

Publish/Subscribe
Synchronous Request/Reply
Asynchronous Request/Reply
Push/Pull
Parallelised pipeline

Refer to https://blog.scottlogic.com/2015/03/20/ZeroMQ-Quick-Intro.html for how this is used . The idea is to have "smart endpoint, dumb network", unlike Kafka which is a "dumb endpoint, smart network" model. The clients essentially connect to the server, and the server communicates to the clients based on the above messaging modes. Note that the clients and server can both be on the same machine (IPC) or even different threads in a process talking via an "endpoint" etc.

Sunday, August 18, 2019

Twitter Infra

https://blog.twitter.com/engineering/en_us/topics/infrastructure/2017/the-infrastructure-behind-twitter-scale.html

Periscope infra:

(from https://qr.ae/TWrUa5 ) and a video https://www.youtube.com/watch?v=xjC3ZKYG74g

Wowza Media Systems for streaming
PubNub for the chatroom
Circle CI and Travis CI
Fabric
Iron.io
Algoria for search and indexing
Slack

Tuesday, August 13, 2019

Scaling globally

There are 3 types of scalability issues that need to be addressed to scale to a global scale . Those are

Network scalability & service discovery.
Compute scalability & virtualization
Storage scalability

One will see that organizations that offer cloud as a service use all three of these scalabilities.

Network Scalability :
(A) Load Balancers

Refer to this blog which points out how modern L4/L7 load balancers work.

https://blog.envoyproxy.io/introduction-to-modern-network-load-balancing-and-proxying-a57f6ff80236 . I have also seen L3 load balancers used via DNS (e.g. UltraDNS sitebacker pools).
In summary I have seen load balancers of the following types:

Proxy based load balancers

L3 load balancing: DNS based load balancing via pools (round-robin) or via mapping changes (Akamai), or via Anycast (See this for how BGP makes this happen: https://www.imperva.com/blog/how-anycast-works/)
L4 load balancing via HAProxy (SSL termination via NGINX)
L7 load balancing via HAProxy and a sidecar like Muttley (Uber) , which is essentially based on Healthchecks, Traffic controller rules, and Zookeeper nodes that are maintained at a /zone/service/ level , and updated when a particular service is deployed to a machine.

Client side load balancers:

GRPC based load balancing is an example of client-based load balancing. Refer to https://github.com/grpc/grpc/blob/master/doc/load-balancing.md. (I believe this could be done using something like a muttley sidecar too)

(B) Service discovery :

When a service is deployed on a machine, it needs to be discoverable. This can be done in the following ways

DNS based service discovery such as Mesos-DNS
DNS based service discovery using SRV records (See this https://docs.citrix.com/en-us/citrix-adc/13/dns/service-discovery-using-dns-srv-records.html)
Zookeeper based service discovery

Storage Scalability:
Refer to http://www.cloudbus.org/reports/DistributedStorageTaxonomy.pdf for a taxonomy of Distributed Storage Systems (DSS)
In summary, Distributed storage can be looked at from different perspectives. If we look at it from the point of view of "functionality" there is the following categorization:

Archival: Provide persistent nonvolatile storage. Achieving reliability, even in the event of failure, supersedes all other objectives and data replication is a key instrument in achieving this
General purpose Filesystem: Persistent nonvolatile POSIX compliant filesystem e.g. NFS, CODA, xFS,
Publish/Share: More volatile, think peer-peer
Performance: Operate in parallel over a fast network, typically will stripe data e.g. Zebra,
Federation middleware: Bring together various filesystems over a single API
Custom: GFS (combination of many of the things above

Example

DHT : Store the keys associated with a node in that node's DNS records (e.g. TXT record) and the node info is obtained via SRV record for that node (refer to : https://labs.spotify.com/2013/02/25/in-praise-of-boring-technology/)

Compute Scalability

There are 4 main categories of cluster workloads (ref: https://eng.uber.com/peloton/)

Stateless jobs
Stateful jobs
Batch jobs
Daemon jobs

The idea is to have these jobs scheduled diversely to a cluster. This is done using the tools such as Borg, YARN (slowly moving to Spark in the industry), Mesos and Kubernetes.

Thursday, May 2, 2019

Turing completeness using Mov

Apparently, turing completeness can be created using a single x86 instruction which is the MOV command:

https://drive.google.com/open?id=1cbnCSdBmkjEGxoScn2VtcR7SiC8hc45x

Sunday, March 24, 2019

Type Systems in Computer programs

https://en.wikipedia.org/wiki/Type_system has good survey of the landscape. What I was looking for was duck typing (as happens in Golang)

Friday, March 15, 2019

An O(ND) Difference Algorithm and Its Variations

https://neil.fraser.name/writing/diff/myers.pdf which is considered the best general purpose diff algorithm. See this: https://github.com/google/diff-match-patch