Thursday, December 29, 2011

Synchronization and Locks

Definition of Synchronized Processes: When processes change states in a predictable and repeatable pattern, we say that the processes are "syn chronized" or working together i.e. are related to each other lock-step e.g. In Synchronous Dataflow graphs widely used in DSP applications, some processes become active around the same time, work on some data, and then go to sleep, while waking up another different set of processes and this state-change of processes continues in a fixed predictable repeated pattern. See this . One may say that the processes are performing a "dance".

To maintain "synchronization" i.e. allow a set of processes to work together with something else(another group of processes, or some clock etc.) the following ideas are used
  1. Locks: When a set of processes is in active state, the other set is put in an inactive state i.e. even if the scheduler picks up the other process to be scheduled it wont be scheduled as it would not be in an active state i.e. it would be sleeping on a lock. 
  2. Synchronization with a clock: At fixed intervals processes give up execution and change state. This allows us to not use locks. 
One needs to note that where the thing against which synchronization is done is shared across 2 processes, the 2 processes are "isochronous" (as well as synchronous). However synchronous does not necessarily mean isochronous i.e. if isochronous, then synchronous. (isochronous is a subset of synchronous)

The function calls a process makes can be either synchronized calls or asynchronized calls (i.e. process makes a call (i.e. process is the caller), but does not go-to-sleep waiting for response from the callee, instead continues working further)

Thus, the point of the post was to make a note that while Synchronous data-flow may be readily found in DSP like applications (e.g. video encoding), general purpose real time computing may need to follow asynchronous models (processes not working in rhythms) e.g. processing data coming on a n/w interface, or an user moving the mouse. We can include these processes as part of the synchronized data flow graphs, but considering the deadline for such processes the right way to schedule such processes is by keeping them OUT of the synchronization i.e. allowing the scheduling mechanism to schedule such an event "asynchronously" out of the synced up processes.

Basically all i am really doing is making an argument that we need a "scheduler" that is capable of scheduling asynchronous events. This appears kind of obvious in retrospect though still noting it down. 

Wednesday, December 14, 2011

Tuesday, December 6, 2011

Update Problem & HTTP

HTTP is a stateless protocol i.e. the HTTP server is not supposed to keep track of the state of each flow. Now, in this situation, how can we deal with  the "update problem" ?

http://www.w3.org/1999/04/Editing

The above is one way of dealing with it. This may also be called as "Optimistic Concurrency Control" i.e. check for conflicts before commit - dont take locks too early etc.

As i see it, the idea is basically that :
  1. Using HEAD: Check if a document exists/was modified before issuing the PUT request, by using the HEAD request before actually issuing the PUT.
  2. Using Entity Tag : On the server, reject the PUT if the etag of the file from the client being PUT, does not match the etag on the server. 

Thursday, November 10, 2011

Google CLI

Googling over a CLI,  but does not seem to have things like grep etc?

http://goosh.org/


Friday, October 28, 2011

Wednesday, October 5, 2011

Bandwidth, Latency and Throughput

Here is what i understand about these (very basic) things. Imagine that the communication is happening between point A and point B of a wire, or one may even look upon this as points on the road (for maybe easier understanding)
  1. Bandwidth = The MAXIMUM number of bytes(or cars e.g.) that pass a certain mark (say point A) in some unit of time e.g. 1 sec, measured in bits/sec.  E.g. if a road carries at max 50 cars/min, then the bw of the road is 50 cars/min. Note that bandwidth can be varying across different sections of the network pipe. Note that having more "lanes" means having more bandwidth usually. If the road bw is 50 cars/min, but currently there are only 30 cars/min, then we say that the "throughput" of the road is 30 cars/min. 
  2. Delay or Latency = time taken for some bits (or cars) to be retrieved/travel  some "length" of the pipe (e.g. from A to B).  
  3. Bandwidth-Delay product = Imagine some packets/cars are trying to move from A to B. The delay between A and B is say 30 seconds. Now, lets say, we measure at point A, that the bandwidth of the pipe is 50 packets/sec (or say 50 cars/sec). Then, the number of packets that will exist on the wire/road before 1st car/packet reaches point B, is called the Bandwidth Delay product e.g. in our case its 1500 packets(or cars) i.e. 1500 packets will be pumped into the network before 1 packet reaches point B. (Assuming no change in network/road conditions meanwhile) . Note that in this situation, the section between A and B is fully occupied with packets/cars and if we try to additionally put packets, they would be dropped. This is why BDP happens to be the max cwnd (outstanding packets) possible for TCP. 
  4. Flow =  RFC 2722 defines traffic flow as "an artificial logical equivalent to a call or connection".
  5. Flow Rate = Rate at which data is delivered for a particular flow, in bits/sec. Sometimes called "flow bandwidth" or "effective bandwidth". 
  6. Throughput = rate of packet retrieval by the measurement apparatus at destination (or point B in our example) for the flow in question, measured in bps. In case of lossless pipes, throughput is same as flow rate. In case of lossy mediums, throughput can be less than flow rates (as packets may be dropped). Throughput depends on the bandwidth, as well as the latency. People have also been doing experiments as to how "power" affects "throughput" for a given data rate (e.g. in wireless networks). People also measure rate of change of throughput, measured in bps/s. 
  7. Goodput = Application level Throughput.
  8. Makespan = time difference between start of a job and end of it. e.g. when we download something from the internet, the time it takes for the download is the makespan of the schedule that allowed the download. High goodput means low makespan.  Measured in seconds. 
Caching: For a given bandwidth of the pipe/road, if the delay to the destination is smaller, then the first packet reaches faster to point B, and also faster back to point A. Thus, having smaller delays (which means smaller bdp) means that same amount of data is retrieved in smaller time, thus increasing the number of bytes in that time interval (with vs. without cache) i.e. basically less "idle" time in the schedule.

Similarly, If we increase bandwidth, but keep latency the same, we can potentially retrieve more data/sec, thus increasing throughput.


Ping for measuring throughput: I have seen on several pages such as this that using Ping for measuring bandwidth is a fallacy. I think that under conditions where bandwidth is not changing much , we can use the Ping for measuring throughput (i.e. i dont agree with the author of that article that Ping cannot be used for flow bandwidth estimation. Of course one cannot know the bandwidth from flow bandwidth, as flow bandwidth may be affected by routing (priority of the ping packet in the scheduler)).
Some other folks have used Ping with varying packet sizes to see how the packet size can affect ping throughput. Bandwidth Estimation is an active research area e.g. see this and this

Also, here is a list of other performance tools useful for measuring effective bw/ or throughput : http://www.caida.org/tools/taxonomy/perftaxonomy.xml, and 

Wednesday, August 10, 2011

SQL like Query Processing for NoSql db e.g. BigTable and DHT stuff

http://horicky.blogspot.com/2009/11/query-processing-for-nosql-db.html


Here is  a slide deck about Distributed Hash Tables, and here is Wikipedia and here is a link to Cassandra (High level overview of Facebook's distributed storage system) and here is course page about Peer-Peer systems that covers this in some detail.

Also found this for a quick recap of NoSQL architecture and how it can improve performance (and scale) despite some drawbacks of not having all relational database features ---I also found this and this to be useful for understanding some issues about both sides of the argument (i.e. NoSql vs Relational)

Of more interest to me is the role of DHTs in Content based routing. Will update soon.



Tuesday, August 2, 2011

Some OO stuff

Thought that it was nice to make a note of OO stuff that i found interesting. Will keep updating this list with more stuff i come across:

  1. http://javadude.com/articles/passbyvalue.htm: Java Objects: pass by reference, or pass by value?
  2. http://www.phpcompiler.org/articles/virtualinheritance.html : Complications introduced by virtual inheritence in C++
  3. http://www.objectmentor.com/resources/articles/abcpvf.pdf : About Abstract Classes, Factoring, Interfaces

Sunday, July 31, 2011

Linux Kernel Map

Again, most people probably already know about this stuff, listing mainly because i too thought it was nice : http://www.makelinux.net/kernel_map/

Also, http://www.redhat.com/magazine/001nov04/features/vm/ was a nice high level intro to Virtual Memory (and things such as "Buddy Allocator", "Slab Allocator"). Also have a look at this wikipedia entry (http://en.wikipedia.org/wiki/Region-based_memory_management) as a clarification to zone based memory allocators

Monday, May 16, 2011

Performance testing tools

A link to open source software performance testing tools 

Thursday, March 24, 2011

Falling off the math cliff

Just for some fun, a pretty popular picture ...

Tuesday, February 22, 2011

Reading Elf Files

Some stuff to follow up on : Reading elf files using Objdump and Readelf

Tuesday, January 4, 2011

A reading list for Computer Systems

This link has a list of papers that one should master to feel comfortable dealing with advances in computer systems. (imo)

and, this link is a set of lecture notes in Algorithms, and CS Theory, which imo, should not hurt mastering, lack of time not withstanding.