Wednesday, March 28, 2012

Affinity Lattice

lets say there are 3 bits 000, the graph im interested in looks like this

            111 ------\
          /       \       \              
        011    101    |                 
       /    \      /  \    |            
       |     001     |   |
       |       |       |    |
       |     000     |    |
       \    /     \    /    |
         010    100    |
           \      /        /
             110 ------/
note that this is the graph where we go from one node to another if 1 bit changes. What i have drawn above is essentially a cube flattened out. The trouble is that i cannot draw a planar/non-overlapping graph/lattice for 4 dimensional codes.I am curious to find out if it is even possible at all to draw a hypercube on paper without overlapping edges.

Further, we cannot write this in a row-by-row table because the geometry of the table does not allow more than 2 neighbors. 

What i am essentially looking for are geometric objects into which these codes can be embedded, and i want to call those geometric objects as "Affinity Objects" perhaps Affinity lattices. 

Let me get back on this and find out the literature about this

Monday, March 26, 2012

Intro to File Systems

Quick high-level intro to File Systems on Linux. File System design can be motivated by things such as trying to reduce the seek-time (thereby latency, thereby increase throughput) and reduce the wait-time for write operations, and to achieve ACID (or not).
  1. Berkeley Fast File System: Place "sibling" files in nearby blocks. Boot blocks + superblock + cylinder groups (free-list + inodes(permissions, first 10 data blocks + indirect blocks) + directory blocks + data blocks). The MS-DOS FAT is a simpler version of this. 
  2. Journaling File Systems
    1. Metadata Journaling file systems: Maintain a log/Journal of the meta-operations related to allocating inodes, dirs, blocks, but dont actually store the data being written to blocks, into the log/Journal. [2] e.g. EXT3FS 
    2. Log Structured File System[1] [2]: See this for a quick high level understanding. To avoid circular logs and fragmentation, "segments"/pools of free segments etc are used based on Garbage Collection techniques. [Wikipedia]
  3. Hierarchical File Systems: [Wikipedia]
  4. Distributed File Systems [Wikipidia]
    1. Chunked file systems e.g. Google File System
Have a look at this Wikipedia page :http://en.wikipedia.org/wiki/Comparison_of_file_systems

Wednesday, March 14, 2012

Internet Next-Gen

Some Proposals:

  1. Today there is quite a bit of data on the internet that is public.  This has in fact been the reason why Google could crawl so much of data. The issue is that when content is going to get locked up into social networks, internet quickly approches the society of humans i.e. privacy/information sharing/hiding etc for political agendas may follow. What is needed here is some governing mechanism to allow sites to interoperate and share data for searches. Some data like Wikipedia is in the best-interests of everyone to be shared. There needs to be thinking about how data is going to be shared across social networking silos.
  2. Named Data Networking: Routers cache index to data. Effort made by routers is to find the path to the router that has access to content, rather than to attempt to connect to some end point, and treats content as the entity of primary importance, than treating destination IP as the entity of primary importance. Data can re thus retrieved based on the URL(i.e. URL without the host)

Sunday, March 11, 2012

On Caches

This is quite a vast topic, because so much of what the work that is done in the Computer Science field is an extension of some-sort-of-cacheing by some entities. One can in fact look upon a major chunk of the computer science field itself as a search for "better" cacheing techniques. (the meaning of "better" changes with applications)

The simplest way of cacheing is of course, store data in a hash table with the key as the "address" and the value as the data. This kind of cacheing is called an associative array or hash based cacheing.

However, we can generalize this process a bit by trying to map the entire address space into a smaller address space. I will start off the discussion with the caches as one finds in Computer Organization books i.e. caching of data one finds in slower memories into faster memories. In such types of cacheing, there are 4 things of interest and we can arrange the 4 things (line-number, set, tag, and element-offset; to be discussed soon) in 4! or 24 ways, so there are at least 24 types of caching possible. I am saying at least 24 because one may use advanced data structures to implement caches which will scale up the total types of caches possible.

Organization:
See figures for an conceptual understanding. Caches are in reality not necessarily flat as shown here. The flat caches as ive drawn is perhaps the simplest way to explain it. I will briefly discuss 2 alternative ways in which cacheing of data from CPU addresses can be achieved.

Type 1: Flat Cache/Tags are part of the Cache
See the following picture. Lets say the main memory is 2^A * k bytes. We can envision the memory address to be broken up into x, y, z as follows. The "address-chunk" is referred to as an "element" in literature.

Type 2: Flat Cache/Note the difference in the tag bits.



Storage Benefits
:
Cacheing benefits are achieved by the "tag field" being non-zero (e.g. shown as y > 0 bits). Note that we can potentially store the tag in a separate faster-memory than the cache, and this is often done (see this), but i am for now showing it part of the cache, as the tag is an important part/storage of the caching process.

Search:
The search process for caches is as follows:
  1. Address --> find the cache line, or group of lines (if s > 0) associated with this address
  2. Search "associatively" within this set's tags and look for a match.
If number of sets is 1 i.e. s is 0 bits, we say that the cache is "direct mapped"; and when there is only 1 cache line i.e. we view upon all lines as part of 1 set, then we call it "fully associative" cache. One will also find in the literature the definition that a direct-mapped cache is a set-associative cache with 1 line per set [this]. The terminology only makes sense because of the way "Associative" searches are done for the elements part of a "set" i.e. this is an advancement over the simplistic data structure ive shown i.e. one may do some sort of fast-searches for the tag-section in a set by keeping the tags in a special memory.

Question: What difference do these things (type 1 and type2) make ?
The answer is that in type1, "close" addresses map to the same line (the higher order bits  in the address, decide the cache-line); and in type 2, "close" addresses map to different lines (as, relatively lower order bits in the address, decide the cache line)

As a result of this, type 2 is preferred, because cacheing wants to rely on principle of locality i.e. we want close addresses to map to different cache lines so that all those lines are in the cache when the processor needs them, effectively saving latency and increasing throughput.

Cache Eviction Logic:
In the above discussion the eviction logic was that, in case of "direct mapped cache" i.e. each line is a separate set, if the cache line one is interested in, is occupied with a different tag, we kick out that data and load new data. 
If we are talking about a set-associative cache, then, we would potentially associatively search for a the tag match in all the lines in the "set", and then we have to decide some mechanism to kick out an entry - this may be done as LRU, LFU, FIFO etc (google for this). 

Cache Coherency:

Distributed Caches & Partitioning: See this

Extensions:
Web Cacheing:
One can in principle extend this logic to cacheing in general, e.g. suppose we want to cache the content from www. google.com/blah1 and www.google.com/blah2. May it be nice to use higher-order bits (e.g. the www.google.com to map to a cache-block, or may be we better off using finer granularities (i.e. blah1, blah2) to store data. I would go with the latter for locality of reference. One can see that there is potential for research here too i.e. what is the performance benefits of each style depending on the browsing habits of the user. 

Network Buffering:
Often, in networking systems, one may consider the caches to be made up of a pool of buffers separated in zones -- and thus in the above classification scheme, networking caches may be envisioned as; "zone" meaning a set, and buffers in a zone meaning the cache-lines in a set. The eviction scheme is used to figure out which buffer to re-use in a particular zone

Tuesday, March 6, 2012

Components of the Web

The Web is made up of 3 general components.
  1. Origin Servers
  2. Clients
  3. Proxy: (See this, which im reproducing )  There are four kinds of widgets that people might call proxies. Note that the term "widget" below may be either software, hardware, or a combination.
    1. A router is a widget that sends packets back and forth between multiple network segments (i..e., between ethernet, ADSL, and LocalTalk networks all coming in to the same machine, but possibly on different ports or cards). The widget may choose to filter out some packets and not route them. Important characteristic: multiple networks attached.
    2. A gateway is widget that translates data from one protocol (like TCP/IP) to another protocol (like AppleTalk) as it flies past. The widget may also choose to not translate some packets, and filter them out.Important characteristic: protocol translation.
    3. A proxy is a widget that receives requests with its left hand (usually from one network), and then launches similar requests with its right hand (usually on a different network). As the results arrive back at the right hand, they are copied over to the still-waiting left hand and returned to the original client. (It acts as a server with its left hand, and as a client with its right hand.) The widget may filter out certain requests or certain responses and not copy them from hand to hand. Important characteristic: acts as both client and server. When a proxy operates at the packet level it is a transparent proxy, and the clients usually don't know that there's a proxy involved.
    4. A cache is a widget that receives requests and either responds directly from it's own store of information, or forwards the requests "upstream" to another server. When data comes back from the upstream server, this widget keeps a copy of the data for itself, and also returns the data to the original client. Important characteristic: data stored locally.

Cloud Computing Landscape

http://cloudstack.org/blog.html?start=5 led to me nice overviews about the landscape. Will explore further, add more detailed info, better surveys etc. soon. 

Monday, February 27, 2012

Level and Edge Triggered Programs

Level Triggered : The process "looks" at some data-structure/state and if it matches something, notifies the interested processes i.e. set some flag somewhere etc.e.g. time 0: no packets, time 1: packet, time 2: packet will cause only 2 level-triggers at time 1 and time 2.

Edge Triggered:  The process "looks"at some data-structure/state and sees if it has "changed" from the last time the process looked at it. If it has, notify other interested processes i.e. set some flag somewhere. e.g. time 0: no packets, time 1: packet, time 2: packet will cause only 1 edge-trigger at time 1.

Note here that "time" here means at the abstraction-level of "Software systems" i.e. the time a process "looks" at some data-structure. (i.e. the number of times a CPU instruction touched some abstraction (which is stored in memory (though the process may refer different memory locations at different instants))

I will describe how it works in the context of an edge-triggered vs a level-triggered epoll system call soon. The idea is that an edge-trigger is triggered once , which allows us to do reads etc in the context of a different thread without having to to fd_clr etc. Coming soon :-)


Ref: [1][2]

Wednesday, February 15, 2012

Thread Pools

As a quick recap, note that the linux kernel creates an object of the data structure called a "task". Both of what we call as processes and threads are really tasks that are "cloned" from this original task. fork() is one kind of cloning and kthread_create (pthread_create, (portable kernel thread create) API) is another kind of cloning. Multiple threads may share the virtual address space (and thus in turn Data sections, etc), but threads mostly have their own stacks.

A task is a unit of scheduling on by the linux scheduler.

Note that "process" is a task that has a "user level context". When "user level threads" are created, they do not call the clone system call and thus share the address space of the process. These kinds of user-level-threads are called lightweight processes.

One can, if one wants, instead of creating new a thread every time the client asks the server to perform some task, we can have a pool of threads -- what this means is that we can store the "context" (i.e. registers etc) in some place in RAM and toggle between these places with a pointer i.e. all the operating system needs to do to perform a thread switch, is to change the thread pointer (see this). This is essentially an optimization to save the overhead of creating and deleting threads.



Have a look at this link for a quick high-level intro of linux memory management.

Monday, February 13, 2012

On OAuth

I was pointed to this article, which made up read up more on OAuth. Will update more soon -- just making a note.

Thursday, February 9, 2012

Video & Network Bitrates


Video encoding layer:
VBR - constant PSNR
CBR - variable PSNR

Network Layer
CBR - max packets/sec is constant over time
VBR - max packets/sec varies over time

For video over network
We can stream VBR over VBR, CBR over VBR, CBR over CBR; and VBR over CBR

Monday, January 23, 2012

Loose Coupling and Tight Coupling

This is an intro to the idea. Basically, the learning was that it appears that loosely-coupled systems are easier to maintain and evolve. One should of course not read too much into the definition of "loosely-coupled" -- what one means here is that the "interface" between two components is well-defined and each party knows about it. However the other party knows nothing about what happens under the hood of the interface. Such systems are easier to maintain because the references of one object are not tied together with the references of another object, but rather they pass through well-defined interfaces.

One may also see that the same idea can be extended to hold true for more general relationships e.g. Loosely-coupled organizations may be better suited for evolution/adaptation  and growth than tightly coupled organizations i.e. organizations that do not play the big-brother may allow each employee to learn/grow as per their personal choice thereby increasing satisfaction levels and motivation (because of having the perception of being in control of one's life).

Further, it may also be that in some enterprises "identity" management/aka "authorization" is tightly coupled and "authentication" is loosely coupled  i.e. one can get in from many doors (the edge is only loosely coupled with the centralized system), but cannot access any content unless the centralized server approves [Ref: [1] [2]]

Friday, January 6, 2012

Controlling the heater via a humidity sensor

A project i have been wanting to design is to use hygrometer to turn on/off the heater i have in my room. The heater makes the room too dry. The plan is to keep the window open (to allow for humidity to enter the room) and switch off the heater when the humidity drops below a certain value. I am not very sure how to get this working, but googling around a bit led me to this:

http://marcusgun.awardspace.com/humidor.html 


Tuesday, January 3, 2012

Complexity Zoo

Link to a pretty popular place to learn about this.
http://qwiki.stanford.edu/index.php/Complexity_Zoo

Will post stuff as i learn more.

Thursday, December 29, 2011

Synchronization and Locks

Definition of Synchronized Processes: When processes change states in a predictable and repeatable pattern, we say that the processes are "syn chronized" or working together i.e. are related to each other lock-step e.g. In Synchronous Dataflow graphs widely used in DSP applications, some processes become active around the same time, work on some data, and then go to sleep, while waking up another different set of processes and this state-change of processes continues in a fixed predictable repeated pattern. See this . One may say that the processes are performing a "dance".

To maintain "synchronization" i.e. allow a set of processes to work together with something else(another group of processes, or some clock etc.) the following ideas are used
  1. Locks: When a set of processes is in active state, the other set is put in an inactive state i.e. even if the scheduler picks up the other process to be scheduled it wont be scheduled as it would not be in an active state i.e. it would be sleeping on a lock. 
  2. Synchronization with a clock: At fixed intervals processes give up execution and change state. This allows us to not use locks. 
One needs to note that where the thing against which synchronization is done is shared across 2 processes, the 2 processes are "isochronous" (as well as synchronous). However synchronous does not necessarily mean isochronous i.e. if isochronous, then synchronous. (isochronous is a subset of synchronous)

The function calls a process makes can be either synchronized calls or asynchronized calls (i.e. process makes a call (i.e. process is the caller), but does not go-to-sleep waiting for response from the callee, instead continues working further)

Thus, the point of the post was to make a note that while Synchronous data-flow may be readily found in DSP like applications (e.g. video encoding), general purpose real time computing may need to follow asynchronous models (processes not working in rhythms) e.g. processing data coming on a n/w interface, or an user moving the mouse. We can include these processes as part of the synchronized data flow graphs, but considering the deadline for such processes the right way to schedule such processes is by keeping them OUT of the synchronization i.e. allowing the scheduling mechanism to schedule such an event "asynchronously" out of the synced up processes.

Basically all i am really doing is making an argument that we need a "scheduler" that is capable of scheduling asynchronous events. This appears kind of obvious in retrospect though still noting it down. 

Wednesday, December 14, 2011

Tuesday, December 6, 2011

Update Problem & HTTP

HTTP is a stateless protocol i.e. the HTTP server is not supposed to keep track of the state of each flow. Now, in this situation, how can we deal with  the "update problem" ?

http://www.w3.org/1999/04/Editing

The above is one way of dealing with it. This may also be called as "Optimistic Concurrency Control" i.e. check for conflicts before commit - dont take locks too early etc.

As i see it, the idea is basically that :
  1. Using HEAD: Check if a document exists/was modified before issuing the PUT request, by using the HEAD request before actually issuing the PUT.
  2. Using Entity Tag : On the server, reject the PUT if the etag of the file from the client being PUT, does not match the etag on the server. 

Thursday, November 10, 2011

Google CLI

Googling over a CLI,  but does not seem to have things like grep etc?

http://goosh.org/


Friday, October 28, 2011

Wednesday, October 5, 2011

Bandwidth, Latency and Throughput

Here is what i understand about these (very basic) things. Imagine that the communication is happening between point A and point B of a wire, or one may even look upon this as points on the road (for maybe easier understanding)
  1. Bandwidth = The MAXIMUM number of bytes(or cars e.g.) that pass a certain mark (say point A) in some unit of time e.g. 1 sec, measured in bits/sec.  E.g. if a road carries at max 50 cars/min, then the bw of the road is 50 cars/min. Note that bandwidth can be varying across different sections of the network pipe. Note that having more "lanes" means having more bandwidth usually. If the road bw is 50 cars/min, but currently there are only 30 cars/min, then we say that the "throughput" of the road is 30 cars/min. 
  2. Delay or Latency = time taken for some bits (or cars) to be retrieved/travel  some "length" of the pipe (e.g. from A to B).  
  3. Bandwidth-Delay product = Imagine some packets/cars are trying to move from A to B. The delay between A and B is say 30 seconds. Now, lets say, we measure at point A, that the bandwidth of the pipe is 50 packets/sec (or say 50 cars/sec). Then, the number of packets that will exist on the wire/road before 1st car/packet reaches point B, is called the Bandwidth Delay product e.g. in our case its 1500 packets(or cars) i.e. 1500 packets will be pumped into the network before 1 packet reaches point B. (Assuming no change in network/road conditions meanwhile) . Note that in this situation, the section between A and B is fully occupied with packets/cars and if we try to additionally put packets, they would be dropped. This is why BDP happens to be the max cwnd (outstanding packets) possible for TCP. 
  4. Flow =  RFC 2722 defines traffic flow as "an artificial logical equivalent to a call or connection".
  5. Flow Rate = Rate at which data is delivered for a particular flow, in bits/sec. Sometimes called "flow bandwidth" or "effective bandwidth". 
  6. Throughput = rate of packet retrieval by the measurement apparatus at destination (or point B in our example) for the flow in question, measured in bps. In case of lossless pipes, throughput is same as flow rate. In case of lossy mediums, throughput can be less than flow rates (as packets may be dropped). Throughput depends on the bandwidth, as well as the latency. People have also been doing experiments as to how "power" affects "throughput" for a given data rate (e.g. in wireless networks). People also measure rate of change of throughput, measured in bps/s. 
  7. Goodput = Application level Throughput.
  8. Makespan = time difference between start of a job and end of it. e.g. when we download something from the internet, the time it takes for the download is the makespan of the schedule that allowed the download. High goodput means low makespan.  Measured in seconds. 
Caching: For a given bandwidth of the pipe/road, if the delay to the destination is smaller, then the first packet reaches faster to point B, and also faster back to point A. Thus, having smaller delays (which means smaller bdp) means that same amount of data is retrieved in smaller time, thus increasing the number of bytes in that time interval (with vs. without cache) i.e. basically less "idle" time in the schedule.

Similarly, If we increase bandwidth, but keep latency the same, we can potentially retrieve more data/sec, thus increasing throughput.


Ping for measuring throughput: I have seen on several pages such as this that using Ping for measuring bandwidth is a fallacy. I think that under conditions where bandwidth is not changing much , we can use the Ping for measuring throughput (i.e. i dont agree with the author of that article that Ping cannot be used for flow bandwidth estimation. Of course one cannot know the bandwidth from flow bandwidth, as flow bandwidth may be affected by routing (priority of the ping packet in the scheduler)).
Some other folks have used Ping with varying packet sizes to see how the packet size can affect ping throughput. Bandwidth Estimation is an active research area e.g. see this and this

Also, here is a list of other performance tools useful for measuring effective bw/ or throughput : http://www.caida.org/tools/taxonomy/perftaxonomy.xml, and 

Wednesday, August 10, 2011

SQL like Query Processing for NoSql db e.g. BigTable and DHT stuff

http://horicky.blogspot.com/2009/11/query-processing-for-nosql-db.html


Here is  a slide deck about Distributed Hash Tables, and here is Wikipedia and here is a link to Cassandra (High level overview of Facebook's distributed storage system) and here is course page about Peer-Peer systems that covers this in some detail.

Also found this for a quick recap of NoSQL architecture and how it can improve performance (and scale) despite some drawbacks of not having all relational database features ---I also found this and this to be useful for understanding some issues about both sides of the argument (i.e. NoSql vs Relational)

Of more interest to me is the role of DHTs in Content based routing. Will update soon.



Tuesday, August 2, 2011

Some OO stuff

Thought that it was nice to make a note of OO stuff that i found interesting. Will keep updating this list with more stuff i come across:

  1. http://javadude.com/articles/passbyvalue.htm: Java Objects: pass by reference, or pass by value?
  2. http://www.phpcompiler.org/articles/virtualinheritance.html : Complications introduced by virtual inheritence in C++
  3. http://www.objectmentor.com/resources/articles/abcpvf.pdf : About Abstract Classes, Factoring, Interfaces

Sunday, July 31, 2011

Linux Kernel Map

Again, most people probably already know about this stuff, listing mainly because i too thought it was nice : http://www.makelinux.net/kernel_map/

Also, http://www.redhat.com/magazine/001nov04/features/vm/ was a nice high level intro to Virtual Memory (and things such as "Buddy Allocator", "Slab Allocator"). Also have a look at this wikipedia entry (http://en.wikipedia.org/wiki/Region-based_memory_management) as a clarification to zone based memory allocators

Monday, May 16, 2011

Performance testing tools

A link to open source software performance testing tools 

Thursday, March 24, 2011

Falling off the math cliff

Just for some fun, a pretty popular picture ...

Tuesday, February 22, 2011

Reading Elf Files

Some stuff to follow up on : Reading elf files using Objdump and Readelf

Tuesday, January 4, 2011

A reading list for Computer Systems

This link has a list of papers that one should master to feel comfortable dealing with advances in computer systems. (imo)

and, this link is a set of lecture notes in Algorithms, and CS Theory, which imo, should not hurt mastering, lack of time not withstanding.

Thursday, November 18, 2010

Trust

As a quick recap:
  1. Authentication i.e. verifying that someone has the proper credentials to enter the system can be done via 
    1. Passwords
    2. Public key encryption. The gatekeeper will request from the person some agreed upon message that will be encrypted with the private key, and only the public key can decrypt it. This way, the gatekeeper knows that the person has passed the truth-test for the public-key/private-key mapping. Should the gatekeeper "trust" this person ? This is a separate issue and this is handled via "Certificates"
Imagine the following scenario. You meet someone new and they need proof that you have the credentials you claim you do. You can say orally that you have a M.S. Degree. Is that good enough ? How can they "trust" you ? To make them trust you, you will present them with some certificate signed by someone THEY trust. e.g. you can show the university certificate. They can then call up the university and find out if you have indeed passed. In PKI, the certificate presented will help the new party to verify that. However, do they "trust" the university ? This is another separate question. In case of Digital Certificates, the following logic applies.
  1. The CA has confirmed that the public-key that shows up in the certificate is "trustworthy" and to say this, CA will sign this public key with its own private key. What if CA has trusted some fake machine ? This then becomes the CA's fault. The question arises, how can a CA trust someone.
  2. Once the user gets a certificate signed by a CA it trusts, the end-user can accept that it is talking to someone trustworthy.  
So, as i see it, "Trust" itself seems to rely on out-of-band things like "reputation", "group-think" and other metrics. This is one frequently cited paper. Trust thus, is a "chain". If any one in the chain is a fake, the whole trust chain breaks. Thus, one needs constant checks and confirmations that someone is trustworthy.

In other words, the learning is that "trust" has to be earned, and further that trust has to be retained and maintained. Also one must note that "trust" is a human emotion, and we create all these systems because we as humans want to get our work done by trustworthy sources i.e. sources that will help solve our problems and satisfy our desires.

In this sense, trust is a engineering problem. 

Thursday, October 14, 2010

BWT

How do we make data more amenable to Run Length Encoding? Burrows-Wheeler Transform is one way.

Wednesday, October 13, 2010

Complex Logarithms

Following up on my post here, I came across another such fun stuff here. Although it is already typed on that page, im retyping it, with the chance of trying to elaborate a bit more. The argument goes as follows:
  1. We know that e^(i*theta) = cos(theta) + i sin(theta)
  2. now, if theta = pi radians, then, we know that e^(i Pi) = cos(Pi) + i sin (Pi)
  3. Now, cos(Pi rad) = cos (180 deg) = -1, and sin(Pi) = 0
  4. so, e^(i Pi) = -1
  5. now, square on both sides, so we get:
  6. e^(2 i Pi) = 1
  7. now take ln on both sides, so as to get
  8. 2 i Pi = 0
clearly this is absurd, so something strange must be going on when taking a log of a complex number (step 7). and it turns out that yes, indeed this is a tricky situation, and this leads us to the topic of "Complex Logarithms". Essentially what is going on here is the following (as Wikipedia explains): A logarithm of a complex number essentially has "infinite" answers. (kind of like "aliasing"). Will revisit this page soon.

Tuesday, October 5, 2010

Valgrind !

Valgrind - awesome stuff :-)

Need to follow up on this ...

Tuesday, August 10, 2010

Nagle & delayed ACKs

Interesting discussion how Nagleing (goal is to avoid sending "tinygrams" to the peer, implemented by filling the sender buffer (delaying the "send") till an ack of a previous send is received) - and delayed ACKs (on the receiver side, do not immediately send an ACK, but delay it, so that the ack can be piggybacked on something the receiver app wants to send) can cause TCP performance problems. See this and this.

Note of course, how the tinygram issue is different from the Silly Window issue (tinygram - TCP window (as advertised by the peer) is almost empty and silly-window (TCP window is almost full).

Of course, no big deal in retrospect ... but interesting i thought...


(Another interesting thing to note is that the so-called "congestion window" is a way where the sender does not fill up the whole receiver's advertised window at once, but fills  it up slowly (the so called cwnd growth function)) 

Monday, July 19, 2010

Generating Silence?

What is of interest to me is implementing a device that does the following:
music playing loudly -----> pressure waves ---> mic---> computer
what should happen is:

music playing loudly ---> pressure waves --->
device that emits "negative" waves --> superposition -->
mic -> computer

Clearly this is too ambitious(big a bite), so maybe need to make some simplifying assumptions. Let me try to see if i can get some basic stuff working at least ! Well of course, the end goal is to have the "ear" instead of a microphone, but that might be actually beyond me right now. Will update this in the future.

Wednesday, June 30, 2010

A list of interesting products

Some hardware and software products i came across and liked:
  1. Watching TV cheaply: http://www.hulu.com/plus and http://www.roku.com/roku-products + netflix.
  2. Personal Cell Towers and Cell Phone Signal Boosters
  3. Also came across Wolfram Alpha, which offers very interesting statitics e.g. See "Cisco Vs Akamai Vs Google Vs Apple" here. How cool is that ?

Monday, June 7, 2010

Video Streaming

Video streaming (where streaming means that someone is producing data , and someone else is consuming it) generally follows these two approaches based on the "subscription" type:
  1. Push model of viewing data - e.g. i "subscribe" for a newspaper, and the newspaper is "pushed" to my house. Similarly, if clients subscribe to some multicast addresses, servers will push data to multicast addresses using UDP (or RTP). Client-server is unicast, but server-client is SSM or ASM. These days apparently one can do multicast on RTMP.
  2. Pull model of viewing data - e.g. go to a newspaper stand to buy newspapers, i.e. no pre-decided "subscription". Clients will ask for chunks (i.e. ask the server to transmit the chunk) of data from server using HTTP, RTSP, other signaling mechanisms (e.g. RTCP) i.e. one is a streaming channel, and one control channel. These are unicast messages both ways. Note that the "chunk" here may be the entire size of the video file. If one wants to do "adaptive" rate control streaming, then one may have smaller chunks and ask more chunks as deemed necessary. 

Apple HLS may be described as a pull model. (here is a sample HLS stream )
IPTV on the other hand could be described as using the push model, where clients just ask for the data transmission to start (via IGMP join or via RTSP) , and the server then pushes data to the multicast.

I am not sure if RTMP is a push or a pull model. I am guessing it is a push model. 

Cisco Videoscape/Apple TV/Google TV/:
Cisco Videoscape, Apple TV,  Google Tv is the merging of TV and internet.
Today, "television" (unlike internet video) has been a 'real-time' system i.e. content is buffered for a small jitter-buffer (maybe a few seconds max.) & attempted to be repaired, but IF it happens so that there are gaps in the content, today, one sees the occasional screen-jitter i.e. today, TV content on screen is not frozen with a "buffering..." thing.

Thus, if one wants to provide a TV-like experience, one would have to create UDP streaming APIs (or loss-tolerant-TCP which is in other words: TCP Friendly UDP). Of course, this means that VQE like solutions can still be useful for these applications. Of course, this is just one way -- the other and probably the more attractive method is to use Adaptive Bitrate Streaming (ABR), and TCP, and a pull mechanism based on chunks of the video.

Youtube today uses HTTP Pseudo-Streaming using lightpd servers +  RTMP (See this and this )  + HTML5 + can even work with RTSP (Actually see this video)

Signaling Protocols:
RTSP:
Originally developed as a 'remote control' mechanism, and is a "stateful" protocol. Client has 4 states (init, ready, playing, recording) and server has 4 states too (init, ready, playing, recording). This was a good intro.
HTTP/RTMP/MMS are also used for signaling

Transport protocols:
HTTP (over TCP, rarely UDP), RTP/RTCP over UDP (rarely over TCP), RTMP (/optionally over HTTP) using TCP/UDP, MMS

Anyway this was a nice link comparing the different streaming protocols and their features.

Overview of Adobe Streaming tech

Actionscript vs. Javascript:
http://answers.yahoo.com/question/index?qid=20081212142558AAluL6n

SWF vs. FLV:
while we are on this topic, it is useful to know the difference between a SWF(player/metadata) and a FLV(data) (see this). This is a somewhat basic beginner kind of a question. Here is a link to see how Flash Video is distributed

RTMP:
Note that just by using SWF and FLV over HTTP video streaming can be achieved - what we call as "progressive download" of the FLV.
Adobe created another solution for streaming called RTMP. RTMP is used to talk from the player (running in the browser) with the server, and eventually send the FLV video with a difference that the client can jump to any location.

Youtube does use RTMP for storing video using RTMPT (see this). However, while streaming, Youtube used chunked HTTP transfers,

Apple Streaming tech:

HLS

Tuesday, June 1, 2010

Blog link

http://thedailyreviewer.com/top/rationality

Nice blog i came across. Note of course that one should not feel threatened by "rationality" etc, as i mentioned here, i.e. there is more to life than "rationality"/"problem-solving".

Wednesday, May 12, 2010

Computing

Just making a note of the things that one might want to consider while working on a career in Computing.

  1. http://www2.research.att.com/~dsj/nsflist.html#Intro: A quick high-level overview of the state of the art in the area of computer science
  2. http://en.wikipedia.org/wiki/Gödel_Prize
  3. http://en.wikipedia.org/wiki/Turing_Award
  4. http://en.wikipedia.org/wiki/List_of_computer_science_conferences
All of these as a measure of what has received recognition in the recent past (as a sign of which areas being investigated have received attention (and probably funding?)) and probably a good place to start working on (i.e. might receive support from other interested parties, unlike doing one's own's stuff in obscurity) ?

Monday, April 12, 2010

An imaginary number "j" is NOT sqrt(-1) !

I was shown the following proof a few days back and it left me puzzled:
1 = 1
also,
-1 = -1
Now lets divide both sides by 1. Thus:
-1/1 = -1/1
thus,
1/-1 = -1/1
Now lets take the square root on both sides (i.e. raise to 1/2. When i use the word "sqrt", what i mean is raising it to (1/2))
sqrt(1/-1) = sqrt(-1/1)
sqrt(1) /sqrt(-1) = sqrt(-1)/sqrt(1)
Lets call
x = sqrt(1) (which means, x = (1)^(1/2))
y = sqrt(-1), (which means, y = (-1)^(1/2))
then we have:
x/y = y/x
or,
x^2 = y^2, and thus
-1 = 1
Clearly this is absurd, so what went wrong ?

We can see that we are substituting x = 1^(1/2), which is supposed to mean that x^2 = 1, but x^2 = 1 means that x could as well have been - 1 i.e. - (1^(1/2)). Similarly, if we substitute y as sqrt (-1), then y^2 = -1, so y = +/- sqrt(-1). Essentially, if we call sqrt(1) as x, we must also understand that -x is also sqrt(1), and as a result the trouble with this solution is that after a stage it become "ambiguous".

On similar lines, if we define 'j', the imaginary number as +sqrt(-1), we face the following problems:

A) j^2 = j * j = sqrt(-1) * sqrt (-1) = ((-1)^(1/2))^2 = -1
B) j^2 = j * j = sqrt(-1) * sqrt (-1) = sqrt (-1 * -1) = sqrt(1) = +/- 1

i.e. ambiguous, and this is the reason we define j^2 = -1 (and j = +/- ((-1)^(1/2))). In fact Wikipedia is also careful about defining the imaginary number in this way (http://en.wikipedia.org/wiki/Imaginary_number) i.e. basically that we note that by this definition, j = +/- sqrt(-1) (and not j = + sqrt(-1)), so when someone uses e^(jtheta), it means e^((+/- ((-1)^(1/2)) * theta) -- kinda sucks -- but we have notation, to help us get used to this mess.

Thus we must be careful in defining the imaginary number as a "imaginary" square with area -1.
in a land that uses square coins, and if we owe -5 dollars to someone, we really owe 5 "imaginary" square coins. Someone can always argue that these coins are "imaginary" and impossible to "feel" by the senses, so there is no way one can give such a coin. However, we DEFINE a way to end this debt, by saying that giving a "real" coin, eliminates the imaginary coins' debt.

Other ways of looking at the imaginary number is as a "rotation" in the complex plane (using x+iy), but we sometimes loosely use j = sqrt(-1) and in my opinion it is not right i.e. has the potential to cause quite a bit of confusion.

Saturday, March 6, 2010

Now whats with this another blog

Aims at two goals
  1. The other blog is meant to be a "lessons learnt", i.e. as such i do not intend to write stuff there unless i have spent a significant time thinking about stuff, so the half-baked ideas i am hoping to put them up here, why all this fuss about making this stuff public, when i can as well make all of this private ? Well, a couple of reasons -- making it public increases my stake in this stuff (making me actually revisit these ideas) and the idea of putting this online is because i find it convenient to put up links to pages this way (e.g. Wikipedia, other pages, papers etc.).
  2. Also, another secret goal is to work on Mathematics & related issues in Computer Science, EE, and other areas that i face professionally, i.e. intend not to put high-level philosophy talk here.
This blog is just my personal blog (in a manner similar to the blogs of others)