Some Notes: 2012

Thursday, December 6, 2012

Monday, November 12, 2012

According to RFC 2616

The POST method is used to request that the origin server accept the entity enclosed in the request as a new subordinate of the resource identified by the Request-URI in the Request-Line. […] The actual function performed by the POST method is determined by the server and is usually dependent on the Request-URI. The posted entity is subordinate to that URI in the same way that a file is subordinate to a directory containing it, a news article is subordinate to a newsgroup to which it is posted, or a record is subordinate to a database.

PUT is defined as:

The PUT method requests that the enclosed entity be stored under the supplied Request-URI. If the Request-URI refers to an already existing resource, the enclosed entity SHOULD be considered as a modified version of the one residing on the origin server. If the Request-URI does not point to an existing resource, and that URI is capable of being defined as a new resource by the requesting user agent, the origin server can create the resource with that URI.

See this for a discussion. What eventually i am going with is this understanding:

This is specified in RFC 2616 §9.5ff.POST creates a child resource, so POST to /items creates a resources that lives under the/items resource. Eg. /items/1PUT is for creating or updating something with a known URL.
Therefore PUT is only a candidate for CREATE where you already know the url before the resource is created. Eg. /blogs/nigel/entry/when_to_use_post_vs_put as the title is used as the resource key

Tuesday, October 30, 2012

API vs SDK

http://stackoverflow.com/questions/834763/api-vs-sdk
API could be said to be the "declaration" and the SDK the "declaration + implementation(definition)" of an interface(method/technique to drive/use something)

Note that in OOP, the interface is usually a virtual class (C++), or an interface (Java). People can do 2 things with such a class/interface.

If what the programmer is provided is just an API with no accompanying implementations - he/she can implement the interface aka create a derived class and use the base class' pointer to call the functions defined.
If what the programmer is provided is an SDK, with the interface specified in a virtual class (or an "interface") then the libraries provided will implement the SDK... all the programmer needs to do is to instantiate the object of this interface (as specified in the documentation), and then use it.

Wednesday, October 10, 2012

Gates with dominos

Came across this interesting post : http://mathlesstraveled.com/2012/10/10/making-a-computer-out-of-dominoes/
and this link explaining the AND gate using dominos.

Tuesday, October 9, 2012

Functors/ Function Objects

http://www.experts-exchange.com/articles/849/Function-pointers-vs-Functors.html is the way to handle function pointers in C++ so that the passing function pointer is override-friendly (no casting needed), and the function can maintain state of the object it is in, what one needs is a functor.

http://stackoverflow.com/questions/356950/c-functors-and-their-uses .

Functor is a function which retains state.
http://stackoverflow.com/questions/317450/why-override-operator?lq=1

Saturday, September 8, 2012

Group Keying & Rekeying

http://eprint.iacr.org/2011/575.pdf Follow up

Some InPainting algorithms

http://kedarpatwardhan.org/Research/pdfs/Thesis-kedar.pdf

Quick List of HOT software technologies

Listing some software technologies that are *hot* these days. I have touched upon them in the past in some senses. Just bringing it all together.

Web frameworks for Web services: Goal is to pushing pages and delighting users[1]

Node.js: Event driven server side javascript based "server", to deal with sysadmin capabilities and low level routing [1]
Ruby on Rails: A web framework based on the Ruby Language
Ajax: Method to render only small sections of the page as necessary from the user/client perspective by talking to the server side framework. (Some Ajax frameworks are even embedded as a part of larger frameworks. For example, the jQuery JavaScript Library is included in Ruby on Rails.[Wikipedia])

Clouds & Virtualization Infrastructure:

Amazon's EC2 cloud: A web service interface that allows one to obtain and configure capacity in the cloud managed by Amazon, where new server instances can be boot up in minutes allowing to quickly scale capacity, both up and down, as your computing requirements change. Amazon EC2 changes the economics of computing by allowing you to pay only for capacity that you actually use. Amazon EC2 provides developers the tools to build failure resilient applications and isolate themselves from common failure scenarios. Here is the marketplace for the applications written based on the EC2 cloud. Here is a link to AWS developer community resources. This internally leverages virtualization technologies and Big-data technologies.
See this Virtualization Journal for advances

Big Data: Technologies to solve problems in distributed systems and to deal with large data.

Apache Hadoop Project: Java based technology that helps to leverage multiple commodity hardware to deal with big data (however typically for performance, this "commodity hardware" is really high end machines). Hadoop is an java-based open source implementation of the Google MapReduce model (which in turn bears similarities with the MPI Map-Reduce model). Google is thought to implement the map-reduce in C++.

OpenFlow and SDN:

Openflow is a layer-2 protocol to talk to a switch or router so as to get access to the switch or router's forwarding plane and tell the switch to take certain decisions. Openflow is considered to be an enabler of SDN (of course one can always do SDN by SNMP or by CLI. See a past blog)

Android and Apple App development:

Sunday, September 2, 2012

Copy Constructor

http://www.velocityreviews.com/forums/t283016-question-on-copy-constructor.html as to why the argument of the copy constructor needs to be pass-by-reference

(note that there are 2 ways of passing arguments in C++, one is by value (copy constructor is involved), and one is by reference)

See Wikipedia for examples

Saturday, September 1, 2012

Depth from Monocular images

http://ai.stanford.edu/~asaxena/learningdepth/NIPS_LearningDepth.pdf

and as one finds in the paper, a pointer to http://en.wikipedia.org/wiki/Markov_random_field

Monday, August 27, 2012

Video Scrambling method

http://www.cl.cam.ac.uk/~mgk25/nagra.pdf had a nice view into how scrambling could be done (and in turn could be broken). Was good for my own info, so putting it up.

Monday, August 20, 2012

Facebook Database design

http://stackoverflow.com/questions/1009025/facebook-database-design, but of course it may be NoSQL based. (follow up)

Under Gmail's hood

http://www.infoworld.com/d/applications/under-gmail%E2%80%99s-hood-326

Sunday, August 19, 2012

Exception Handling implementation

http://en.wikipedia.org/wiki/Exception_handling#Exception_handling_implementation Intro to how languages implement exception handling.

Thursday, August 16, 2012

Pathfinding algorithms

http://theory.stanford.edu/~amitp/GameProgramming/AStarComparison.html

Friday, August 10, 2012

Linux Timers

http://www.ibm.com/developerworks/linux/library/l-timers-list/ as to how timer-related-callbacks are implemented in Linux. This is something i have used in the past, though came across this page, so making a note for reference.

SSH vs SSL vs IPSec

http://www.k2esec.com/secure-communications/network-security-protocols-ipsec-vs-tlsssl-vs-ssh-part-ii

Tuesday, August 7, 2012

SDN without openflow

http://www.networkworld.com/news/2011/111711-sdn-openflow-sidebar-253230.html

Saturday, August 4, 2012

Spatial hashing

http://conkerjo.wordpress.com/2009/06/13/spatial-hashing-implementation-for-fast-2d-collisions/

Friday, August 3, 2012

Design patterns

I have wondered for a while, why people keep talking about design patterns, and looks like i should have studied this long time back - reason - because if nothing else they are an effective benchmark to differentiate one language from another.

Have a look at this article by Paul Graham about how 2 different 1950 paradigms

Lisp
Fortran

Have been converging over the years.

Also, of course, i do not know all the patterns listed here, but learning them http://en.wikipedia.org/wiki/Software_design_pattern#Classification_and_list. I suspect i have used some of these patterns without knowing their names, though good to know them formally.

Broadly speaking there are 3 types of design patterns [ ref: this ]

Structural: patterns generally deal with relationships between entities, making it easier for these entities to work together. e.g. a Decorator which will add some tags to the URL based on context.
Creational: patterns provide instantiation mechanisms, making it easier to create objects in a way that suits the situation.e.g. Factory, similar to an adaptor/wrapper for "creating" different types of objects. Another example is the Singleton design pattern: here is a quick link for the difference between a singleton pattern and a static class (note that a static class is a concept that is in C#, not in C++). Singleton's lead to objects that can be passed as arguments etc
Behavioral : patterns are used in communications between entities and make it easier and more flexible for these entities to communicate e.g. Strategy pattern, adapter/wrapper

Wednesday, August 1, 2012

Suffix Tree and String matching

Just found out that there is a popular data structure to deal with substring kind of problems, called the "Suffix Tree". (constrast with "Prefix tree", which is used in longest prefix kind of problems)

Suffix tree can be constructed with the simple O($$n^2$$) algo (insert the "n" suffixes of a string with n chars into the tree creating a child wherever necessary) or with more advanced algorithms that can do it in O(n) (http://en.wikipedia.org/wiki/Ukkonen%27s_algorithm). The point is, once we have a suffix tree, we can execute several of these algorithms in linear time.

Longest repeating substring: the node which is farthest away from the tree but which has degree > 1.
Longest substring which is also a palindrome: to do this, we take te original string lets say abcac and concatenate it with itself reversed i.e. abcaccacba and run "longest repeating substring"
Longest common substring: where we concatenate 2 strings first e.g. abcd and bcd so as to get abcdbcd and now search for the node deepest down which has degree > 1.
Shortest substring that occurs only once: to answer this, we find the node with the shortest length that takes us to a leaf node.

The intuition about the degree > 1 is that the fact that there is a branch at this node, means that there are at least 2 different suffixes of the string that have this node's value(i.e. the string formed by the characters as we traverse from root to this node) as its substring. This means that this node's prefix is part of both (or more) different suffixes, and thus is a "common substring".

Ref: http://www.csc.liv.ac.uk/~leszek/COMP526/week5/week5.pdf

Anti-Aliasing

Found this nice note on anti-aliasing. Adding for reference.

Monday, July 30, 2012

Partial Sorting

http://en.wikipedia.org/wiki/Partial_sorting

and its distributed version: http://www.eng.tau.ac.il/distalgs/distalgs%20papers/selection-ppt.pdf

Web Frameworks

http://en.wikipedia.org/wiki/Comparison_of_web_application_frameworks

Further, have a look at these graphs for the usage of a variety of lightweight web frameworks out there.

Java based web frameworks:
I have some experience with a 'heavyweight J2EE' application and was thus interested in knowing the lightweight web frameworks out there. (http://docs.jboss.com/books/lightweight/ch01.html). Also found this http://www.it.uom.gr/teaching/applicationsIT_1/lectures/ which was useful as a intro to Java based web frameworks. Also look at http://www.it.uom.gr/teaching/applicationsIT_2/

Saturday, July 28, 2012

Useful Linux debugging tools

https://oldpapyrus.wordpress.com/tag/linux-debugging-strace-ltrace-vmstat/ which points to

http://www.dedoimedo.com/computers/linux-system-debugging-super.html and
http://www.cyberciti.biz/tips/linux-shared-library-management.html

Query Rewriting using Views

Have a look at this thesis to get an idea as to how a SQL query can be rewritten given that some views (named queries) of the database are already available.

Friday, July 27, 2012

Simpson's Paradox

Have a look at this paradox called Simpson's paradox. Again, one of those things, which are pretty obvious/straightforward/easily-deduced based on evidence, but seems counter-intuitive. The vector diagrams suggest why it works and why we dont catch the problem. (See how the second blue vector is "long" even though it has less rate than the second red vector)

Monday, July 16, 2012

Big Data and Relational Databases

http://dba.stackexchange.com/questions/13931/why-cant-relational-databases-meet-the-scales-of-big-data

Note the discussion about the way the data is structured (whether it facilitates partitioning e.g. )

Tuesday, June 26, 2012

Router Virtualization

Was trying to find out how one can virtualize the cisco router in ESX.
Have a look at this for some motivation behind this.
The following methods are possible today:

Create a virtual Linux box and run Quagga or XORP on it.
http://www.thevarguy.com/2012/07/23/vmware-nicira-vs-cisco-insieme-virtualized-network-war/
Can Openflow be of relevance [see this], and this

Investigating this further. Let me try to find out what is a good solution.

Tuesday, May 22, 2012

Clustering complexity

Here is a paper, which says "Clustering is difficult, only where it does not matter".

Thursday, May 17, 2012

Speech Recognition and Hidden Markov Models

Putting up this link/thesis as a note.

Friday, May 11, 2012

Some Basic PHP Internals

http://stackoverflow.com/questions/971279/how-the-sausage-is-made-tour-of-apache-php-mysql-interaction

The above link led me to this, and from there info about the Zend Engine and PHP request life cycle.

Tuesday, April 24, 2012

Replication in distributed systems

http://en.wikipedia.org/wiki/Replication_(computing)

Note the advantages and disadvantages of synchronous and asynchronous replication methods

Monday, April 23, 2012

Payoffs for Mixed Strategies

https://www2.bc.edu/~sonmezt/E308SL7.pdf

Further, one could apply the ideas presented in the above pdf, to a one-shot simultaneous game. (i.e. a simultaneous game played only once)
Assume a 2-person game where the goal of the game is to win.

Lets assume a one-shot game. In this situation, what if the Payoff Matrix has no obvious Nash Equilibria i.e. at least 1 player finds that its best to change the decision. What is the best strategy in this situation. One way to think is as follows

Assume that the game is a repeated game. Now if one does not want to be exploited, the strategy is to mix up one's decisions. How should this mixing be done ? In order to minimize the payoffs of the other person, it is best to seek the minimum of the maximum payoffs the other player -- why so ? (have a look at the article), and in doing so one finds the mixed strategy.

Once one knows the mixed strategy, choose the decision with the highest yeild in the long-run as the decision for the one-shot game.

Thus, one can think through the decisions for games that have no Nash Equilibria by thinking through a repeated-game and then using that as a decision in a one-shot game. Of course, if the game does have Nash Equilibria then those are the rational choices.

Also, of course that real life interactions are very rarely one-shot i.e. most are repeat-games, and that is why many other factors start coming into the picture. (including ethics and what not). As an example have a look at this which suggests how and why social-diversity may in fact lead to cooperation and altruism.

Stack Machines vs. Register Machines & more

http://www.codeproject.com/Articles/461052/Stack-based-vs-Register-based-Virtual-Machine-Arch

Java VMs are stack-machines, here is one link
and the Android Davlik VM is based on a register-based-architecture.

"The stack machine style is in contrast to register file machines which hold temporary values in a small fast visible array of similar registers, or accumulator machines which have only one visible general-purpose temp register, or memory-to-memory machines which have no visible temp registers."

Here is a comparison (USENIX) of stack vs register machines. To quote from the paper:

"For example, the local variable assignment a = b + c might be translated to stack JVM code as ILOAD c, ILOAD b, IADD, ISTORE a. In a virtual register machine, the same code would be a single instruction IADD a, b, c. Thus, virtual register machines have the potential to signiﬁcantly reduce the number of instruction dispatches."

and also

"In the stack-based JVM, a local variable is accessed using an index, and the operand stack is accessed via the stack pointer. In the register-based JVM both the local variables and operand stack can be considered as virtual registers for the method. There is a simple mapping from stack locations to register numbers, because the height and contents of the JVM operand stack are known at any point in a program"

Here is a comparison (Wikipedia) of different Application Virtual Machines

This is of course not a big deal, i.e. stack machines are just a way of compact code, but under the hood it of course uses registers to execute the code.

Wednesday, April 18, 2012

ARPs and VLANs

Quick note: https://learningnetwork.cisco.com/message/88223
Also, note how inter-VLAN routing is set up : http://www.cisco.com/en/US/docs/switches/lan/catalyst5000/hybrid/routing.html#wp29604

VLAN Hopping:
Making this as a security note.

Also, a quick recap of LAN vs. MAN vs. WAN protocols. Note specifically the different protocols that are used by each kind of network.

Monday, April 16, 2012

Image Processing in Matlab

Found a couple of links on the Mathworks page related to a few (perhaps basic) aspects of image processing.

http://www.mathworks.com/help/toolbox/images/f20-9579.html Link to ideas (with examples) such as image registration, filtering, ROI enhancement, morphological enhancements. etc.
http://www.mathworks.com/help/toolbox/images/f8-20792.html Link to some examples about Color space. I will soon upload a pictorial form of how colors are represented by different types of color spaces. One common type of color space is RGB color space, which one can image as a sphere with 3 "corners" jutting out. The color space of a device is a subset of this color space. There are different ways of looking at it e.g. RGB, Hue/Sat/Value, YUV (e.g. using the Jab notation e.g. 4:2:0, :4:1:1 etc), etc.

Will update as i find out more info/links.

Sunday, April 15, 2012

Linux Networking Stack Performance

Here is a link about the TCP/IP stack on Linux 2.4 and 2.6. The key insights are as follows

Kernel Locking improvements: For SMP, Linux 2.6 still does use the "Big Kernel Lock" to prevent different CPUs from accessing the kernel at the same time , but it has finer granularity.
Efficient copy routines in 2.6
2.6 Kernel uses the O(1) scheduler, and post 2.6.23 uses a O(log n) scheduler . See this and this and this (Info about how sessions are made up of processes and how tty is associated with the session, processes/tasks are made up of threads that share the thread group id, the O(1) algo using 2 runqueues per CPU, task priority, nice values etc.)

Thursday, April 12, 2012

St. Petersburg Paradox

http://en.wikipedia.org/wiki/St._Petersburg_paradox is a 'paradox' where a player pays some fixed amount (say $20) to enter the game, and it appears that the expected gain from the game is infinite dollars. Therefore, the debate is that the player should pay infinite dollars for the entrance (not $20). The counter argument is that the player argues that i've paid $20, but the heads could show up at the first coin flip itself and so i will be left with a loss. Further, even though i may get a lot of money later the probability of that happening is extremely low -- this diminishing probabilities of riches seems to put off most people - yet - mathematically, the "expected" returns are infinite. This is thus argued to be a case for the rejection of mathematical expectation i.e. our psychology goes against the mathematical (long term and statistical certainty) because in real-life we cannot play infinitely long and we do not have infinite (or large) amounts to bet.

Btw, as a tangential (not at all related to betting and expectations but a side-point about the usage of the word 'average') and a somewhat funny point (though unrelated to the expectation issue), is that one can have a population where almost EVERYONE is above average. e.g. if there are just a few people with 1 leg, 1.99 could be the average number of legs in the population which is <2 for almost everyone. Of course this is kind of obvious that we aren't talking about the median, but ive often seen these two things being confused.

Book on Algorithms in Computational Geometry

Book on Algorithms in Computational Geometry e.g. convex hulls, line intersections, triangulations, point location/trapezoidal maps, Voronoi Diagrams etc.

To be honest, this was amongst the more beautiful things i have seen of late

One example of the utility of this is GPSes, which as i understand will use methods to find out "intersections" of layers of roads etc, followed by some kind of routing algorithms (Dijkstra?).

If a region/node has a weight, instead of an edge having a weight and if we want to minimize the weight, one way to do that is to from one node, construct edges to "neighboring" nodes with a weight of the destination node and then use Dijkstra?(confirm)

Sunday, April 8, 2012

Some machine Learning info

I am going to list things i came across while my feeble attempts at learning machine-learning.

Classification:
E.g. To do document classification we perform at least some of the following tasks.

Get some training data, clean and stem the text.
Represent your documents as vectors
Perform feature selection
Select your classifier and train it,
Run the classification on some test data to see how well it performs

References : http://pauldix.blogs.com/pauldix/goruco/goruco_talk.pdf (have a look at this for a quick overview of classification methods and things like Naive Bayes, SVM, etc)

Thursday, April 5, 2012

Some DHCP/IPTables/Squid Fun

This is one example of using IP tables (Wikipedia, Tutorial) on the router to redirect packets to a gateway/proxy (such as Squid), when connected to the wireless on the airport/home etc. The interesting thing (for me) about the post, was the use of DHCP ranges so as to give a different experience of the internet to a certain section of people. Useful to note for future i thought.

A note on IpTables from here:

iptables is built on top of netfilter, the packet alteration framework for Linux 2.4.x and 2.6.x. It is a major rewrite of its predecessor ipchains, and is used to control packet filtering, Network Address Translation (masquerading, portforwarding, transparent proxying), and special effects such as packet mangling.

Advances in Video Coding

A list of recent advances in video coding

Distributed Coding : Wyner-Ziv coding where source statistics are exploited at the decoder allowing relatively lighter encoding phases. H.264 is used as an intraframe encoder and Wyner-Ziv frames are inserted in between intra-frames, where after integer transform of the frame, followed by LDPC coding of the transformed-data after which only parity bits are sent
Texture based methods
Scalable coding: Temporal, Spatial, SNR: e.g. code "enhancement" layers as B frames with a hierarchical structure
Multi-view coding
ROI based coding: Region of interest. See this
Conditional Replenishment : See this

Ref[1]

Further, there are advances known for video streaming such as

Adaptive bitrate coding: Encode the video with multiple bitrates and chunk it.
Byte streams (instead of bitstreams)
Reduce Pre-roll time
Multi-source video streaming over WLANS. ref[2]

High Performance Web Sites

Some links about this:

http://stevesouders.com/hpws/rules.php

Wednesday, April 4, 2012

Book on Cognitive Science

http://www.goertzel.org/books/complex/contents.html

Liked what ive read so far in the book about Cognitive Science, AI, Dynamical Systems overview etc.

Thursday, March 29, 2012

List of important papers in Computer Science

http://en.wikipedia.org/wiki/List_of_important_publications_in_computer_science

Wednesday, March 28, 2012

Affinity Lattice

lets say there are 3 bits 000, the graph im interested in looks like this

111 ------\

/ \ \

011 101 |

/ \ / \ |

| 001 | |

| | | |

| 000 | |

\ / \ / |

010 100 |

\ / /

110 ------/

note that this is the graph where we go from one node to another if 1 bit changes. What i have drawn above is essentially a cube flattened out. The trouble is that i cannot draw a planar/non-overlapping graph/lattice for 4 dimensional codes.I am curious to find out if it is even possible at all to draw a hypercube on paper without overlapping edges.

Further, we cannot write this in a row-by-row table because the geometry of the table does not allow more than 2 neighbors.

What i am essentially looking for are geometric objects into which these codes can be embedded, and i want to call those geometric objects as "Affinity Objects" perhaps Affinity lattices.

Let me get back on this and find out the literature about this

Monday, March 26, 2012

Intro to File Systems

Quick high-level intro to File Systems on Linux. File System design can be motivated by things such as trying to reduce the seek-time (thereby latency, thereby increase throughput) and reduce the wait-time for write operations, and to achieve ACID (or not).

Berkeley Fast File System: Place "sibling" files in nearby blocks. Boot blocks + superblock + cylinder groups (free-list + inodes(permissions, first 10 data blocks + indirect blocks) + directory blocks + data blocks). The MS-DOS FAT is a simpler version of this.
Journaling File Systems:

Metadata Journaling file systems: Maintain a log/Journal of the meta-operations related to allocating inodes, dirs, blocks, but dont actually store the data being written to blocks, into the log/Journal. [2] e.g. EXT3FS
Log Structured File System[1] [2]: See this for a quick high level understanding. To avoid circular logs and fragmentation, "segments"/pools of free segments etc are used based on Garbage Collection techniques. [Wikipedia]

Hierarchical File Systems: [Wikipedia]
Distributed File Systems [Wikipidia]

Chunked file systems e.g. Google File System

Have a look at this Wikipedia page :http://en.wikipedia.org/wiki/Comparison_of_file_systems

Wednesday, March 14, 2012

Internet Next-Gen

Some Proposals:

Today there is quite a bit of data on the internet that is public. This has in fact been the reason why Google could crawl so much of data. The issue is that when content is going to get locked up into social networks, internet quickly approches the society of humans i.e. privacy/information sharing/hiding etc for political agendas may follow. What is needed here is some governing mechanism to allow sites to interoperate and share data for searches. Some data like Wikipedia is in the best-interests of everyone to be shared. There needs to be thinking about how data is going to be shared across social networking silos.
Named Data Networking: Routers cache index to data. Effort made by routers is to find the path to the router that has access to content, rather than to attempt to connect to some end point, and treats content as the entity of primary importance, than treating destination IP as the entity of primary importance. Data can re thus retrieved based on the URL(i.e. URL without the host)

Monday, March 12, 2012

Notes about HTTPS & caches

http://encosia.com/cripple-the-google-cdns-caching-with-a-single-character/

Sunday, March 11, 2012

On Caches

This is quite a vast topic, because so much of what the work that is done in the Computer Science field is an extension of some-sort-of-cacheing by some entities. One can in fact look upon a major chunk of the computer science field itself as a search for "better" cacheing techniques. (the meaning of "better" changes with applications)

The simplest way of cacheing is of course, store data in a hash table with the key as the "address" and the value as the data. This kind of cacheing is called an associative array or hash based cacheing.

However, we can generalize this process a bit by trying to map the entire address space into a smaller address space. I will start off the discussion with the caches as one finds in Computer Organization books i.e. caching of data one finds in slower memories into faster memories. In such types of cacheing, there are 4 things of interest and we can arrange the 4 things (line-number, set, tag, and element-offset; to be discussed soon) in 4! or 24 ways, so there are at least 24 types of caching possible. I am saying at least 24 because one may use advanced data structures to implement caches which will scale up the total types of caches possible.

Organization:
See figures for an conceptual understanding. Caches are in reality not necessarily flat as shown here. The flat caches as ive drawn is perhaps the simplest way to explain it. I will briefly discuss 2 alternative ways in which cacheing of data from CPU addresses can be achieved.

Type 1: Flat Cache/Tags are part of the Cache
See the following picture. Lets say the main memory is 2^A * k bytes. We can envision the memory address to be broken up into x, y, z as follows. The "address-chunk" is referred to as an "element" in literature.

Type 2: Flat Cache/Note the difference in the tag bits.

Storage Benefits:
Cacheing benefits are achieved by the "tag field" being non-zero (e.g. shown as y > 0 bits). Note that we can potentially store the tag in a separate faster-memory than the cache, and this is often done (see this), but i am for now showing it part of the cache, as the tag is an important part/storage of the caching process.

Search:
The search process for caches is as follows:

Address --> find the cache line, or group of lines (if s > 0) associated with this address
Search "associatively" within this set's tags and look for a match.

If number of sets is 1 i.e. s is 0 bits, we say that the cache is "direct mapped"; and when there is only 1 cache line i.e. we view upon all lines as part of 1 set, then we call it "fully associative" cache. One will also find in the literature the definition that a direct-mapped cache is a set-associative cache with 1 line per set [this]. The terminology only makes sense because of the way "Associative" searches are done for the elements part of a "set" i.e. this is an advancement over the simplistic data structure ive shown i.e. one may do some sort of fast-searches for the tag-section in a set by keeping the tags in a special memory.

Question: What difference do these things (type 1 and type2) make ?

The answer is that in type1, "close" addresses map to the same line (the higher order bits in the address, decide the cache-line); and in type 2, "close" addresses map to different lines (as, relatively lower order bits in the address, decide the cache line)

As a result of this, type 2 is preferred, because cacheing wants to rely on principle of locality i.e. we want close addresses to map to different cache lines so that all those lines are in the cache when the processor needs them, effectively saving latency and increasing throughput.

Cache Eviction Logic:

In the above discussion the eviction logic was that, in case of "direct mapped cache" i.e. each line is a separate set, if the cache line one is interested in, is occupied with a different tag, we kick out that data and load new data.

If we are talking about a set-associative cache, then, we would potentially associatively search for a the tag match in all the lines in the "set", and then we have to decide some mechanism to kick out an entry - this may be done as LRU, LFU, FIFO etc (google for this).

Cache Coherency:

http://en.wikipedia.org/wiki/Cache_coherence

Distributed Caches & Partitioning: See this

Extensions:

Web Cacheing:

One can in principle extend this logic to cacheing in general, e.g. suppose we want to cache the content from www. google.com/blah1 and www.google.com/blah2. May it be nice to use higher-order bits (e.g. the www.google.com to map to a cache-block, or may be we better off using finer granularities (i.e. blah1, blah2) to store data. I would go with the latter for locality of reference. One can see that there is potential for research here too i.e. what is the performance benefits of each style depending on the browsing habits of the user.

Network Buffering:

Often, in networking systems, one may consider the caches to be made up of a pool of buffers separated in zones -- and thus in the above classification scheme, networking caches may be envisioned as; "zone" meaning a set, and buffers in a zone meaning the cache-lines in a set. The eviction scheme is used to figure out which buffer to re-use in a particular zone

Tuesday, March 6, 2012

Components of the Web

The Web is made up of 3 general components.

Origin Servers
Clients
Proxy: (See this, which im reproducing ) There are four kinds of widgets that people might call proxies. Note that the term "widget" below may be either software, hardware, or a combination.

A router is a widget that sends packets back and forth between multiple network segments (i..e., between ethernet, ADSL, and LocalTalk networks all coming in to the same machine, but possibly on different ports or cards). The widget may choose to filter out some packets and not route them. Important characteristic: multiple networks attached.
A gateway is widget that translates data from one protocol (like TCP/IP) to another protocol (like AppleTalk) as it flies past. The widget may also choose to not translate some packets, and filter them out.Important characteristic: protocol translation.
A proxy is a widget that receives requests with its left hand (usually from one network), and then launches similar requests with its right hand (usually on a different network). As the results arrive back at the right hand, they are copied over to the still-waiting left hand and returned to the original client. (It acts as a server with its left hand, and as a client with its right hand.) The widget may filter out certain requests or certain responses and not copy them from hand to hand. Important characteristic: acts as both client and server. When a proxy operates at the packet level it is a transparent proxy, and the clients usually don't know that there's a proxy involved.
A cache is a widget that receives requests and either responds directly from it's own store of information, or forwards the requests "upstream" to another server. When data comes back from the upstream server, this widget keeps a copy of the data for itself, and also returns the data to the original client. Important characteristic: data stored locally.

Cloud Computing Landscape

http://cloudstack.org/blog.html?start=5 led to me nice overviews about the landscape. Will explore further, add more detailed info, better surveys etc. soon.

Tuesday, February 28, 2012

MPEG Dash

http://dashpg.com/?page_id=25

Monday, February 27, 2012

Level and Edge Triggered Programs

Level Triggered : The process "looks" at some data-structure/state and if it matches something, notifies the interested processes i.e. set some flag somewhere etc.e.g. time 0: no packets, time 1: packet, time 2: packet will cause only 2 level-triggers at time 1 and time 2.

Edge Triggered: The process "looks"at some data-structure/state and sees if it has "changed" from the last time the process looked at it. If it has, notify other interested processes i.e. set some flag somewhere. e.g. time 0: no packets, time 1: packet, time 2: packet will cause only 1 edge-trigger at time 1.

Note here that "time" here means at the abstraction-level of "Software systems" i.e. the time a process "looks" at some data-structure. (i.e. the number of times a CPU instruction touched some abstraction (which is stored in memory (though the process may refer different memory locations at different instants))

I will describe how it works in the context of an edge-triggered vs a level-triggered epoll system call soon. The idea is that an edge-trigger is triggered once , which allows us to do reads etc in the context of a different thread without having to to fd_clr etc. Coming soon :-)

Ref: [1][2]

Tuesday, February 21, 2012

Postgres Documentation

http://www.postgresql.org/docs/manuals/archive/

Wednesday, February 15, 2012

Thread Pools

As a quick recap, note that the linux kernel creates an object of the data structure called a "task". Both of what we call as processes and threads are really tasks that are "cloned" from this original task. fork() is one kind of cloning and kthread_create (pthread_create, (portable kernel thread create) API) is another kind of cloning. Multiple threads may share the virtual address space (and thus in turn Data sections, etc), but threads mostly have their own stacks.

A task is a unit of scheduling on by the linux scheduler.

Note that "process" is a task that has a "user level context". When "user level threads" are created, they do not call the clone system call and thus share the address space of the process. These kinds of user-level-threads are called lightweight processes.

One can, if one wants, instead of creating new a thread every time the client asks the server to perform some task, we can have a pool of threads -- what this means is that we can store the "context" (i.e. registers etc) in some place in RAM and toggle between these places with a pointer i.e. all the operating system needs to do to perform a thread switch, is to change the thread pointer (see this). This is essentially an optimization to save the overhead of creating and deleting threads.

Have a look at this link for a quick high-level intro of linux memory management.

Monday, February 13, 2012

On OAuth

I was pointed to this article, which made up read up more on OAuth. Will update more soon -- just making a note.

Thursday, February 9, 2012

Video & Network Bitrates

Video encoding layer:
VBR - constant PSNR
CBR - variable PSNR

Network Layer
CBR - max packets/sec is constant over time
VBR - max packets/sec varies over time

For video over network
We can stream VBR over VBR, CBR over VBR, CBR over CBR; and VBR over CBR

Monday, January 23, 2012

Loose Coupling and Tight Coupling

This is an intro to the idea. Basically, the learning was that it appears that loosely-coupled systems are easier to maintain and evolve. One should of course not read too much into the definition of "loosely-coupled" -- what one means here is that the "interface" between two components is well-defined and each party knows about it. However the other party knows nothing about what happens under the hood of the interface. Such systems are easier to maintain because the references of one object are not tied together with the references of another object, but rather they pass through well-defined interfaces.

One may also see that the same idea can be extended to hold true for more general relationships e.g. Loosely-coupled organizations may be better suited for evolution/adaptation and growth than tightly coupled organizations i.e. organizations that do not play the big-brother may allow each employee to learn/grow as per their personal choice thereby increasing satisfaction levels and motivation (because of having the perception of being in control of one's life).

Further, it may also be that in some enterprises "identity" management/aka "authorization" is tightly coupled and "authentication" is loosely coupled i.e. one can get in from many doors (the edge is only loosely coupled with the centralized system), but cannot access any content unless the centralized server approves [Ref: [1] [2]]

Friday, January 6, 2012

Controlling the heater via a humidity sensor

A project i have been wanting to design is to use hygrometer to turn on/off the heater i have in my room. The heater makes the room too dry. The plan is to keep the window open (to allow for humidity to enter the room) and switch off the heater when the humidity drops below a certain value. I am not very sure how to get this working, but googling around a bit led me to this:

http://marcusgun.awardspace.com/humidor.html

Tuesday, January 3, 2012

Complexity Zoo

Link to a pretty popular place to learn about this.
http://qwiki.stanford.edu/index.php/Complexity_Zoo

Will post stuff as i learn more.