Thursday, December 6, 2012
Monday, November 12, 2012
Post Vs Put
According to RFC 2616
The POST method is used to request that the origin server accept the entity enclosed in the request as a new subordinate of the resource identified by the Request-URI in the Request-Line. […] The actual function performed by the POST method is determined by the server and is usually dependent on the Request-URI. The posted entity is subordinate to that URI in the same way that a file is subordinate to a directory containing it, a news article is subordinate to a newsgroup to which it is posted, or a record is subordinate to a database.
PUT is defined as:
The PUT method requests that the enclosed entity be stored under the supplied Request-URI. If the Request-URI refers to an already existing resource, the enclosed entity SHOULD be considered as a modified version of the one residing on the origin server. If the Request-URI does not point to an existing resource, and that URI is capable of being defined as a new resource by the requesting user agent, the origin server can create the resource with that URI.
See this for a discussion. What eventually i am going with is this understanding:
This is specified in RFC 2616 §9.5ff.POST creates a child resource, so POST to/items
creates a resources that lives under the/items
resource. Eg./items/1
PUT is for creating or updating something with a known URL.
Therefore PUT is only a candidate for CREATE where you already know the url before the resource is created. Eg./blogs/nigel/entry/when_to_use_post_vs_put
as the title is used as the resource key
Tuesday, October 30, 2012
API vs SDK
http://stackoverflow.com/questions/834763/api-vs-sdk
API could be said to be the "declaration" and the SDK the "declaration + implementation(definition)" of an interface(method/technique to drive/use something)
Note that in OOP, the interface is usually a virtual class (C++), or an interface (Java). People can do 2 things with such a class/interface.
API could be said to be the "declaration" and the SDK the "declaration + implementation(definition)" of an interface(method/technique to drive/use something)
Note that in OOP, the interface is usually a virtual class (C++), or an interface (Java). People can do 2 things with such a class/interface.
- If what the programmer is provided is just an API with no accompanying implementations - he/she can implement the interface aka create a derived class and use the base class' pointer to call the functions defined.
- If what the programmer is provided is an SDK, with the interface specified in a virtual class (or an "interface") then the libraries provided will implement the SDK... all the programmer needs to do is to instantiate the object of this interface (as specified in the documentation), and then use it.
Wednesday, October 10, 2012
Gates with dominos
Came across this interesting post : http://mathlesstraveled.com/2012/10/10/making-a-computer-out-of-dominoes/
and this link explaining the AND gate using dominos.
and this link explaining the AND gate using dominos.
Tuesday, October 9, 2012
Functors/ Function Objects
http://www.experts-exchange.com/articles/849/Function-pointers-vs-Functors.html is the way to handle function pointers in C++ so that the passing function pointer is override-friendly (no casting needed), and the function can maintain state of the object it is in, what one needs is a functor.
http://stackoverflow.com/questions/356950/c-functors-and-their-uses .
Functor is a function which retains state.
http://stackoverflow.com/questions/317450/why-override-operator?lq=1
http://stackoverflow.com/questions/356950/c-functors-and-their-uses .
Functor is a function which retains state.
http://stackoverflow.com/questions/317450/why-override-operator?lq=1
Saturday, September 8, 2012
Quick List of *HOT* software technologies
Listing some software technologies that are *hot* these days. I have touched upon them in the past in some senses. Just bringing it all together.
Web frameworks for Web services: Goal is to pushing pages and delighting users[1]
Web frameworks for Web services: Goal is to pushing pages and delighting users[1]
- Node.js: Event driven server side javascript based "server", to deal with sysadmin capabilities and low level routing [1]
- Ruby on Rails: A web framework based on the Ruby Language
- Ajax: Method to render only small sections of the page as necessary from the user/client perspective by talking to the server side framework. (Some Ajax frameworks are even embedded as a part of larger frameworks. For example, the jQuery JavaScript Library is included in Ruby on Rails.[Wikipedia])
Clouds & Virtualization Infrastructure:
- Amazon's EC2 cloud: A web service interface that allows one to obtain and configure capacity in the cloud managed by Amazon, where new server instances can be boot up in minutes allowing to quickly scale capacity, both up and down, as your computing requirements change. Amazon EC2 changes the economics of computing by allowing you to pay only for capacity that you actually use. Amazon EC2 provides developers the tools to build failure resilient applications and isolate themselves from common failure scenarios. Here is the marketplace for the applications written based on the EC2 cloud. Here is a link to AWS developer community resources. This internally leverages virtualization technologies and Big-data technologies.
- See this Virtualization Journal for advances
Big Data: Technologies to solve problems in distributed systems and to deal with large data.
- Apache Hadoop Project: Java based technology that helps to leverage multiple commodity hardware to deal with big data (however typically for performance, this "commodity hardware" is really high end machines). Hadoop is an java-based open source implementation of the Google MapReduce model (which in turn bears similarities with the MPI Map-Reduce model). Google is thought to implement the map-reduce in C++.
OpenFlow and SDN:
- Openflow is a layer-2 protocol to talk to a switch or router so as to get access to the switch or router's forwarding plane and tell the switch to take certain decisions. Openflow is considered to be an enabler of SDN (of course one can always do SDN by SNMP or by CLI. See a past blog)
Android and Apple App development:
Sunday, September 2, 2012
Copy Constructor
http://www.velocityreviews.com/forums/t283016-question-on-copy-constructor.html as to why the argument of the copy constructor needs to be pass-by-reference
(note that there are 2 ways of passing arguments in C++, one is by value (copy constructor is involved), and one is by reference)
See Wikipedia for examples
(note that there are 2 ways of passing arguments in C++, one is by value (copy constructor is involved), and one is by reference)
See Wikipedia for examples
Saturday, September 1, 2012
Depth from Monocular images
http://ai.stanford.edu/~asaxena/learningdepth/NIPS_LearningDepth.pdf
and as one finds in the paper, a pointer to http://en.wikipedia.org/wiki/Markov_random_field
and as one finds in the paper, a pointer to http://en.wikipedia.org/wiki/Markov_random_field
Monday, August 27, 2012
Video Scrambling method
http://www.cl.cam.ac.uk/~mgk25/nagra.pdf had a nice view into how scrambling could be done (and in turn could be broken). Was good for my own info, so putting it up.
Monday, August 20, 2012
Facebook Database design
http://stackoverflow.com/questions/1009025/facebook-database-design, but of course it may be NoSQL based. (follow up)
Sunday, August 19, 2012
Exception Handling implementation
http://en.wikipedia.org/wiki/Exception_handling#Exception_handling_implementation Intro to how languages implement exception handling.
Thursday, August 16, 2012
Friday, August 10, 2012
Linux Timers
http://www.ibm.com/developerworks/linux/library/l-timers-list/ as to how timer-related-callbacks are implemented in Linux. This is something i have used in the past, though came across this page, so making a note for reference.
Tuesday, August 7, 2012
Saturday, August 4, 2012
Friday, August 3, 2012
Design patterns
I have wondered for a while, why people keep talking about design patterns, and looks like i should have studied this long time back - reason - because if nothing else they are an effective benchmark to differentiate one language from another.
Have a look at this article by Paul Graham about how 2 different 1950 paradigms
Have a look at this article by Paul Graham about how 2 different 1950 paradigms
- Lisp
- Fortran
Also, of course, i do not know all the patterns listed here, but learning them http://en.wikipedia.org/wiki/Software_design_pattern#Classification_and_list. I suspect i have used some of these patterns without knowing their names, though good to know them formally.
Broadly speaking there are 3 types of design patterns [ ref: this ]
Broadly speaking there are 3 types of design patterns [ ref: this ]
- Structural: patterns generally deal with relationships between entities, making it easier for these entities to work together. e.g. a Decorator which will add some tags to the URL based on context.
- Creational: patterns provide instantiation mechanisms, making it easier to create objects in a way that suits the situation.e.g. Factory, similar to an adaptor/wrapper for "creating" different types of objects. Another example is the Singleton design pattern: here is a quick link for the difference between a singleton pattern and a static class (note that a static class is a concept that is in C#, not in C++). Singleton's lead to objects that can be passed as arguments etc
- Behavioral : patterns are used in communications between entities and make it easier and more flexible for these entities to communicate e.g. Strategy pattern, adapter/wrapper
Wednesday, August 1, 2012
Suffix Tree and String matching
Just found out that there is a popular data structure to deal with substring kind of problems, called the "Suffix Tree". (constrast with "Prefix tree", which is used in longest prefix kind of problems)
Suffix tree can be constructed with the simple O($$n^2$$) algo (insert the "n" suffixes of a string with n chars into the tree creating a child wherever necessary) or with more advanced algorithms that can do it in O(n) (http://en.wikipedia.org/wiki/Ukkonen%27s_algorithm). The point is, once we have a suffix tree, we can execute several of these algorithms in linear time.
Ref: http://www.csc.liv.ac.uk/~leszek/COMP526/week5/week5.pdf
Suffix tree can be constructed with the simple O($$n^2$$) algo (insert the "n" suffixes of a string with n chars into the tree creating a child wherever necessary) or with more advanced algorithms that can do it in O(n) (http://en.wikipedia.org/wiki/Ukkonen%27s_algorithm). The point is, once we have a suffix tree, we can execute several of these algorithms in linear time.
- Longest repeating substring: the node which is farthest away from the tree but which has degree > 1.
- Longest substring which is also a palindrome: to do this, we take te original string lets say abcac and concatenate it with itself reversed i.e. abcaccacba and run "longest repeating substring"
- Longest common substring: where we concatenate 2 strings first e.g. abcd and bcd so as to get abcdbcd and now search for the node deepest down which has degree > 1.
- Shortest substring that occurs only once: to answer this, we find the node with the shortest length that takes us to a leaf node.
Ref: http://www.csc.liv.ac.uk/~leszek/COMP526/week5/week5.pdf
Monday, July 30, 2012
Web Frameworks
http://en.wikipedia.org/wiki/Comparison_of_web_application_frameworks
Further, have a look at these graphs for the usage of a variety of lightweight web frameworks out there.
Java based web frameworks:
I have some experience with a 'heavyweight J2EE' application and was thus interested in knowing the lightweight web frameworks out there. (http://docs.jboss.com/books/lightweight/ch01.html). Also found this http://www.it.uom.gr/teaching/applicationsIT_1/lectures/ which was useful as a intro to Java based web frameworks. Also look at http://www.it.uom.gr/teaching/applicationsIT_2/
Further, have a look at these graphs for the usage of a variety of lightweight web frameworks out there.
Java based web frameworks:
I have some experience with a 'heavyweight J2EE' application and was thus interested in knowing the lightweight web frameworks out there. (http://docs.jboss.com/books/lightweight/ch01.html). Also found this http://www.it.uom.gr/teaching/applicationsIT_1/lectures/ which was useful as a intro to Java based web frameworks. Also look at http://www.it.uom.gr/teaching/applicationsIT_2/
Saturday, July 28, 2012
Query Rewriting using Views
Have a look at this thesis to get an idea as to how a SQL query can be rewritten given that some views (named queries) of the database are already available.
Friday, July 27, 2012
Simpson's Paradox
Have a look at this paradox called Simpson's paradox. Again, one of those things, which are pretty obvious/straightforward/easily-deduced based on evidence, but seems counter-intuitive. The vector diagrams suggest why it works and why we dont catch the problem. (See how the second blue vector is "long" even though it has less rate than the second red vector)
Monday, July 16, 2012
Big Data and Relational Databases
http://dba.stackexchange.com/questions/13931/why-cant-relational-databases-meet-the-scales-of-big-data
Note the discussion about the way the data is structured (whether it facilitates partitioning e.g. )
Note the discussion about the way the data is structured (whether it facilitates partitioning e.g. )
Tuesday, June 26, 2012
Router Virtualization
Was trying to find out how one can virtualize the cisco router in ESX.
Have a look at this for some motivation behind this.
The following methods are possible today:
Have a look at this for some motivation behind this.
The following methods are possible today:
- Create a virtual Linux box and run Quagga or XORP on it.
- http://www.thevarguy.com/2012/07/23/vmware-nicira-vs-cisco-insieme-virtualized-network-war/
- Can Openflow be of relevance [see this], and this
Tuesday, May 22, 2012
Clustering complexity
Here is a paper, which says "Clustering is difficult, only where it does not matter".
Thursday, May 17, 2012
Friday, May 11, 2012
Some Basic PHP Internals
http://stackoverflow.com/questions/971279/how-the-sausage-is-made-tour-of-apache-php-mysql-interaction
The above link led me to this, and from there info about the Zend Engine and PHP request life cycle.
The above link led me to this, and from there info about the Zend Engine and PHP request life cycle.
Tuesday, April 24, 2012
Replication in distributed systems
http://en.wikipedia.org/wiki/Replication_(computing)
Note the advantages and disadvantages of synchronous and asynchronous replication methods
Note the advantages and disadvantages of synchronous and asynchronous replication methods
Monday, April 23, 2012
Payoffs for Mixed Strategies
https://www2.bc.edu/~sonmezt/E308SL7.pdf
Further, one could apply the ideas presented in the above pdf, to a one-shot simultaneous game. (i.e. a simultaneous game played only once)
Assume a 2-person game where the goal of the game is to win.
Also, of course that real life interactions are very rarely one-shot i.e. most are repeat-games, and that is why many other factors start coming into the picture. (including ethics and what not). As an example have a look at this which suggests how and why social-diversity may in fact lead to cooperation and altruism.
Further, one could apply the ideas presented in the above pdf, to a one-shot simultaneous game. (i.e. a simultaneous game played only once)
Assume a 2-person game where the goal of the game is to win.
- Lets assume a one-shot game. In this situation, what if the Payoff Matrix has no obvious Nash Equilibria i.e. at least 1 player finds that its best to change the decision. What is the best strategy in this situation. One way to think is as follows
- Assume that the game is a repeated game. Now if one does not want to be exploited, the strategy is to mix up one's decisions. How should this mixing be done ? In order to minimize the payoffs of the other person, it is best to seek the minimum of the maximum payoffs the other player -- why so ? (have a look at the article), and in doing so one finds the mixed strategy.
- Once one knows the mixed strategy, choose the decision with the highest yeild in the long-run as the decision for the one-shot game.
Also, of course that real life interactions are very rarely one-shot i.e. most are repeat-games, and that is why many other factors start coming into the picture. (including ethics and what not). As an example have a look at this which suggests how and why social-diversity may in fact lead to cooperation and altruism.
Stack Machines vs. Register Machines & more
http://www.codeproject.com/Articles/461052/Stack-based-vs-Register-based-Virtual-Machine-Arch
Java VMs are stack-machines, here is one link
and the Android Davlik VM is based on a register-based-architecture.
and also
This is of course not a big deal, i.e. stack machines are just a way of compact code, but under the hood it of course uses registers to execute the code.
Java VMs are stack-machines, here is one link
and the Android Davlik VM is based on a register-based-architecture.
"The stack machine style is in contrast to register file machines which hold temporary values in a small fast visible array of similar registers, or accumulator machines which have only one visible general-purpose temp register, or memory-to-memory machines which have no visible temp registers."Here is a comparison (USENIX) of stack vs register machines. To quote from the paper:
"For example, the local variable assignment a = b + c might be translated to stack JVM code as ILOAD c, ILOAD b, IADD, ISTORE a. In a virtual register machine, the same code would be a single instruction IADD a, b, c. Thus, virtual register machines have the potential to significantly reduce the number of instruction dispatches."
and also
"In the stack-based JVM, a local variable is accessed using an index, and the operand stack is accessed via the stack pointer. In the register-based JVM both the local variables and operand stack can be considered as virtual registers for the method. There is a simple mapping from stack locations to register numbers, because the height and contents of the JVM operand stack are known at any point in a program"Here is a comparison (Wikipedia) of different Application Virtual Machines
This is of course not a big deal, i.e. stack machines are just a way of compact code, but under the hood it of course uses registers to execute the code.
Wednesday, April 18, 2012
ARPs and VLANs
Quick note: https://learningnetwork.cisco.com/message/88223
Also, note how inter-VLAN routing is set up : http://www.cisco.com/en/US/docs/switches/lan/catalyst5000/hybrid/routing.html#wp29604
VLAN Hopping:
Making this as a security note.
Also, a quick recap of LAN vs. MAN vs. WAN protocols. Note specifically the different protocols that are used by each kind of network.
Also, note how inter-VLAN routing is set up : http://www.cisco.com/en/US/docs/switches/lan/catalyst5000/hybrid/routing.html#wp29604
VLAN Hopping:
Making this as a security note.
Also, a quick recap of LAN vs. MAN vs. WAN protocols. Note specifically the different protocols that are used by each kind of network.
Monday, April 16, 2012
Image Processing in Matlab
Found a couple of links on the Mathworks page related to a few (perhaps basic) aspects of image processing.
- http://www.mathworks.com/help/toolbox/images/f20-9579.html Link to ideas (with examples) such as image registration, filtering, ROI enhancement, morphological enhancements. etc.
- http://www.mathworks.com/help/toolbox/images/f8-20792.html Link to some examples about Color space. I will soon upload a pictorial form of how colors are represented by different types of color spaces. One common type of color space is RGB color space, which one can image as a sphere with 3 "corners" jutting out. The color space of a device is a subset of this color space. There are different ways of looking at it e.g. RGB, Hue/Sat/Value, YUV (e.g. using the Jab notation e.g. 4:2:0, :4:1:1 etc), etc.
Sunday, April 15, 2012
Linux Networking Stack Performance
Here is a link about the TCP/IP stack on Linux 2.4 and 2.6. The key insights are as follows
- Kernel Locking improvements: For SMP, Linux 2.6 still does use the "Big Kernel Lock" to prevent different CPUs from accessing the kernel at the same time , but it has finer granularity.
- Efficient copy routines in 2.6
- 2.6 Kernel uses the O(1) scheduler, and post 2.6.23 uses a O(log n) scheduler . See this and this and this (Info about how sessions are made up of processes and how tty is associated with the session, processes/tasks are made up of threads that share the thread group id, the O(1) algo using 2 runqueues per CPU, task priority, nice values etc.)
Thursday, April 12, 2012
St. Petersburg Paradox
http://en.wikipedia.org/wiki/St._Petersburg_paradox is a 'paradox' where a player pays some fixed amount (say $20) to enter the game, and it appears that the expected gain from the game is infinite dollars. Therefore, the debate is that the player should pay infinite dollars for the entrance (not $20). The counter argument is that the player argues that i've paid $20, but the heads could show up at the first coin flip itself and so i will be left with a loss. Further, even though i may get a lot of money later the probability of that happening is extremely low -- this diminishing probabilities of riches seems to put off most people - yet - mathematically, the "expected" returns are infinite. This is thus argued to be a case for the rejection of mathematical expectation i.e. our psychology goes against the mathematical (long term and statistical certainty) because in real-life we cannot play infinitely long and we do not have infinite (or large) amounts to bet.
Btw, as a tangential (not at all related to betting and expectations but a side-point about the usage of the word 'average') and a somewhat funny point (though unrelated to the expectation issue), is that one can have a population where almost EVERYONE is above average. e.g. if there are just a few people with 1 leg, 1.99 could be the average number of legs in the population which is <2 for almost everyone. Of course this is kind of obvious that we aren't talking about the median, but ive often seen these two things being confused.
Btw, as a tangential (not at all related to betting and expectations but a side-point about the usage of the word 'average') and a somewhat funny point (though unrelated to the expectation issue), is that one can have a population where almost EVERYONE is above average. e.g. if there are just a few people with 1 leg, 1.99 could be the average number of legs in the population which is <2 for almost everyone. Of course this is kind of obvious that we aren't talking about the median, but ive often seen these two things being confused.
Book on Algorithms in Computational Geometry
Book on Algorithms in Computational Geometry e.g. convex hulls, line intersections, triangulations, point location/trapezoidal maps, Voronoi Diagrams etc.
To be honest, this was amongst the more beautiful things i have seen of late
One example of the utility of this is GPSes, which as i understand will use methods to find out "intersections" of layers of roads etc, followed by some kind of routing algorithms (Dijkstra?).
If a region/node has a weight, instead of an edge having a weight and if we want to minimize the weight, one way to do that is to from one node, construct edges to "neighboring" nodes with a weight of the destination node and then use Dijkstra?(confirm)
To be honest, this was amongst the more beautiful things i have seen of late
One example of the utility of this is GPSes, which as i understand will use methods to find out "intersections" of layers of roads etc, followed by some kind of routing algorithms (Dijkstra?).
If a region/node has a weight, instead of an edge having a weight and if we want to minimize the weight, one way to do that is to from one node, construct edges to "neighboring" nodes with a weight of the destination node and then use Dijkstra?(confirm)
Sunday, April 8, 2012
Some machine Learning info
I am going to list things i came across while my feeble attempts at learning machine-learning.
Classification:
E.g. To do document classification we perform at least some of the following tasks.
Classification:
E.g. To do document classification we perform at least some of the following tasks.
- Get some training data, clean and stem the text.
- Represent your documents as vectors
- Perform feature selection
- Select your classifier and train it,
- Run the classification on some test data to see how well it performs
Thursday, April 5, 2012
Some DHCP/IPTables/Squid Fun
This is one example of using IP tables (Wikipedia, Tutorial) on the router to redirect packets to a gateway/proxy (such as Squid), when connected to the wireless on the airport/home etc. The interesting thing (for me) about the post, was the use of DHCP ranges so as to give a different experience of the internet to a certain section of people. Useful to note for future i thought.
A note on IpTables from here:
iptables is built on top of netfilter, the packet alteration framework for Linux 2.4.x and 2.6.x. It is a major rewrite of its predecessor ipchains, and is used to control packet filtering, Network Address Translation (masquerading, portforwarding, transparent proxying), and special effects such as packet mangling.
A note on IpTables from here:
iptables is built on top of netfilter, the packet alteration framework for Linux 2.4.x and 2.6.x. It is a major rewrite of its predecessor ipchains, and is used to control packet filtering, Network Address Translation (masquerading, portforwarding, transparent proxying), and special effects such as packet mangling.
Advances in Video Coding
A list of recent advances in video coding
- Distributed Coding : Wyner-Ziv coding where source statistics are exploited at the decoder allowing relatively lighter encoding phases. H.264 is used as an intraframe encoder and Wyner-Ziv frames are inserted in between intra-frames, where after integer transform of the frame, followed by LDPC coding of the transformed-data after which only parity bits are sent
- Texture based methods
- Scalable coding: Temporal, Spatial, SNR: e.g. code "enhancement" layers as B frames with a hierarchical structure
- Multi-view coding
- ROI based coding: Region of interest. See this
- Conditional Replenishment : See this
Ref[1]
Further, there are advances known for video streaming such as
- Adaptive bitrate coding: Encode the video with multiple bitrates and chunk it.
- Byte streams (instead of bitstreams)
- Reduce Pre-roll time
- Multi-source video streaming over WLANS. ref[2]
Wednesday, April 4, 2012
Book on Cognitive Science
http://www.goertzel.org/books/complex/contents.html
Liked what ive read so far in the book about Cognitive Science, AI, Dynamical Systems overview etc.
Liked what ive read so far in the book about Cognitive Science, AI, Dynamical Systems overview etc.
Thursday, March 29, 2012
Wednesday, March 28, 2012
Affinity Lattice
lets say there are 3 bits 000, the graph im interested in looks like this
111 ------\
/ \ \
011 101 |
/ \ / \ |
| 001 | |
| | | |
| 000 | |
\ / \ / |
010 100 |
\ / /
110 ------/
note that this is the graph where we go from one node to another if 1 bit changes. What i have drawn above is essentially a cube flattened out. The trouble is that i cannot draw a planar/non-overlapping graph/lattice for 4 dimensional codes.I am curious to find out if it is even possible at all to draw a hypercube on paper without overlapping edges.
Further, we cannot write this in a row-by-row table because the geometry of the table does not allow more than 2 neighbors.
What i am essentially looking for are geometric objects into which these codes can be embedded, and i want to call those geometric objects as "Affinity Objects" perhaps Affinity lattices.
Let me get back on this and find out the literature about this
Monday, March 26, 2012
Intro to File Systems
Quick high-level intro to File Systems on Linux. File System design can be motivated by things such as trying to reduce the seek-time (thereby latency, thereby increase throughput) and reduce the wait-time for write operations, and to achieve ACID (or not).
- Berkeley Fast File System: Place "sibling" files in nearby blocks. Boot blocks + superblock + cylinder groups (free-list + inodes(permissions, first 10 data blocks + indirect blocks) + directory blocks + data blocks). The MS-DOS FAT is a simpler version of this.
- Journaling File Systems:
- Metadata Journaling file systems: Maintain a log/Journal of the meta-operations related to allocating inodes, dirs, blocks, but dont actually store the data being written to blocks, into the log/Journal. [2] e.g. EXT3FS
- Log Structured File System[1] [2]: See this for a quick high level understanding. To avoid circular logs and fragmentation, "segments"/pools of free segments etc are used based on Garbage Collection techniques. [Wikipedia]
- Hierarchical File Systems: [Wikipedia]
- Distributed File Systems [Wikipidia]
- Chunked file systems e.g. Google File System
Have a look at this Wikipedia page :http://en.wikipedia.org/wiki/Comparison_of_file_systems
Wednesday, March 14, 2012
Internet Next-Gen
Some Proposals:
- Today there is quite a bit of data on the internet that is public. This has in fact been the reason why Google could crawl so much of data. The issue is that when content is going to get locked up into social networks, internet quickly approches the society of humans i.e. privacy/information sharing/hiding etc for political agendas may follow. What is needed here is some governing mechanism to allow sites to interoperate and share data for searches. Some data like Wikipedia is in the best-interests of everyone to be shared. There needs to be thinking about how data is going to be shared across social networking silos.
- Named Data Networking: Routers cache index to data. Effort made by routers is to find the path to the router that has access to content, rather than to attempt to connect to some end point, and treats content as the entity of primary importance, than treating destination IP as the entity of primary importance. Data can re thus retrieved based on the URL(i.e. URL without the host)
Monday, March 12, 2012
Sunday, March 11, 2012
On Caches
This is quite a vast topic, because so much of what the work that is done in the Computer Science field is an extension of some-sort-of-cacheing by some entities. One can in fact look upon a major chunk of the computer science field itself as a search for "better" cacheing techniques. (the meaning of "better" changes with applications)
The simplest way of cacheing is of course, store data in a hash table with the key as the "address" and the value as the data. This kind of cacheing is called an associative array or hash based cacheing.
However, we can generalize this process a bit by trying to map the entire address space into a smaller address space. I will start off the discussion with the caches as one finds in Computer Organization books i.e. caching of data one finds in slower memories into faster memories. In such types of cacheing, there are 4 things of interest and we can arrange the 4 things (line-number, set, tag, and element-offset; to be discussed soon) in 4! or 24 ways, so there are at least 24 types of caching possible. I am saying at least 24 because one may use advanced data structures to implement caches which will scale up the total types of caches possible.
Organization:
See figures for an conceptual understanding. Caches are in reality not necessarily flat as shown here. The flat caches as ive drawn is perhaps the simplest way to explain it. I will briefly discuss 2 alternative ways in which cacheing of data from CPU addresses can be achieved.
Type 1: Flat Cache/Tags are part of the Cache
See the following picture. Lets say the main memory is 2^A * k bytes. We can envision the memory address to be broken up into x, y, z as follows. The "address-chunk" is referred to as an "element" in literature.
Storage Benefits:
Cacheing benefits are achieved by the "tag field" being non-zero (e.g. shown as y > 0 bits). Note that we can potentially store the tag in a separate faster-memory than the cache, and this is often done (see this), but i am for now showing it part of the cache, as the tag is an important part/storage of the caching process.
Search:
The search process for caches is as follows:
The simplest way of cacheing is of course, store data in a hash table with the key as the "address" and the value as the data. This kind of cacheing is called an associative array or hash based cacheing.
However, we can generalize this process a bit by trying to map the entire address space into a smaller address space. I will start off the discussion with the caches as one finds in Computer Organization books i.e. caching of data one finds in slower memories into faster memories. In such types of cacheing, there are 4 things of interest and we can arrange the 4 things (line-number, set, tag, and element-offset; to be discussed soon) in 4! or 24 ways, so there are at least 24 types of caching possible. I am saying at least 24 because one may use advanced data structures to implement caches which will scale up the total types of caches possible.
Organization:
See figures for an conceptual understanding. Caches are in reality not necessarily flat as shown here. The flat caches as ive drawn is perhaps the simplest way to explain it. I will briefly discuss 2 alternative ways in which cacheing of data from CPU addresses can be achieved.
Type 1: Flat Cache/Tags are part of the Cache
See the following picture. Lets say the main memory is 2^A * k bytes. We can envision the memory address to be broken up into x, y, z as follows. The "address-chunk" is referred to as an "element" in literature.
Type 2: Flat Cache/Note the difference in the tag bits.
Storage Benefits:
Cacheing benefits are achieved by the "tag field" being non-zero (e.g. shown as y > 0 bits). Note that we can potentially store the tag in a separate faster-memory than the cache, and this is often done (see this), but i am for now showing it part of the cache, as the tag is an important part/storage of the caching process.
Search:
The search process for caches is as follows:
- Address --> find the cache line, or group of lines (if s > 0) associated with this address
- Search "associatively" within this set's tags and look for a match.
Question: What difference do these things (type 1 and type2) make ?
The answer is that in type1, "close" addresses map to the same line (the higher order bits in the address, decide the cache-line); and in type 2, "close" addresses map to different lines (as, relatively lower order bits in the address, decide the cache line)
As a result of this, type 2 is preferred, because cacheing wants to rely on principle of locality i.e. we want close addresses to map to different cache lines so that all those lines are in the cache when the processor needs them, effectively saving latency and increasing throughput.
Cache Eviction Logic:
In the above discussion the eviction logic was that, in case of "direct mapped cache" i.e. each line is a separate set, if the cache line one is interested in, is occupied with a different tag, we kick out that data and load new data.
If we are talking about a set-associative cache, then, we would potentially associatively search for a the tag match in all the lines in the "set", and then we have to decide some mechanism to kick out an entry - this may be done as LRU, LFU, FIFO etc (google for this).
Cache Coherency:
Distributed Caches & Partitioning: See this
Extensions:
Web Cacheing:
One can in principle extend this logic to cacheing in general, e.g. suppose we want to cache the content from www. google.com/blah1 and www.google.com/blah2. May it be nice to use higher-order bits (e.g. the www.google.com to map to a cache-block, or may be we better off using finer granularities (i.e. blah1, blah2) to store data. I would go with the latter for locality of reference. One can see that there is potential for research here too i.e. what is the performance benefits of each style depending on the browsing habits of the user.
Network Buffering:
Often, in networking systems, one may consider the caches to be made up of a pool of buffers separated in zones -- and thus in the above classification scheme, networking caches may be envisioned as; "zone" meaning a set, and buffers in a zone meaning the cache-lines in a set. The eviction scheme is used to figure out which buffer to re-use in a particular zone
Tuesday, March 6, 2012
Components of the Web
The Web is made up of 3 general components.
- Origin Servers
- Clients
- Proxy: (See this, which im reproducing ) There are four kinds of widgets that people might call proxies. Note that the term "widget" below may be either software, hardware, or a combination.
- A router is a widget that sends packets back and forth between multiple network segments (i..e., between ethernet, ADSL, and LocalTalk networks all coming in to the same machine, but possibly on different ports or cards). The widget may choose to filter out some packets and not route them. Important characteristic: multiple networks attached.
- A gateway is widget that translates data from one protocol (like TCP/IP) to another protocol (like AppleTalk) as it flies past. The widget may also choose to not translate some packets, and filter them out.Important characteristic: protocol translation.
- A proxy is a widget that receives requests with its left hand (usually from one network), and then launches similar requests with its right hand (usually on a different network). As the results arrive back at the right hand, they are copied over to the still-waiting left hand and returned to the original client. (It acts as a server with its left hand, and as a client with its right hand.) The widget may filter out certain requests or certain responses and not copy them from hand to hand. Important characteristic: acts as both client and server. When a proxy operates at the packet level it is a transparent proxy, and the clients usually don't know that there's a proxy involved.
- A cache is a widget that receives requests and either responds directly from it's own store of information, or forwards the requests "upstream" to another server. When data comes back from the upstream server, this widget keeps a copy of the data for itself, and also returns the data to the original client. Important characteristic: data stored locally.
Cloud Computing Landscape
http://cloudstack.org/blog.html?start=5 led to me nice overviews about the landscape. Will explore further, add more detailed info, better surveys etc. soon.
Tuesday, February 28, 2012
Monday, February 27, 2012
Level and Edge Triggered Programs
Level Triggered : The process "looks" at some data-structure/state and if it matches something, notifies the interested processes i.e. set some flag somewhere etc.e.g. time 0: no packets, time 1: packet, time 2: packet will cause only 2 level-triggers at time 1 and time 2.
Edge Triggered: The process "looks"at some data-structure/state and sees if it has "changed" from the last time the process looked at it. If it has, notify other interested processes i.e. set some flag somewhere. e.g. time 0: no packets, time 1: packet, time 2: packet will cause only 1 edge-trigger at time 1.
Note here that "time" here means at the abstraction-level of "Software systems" i.e. the time a process "looks" at some data-structure. (i.e. the number of times a CPU instruction touched some abstraction (which is stored in memory (though the process may refer different memory locations at different instants))
I will describe how it works in the context of an edge-triggered vs a level-triggered epoll system call soon. The idea is that an edge-trigger is triggered once , which allows us to do reads etc in the context of a different thread without having to to fd_clr etc. Coming soon :-)
Ref: [1][2]
Edge Triggered: The process "looks"at some data-structure/state and sees if it has "changed" from the last time the process looked at it. If it has, notify other interested processes i.e. set some flag somewhere. e.g. time 0: no packets, time 1: packet, time 2: packet will cause only 1 edge-trigger at time 1.
Note here that "time" here means at the abstraction-level of "Software systems" i.e. the time a process "looks" at some data-structure. (i.e. the number of times a CPU instruction touched some abstraction (which is stored in memory (though the process may refer different memory locations at different instants))
I will describe how it works in the context of an edge-triggered vs a level-triggered epoll system call soon. The idea is that an edge-trigger is triggered once , which allows us to do reads etc in the context of a different thread without having to to fd_clr etc. Coming soon :-)
Ref: [1][2]
Tuesday, February 21, 2012
Wednesday, February 15, 2012
Thread Pools
As a quick recap, note that the linux kernel creates an object of the data structure called a "task". Both of what we call as processes and threads are really tasks that are "cloned" from this original task. fork() is one kind of cloning and kthread_create (pthread_create, (portable kernel thread create) API) is another kind of cloning. Multiple threads may share the virtual address space (and thus in turn Data sections, etc), but threads mostly have their own stacks.
A task is a unit of scheduling on by the linux scheduler.
Note that "process" is a task that has a "user level context". When "user level threads" are created, they do not call the clone system call and thus share the address space of the process. These kinds of user-level-threads are called lightweight processes.
One can, if one wants, instead of creating new a thread every time the client asks the server to perform some task, we can have a pool of threads -- what this means is that we can store the "context" (i.e. registers etc) in some place in RAM and toggle between these places with a pointer i.e. all the operating system needs to do to perform a thread switch, is to change the thread pointer (see this). This is essentially an optimization to save the overhead of creating and deleting threads.
Have a look at this link for a quick high-level intro of linux memory management.
A task is a unit of scheduling on by the linux scheduler.
Note that "process" is a task that has a "user level context". When "user level threads" are created, they do not call the clone system call and thus share the address space of the process. These kinds of user-level-threads are called lightweight processes.
One can, if one wants, instead of creating new a thread every time the client asks the server to perform some task, we can have a pool of threads -- what this means is that we can store the "context" (i.e. registers etc) in some place in RAM and toggle between these places with a pointer i.e. all the operating system needs to do to perform a thread switch, is to change the thread pointer (see this). This is essentially an optimization to save the overhead of creating and deleting threads.
Have a look at this link for a quick high-level intro of linux memory management.
Monday, February 13, 2012
On OAuth
I was pointed to this article, which made up read up more on OAuth. Will update more soon -- just making a note.
Thursday, February 9, 2012
Video & Network Bitrates
Video encoding layer:
VBR - constant PSNR
CBR - variable PSNR
Network Layer
CBR - max packets/sec is constant over time
VBR - max packets/sec varies over time
For video over network
We can stream VBR over VBR, CBR over VBR, CBR over CBR; and VBR over CBR
Monday, January 23, 2012
Loose Coupling and Tight Coupling
This is an intro to the idea. Basically, the learning was that it appears that loosely-coupled systems are easier to maintain and evolve. One should of course not read too much into the definition of "loosely-coupled" -- what one means here is that the "interface" between two components is well-defined and each party knows about it. However the other party knows nothing about what happens under the hood of the interface. Such systems are easier to maintain because the references of one object are not tied together with the references of another object, but rather they pass through well-defined interfaces.
One may also see that the same idea can be extended to hold true for more general relationships e.g. Loosely-coupled organizations may be better suited for evolution/adaptation and growth than tightly coupled organizations i.e. organizations that do not play the big-brother may allow each employee to learn/grow as per their personal choice thereby increasing satisfaction levels and motivation (because of having the perception of being in control of one's life).
Further, it may also be that in some enterprises "identity" management/aka "authorization" is tightly coupled and "authentication" is loosely coupled i.e. one can get in from many doors (the edge is only loosely coupled with the centralized system), but cannot access any content unless the centralized server approves [Ref: [1] [2]]
One may also see that the same idea can be extended to hold true for more general relationships e.g. Loosely-coupled organizations may be better suited for evolution/adaptation and growth than tightly coupled organizations i.e. organizations that do not play the big-brother may allow each employee to learn/grow as per their personal choice thereby increasing satisfaction levels and motivation (because of having the perception of being in control of one's life).
Further, it may also be that in some enterprises "identity" management/aka "authorization" is tightly coupled and "authentication" is loosely coupled i.e. one can get in from many doors (the edge is only loosely coupled with the centralized system), but cannot access any content unless the centralized server approves [Ref: [1] [2]]
Friday, January 6, 2012
Controlling the heater via a humidity sensor
A project i have been wanting to design is to use hygrometer to turn on/off the heater i have in my room. The heater makes the room too dry. The plan is to keep the window open (to allow for humidity to enter the room) and switch off the heater when the humidity drops below a certain value. I am not very sure how to get this working, but googling around a bit led me to this:
http://marcusgun.awardspace.com/humidor.html
http://marcusgun.awardspace.com/humidor.html
Tuesday, January 3, 2012
Complexity Zoo
Link to a pretty popular place to learn about this.
http://qwiki.stanford.edu/index.php/Complexity_Zoo
Will post stuff as i learn more.
http://qwiki.stanford.edu/index.php/Complexity_Zoo
Will post stuff as i learn more.
Subscribe to:
Posts (Atom)