Thursday, December 27, 2012

Instagram - Architecture

 Instagram Architecture

Simple "KISS" rules to make one billion dollars 

1. Instagram took 8 weeks to build, photo-sharing website. One year to get up and running
2. Small start-up (3 engineers), with many (proven) technologies .
3. They used Canonical Early start-up  model
4. Service Oriented architecture is used
4. They sold company to "Facebook" for billion USD.


Infrastructure

1. Amazon web services as cloud ( EC2)

  • EC2 is "Elastic Cloud Computing" :
  • You can buy computing power by hour
  • You can bandwidth by GB
  • You can buy space by GB/month
  • You can add load balancer, firewall by click of mouse


2. Linux




3. Elastic load balancer with  "3 Nginx" instance siting behind ELB. Here 100 of EC2 instances are used.


3. a : Nginx
(Nginx  (pronounced “Engine-X”) is an open source Web server and a reverse proxy server for HTTP, SMTP, POP3 and IMAP protocols, with a strong focus on high concurrency, performance and low memory usage.


Aimed at solving the C10K problem of 10,000 simultaneous connections, nginx was written with a different architecture in mind—one which is much more suitable for nonlinear scalability in both the number of simultaneous connections and requests per second. nginx is event-based, so it does not follow Apache's style of spawning new processes or threads for each web page request. The end result is that even as load increases, memory and CPU usage remain manageable. nginx can now deliver tens of thousands of concurrent connections on a server with typical hardware (Ref: 3)

3. b 100 EC2 instances with load balancer

In this diagram, two web browsers are requesting 3 different web sites, assuming there are three instances,  ELB distributes the load (simple form of ELB, there may be many variations in ELB)

More on ELB



4. PostgreSQL


PostgreSQLPostgreSQL is an object-relational database system that has the features of traditional proprietary database systems with enhancements to be found in next-generation DBMS systems.

5. Gunicorn as their WSGI server.  

Unicorn is an HTTP server for Rack applications designed to only serve fast clients on low-latency, high-bandwidth connections and take advantage of features in Unix/Unix-like kernels.
A solution is to stop being a supermarket and start being Fry's. When you checkout at Fry's you wait in one long line. In front are thirty cash registers handling one person at a time. When a cashier finishes with a customer, they turn on a light above the register to signal they're ready to handle the next one. It's counterintuitive, but one long line can be more efficient than many short lines.(tweeter page) 



6. 25+  Django App servers : Ref:


App Server used for Horizontal scaling

7. Deploying Django with gunicorn and nginx 

8: Amazon S3 (for photo storage)


Amazon S3 is storage for the Internet. It is designed to make web-scale computing easier for developers.

Amazon S3 provides a simple web services interface that can be used to store and retrieve any amount of data, at any time, from anywhere on the web. It gives any developer access to the same highly scalable, reliable, secure, fast, inexpensive infrastructure that Amazon uses to run its own global network of web sites. The service aims to maximize benefits of scale and to pass those benefits on to developers.


9.  Route 53:

Amazon Route 53 is a highly available and scalable Domain Name System (DNS) web service

Courtesy : Mark J



10. Amazon Cloud CDN (Content delivery Network):


A typical website generally contains a mix of static content and dynamic content. Static content includes images or style sheets; dynamic or application generated content includes elements of your site that are personalized to each viewer. Amazon CloudFront can help improve performance of your entire website in the following ways:

Amazon CloudFront can cache static content at each edge location. This means that your popular static content (e.g., your site’s logo, navigational images, cascading style sheets, JavaScript code, etc.) will be available at a nearby edge location for the browsers to download with low latency and improved performance for viewers. Caching popular static content with Amazon CloudFront also helps you offload requests for such files from your origin sever – CloudFront serves the cached copy when available and only makes a request to your origin server if the edge location receiving the browser’s request does not have a copy of the file.

12 : Redis 

*Re*mote *Di*ctionary *S*ervice

RedisRedis (REmote DIctionary Server) is key-value in-memory database storage that also supports disk storage for persistence


13. Memcached 

In computing, memcached is a general-purpose distributed memory caching system  but is now used by many other sites. It is often used to speed up dynamic database-driven websites by caching data and objects in RAM to reduce the number of times an external data source (such as a database or API) must be read. Memcached runs on Unix, Linux, Windows and Mac OS X and is distributed under a permissive free software license



14:  Apache Solr



Apache Solr is a fast open-source Java search server.
Solr enables you to easily create search engines which searches websites, databases and files.



15. Fabric 

is used to execute commands in parallel on all machines. A deploy takes only seconds.

PostgreSQL (users, photo metadata, tags, etc) runs on 12 Quadruple Extra-Large memory instances.

16:  Twisted

Twisted
Twisted is an event-driven networking engine written in Python and licensed under the open source  MIT license.

Final architecture from Rama's blog:

The final architecture from Rama (How to make a billion dollars in a couple of years)

 Slide21

Donate:

If you like this article, Please donate money

Major References:

1. Instagram blog on tumblr :
2. Highscalibity site :
3. Ngnix 
4. Billion dollars in few years
5.PostgreSQL when it’s not your job.

No comments: