Sunday, December 9, 2012

Basic Facebook Architecture

Getting into Facebook  Architecture 

Social networking is the art of connecting with those who share common interests. 
Facebook Use Case:

1. Social Graph
2. Friend connections
3. Friends Friend connections
4. Friend can comments, read your stuff, like
5. You can update you status, upload pictures, videos
6. You can post your details
7. You're each post is tracked on timeline

Volumes of data

1. Tera/Petabytes of data is stored. 
2. 350+ million users are using

High-level architecture

Key components

1. Front end: Servers run on LAMP ( Linux, Apache, MySQL, and PHP)
a. FB runs on the Linux and Apache web servers.
b. MySQL: Key values stored data. Data is randomly distributed across the large server
c. PHP: Web programming langauge
d. MemCache: Memcache is a memory caching system that is used to speed up dynamic
database-driven websites (like Facebook) by caching data and objects in
RAM to reduce reading time. 

Architecture model 

1. Create service needed
2. Create a framework/tool set for easier creation of services
3. Use the right programming language for the task

Mika-shroepfera's architecture diagram

  • PHP web server collects data, multiple dedicated web services. There are other services developed by Python, C++, Erlang and so on. In order to communicate between all these services "Thrift" is used.
  • Load Balancer to Balance the incoming web page access.
The below diagram shows facebook architecture 
Aditi technology's slide pack

Open Source Model 

BigPipe is a fundamental redesign of the dynamic web page serving system. The general idea is to decompose web pages into small chunks called page lets  and pipeline them through several execution stages inside web servers and browsers  see as below


Thrift (protocol)

Thrift is a lightweight remote procedure call framework for scalable cross-language services development. Thrift supports C++, PHP, Python, Perl, Java, Ruby, Erlang, and others. It’s quick, saves development
time, and provides a division of labor of work on high-performance servers and applications.

Scribe (log server)

Scribe is a server for aggregating log data streamed in real-time from many other servers. It is a scalable framework useful for logging a wide array of data. It is built on top of Thrift.

Cassandra (database)
Cassandra was earlier used, now this is being replaced by HBase
Image result for cassandra database

Apache Cassandra is an open source distributed database management system. It is an Apache Software Foundation top-level project designed to handle very large amounts of data spread out across many commodity servers while providing a highly available service with no single point of failure.

Hiphop logo white.png
HipHop for PHP is a source code transformer for PHP script code. HipHop programmatically transforms PHP source code into C++ and then uses g++ to compile it to machine code. HipHop includes a code transformer, a reimplementation of PHP's runtime system, and a rewrite of many common PHP Extensions to take advantage of these performance optimizations

HBase (database)
HBase Logo.png
Apache HBase™ is the Hadoop database, a distributed, scalable, big data store.

Haystack for the pictures
Image result for hbase architecture

The high-level Facebook architectural in terms of LAMP other protocols is explained in following Video (OSCON 2010: David Recordon, "Today's LAMP Stack")

Many thanks for many views and support.  Please comment.


1. Hay Stack: Pictures usage and architecture
2. Clearcloud site: Most of the contents, idea to write this blog
3. Seattle conference mika-shroepfera : VP of facebook presentation, it is not in english though
4. Apche Thrift: Thrift site
5. Facebook Engineering page: Day to changes as developers 
6. Aditi Technolgy blogs: Nice infoq sites, nuts and bolts
7. Glenns site: Gog blog
8. Facebook: Science and the Social Graph: 2009 Infoq from Aditya Agarwal 

No comments: