On TV.com: 2009's Most PIRATED TV Show
BNET Business Network:
BNET
TechRepublic
ZDNet

Talkback

Add your opinion
advertisement
advertisement
Click Here
Scaling an Internet architecture

Any great website eventually runs out of capacity. Horizontal scalability with load balancers between the tiers offers redundancy -- keeping the site up and running.

Hi I'm Ted Cahall, Chief Information Officer and Senior Vice President at CNET Networks, and today I'd like to talk about scaling an Internet architecture.

One of the big problems about building a great Internet site is that, eventually you run out of capacity and most people start their site out on one single box. In this case I'll talk about an open source solution or at least a free solution, where all the parts are available on the Internet; you download them, all you had to do was buy the actual piece of hardware. In this case let's say we're using a version of Linux which is free and on that version of Linux we've put in an Apache Web Server, which will serve the static pages off a disc, and then we're using a Java Virtual Machine running some type of J2SE container for servlets on to a MySQL database to serve the pages. Eventually what'll happen is, we get enough users, enough people that want to come to our site and we run out of capacity.

So how do we address that, and make it where we can figure out what ran out of capacity, and how do we take it to the next level. So a common way to do that is to then take this same site here, and build it out, but move each one of these processes out on to a separate machine so then we'll be able to scale those machines independently. And that would look something along these lines which would be, we'd make an Apache tier which is our web server. We then make that connect down to a Java tier, which would be our application level, it could be PHP if we want it to which would stay up in the Apache tier, but this is more of a classical three-tier physical architecture, then down to a data tier to where we're holding our data; for this we chose a MySQL database and we've now got three independent tiers, which we can then start to scale or we can at least add different types of horsepower in terms of vertical scalability by adding CPU's, adding more RAM, adding faster discs etc. But eventually, even this scalability method falls down and we need to further scale the system.

The type of scalability I'll describe to you now is called horizontal scalability. Horizontal scalability means I'll take each one of these tiers and I'll move them out horizontally and scale them. Now to do that, I need a piece of hardware called a load balancer. So we'll look at this infrastructure and we've got the Internet coming in and the first thing it hits is this box called a load balancer, and the load balancer has these virtual IPs which has one IP address that fans out to multiple IP address in the tier below it. So in this case it would be multiple Apache boxes which are the web servers. So there would be, let's say we took three initially, and we have three Apache boxes and the load balancer although it looks like one IP addressed to the Internet, is distributing this load across these three Apache boxes.

Now the Apache boxes will serve the static content off their local drives which were pushing out with an rsync or some other method, but eventually we want some dynamic content and that dynamic content comes through the Java Virtual Machine with a J2SE servelet container on it so then we know we need another load balancer because we'll need a scale. We've scaled the Apache tier horizontally, we now need to scale the Java and J2SE tier horizontally.

So let's say we'll start with again three boxes, and we put the Java tier in, and they're connected to the load balancer, as well as the Apache connects back into the load balancer to move towards the Java tier. Now we've got the Java tier scaled out horizontally as you can see, and the question is how do we get the data tier scaled horizontally. Well that's a little trickier, but again we connect back to a load balancer and this picture is more shown logically than physically, this mail will just be three different VIP's in the very same load balancer, but I'm trying to give you a logical representation of how all this is connected, and we say we have three MySQL replica slaves that hold this data. And the slave term is a MySQL term, I'm not just using that. That's not some crazy computer science term that it's really a vendor term so we've now got it to where we can pull the data from the Java Virtual Machines up through the MySQL.

The question now is how do we update all three of these databases. Well MySQL itself has a master slave relationship and the machine that you write to is always referred to the master and through its infrastructure, automatically updates these slaves through a process called replication. So as long as all of the rights of the data through the tools that maybe some of the tech producers or the tools that some of the business people are using, they're writing data into this master. Maybe there's some feeds that are coming from some merchants and these feeds also are going into this master database, MySQL is replicating that up to the slaves, the load balancer is distributing the load to the MySQL databases, the Java tier is able to pull that and we've now not only got horizontal scalability, but we have redundancy. If any of these boxes were to fail our site will still be up as the load balance will take the failed box out of rotation, continue to distribute the load to the boxes that are still working and keep our site completely up and functional.

So, the key critical component to scaling out an Internet architecture is to have load balancers between the tiers that are scaled out horizontally.