Blog

Writing easily scalable web(PHP) applications
2012-06-10


"A cloud is made of billows upon billows upon billows that look like clouds. As you come closer to a cloud you don't get something smooth but irregularities at a smaller scale." - BenoƮt Mandelbrot

I recently started working as a sysadmin for a managed hosting company which hosts sites with huge amounts of traffic. In order to handle all that traffic, those sites have to be scaled, which is not always as easy as it could be. Most sites require a lot of work because they weren't designed to be scaled. Having spent a decent amount of time coding PHP over the years, I realized that most of the applications we deal with are a mess, because they weren't designed to do this kind of work. I figured I should share some of my newly found wisdom with the good people of the internet, and possibly make my life easier in the long run if sites written using this knowledge end up on our servers. A txt version is also available.

Following these simple guidelines, you will make your site easily scalable to handle up to millions of hits per day without making your coding much more complicated than it already is.

separate RW and RO resources

When scaling database or caching services, one of the most common things to do is set up a master and slaves. Those slaves can usually give you all the data you need, but can't write anything. In that case you need to separate your writing queries from your reading queries, if possible.

For now, all you need to do is use a reference to the RW resource, so that if the day comes, all you'd have to do to scale your MySQL access is add a simple connect/select_db.
$mysqlRW = mysql_connect();
$mysqlRO =& $mysqlRW;
***
mysql_query('INSERT INTO table (column) VALUES (1)', $mysqlRW);
***
mysql_query('SELECT * FROM table', $mysqlRO);
There is nothing stopping you from reading through the master, but it should be avoided. There are some cases when you need your data to be as fresh as possible(slaves' data lags a bit) so you have to use the RW resource, but such cases are pretty rare and can be usually avoided by using caching.

Don't worry about load balancing your slaves. That is much more easily and safely done by admins later on. A single RO resource is all you need.

use proper caching

In-memory, key-value data stores are indispensable when you need frequent access to the same data on a large scale, such as sessions, counters, templates,... Redis with phpredis is my personal favorite for PHP applications. Implementing your own caching systems is rarely a good idea, especially if you plan on using regular files as storage.

Separating RW and RO resources applies here too.

don't rely on files

One of the advantages of Redis and similar technologies is the ability to access the same data from multiple web servers, which means that you can have multiple servers doing the same kind of work. If your application relies on files for data(not including your .php files), it is much more difficult to scale and load balance the PHP service itself. That also means it's more difficult to deploy a redundant server which takes over if your main server is down.

try to make your pages compatible with reverse proxies

Having identical output for all visitors who haven't logged in enables you to serve them a cached version of your site, instead of having your servers work their asses off to produce the same result every time, which is especially handy if most of your traffic comes from such users.

have a DBA go through your database design and queries

This is really important. Most database performance loss comes from a poorly designed database and badly written queries. Having a DBA handle it before you write your application can save you a lot of trouble later on and can even speed up your coding, since you wouldn't get lost in your database design and queries. Also, database servers are not cheap, so having a decent database design can save you a lot of money. Queries can be fixed later, but a bad database design often requires a serious amount of rewriting and downtime to migrate to the new database.

make sure your code works with PHP caching

My personal favorite here is APC, but there are plenty of other solutions out there. PHP caching could double the execution speed of your PHP code, which means you wouldn't need more power(money) to run your application.

not as important, but still helpful

separate all database access

If you have all of your mysql related commands grouped(functions in separate files or classes), you should be able to change the way you access your data without much hassle. One of the reasons to do that is so that you can easily add caching around your data requests, replace MySQL with PostgreSQL, easily make sure everything is escaped(avoid SQL injections), shard your tables,...

try not to use server-specific technologies(extensions)

Basically, don't bind your application to Apache if you don't have to. There are other web servers, such as nginx, which can do wonders for your performance if the time comes.

use a simple templating engine

Separating your PHP and HTML code is often a very helpful thing. It will make your PHP and HTML a lot more readable, which is great on its own. As far as scalability is concerned, templates enable you to easily move your static content(images, css, js) to another server or a CDN, thus reducing the load on your PHP server. Even something as dumb as nanotemp, which I use for almost everything, will do the trick in most situations.

what not to do

  • Don't get bogged down with micro optimization. It is usually not worth the effort, especially if your site is not getting nearly as much traffic as it can handle and you have more important work to do. Throwing hardware on a software problem is something most of us hate to do, but in most cases your peak traffic will not be stable enough to avoid buying/renting more hardware with micro optimization.
  • Don't bother working around bad server configurations or lack of services. Installing and configuring services is not your application's fault. All load balancing, monitoring and trending should be done on the servers, so you should not have to code any of that.
  • Don't rely on obscure services. Those are usually poorly maintained and can cause a lot of problems in the long run. Also, a sysadmin will be more likely to misconfigure services they have no experience with.
  • Don't stress test your setup before knowing what your traffic looks like. Testing if your application can handle 100 writes per second is useless if most of your traffic turns out to be reading.
  • Don't rush to judgment. If your site is slow/overloaded make sure you find the real reason behind it. Handle it one bottleneck at a time and make sure you don't waste money on hardware before you're sure you need it. Most bottlenecks can be resolved by re-configuring your services or moving them from one server to another.