January 27, 2014

Avoiding the Database bloat in your WordPress Portal

A healthy database means a healthy Wordpress portal.

The WordPress Architecture has always been criticised for its considerable reliance on the underlying database.

This dependancy gives WordPress tremendous flexibility allowing it to be installed in a variety of hosting environments however, at the same time it creates unique scalability challenges especially for large complex setups where a considerable number of plugins are involved.

We have been brought in on a number of occasions to help resolve various problems with large scale  Wordpress portals and in usually all the cases its the database which is dragging the entire site down.

Here are the standard performance tuning techniques that people usually deploy once the site is in trouble. Most of these are knee jerk reactions that people take on immediately to mitigate the issue given that their site is now down or incredibly slow and its hurting them financially.

Hardware Based Solutions

The premise being that my hosting provider sucks because they have too many hosts on one machine and I need more ‘Power’ to run my portal now. Hence people usually switch providers in the following order:

  • Go from Shared Hosting package to a VPS
  • Increase the power of the VPS or setup multiple VPS instances
  • Move to a Dedicated Server with an exorbitantly large number of CPU Cores and insane amount of Ram.

Software Based Solutions

Deploy Performance Plugins

There are a ton out there. Your mileage my vary and they usually implement one or all of the following:

      • WordPress Object Caching
      • Page Caching
      • OP Code Caching
      • CDN Configuration
      • Content Compression
      • Expiry Header manipulation
      • CSS and JS Minification – Often breaks the site

Other more Advanced options include:

  • Switching from Apache to Nginx or LiteSpeed
  • Switching to PHP-FPM and talking to PHP via fast-cgi
  • Setup front end caching by using Varnish
  • MySQL Configuration file (my.cnf) optimizations that are usually copy/pasted from the Internet.
  • Moving from MySQL to Percona
  • Mod pagespeed by Google

Depending on the situation the above mentioned solutions may mitigate the issue you are having until your traffic or data increases or may not solve the problem at all.

Usually once the above mentioned solutions have been exhausted without the desired result frustration starts to set in and one starts wondering how the other sites on WordPress work so well and yours does not.

It’s time for a Scientific Approach

I think it is best to realize at this point that you first have to measure and isolate which layer in the application stack is responsible for giving you sleepless nights. A good tool for measuring your application’s performance is NewRelic. Here is a sample screenshot from the NewRelic Console for one of our clients’ WordPress portals:

Screen Shot 2014-01-27 at 4.56.39 pm

As can be seen the majority of the time (over 90%) is being taken up by the database. Optimizing any other layer will only lead to marginal benefits at best. However, if the database can be optimized we can realize significant performance gains. Armed with this information we can now dig deeper into the underlying queries and the scripts that are executing them.

Here is a snapshot of the queries that are currently taking the most time. You can also get these details from the MySQL Slow Query Log.

Screen Shot 2014-01-27 at 7.25.40 pm

More Details including the scripts that execute the queries:

SQL Query Details

More details about the time spent during script execution:

Script Execution Time Breakdown

It is clear from these reports that queries that involve the wp_postmeta table seem to be the slowest. Why is that? What is inside the wp_postmeta table that is slowing down the response. A quick count of the two tables shows the following numbers:

mysql> select count(*) from wp_posts;
| count(*) |
|   223572 |
1 row in set (0.20 sec)

mysql> select count(*) from wp_postmeta;
| count(*) |
|  2018257 |
1 row in set (2.44 sec)

So the wp_postmeta table is about 9 times larger than the wp_posts table! Putting it another way – On average every post inserted into the database leads to an additional nine rows in the post_meta table. Suddenly now it starts making sense why the post_meta related queries take their sweet time in responding to the requests. How do we fix this?

Solutions to the Table Bloat

Well not all bloats are created equal. We need to dig deeper and figure out what kind of extra information is being added by each post.

In this particular case we found:

  1. 6 hidden fields added by a sitemap plugin for every post since it was installed
  2. 10 custom fields of which 4 to 5 were inserted with every post
  3. 4 hidden fields added by an SEO plugin for every post since it was installed

We implemented different solutions to solve different problems. Here is what we did:

  1. We moved the custom fields to a different table and setup the columns for each custom field
  2. We modified the sitemap plugin so that it would generate the sitemap without the need to store additional information against each post
  3. We modified the SEO plugin to store all the extra information in its own tables
  4. We reviewed the theme carefully and optimized queries by replacing functions and classes and found alternatives to achieving the same functionality without the costly queries that required complex joins.
  5. We added more intelligence to the theme files so that it would rely on cached results and would not query the database every time a page was opened.
  6. The database results are cached in Redis and are updated when a new post is added.
  7. The cache periodically rebuilds itself as well and serves stale data until it has been updated.

 Moral of the Story

Well there are several:

  1. Use the right tools to identify the bottleneck in the Application Stack. For WordPress in most cases it is the DB.
  2. Not all plugins are created equal. Double check plugins when you install them and see where and how they store information. Plan and project accordingly.
  3. Only install the plugins you really need. The more plugins you add the slower your site will become.
  4. Not all themes are created equal. Check to see how they leverage WordPress Database functions. A good theme programmer understands and appreciates the WordPress ORM’s strengths and weaknesses.
  5. Not all queries can be optimized especially those that are generated from the WordPress core. Try and figure out a method that can be used to cache the results if possible.