Thinking: Our Blog

If you look closely, you can actually see the superhero capes on the backs of our developers. We think they’re that amazing – and so do our clients. Being amazing (read: staying ever-ahead of the web strategy and development curve) is a commitment we’ve made to our clients, and to ourselves. As part of that commitment, we recognize how important it is to physically place ourselves at the center of the action, too, when it comes to the industry.
We’re excited to announce the opening of Oomph Boston, located in the heart of Harvard Square. One Mifflin Place in Cambridge is already buzzing with a team of talent headed by new Creative Director Patrick Richardson.
This second location – an extension of our home office in Providence – allows us to be closer to many of our core clients, like NESN and Boston College. Plus, with so many distinguished colleges and universities in our new backyard, our Boston office gives us access to the brightest minds in the field as we grow our company. Today, the energy happening around web development is intensifying wicked fast, and Boston is one of the hubs leading the charge. By basing a team in Boston, we’ve been able to readily share knowledge and demonstrate leadership in the WordPress, Drupal and PHP communities in and around the region.
Our new presence in Boston is very exciting. We are fortunate to be growing and expanding, and this milestone is one we’ve been anxiously awaiting. The Boston tech scene is hot and Oomph is ready to dive right in!
- Chris Murray, President
As we expand our space, we plan to expand our team as well. Oomph is currently hiring outstanding developers (read: superheroes) to join our team and continue to make the exclamation point in our logo well earned.
This spring, I had the privilege to attend WordPress.com VIP’s Intensive VIP Developer Training Workshop in Napa Valley. The event was hosted by Automattic, the company behind WordPress.com, and took place at The Carneros Inn, a beautiful and intimate resort situated in the heart of the winegrowing region. Representatives from big media, development shops, and freelancers descended on this idyllic setting for two and a half days of presentations, networking events and training all concentrated on working within the enterprise-centric WordPress.com VIP platform. In short, a developer’s dream come true.

Upon arriving, I checked in and had a few hours to stroll the grounds and mingle with other attendees – quickly realizing I was among some of the brightest minds in the WordPress community. Throughout the afternoon, developers from companies such as Time Magazine, Dow Jones, O’Reilly Media and NBC Sports began filtering onto the property.

The event officially kicked-off a few hours later with Matt Mullenweg’s inspirational keynote address, which left me with the overwhelming awareness that I was involved in something very special. Over the last few years, WordPress has achieved an impressive following and some serious momentum. The platform has evolved from blog management to content management, and it’s quickly gaining traction with developers as an application framework. What was once a simple blogging platform now powers 16% of the Internet. Tony Schneider, CEO of Automattic, followed Matt’s address with this real-time map illustrating some of the remarkable usage statistics related to WordPress, and then he concluded with his exciting vision for the future of the platform.
The next two days were all about the training. Topics included caching, taxonomies, A/B testing, coding standards, advanced theming, the rewrite API, and other intensive subjects relevant to working with VIP. These sessions were augmented by interactive examples presented on a Debian server virtual machine that allowed everyone to hack along and directly attack the code with the very people who helped create it. Mo Janda introduced a

tool called the VIP Code Scanner, which I highly recommend to anyone working on the platform and looking for a useful way to audit their code. Michael Fields of The Theme Team shared his expertise, including advice on leveraging The _s Starter Theme that gives theme designers a “1000-hour head start.” During one of my favorite talks – Mike Adams’ session on XML-RPC – the Automattic Social Team made an announcement revealing the new RESTful API now available on WordPress.com.
Direct access to the Automatticians along with members of the VIP community created both a great social dynamic and an opportunity to learn new strategies and discuss concepts utilized by other developers. The relaxing atmosphere of wine country, the rigorous daily seminars and the nightly networking opportunities together resulted in an amazing educational event. The intimate and interactive setting contributed greatly to making the experience wholly satisfying. I would like to extend a special thanks to my employer, Oomph, for making my attendance at this incredible training possible. I can only hope my growth as a developer on the highly-successful VIP platform will consist of more equally gratifying opportunities in the future.
Back in November, I presented a talk at the Boston WordPress Meetup entitled Caching, Scaling, and What I’ve Learned Programming for WordPress.com VIP. Our experience with WordPress.com VIP provides unique insight into what it takes to operate high-traffic WordPress sites, but what we’ve learned is also applicable to WordPress-powered sites in general.
After providing a brief overview of different caching methods, I spoke in depth about using fragment caching to improve performance and provided numerous examples taken from client projects. Fragment caching is the practice of abstracting elements of a site, such recent posts or navigation menus, and saving them for reuse. It could be that a specific element is complex to generate, or that it appears on numerous pages across a site; in either case, avoiding the need to recreate these elements can provide significant performance benefits.
The slides I used and video of presentation are embed below.
Oomph is on the lookout for a talented Creative Director to join our team full-time in either the Providence, RI or Cambridge, MA office. We had a terrific 2011 and we are searching for someone to lead the creative team through our next stage of growth.
Key responsibilities will include active contributions to Oomph’s brand, marketing, and creative culture, instilling engaging design as a core philosophy across the entire company, and developing creative approaches for all aspects of client engagements.
An ideal candidate will have experience leading small creative teams in an agency environment, a desire to work on creative solutions for sites and apps with massive audiences, and a passion and drive to create “wow” design and deliverables.
Interested? Learn more about the position.
All Open Positions at Oomph:
One of our most recent projects was to create a WordPress theme that offered versions in six different languages (two of which were UK vs. US English dialects.) The main feature of this theme was to be easily able to switch between languages, translating not only static screen elements, but also dynamic elements that come from the database. Oomph developed a theme that offers all the functionality necessary for these requirements, but in the development process, a flaw in the WordPress taxonomy system was revealed. The succinct definition of this problem is this:
Identical terms that appear in two different taxonomies are still the same term.
Yet, They are treated by the WordPress interface as if they are not.
The Problem
For a simple example, suppose we had two taxonomies, fruits and colors. In the fruits taxonomy, we had the following terms:
Apple Orange Banana
And in colors, the following:
Red Orange Yellow
Suppose we wanted to change all of the colors to use lower-case instead:
red orange yellow
We now see that the change to the term “Orange” is also reflected in the fruits taxonomy:
Apple orange Banana
This is obviously not what we want. Conceptually, Orange the fruit and Orange the color are distinct entities, but to WordPress, they are the same thing.
The Project
The problem above was fully fleshed out while developing one of our most recent and most challenging projects. This project required us to be able to translate everything seen on the screen for an end-user into 5 distinct languages: English, French, Italian, German and Dutch; as well as into 2 variants of English, UK and US, for a total of 6 different translations for each string seen by the user. Besides regular translateable strings that were coded into the theme using the WordPress I8n API, we also had to manage translations for certain taxonomies, such as category.
Since these strings would be coming from the database, employing the WordPress I18n API for translating these strings via .PO/.MO files would not have been the correct way of approaching this problem, as then we would have been passing variable values into the __() suite of functions. This poses a problem since the scripts we use to generate the .POT files to be translated would not know which string is actually in need of translation.
Another down-side of having used PO files for translating these strings would be that there would be no separation of language category taxonomies, which means that per-language category terms would have to mirror each of the other ones at all times, and there would be no opportunity for using a foreign-language term that does not appear in either the base (untranslated) taxonomy.
In order to achieve translated taxonomies, we first created a taxonomy, “languages” that would store each of the languages that were used within our system, with the terms:
The -language suffix on these term slugs was a necessity because of slug collisions with another taxonomy.
en-uk-language - English (UK) en-us-language - English (US) fr-fr-language - French nl-nl-language - Dutch de-de-language - German it-it-language - Italian
The terms from this taxonomy were in turn used to define a taxonomy for each one of the taxonomies we needed to translate:
category_en-uk-language category_fr-fr-language ...
And so forth. For each term used in one of these translated taxonomies, the helpful “description” field is used to link it back to its untranslated base:
As an aside, the hijacking of the “description” field as a substitute for proper Taxonomy metadata is a time-honored WordPress hack that is used by many themes. The alternative would be to store taxonomy metadata in the options table, but this method does not scale as well, since it bloats the options table and would require two database reads to pull meta data out for every term. Another alternative for dealing with this issue would be to use a custom table, but this sort of thing is simply not allowed in the WordPress.com/Wordpress.com VIP environment, adds extra complexity, and would still require extra database reads to pull the taxonomy meta data.

In (US) English, we might have a category structure akin to the following:
Job Market -- Pay Advice -- Insight -- Resumés
Which in French, for example, would look like:
Marche de l'emploi -- Salaires Conseils -- Aperçu -- CV's
But also might have slight differences for UK English vs. our base theme language of US English:
Job Market -- Salaries Advice -- Insight -- CV's
Looking at our category structure above, we see that both UK English and US English are using “Job Market” as a category, but UK English and French both use the term “CV’s” instead of the US English “Resumés.”
Suppose we wanted to change the UK English category to read “Job market” instead, with the second word in lower-case. We would make this change using the edit terms screen in the WordPress back-end. Now we look at the US English version of “category”:
Job market -- Salaries Advice -- Insight -- Resumés
Whoah! Back the truck up! That change also got applied to the “Job Market” term in the US English category. What happened here?! It seems that even though the “Job Market” category appears twice in separate category taxonomies, changes to one will affect the other. We’d encounter the same problem if we tried to change French “CV’s” into “CVs”, without an apostrophe: the change will also propagate to the UK English category name. This is definitely not what we want.
Problem Details
The complication here arises from the WordPress taxonomy data schema. There are two tables in the WordPress schema that are responsible for storing taxonomy data:
wp_terms
and
wp_term_taxonomy
A third table, wp_term_relationships links taxonomy terms with the posts to which they are attached, but I don’t necessarily consider it part of the taxonomy data storage system.
The fields of interest are:
wp_term_taxonomy.taxonomy wp_term_taxonomy.parent wp_term_taxonomy.description wp_terms.name wp_terms.slug and finally, the relating field: wp_term_taxonomy.term_id = wp_terms.term_id
This is a “normalized” schema, in the sense that objects that are ostensibly the same appear only once: a distinct term will only ever appear once in the wp_terms table, keyed by its slug.
Here is where we get to the core of the above-stated problem:
Identical terms that appear in two different taxonomies are still the same term.
Yet, They are treated by the WordPress interface as if they are not.
This is because WordPress is doing everything it can to keep the terms/term_taxonomy tables normalized. If we have a term, “Job Market”, and it appears in both UK English Category and US English Category, then any changes to one will necessarily be made to the other, since they are sharing the same entry in wp_terms.
Why is this a problem? For me, it boils down to the following issues:
- Terms in two different taxonomies are very rarely conceptually the same thing, even if they have the same text.
- The WordPress administrative interface implies that a term in two or more taxonomies are NOT the same thing (they appear in different locations, after all.)
- From a database engineering standpoint, it’s an extra, unnecessary table in the schema.
Why is it designed this way? As with many quirks in today’s software, it boils down to “legacy code.” WordPress grew organically over time, and certain decisions were made at the time in order to add features, that seemed reasonable at the time, given the problem being solved at the time, that did not fully predict future implications. WordPress was originally devised as blogging software, but is more and more being used for general CMS purposes. As a general-purpose CMS, complex taxonomy schemas such as the one we used for this recent project would be required for storing data into distinct buckets, which might have complex and overlapping domains. But suppose we could fix this? What could we do to address this issue, and maybe even simplify code and user experience in the process?
wp_taxonomy
The wp_taxonomy table is my proposed solution to this problem. It would consolidate all of the information in wp_terms with wp_term_taxonomy:
taxonomy_id taxonomy name slug description parent count term_group
Notice that here there’s only two more fields in this table than in wp_term_taxonomy, one of which is the oh-so-mysterious-and-little-used term_group from wp_terms. Since linking to a separate terms table is no longer required, we can completely drop term_id, and then add the two relevant fields from the terms table, name and slug. Uniqueness on this table would then be enforced on the pair (taxonomy,slug). A schema like this would result in one less database table, one less field, and a lot more flexibility, since each taxonomy would now create a unique bucket of terms.
How much work would it take to adapt the API to use this proposed schema? It’s hard to say. A brief run-through of the functions that touch the database in wp-include/taxonomy.php shows that at least the following functions would need to be changed:
get_object_in_term() WP_Tax_Query::get_sql() WP_Tax_Query::transform_query() get_term() get_term_by() get_terms() term_exists() wp_delete_object_term_relationships() (Likely obviated entirely by proposed schema) wp_delete_term() wp_get_object_terms() wp_insert_term() wp_set_object_terms() wp_unique_term_slug() wp_update_term() wp_update_term_count_now() clean_term_cache() _pad_term_counts() _update_post_term_count()
This is most definitely not a complete list of all the functions that would require updating in order to work with this proposed schema. There are also all of the functions that hook into the various term-related actions and filters, as well as all of the back-end wp-admin code that would need modification. The above represents only the minimal set of core taxonomy API functions that would need updating, but I think it’s a good start.
If only it were so easy!
In an ideal world, where WordPress wasn’t already the beast that it has grown to be, with millions of sites depending on it, including many of the big players, (14.7% percent of the top million sites!) such a change might be a reasonable undertaking. But WordPress is a giant system with a huge community (some of which I’m sure will have something to say about my proposal!) and a monstrous ecosystem. While my proposed changes would (I believe) generally result in simpler code and a more predictable taxonomy system, it’s simply not feasible to make such a drastic change to core without a completely separate code fork.
As millions upon millions of sites are built upon WordPress these days, making dramatic changes to the database schema is a daunting task that would likely break thousands of sites, even if the API managed to stay the same. This is because there is still plenty of theme code out there that reaches directly into the two extant taxonomy data tables, and this would be the code that would most definitely be broken by any change. Perhaps a forked version of WordPress would be needed as a proof-of-concept, but even then, we’re looking at a development cycle that would span years to get such a major initiative off the ground.
Maybe your site is already built on Drupal and you’re considering an upgrade. Maybe you are considering Drupal for the first time. You’ve done some research and hit your first wall – What version of Drupal should you consider?
You’re not alone. Visiting Drupal.org, the choice looks simple. Drupal 7. There are banners and buttons and headlines that all direct you to choose version 7 for your site. But wait – you were on a message board, or Google, or somewhere and saw someone mention version 8.
Well, why would you choose version 7 if 8 is right around the corner? Even the download page for Drupal 6 and 7 directs people to the Drupal 8 initiatives group page. Talk about confusing for anyone up against this decision.
The fact is, Drupal 8 is a long way off – at least 18 months, actually. Version 8 is in development, yes, but there’s a long road ahead before a release will be available for your web site. Drupal 7, which had it’s first release in January 2011, hasn’t even reached its adoption peak yet and that’s because of a number of reasons:
- Many of the contributed modules that are available for Drupal 6 still aren’t ready for Drupal 7
- The architecture changed in Drupal 7 and some functionality provided by contributed modules has been included in core Drupal. Some related modules may not work with Drupal 7.
- It is easier and more cost effective to develop for Drupal 6 because there are established development recipes for common feature requests
- Drupal 6 is still supported and works great
- Some argue there’s a steeper developer learning curve for Drupal 7 over Drupal 6
So, with Drupal 8 not even an option at this point (despite what rumors are floating around the internet), why on Earth would you choose an older version of Drupal?
For new sites, use Drupal 7
You should consider choosing Drupal 6 if there’s a key piece of functionality you need that isn’t available for Drupal 7, and it would cost more to build it than to use something already available for Drupal 6 (of course it’d be great if you could help port it to Drupal 7 and contribute it to the community!).
If you’re building your first Drupal site and you’ve already mapped out the site requirements and everything you need can be accomplished with Drupal 7, there’s no question – you should use Drupal 7. You’ll have the most up-to-date version of Drupal that is available and you won’t have to think about major upgrades for quite a while, probably years.
What it means to upgrade
If you have an existing Drupal 6 site and want to upgrade to Drupal 7, this part’s for you.
It’s important to know, up front, what you’re getting into when upgrading. Upgrading between major Drupal versions is more complicated than the incremental maintenance updates to modules and Drupal core that you probably already do. The underlying data structures change and in most cases there’s some development effort required to make your existing site’s theme compatible with the new version.
Depending on the level of customization on your existing site, this can take a long time. Not only that, but if some data doesn’t update smoothly it may need to be migrated by hand. Have a qualified development team do an assessment of your current site to determine how difficult an upgrade will be.
Does your current D6 site let you upload files or images using CCK’s filefield module? What about other fields like user references or node references? This is a great example of the difficulties inherent in major version changes. Unless you download and install CCK’s Drupal 7 dev branch, which you’ll later remove, to migrate this old field data to D7, you’ll have content with missing field data. Depending on what modules are enabled on your site, you may have to overcome these types of upgrade challenges for each of them.
Is it time for a new look?
If you’re already planning to update the look of your site, it could be the perfect opportunity to upgrade your Drupal 6 site to Drupal 7. Even if you’re not planning on adding new bells and whistles to your site, a redesign will require a new site theme to be developed. Depending on the complexity of your existing site it might make more sense to develop the new theme for Drupal 7’s framework instead of Drupal 6. Just check with your development team to make sure that all of your existing functionality will carry over after the upgrade.
Conclusion
New Drupal sites should be built with Drupal 7, unless a piece of custom functionality makes it cost prohibitive. Site owners wanting to upgrade from D6 to D7 in preparation for Drupal 8 might want to wait – Drupal 6 hasn’t gone stale quite yet. Finally, site owners that are ready for a design refresh should consider Drupal 7, but get an expert assessment to find out if it makes sense to upgrade.
WordPress Import files can often be ungainly and hard to work with due to the various limitations that are necessarily attached to the WordPress import tool, like the PHP max_upload_size, max_post_size or max_memory_limit variables, or a limit built into the web server itself.
Sometimes it’s easier to work with smaller files, whether it be for testing or importing small batches. I recently encountered that need, and my previous solution involved copying and pasting chunks of the XML WXR file from once place to another. This became impractical when I faced a nearly 200MB file that would cause my text editor to choke.
To address this, I developed the following set of shell scripts to work with these WXR files and break it into pages of posts in separate files. It requires an XSLT 2.0 processor. XSLT 2.0 is required because of the use of the xsl:result-document element. I used Saxonica’s Saxon Java class wrapper which provides a handy command-line interface to the Saxon libraries.
The xsl stylesheet
The goal of this stylesheet is to break apart the WXR file’s item elements (each of which represents a single WordPress post) into multiple files, while still preserving the WordPress meta data in each output file.
<?xml version="1.0" encoding="UTF-8" ?>
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:param name="size" />
<xsl:param name="page" />
<xsl:param name="output" />
<xsl:template match="/rss">
<xsl:result-document method="xml" href="{$output}_{$page * $size}-{($page + 1) * $size - 1}.xml">
<rss version="2.0" xmlns:excerpt="http://wordpress.org/export/1.1/excerpt/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:wp="http://wordpress.org/export/1.1/">
<channel>
<xsl:for-each select="channel/*[local-name() != 'item']">
<xsl:copy-of select="." />
</xsl:for-each>
<xsl:for-each select="channel/item">
<xsl:if test="position() < ($page + 1) * $size and position() >= $page * $size">
<xsl:copy-of select="." />
</xsl:if>
</xsl:for-each>
</channel>
</rss>
</xsl:result-document>
</xsl:template>
</xsl:stylesheet>
This stylesheet takes three parameters: page, the page number; size, the page size in number of “item” elements; and output, the prefix filename for the resuling output files. It will emit a file with the name $output_$start-$end.xml. You may note that this stylesheet can only handle one page of posts at a time due to the lack of for or while loops in the XSLT language (at least without language-paradigm-breaking hackery.) This also enables the output to be controlled fully from the calling program, which for this purpose will just be the shell.
Using the XSLT Stylesheet
The basic functionality of this stylesheet allows me to create a new WXR import file with a range of posts contained in the original. In this example, I’m copying the first 2,000 posts from the import file. After it completes, the posts will be saved into file_0-1999.xml.
$ java -Xmx512m -jar ~/saxonhe9-2-0-5j/saxon9he.jar -xsl:split.xsl articles.xml page=0 size=2000
I keep my Saxon JAR file in ~/saxonhe9-2-0-5j/saxon9he.jar, but you’ll likely have it somewhere else.
The -Xmx512m parameter tells the Java VM to set the maximum stack size to 512 MB. You may need to adjust this parameter according to the size of your input file.
Doin’ it all!
Now that we have the basic tool for pulling a single page out of our source XML file, we can use a little bit of shell scripting to get all of the posts into separate files.
#!/bin/bash # filename: required file=$1 # output file prefix: required outfile=$2 if [ "$file" = "" ] || [ ! -f $file ] || [ "$outfile" = "" ]; then echo "Usage: $0 filename outfile [pagesize] [start] [limit]" exit 1 fi # page size: defaults to 2000 [ "$3" != "" ] && pagesize=$3 || pagesize=2000 # start post: defaults to 0 (first post) [ "$4" != "" ] && start=$4 || start=0 # limit: defaults to # of posts in input file [ "$5" != "" ] && limit=$5 || limit=`grep '<item>' $file | wc -l` echo "Splitting $file into" `echo "($limit-$start)/$pagesize" | bc` "pages of size $pagesize between posts $start and $limit"; i=$start while [ "$i" -le "$limit" ]; do echo "Generating page $((i/pagesize)): posts $((i)) through $((i+pagesize)).."; java -Xmx2000m -jar ~/saxonhe9-2-0-5j/saxon9he.jar -xsl:split.xsl $file page=$((i/pagesize)) size=$pagesize output=$outfile i=$((i+pagesize)) done
Save the above as split.sh, and the XSLT file as split.xsl in the same directory. Also, be sure to ensure the path to your Saxon JAR file is correct on line 29. Pulling this all together, we can take a large WXR input file and slice and dice it as we see fit:
[Meerkat ~/Oomph/]$ sh split.sh Articles.xml Articles 1500 Splitting Articles.xml into 16 pages of size 1500 between posts 0 and 24065 Generating page 0: posts 0 through 1500.. Generating page 1: posts 1500 through 3000.. Generating page 2: posts 3000 through 4500.. Generating page 3: posts 4500 through 6000.. Generating page 4: posts 6000 through 7500.. ...
We now have 16 files of 1500 articles each, stored as
Articles_0-1499.xml
Articles_1500-2999.xml
Articles_3000-4499.xml
… And so forth.
Now you can import each of these files individually without choking your WordPress importer! I hope that some of you will find this useful. Keep in mind that the XSL stylesheet above could easily be adapted to work with other large XML data files, too. It would be just a matter of changing the element selectors that you wish to break apart.