Thinking: Our Blog
WordPress Import files can often be ungainly and hard to work with due to the various limitations that are necessarily attached to the WordPress import tool, like the PHP max_upload_size, max_post_size or max_memory_limit variables, or a limit built into the web server itself.
Sometimes it’s easier to work with smaller files, whether it be for testing or importing small batches. I recently encountered that need, and my previous solution involved copying and pasting chunks of the XML WXR file from once place to another. This became impractical when I faced a nearly 200MB file that would cause my text editor to choke.
To address this, I developed the following set of shell scripts to work with these WXR files and break it into pages of posts in separate files. It requires an XSLT 2.0 processor. XSLT 2.0 is required because of the use of the xsl:result-document element. I used Saxonica’s Saxon Java class wrapper which provides a handy command-line interface to the Saxon libraries.
The xsl stylesheet
The goal of this stylesheet is to break apart the WXR file’s item elements (each of which represents a single WordPress post) into multiple files, while still preserving the WordPress meta data in each output file.
[sourcecode language="xml" wraplines="false"]
<?xml version="1.0" encoding="UTF-8" ?>
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:param name="size" />
<xsl:param name="page" />
<xsl:param name="output" />
<xsl:template match="/rss">
<xsl:result-document method="xml" href="{$output}_{$page * $size}-{($page + 1) * $size – 1}.xml">
<rss version="2.0" xmlns:excerpt="http://wordpress.org/export/1.1/excerpt/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:wp="http://wordpress.org/export/1.1/">
<channel>
<xsl:for-each select="channel/*[local-name() != 'item']">
<xsl:copy-of select="." />
</xsl:for-each>
<xsl:for-each select="channel/item">
<xsl:if test="position() < ($page + 1) * $size and position() >= $page * $size">
<xsl:copy-of select="." />
</xsl:if>
</xsl:for-each>
</channel>
</rss>
</xsl:result-document>
</xsl:template>
</xsl:stylesheet>
[/sourcecode]
This stylesheet takes three parameters: page, the page number; size, the page size in number of “item” elements; and output, the prefix filename for the resuling output files. It will emit a file with the name $output_$start-$end.xml. You may note that this stylesheet can only handle one page of posts at a time due to the lack of for or while loops in the XSLT language (at least without language-paradigm-breaking hackery.) This also enables the output to be controlled fully from the calling program, which for this purpose will just be the shell.
Using the XSLT Stylesheet
The basic functionality of this stylesheet allows me to create a new WXR import file with a range of posts contained in the original. In this example, I’m copying the first 2,000 posts from the import file. After it completes, the posts will be saved into file_0-1999.xml.
[sourcecode language="bash" wraplines="false"]
$ java -Xmx512m -jar ~/saxonhe9-2-0-5j/saxon9he.jar -xsl:split.xsl articles.xml page=0 size=2000
[/sourcecode]
I keep my Saxon JAR file in ~/saxonhe9-2-0-5j/saxon9he.jar, but you’ll likely have it somewhere else.
The -Xmx512m parameter tells the Java VM to set the maximum stack size to 512 MB. You may need to adjust this parameter according to the size of your input file.
Doin’ it all!
Now that we have the basic tool for pulling a single page out of our source XML file, we can use a little bit of shell scripting to get all of the posts into separate files.
[sourcecode language="bash" highlight="29" wraplines="false"]
#!/bin/bash
# filename: required
file=$1
# output file prefix: required
outfile=$2
if [ "$file" = "" ] || [ ! -f $file ] || [ "$outfile" = "" ]; then
echo "Usage: $0 filename outfile [pagesize] [start] [limit]"
exit 1
fi
# page size: defaults to 2000
[ "$3" != "" ] && pagesize=$3 || pagesize=2000
# start post: defaults to 0 (first post)
[ "$4" != "" ] && start=$4 || start=0
# limit: defaults to # of posts in input file
[ "$5" != "" ] && limit=$5 || limit=`grep ‘<item>’ $file | wc -l`
echo "Splitting $file into" `echo "($limit-$start)/$pagesize" | bc` "pages of size $pagesize between posts $start and $limit";
i=$start
while [ "$i" -le "$limit" ]; do
echo "Generating page $((i/pagesize)): posts $((i)) through $((i+pagesize))..";
java -Xmx2000m -jar ~/saxonhe9-2-0-5j/saxon9he.jar -xsl:split.xsl $file page=$((i/pagesize)) size=$pagesize output=$outfile
i=$((i+pagesize))
done
[/sourcecode]
Save the above as split.sh, and the XSLT file as split.xsl in the same directory. Also, be sure to ensure the path to your Saxon JAR file is correct on line 29. Pulling this all together, we can take a large WXR input file and slice and dice it as we see fit:
[sourcecode language="bash" gutter="false"]
[Meerkat ~/Oomph/]$ sh split.sh Articles.xml Articles 1500
Splitting Articles.xml into 16 pages of size 1500 between posts 0 and 24065
Generating page 0: posts 0 through 1500..
Generating page 1: posts 1500 through 3000..
Generating page 2: posts 3000 through 4500..
Generating page 3: posts 4500 through 6000..
Generating page 4: posts 6000 through 7500..
…
[/sourcecode]
We now have 16 files of 1500 articles each, stored as
Articles_0-1499.xml
Articles_1500-2999.xml
Articles_3000-4499.xml
… And so forth.
Now you can import each of these files individually without choking your WordPress importer! I hope that some of you will find this useful. Keep in mind that the XSL stylesheet above could easily be adapted to work with other large XML data files, too. It would be just a matter of changing the element selectors that you wish to break apart.
For the amount of users out in the world using WordPress, it is amazing to me that there is no great tutorial to point our clients to about the WordPress Editor window. This post hopes to rectify that.
The Editor window changes and updates just like the rest of WordPress, but has remained pretty consistent for the past few major versions. We’ll be sure to update this post with the newest tools as they are released.
First: Whizzy what?
WordPress uses a javascript plug in called TinyMCE for its WYSIWYG editor. WYSIWYG is an acronym for “What you see is what you get” and its the best way to craft content before saving your post. The editor is pretty powerful, with many buttons and options for writing content. Most of the options are familiar ideas, similar to options in a Word Processor – bold, italic, bulleted list style, etc… When designers like myself start to throw out terms like “H1″ and “blockquote”, though, most people’s eyes glaze over, but that is how these options are stored as HTML elements. Luckily, the Editor does the grunt work for you.
Second: Some definitions and pictures
The “WordPress Editor” window is the main focus of any page that produces content in the WordPress admin section. Most people are familiar with it for posts, and it looks like this:

A good theme will have some styles set up so that content in this window more closely resembles how it will look in your theme – colors, font, heading styles, etc… This helps you craft content that will look great before you even hit the “Save Draft” button and preview it.
No Kitchen Sink
When you fist use WordPress, the “kitchen sink” is off. So the Editor window will most likely look like this, with the following options:

The options explained:
Upload/Insert: This area is a container for all of the media options. Sometimes, plug-ins like the Next Gen Media gallery or Poll Daddy will insert an icon in this area for accessing shortcodes for those items. Clicking on any of these will usually open a modal window above the content for managing photos and the like.
Content View Switcher: The Editor window can function as a Visual Editor (WYSIWYG), or as an HTML editor for those more adventurous. This post will concentrate on options in the Visual Editor.
Bold: Highlighted text will become bold when this is clicked. In HTML, this uses a <strong> tag.
Italic: Highlighted text will become italic when this is clicked. In HTML, this uses the <em> tag. Can be used in combination with the bold button.
Strikethrough: Highlighted text will appear struck, with a line through the center.In HTML, this uses the <del> tag. Used to indicate text for deletion or removal, or to indicate that a change in the text has taken place.
Bulleted List: Highlighted text will be formatted like a bulleted list. Depending on your theme’s style, the bullets may be round or square. Lists can be nested – a bulleted list may have a numbered list inside of it. In HTML, this uses a set of <ul> and <li> tags for the “unordered list” and the “list items”.
Numbered List: Similar to above, but with numbers. In HTML, this uses the <ol> (ordered list) tag in combination with the <li> list item.
Blockquote: Highlighted text will be indicated as a blockquote, which typically means that a whole passage has been quoted form another source. The style of it will vary from theme to theme, but most of the time italic text is used, it is indented, and may have quotes around it automatically. The HTML element is <blockquote>.
Left, Center and Right Align: These buttons will align highlighted text. Most themes align text to the left by default and this is how an author can break out of that mold. An author need not highlight a whole paragraph, as this style will be applied from one full return to another. In HTML, since there are no native tags for alignment, TinyMCE adds a <p> tag around the paragraph with a “style=align: right;” applied to it.
Link (chain icon): The link icon is available for clicking only when text is highlighted. Highlighting text and clicking this button will open a small modal window where an author can enter in the destination URL, choose whether or not to open in a new window, or choose to link to another page on their own site. In HTML, the tag used is <a href=”http://example.com”>Link Text</a>.
Unlink (broken chain): To remove a link, an author can highlight the whole link or simply place the cursor within the link and click this button.
The “More” break: WordPress has an option of adding this physical break to the post – breaking it into two sections, the teaser and the body. If your theme displays the entire post by default on category landing pages, the “more” break is a way to show only some of the content, forcing users to click a “Read More” link to see the rest of the body of the post. Upon clicking the “Read More” link, the user will be brought directly to an anchor in the text where the post continues, so they do not have to read the same teaser again.
If your theme’s landing pages use excerpts instead, this “more” break may not have the intended effect.
Spell Check: This button is a drop down that will change the preferred language of the editor’s spell check dictionary.
Full Screen: Toggles a full screen view of the editor so authors can concentrate on composing their post. Useful, but the buttons for styling your post will be limited in this view to bold, italic, bullets, numbers, blockquote, insert media, insert link, unlink and help. Learn the “Hotkeys” to access more style options (see the Help modal window for a list of Hotkeys and how to use them).
And finally, Toggle Kitchen Sink: This button simply makes another row of options available for styling your post. Those options are explained next.
Kitchen Sink!
With the kitchen sink option on (and WordPress remembers if you prefer it on), a whole slew of additional options are present.

Style drop down: (graphic below) Within this drop down are styles intended for block-level elements – paragraph, address, preformatted, heading 1, heading 2, etc…
The best way to think about inline vs. block-level may be this: A bold tag is inline, because you can bold a portion of a sentence; A Heading 1 is block-level because it will effect all the text in a block, from one hard return to another.
Underline: Highlighted text will have an underline applied to it. In HTML, the tag is <ins> (for insert). Your theme may apply an underline to links, so be sure to use this tag when appropriate, and don’t fool your readers into thinking something might be clickable when it is not.
Justify: Another alignment option. This one is by itself because authors should use it carefully. Not all browsers support the justify feature, and since browsers do not hyphenate text, this style may create “holes” in your paragraphs when spacing between words need to be very large.
Text color: Highlighted text will turn a variety of colors by using this button. When clicked, a standard palette of colors will appear for you to choose from, and a limitless palette is shown when the “More Colors” option is clicked.
Paste as Text: THIS BUTTON IS AWESOME! Very useful for authors who cut and paste text from other sources. Ever copy text from another website, and all the styles come with it? Soon you have a mish mash of styles in your post. You can get frustrated scraping the style tags in HTML view, or you can paste with this button to begin with. When clicked, a new modal pops up with its own text area. Text pasted into this area is converted to “plain text”… nothing but the facts, ma’am. This allows for much easier styling and integration into your content.
Paste from Word: THIS BUTTON IS ALSO AWESOME! Ever cut and paste content from Microsoft Word, and all the sudden your post looks funny? That’s because a bunch of styles – and, frankly, gobbledy gook – comes along with the content from Word. To paste the text in as plain text, click this button and paste your content into that window first. This option is better if you know the content is coming from Word than the Paste as Text button, as it specifically removes tags that Microsoft Word generates.
Remove Formatting: Highlighted text will have styles removed. While this icon is an eraser, it must be noted that it does not always remove every style. It does a better job removing styles that the editor has added in already. It does not consistently remove styles from text that has been cut and pasted from other sources.
Insert Special: This drop down list helps to insert special characters that are hard to access unless you know the special keystrokes.
Remove / Add Indent: A highlighted block-level element will be indented or un-indented with these buttons. Since there is no HTML element for this, TinyMCE adds a “style= margin-left: 30px” to the element. 30px is the default indent increment.
Redo / Undo: Simply keeps track of changes and allows the author to undo or redo a set of changes. I’m honestly not sure how many changes it will keep in memory before they get lost.
Help: A simple modal with Basic and Advanced tips will pop up when this is clicked. The most interesting to me is the table of “Hotkeys” available to authors. Did you know command 1 will make a selection take on the Heading 1 style?
And one more
The last thing I want to review is the contents of the Style drop down.

As briefly mentioned before, the style drop down menu contains a bunch of standard block-level elements. This means that a whole paragraph will get the style, not just a selection of portion of a paragraph. The styles are:
Paragraph: Used by default, but useful if you have chosen another style but want to switch it back to the default paragraph style In HTML, this is the <p>…</p> set of tags.
Address: An interesting tag, address is usually italic for some reason. I wonder if blocks using this tag have special weight with search engines, but very little data is available to back up that hunch. Personally, I rarely use it.
Preformatted: The <pre> tag is a tough cookie. It is intended to display text with white-space preserved, meaning that breaks in the text will be exactly as written. To me, this means trouble, as if there are no breaks in the text inside of this tag, then there are no breaks on the front-end as well, and that can lead to some goofy looking posts. A good theme will take into account the intention of a <pre> tag, but ensure that the display will not break the layout. Again, rarely used… most people use it to display chunks of code.
Heading 1, 2, 3, etc… These are great and every author needs to know how to use them. The concept is simple – headline styles, with more size or boldness given to the lower numbers. But search engines use these tags to determine where the important phrases are in the content as well (and in the page in general) so they should be used not only because they help organize your story, but because they also give your content extra weight.
That’s all folks!
Thanks for reading, and I hope this helps. There is a lot of options packed into this little Editor window, so take advantage of the array of style options WordPress gives the author. Happy blogging!
As designers get used to all the new whizz bang inherit in HTML5 and CSS3, every now and then we get pulled back into the world of basic HTML rendering a là 1993 when we have to design e-newsletters for desktop and web-based Mail clients.
I recently had to create some templates for Constant Contact and I thought, “Hey, this should be easy. Can’t do anything tricky, so, keep the design simple, use a tried-and-true table for the layout, and viola, beautiful emails”. While you can’t do anything tricky, it’s true, there is so much more to consider, and it’s all pretty annoying if you are used to designing for the web. I enjoy designing for IE6 slightly better than designing email templates.
With that frustration in mind, here are some tips I ran across that might be useful for you, but will be very useful for me as I know I will forget them all just in time to design a new set of templates:
Design Specifically – Everything needs a class
The most annoying aspect of Constant Contact for a web designer who liked their code to be clean and nested is the fact that IDs are NOT supported, and neither are styles on HTML elements. So, the CSS selector body p is a no-no. Not even a rule that uses an explicit h1. Instead, you must define a .header1 class and apply it to an h1, like this: <h1 class="header1">.
Ridiculous? Maybe… but here’s why. When Constant Contact assembles your email, it takes all these rules and spells them out explicitly right in the element. So while you may define a normal style, it uses it as a reference and spits out an inline style tag on the element itself. So this:
[code lang="html" light="true" wraplines="true"]
.header1 { font-weight: bold; font-size: 20px; color: #333333; } <h1 class="header1">A Sample Header</h1>[/code]
Becomes:
[code lang="html" light="true" wraplines="true"]
<h1 size="20" color="#333333" style="font-weight: bold; font-size: 20px; color: #333333;">A Sample Header</h1>[/code]
It does this so it can cover the widest array of email clients out there. And it would be maddening for us to try and code this way with all those inline styles. So while the idea of using a stylesheet is like the web, the way Constant Contact uses the stylesheet is not like the web at all.
Design like it’s 1993
We all know KISS, but when I say simple, I mean REAL simple… 1993, beginnings-of-the-internet simple. Take the <center> tag, for example. The good ole margin: 0 auto; won’t work consistently enough, so break out the dusty <center> tag instead.
Use tables for layout. I know, I know, that’s SO 1993, but I’m serious… trying to consistently float divs and clear floats will drive you mad.
Also, forget bit-saving CSS shorthand. You’re better off using the full six-character hex values for colors when you normally might use three. Four values for padding and margin seems to be well supported, but use the long-form tags for font-size, font-weight, font-family, and background properties. Actually, forget background images all together. Many email clients won’t load them.
When using images, style the container like its text
This one may not be very intuitive, but let me explain. I’ve got an image, and it’s important – it’s the logo. I know some email clients won’t load the image by default (looking at you, Gmail), so when the image doesn’t load, I want the contents of the alt tag to display instead, and I want it to look good.
To do this, I simply made sure the container that the image is placed inside has some fallback styles for text. So while the alt text disappears when the image loads, if the user never loads the image, it still looks nice and we don’t lose important information, like the name of the company. Here’s what I’m talking about:

Before images load...

With images loaded.
Use Anchor (Jump) links Carefully
This one particularly bugged me, and took a little time to figure out. The client wanted a Table of Contents with simple anchor links to make the email jump down to the proper element. I knew that support for hrefs that jump down to an element with an ID would be spotty, but a simple <a name="anchor"> would work, right?
Silly me, what was I thinking?
Besides the fact that even these basic elements have spotty support (see this article from Campaign Monitor), there is also a problem of styling and the way Constant Contact handles empty HTML elements.
In order to appease the greatest number of email clients, my code for anchors links was <a name="anchorname" id="anchorname"></a>. But there was a simple problem with that. Since it was empty, Constant Contact turned it into a self-closing xHTML element, which looks like this in the source of the email: <a name="anchorname" id="anchorname" />. The problem is, most email clients won’t recognize an anchor tag that self-closes, so my email had open anchors everywhere, which turned my text default link blue.
The solution is god-awful, and I would never allow my web pages to look like this, but this is what I had to do for it to work and look good. I had to add a style class to handle these links, because even though there is no HREF declaration, email clients will still treat it like a link and turn it default link blue. :
[code lang="html" light="true" wraplines="true"]
<h3 class="header3"><a name="anchorname" id="anchorname" class="anchor">Headline Text</a></h3>
[/code]
Send us Your Tricks!
Designing emails is frustrating enough… and this list is by no means exhaustive. What issues have you encountered? Send them over here and we’ll keep them compiled for you!
In Conclusion
I hope you never have to design email templates, because they really can be frustrating. All the efficiencies you’ve learned while designing for the web get thrown out the window. Still, email is an effective (and cost effective) form of communication, so it won’t go away. It can only get easier as older mail browsers are slowly phased out, but the fight to eradicate IE6 is nothing compared to the fight to eradicate Outlook (Outlook has no support for simple styles like padding and float).
In short, stay sane, stay calm, and use Google to help you figure out the baffling problems of designing email templates. Keep the design simple and the message short.
And good luck… you’ll need it.
WordCamp Boston is just around the corner – July 23 & 24 at Boston University – and Oomph is proud to be playing a big part in the gathering.
For the second year in a row, our very own Erick Hitter has been working hard behind the scenes as one of the core organizers, contributing to the event website and pulling together a top-notch list of corporate sponsors. We’re proud to say that we made Erick’s job a little easier this year by getting Oomph on the corporate sponsor list.
I am also very excited to be speaking this year. At 3:30 on Saturday I’ll be leading a talk on the WordPress VIP platform:
Enterprise Publishing on WordPress.com VIP
This talk is designed for publishers, in-house developers, and consultants interested in learning about the WordPress.com VIP platform.
We will provide an overview of where WordPress.com VIP fits within the overall WordPress ecosystem and what types of publishers are well suited for the platform. We’ll cover the key benefits for publishers and developers, compare a typical self-hosted WordPress implementation to one on WordPress.com VIP, and take a surface-level look at some of the technical nuances for running a site in this incredibly flexible yet controlled environment.
Whether you are considering a migration to WordPress, exploring a switch to enterprise-level hosting, or just curious to learn more about what this VIP thing is all about, this talk should be a valuable resource.
Please stop by and say hi and keep an eye out for other attendees from the Oomph team.

It’s been three years since The Council of PR Firms has engaged in a major upgrade of their website. Previously, their web properties existed on a few different platforms, and now the website has been fully integrated into the WordPress system. A stronger look accompanies the streamlined navigation and back-end controls – a condensed bold-faced font, refined color palette and section icon system.
New Features:
- A “Quick Finder” for Find-a-Firm was added to the homepage, making a useful feature for potential PR clients even more accessible.
- The popular Firm Voice blog was brought back into the main site, helping with SEO, while previously, it was a stand-alone blog property.
- Twitter was integrated into the site more visibly.
- A new section was added – Inside PR – which aims to help promote some of the hot button topics in the PR Industry today.
- Older content from the previous site has been organized better into discreet sections, making it easier for PR Professionals to find resources that may help their business, or for student to find resources that will help inform their after-school job search.
- A new navigational elements called “Tools” was added to the footer, which brings attention to some of the other Council stand-alone web properties or sub-sites.
See the detailed write-up and full portfolio piece here: http://www.thinkoomph.com/portfolio/prfirm/
It is not often that web designers have to accommodate browsers that are considered “best of breed”. Maybe we are snobby when most of us design for Safari or Firefox and then “fix” things in other browsers, but, everyone has their preferred way of working. Internet Explorer is usually stricter in its interpretations of CSS, which makes some use it as a standard and other wince , but I digress…
So it was with extra frustration that I had to investigate a rendering bug in Safari / Webkit. Granted, it was happening only on an older version, Safari 4, and Safari 5 has been out for more than a few months now. But still, when you think Safari or Firefox are better browsers, it’s a wake-up-call to realize that like any browser, they have funny bugs and issues.



