Notes on a Massive WordPress Migration

Cdthome At the UC Berkeley Graduate School of Journalism, we use WordPress heavily as a content management system for student and organization publications, knowledge bases, student handbooks, podcast publishing systems, online magazines, etc. Over the past couple of years, I’ve found again and again that WP is not only up to the task of serving as far more than a blogging platform, it’s a great content management system for many types of sites, once you learn a few tricks.

Just wrapped up a marathon coding session, converting one of the J-School’s most popular sites, China Digital Times (CDT), from Movable Type to WordPress. We launched the new site (with a new design by Devigal) a few days ago. This was by far the most complex WordPress installation I’ve worked on, involving around 16,000 posts and 6,000 tags. As with every site launch, I learned a few things in the process. Thought I’d post some notes here for the sake of others going through a similar process.

Nutshell version: Though SixApart (who make Movable Type) claim that their static page generation approach is great for high-performance sites, we’ve reduced the time it takes to publish a new article from almost 15 minutes to a few seconds by moving from Movable Type to WordPress.

DEFCON 1 (Why Migrate?)

Movable Type is a perl/CGI-based system that transforms database content into static HTML. The idea is that generating HTML is the path to optimal site performance, and that’s true… to a point. The problem with this approach is two-fold: Every time something changes (like when a comment is added or a tag changes), all affected HTML pages have to be rebuilt – a very CPU-expensive process.

The downsides of the approach became apparent when comment spam hit the blog world, and every attempt by a spambot meant a call to a resource-expensive CGI script. Many techniques arose to deal with that problem, but the bottom line was that MT went from being resource-friendly to being a resource hog overnight.

The problem for China Digital Times went DEFCON 1 when the editors fell in love with tagging. They found a nice tagging plugin for MT and set to work tagging the hell out of every piece of content on the site. As the pool of tags grew, the number of database queries and page rebuilds grew as well. Posting time went from a few seconds (initially) to 15 minutes — to publish a single story. Our server was brought to its knees by the load, and we eventually had to move CDT to a dedicated server so it wouldn’t affect other J-School sites. Ultimately, we also had to disable comments and search, as the cost of the relentless requests on perl/CGI processes was too great.

Finally, working on MT sites (from a developer’s perspective) is a major pain. Templates don’t inherit neatly, and are stored in the database (so you can’t edit them directly). Rebuilding pages is a major pain. There are far fewer plugins and themes available to extend the platform, and most plugins require template modification.

For reasons like these, we’ve migrated most of our J-School MT sites to WordPress over the past year. CDT was the last big one to be moved.

Templates

Since we were provided with a static HTML+CSS template by the designer, job #1 was to convert it into a native WordPress theme. For the most part, this is a straightforward task, which I accomplished by grabbing code chunks from existing themes and sites I had built in the past. I had previously created a few WordPress themes from scratch and have manipulated a whole lot more, so this wasn’t a huge deal.

Export/Import

Movable Type can export content to its own native format, and WordPress can import MT export files. So it should be simple to move content over, right? Not so fast.

First, there was the issue of all the Chinese text in the MT site. Though we tried every text encoding setting in the database, in the export template, and in Apache, no amount of cajoling would get us an MT export file with all of the Chinese text intact. I even saw bizarre situations where the first 10,000 or so posts would export fine, but everything thereafter would come out garbage. Because there wasn’t a ton of Chinese text in the system, we decided to bite the bullet and re-do the broken Chinese text after the fact (we’re still working on that). Definitely painful.

Also, MT 3.x doesn’t have a concept of “Pages” similar to WordPress. The exporter wasn’t going to help us with that, so we had to re-create sections like Sponsors by creating Pages in WP and pasting the old data in manually.

Keywords to Tags

WordPress lets users create arbitrary sets of “Custom Fields” — metadata — that can be attached to any post. Movable Type only has a single “Keywords” field. CDT had been using the MT keywords field to store their tags. Obviously, I needed these tags to be migrated, but the slippage between the two systems meant the WordPress importer completely ignored the MT keywords field. Searched high and low, but couldn’t find a workable existing solution, and realized it was time to modify WordPress’ native MT import module.

After some fun hacking, was able to transform MT keywords into WP tags during import. I submitted my changes to the importer to the WordPress project for inclusion in future releases, but so far no word on whether it will make it into the next version. In case it never gets adopted, grab my patch from Trac.

Because MT only has a single Keywords field for metadata, CT editors had taken to overstuffing other fields. For example, if they were quoting another publication, they’d put the original author name and publication into the post title. Since WordPress lets you use an arbitrary number of metadata fields, I put them to good use with a new system of pre-defined meta fields, which were pulled from via a custom function I stuck in the new theme’s functions.php file.

Wp-Meta

These fields are then pulled out with the get_post_meta() function at display time, rendering the meta blocks at the end of each story.

Upload File Locations

Some of CDT’s 60 plus authors were using MT through the web, while others were using desktop clients like Ecto. Movable Type’s file upload function lets users upload to any location under the document root. While some of the authors ended up uploading files into neatly named folders, I found several thousand JPGs and GIFs sitting in the docroot – an absolute mess, with no rhyme or reason. The only technical solution to prevent this would have been to restrict permissions on the docroot, but that couldn’t work because MT had to write its index files there.

WordPress, in contrast, creates a date-based upload tree, and never gives the user the opportunity to specify the destination. While this may seem a bit restrictive for power users, the vast majority of users of any CMS are not geeks – and even geeks appreciate having this kind of thing taken care of for them. This kind of housekeeping needs to be managed by the system.

I’d been meaning to clean up the mess for the past couple of years, but MT made it impossible. Every time I’d clean things up, the authors and editors would start filling up the docroot again. User education was not enough. This problem had to be solved with technology.

My solution during migration was to gather up all of the old files into /wp-content/uploads/mt-old, then go through a lengthy process of regex-based search/replace operations in the MT export file until all image references pointed to the new holding bin (without affecting hrefs that shouldn’t be).

Going forward, newly added files will land in the date-based folders; all of the old content will continue to live in the mt-old directory. After years of file upload chaos, CDT’s static file data is finally under control.

Massaging the Export

The export data needed quite a bit of additional massaging. Content pasted out of Word ended up with ANSI characters where smart quotes, ellipses, and em dashes should have been. Most of the Chinese text was garbled. More search/replace-fu.

Plugins

The WordPress team works hard to keep the core lightweight, and to not bloat it with features most people won’t use. Its excellent plugin architecture has given rise to a thriving cottage industry of plugin authors, who have been able to extend the platform in a thousand directions. Every WP install beyond the basic blog relies on a bunch of them; here are the plugins used by the new CDT.

Akismet: The only plugin bundled with WordPress, Akismet is absolutely essential for fighting comment spam. In combination with comment moderation, Akismet keeps blogs spam-free with zero effort by leveraging the hive-mind of the blogging collective. Just mark a comment as spam and Akismet will tell the rest of the blogosphere to block it. Meanwhile, you benefit from the “this is spam” identifications of a world full of proactive bloggers.

CDT Docs: A super-simple plugin I wrote for another site, Docs lets me provide documentation on all aspects of using the site and its features, as well as its workflow, to authors and editors.

Flexible Upload: Extends the native WP file upload mechanism with a more robust version that allows for image alignment and resizing, watermarking, captioning, etc.

Get Recent Comments: A popular plugin that lets you display a linked list of stories to which comments have recently been added. Essential for letting readers see at a glance where discussion activity is happening on the site.

Gregarious: Dynamite plugin for tapping into social networking sites. One plugin lets readers submit a post to Digg, del.icio.us, reddit, Technorati, and more. Also provides “E-mail this story” functionality.

Improved Include Page: Occasionally, you’ve got some template or sidebar element to which you want to give site administrators access while protecting them from editing template code directly. iinclude_page() lets you create a WordPress page and have its contents inserted at any place in a template. On CDT, we use it to let the editors manipulate the Bookshelf sidebar graphically, without mucking around in code.

Search and Replace: Probably won’t keep this in place forever, but it was invaluable for cleaning up paths and broken bits embedded in post data after I had imported data and could no longer work on the export file directly. A lifesaver. After using it for a bit, your first thought is “Why in the world isn’t this included in core?” Your second thought is, “Oh, that’s why. Thank god this is a plugin and not part of core.”

SA Contact Form: Provides a handy dandy, spam-proof form so readers can contact site authors.

SimpleTags: CDT relies on tagging heavily. While WordPress 2.3 includes tagging support in core, there are a million ways sites want their users to interact with tags. SimpleTags provides half a dozen functions for building tag clouds, tag lists, finding related stories based on tag similarities, etc. — all with customizable parameters and output options. Absolutely brilliant plugin. You’ll find tag references all over CDT, and all of them are generated with SimpleTags.

Tagcloud Immediately after launch, we did hit one stumbling block: Generating the three-pane tag cloud on the left that lets readers view the “Hot 10” (most popular tags this week), “Top 50” (most popular tags in the past year) and “CDT Picks” (editor’s choice) required too much server CPU to generate when included on every page. Uh-oh. Tag handling was killing us on the old site – would it kill us on the new site as well? Even though our caching mechanism should have protected against this, the loads it created when the cache was empty were too great. Realized this was the kind of thing that could be built as a static file once an hour and included on every page instead. Switching to that approach improved performance immensely, and brought site performance back to static site efficiency.

So how do you generate static files from WordPress? Here’s how I did it: 1) Create a new WordPress template in your theme dir that includes only the code you need. At the top of that file, insert a PHP comment to name the template. 2) Create a new, blank Page and connect it to the newly created template. 3) Create a cron job to access the new page URL via wget with the “-O” option to output to a file on disk. 3) Use a simple PHP include() to hoover that file’s URL into the template where you need it.

SimpleTags also let us easily generate the killer Top 500 tag cloud on the site.

Weasel’s HTML Bios: All WP authors get a little bio area to play with, and its easy to have these bios appear on WP author pages. Annoyingly, WP strips all HTML tags from the author bio field. Weasel’s plugin makes it stop doing that.

WP Super Cache: Where it took as long as 15 minutes to post a single entry on the old MT-based CDT, posting is now immediate, with no page rebuilding involved. But every fully dynamic site can benefit from server-side caching, so you get the behavior of a dynamic site with the performance of a static site. This is especially critical for those Slashdot days, when traffic goes through the roof. WP Super Cache is a fork / improvement on the original WP Cache, and works exactly as advertised. Load averages on the CDT server have been lower than ever (see below). It does have a few bugs (I can’t seem to disable it entirely when needed for testing), but is otherwise working smoothly.

Functions

Whenever I find my WordPress templates duplicating code, I lean on one of my favorite WP tricks — create a file called functions.php in your theme directory. Create PHP functions within it, then invoke them by name from within normal template files. For simple tasks (such as spitting out the metadata that appears at the bottom of every CDT story), using functions.php is the quickest path — and a heck of a lot easier than writing plugins.

RSS Aggregation

If you click the Headline Feeds link in the left sidebar, you’ll see embedded RSS feeds from Yahoo! News and China National News. These are generated with WordPress integrated RSS parser, the fetch_rss() function (the Codex notes on wp_rss vs. fetch_rss are mine).

FeedBlitz Mailings

CDT sends out daily digests of new posts to a few thousand readers worldwide every day. When the site was on Movable Type, it used MTBlogMail for years (a script/package I wrote for the purpose back in 2003). It’s worked well, but we realized we could avoid some of the headaches of running a high-traffic mailing list by utilizing an external service. So rather than use my WordPress version of the same script, WP-Digest, we switched to the free (with ads) FeedBlitz service, which has worked famously.

Day Picker

One of the more unusual requests the CDT team had was for a “day of week picker” embedded in the homepage:

Daypicker

I looked high and low, but couldn’t find existing code or a plugin to accomplish this, so decided to build a custom function. The reason this kind of thing is hard to find, I realized, is because the UI is inherently weird. We’re used to date-oriented widgets counting forward in time, but of course we don’t have content for the future. If today is Wednesday, then, reading from left to right, the first half of the week refers to the past few days. But to make the days after today work, they’ll need to point to days from the previous week. Weird UI, weird code. I solved the first part of it, but haven’t yet solved the 2nd (plan to jump on that soon). Here’s what I came up with (again, in functions.php):

function get_week_days() {
   // Calculate day and date values for the preceding week, 
   // counting back from now. There are 86400 seconds in a day.
   // Could also use  something like get_day_link('2007', '12', '07')
   // http://codex.wordpress.org/Template_Tags/get_day_link
   
   $counter = 6;
   while ( $counter >= 0 ) {
     $timestamp = mktime()-$counter*86400;
     $agodate = date('Y/m/d',$timestamp);
     $agoday = date('D',$timestamp);
     $weeklist .= "<li><a href=\"/$agodate\">$agoday</a></li>\n";
     $counter -= 1;
     }
     return $weeklist;
}

Menu

While I love using the list_pages() function in combination with the amazing MyPageOrder plugin (which lets non-technical site editors control the order of Page hierarchies, and thus the hierarchy of their menuing systems), the CDT menus were too customized to be generated automatically, and needed to exclude a lot of stuff and specialized categories. The menus are a combination of static lists and child category listings, e.g.:

wp_list_categories('title_li=&child_of=5806');

which simply builds a list of all categories that are children of category 5806 (News Focus, in this case).

Handling Old Links

There’s some awesome magic in this section. Geeks only.

One of the big challenges facing anyone planning to migrate from one content management system to another is in not breaking existing links. Not only do thousands of external sites link to specific pieces of CDT content, but hundreds of articles on the site link to other articles on the site. Breaking links and “starting fresh” would not be OK.

In the past when I’ve converted Movable Type sites to WordPress, I’ve created an MT template that generated an apache .htaccess file, redirecting every old URL to the equivalent new URL. It’s an interesting trick, but results in a huge .htaccess that must be parsed on every page request. In the case of CDT, that would have meant a 16,000 line .htaccess, which would have been a serious performance drag. And even then, it would catch only post URLs, not URLs for author pages, category pages, etc.

We needed to do better. I wanted to create few Apache rewrite rules that would handle everything, but it seemed impossible. The story headlines were staying the same, so the slugs should have stayed the same as well. But for years, the MT site had truncated the headlines into shorter slugs. For example:

OLD
http://chinadigitaltimes.net/2003/10/copyright_the_o.php

NEW
http://chinadigitaltimes.net/2003/10/copyright-the-official-line/

In other words, no amount of parsing the old URLs through Apache would give me the new ones. Then, on a whim, I pasted one of the old URLs into the site. Expecting a 404, I was amazed to find that the right content appeared automagically. Turns out, this capability is part of the amazing work Mark Jaquith has done in 2.3.x with Canonical URLs. With this working, I was able to write a single line into .htaccess that handled nearly every URL on the old site:

RedirectMatch Permanent /(.*)/(.*)/(.*)\.php$
http://chinadigitaltimes.net/$1/$2/$3

Note: The method above will break some plugins, including Flexible Upload (took me forever to figure out why Flexible Upload refused to work). Rather than match everything in the first two path segments like that, dial it in to match exactly what you need. In this case, year/month. So my final solution for this pattern was:

RedirectMatch Permanent /([0-9]*[0-9]*[0-9]*[0-9]*)/([0-9]*[0-9]*)/(.*)\.php$
http://chinadigitaltimes.net/$1/$2/$3

(should be on a single line). A separate but similar issue came up with matching the old author and tag page URLs to new ones. Our MT setup had been using underscores everywhere; the new site was using hyphens. These two lines took care of all old author and tag URLs (again, each of these rules should be on one line):

RedirectMatch Permanent ^/author/([^_]+)_([^_]+)$
http://chinadigitaltimes.net/author/$1-$2

RedirectMatch Permanent ^/tag/(.*)\+(.*)$
http://chinadigitaltimes.net/tag/$1-$2

My actual rules are a bit more complex, as they account for the possibility of tags and authors consisting of 1,2,3 or 4 words. Ask me if you’d like a copy.

Amazingly, all internal and external URLs on the old site continue to work. I heart WordPress for making this potential show-stopper a walk in the park.

Performance Results

Can you improve performance when moving from a statically generated site to a dynamic environment? You can if the conditions are right. In the case of CDT, publishing times were a nightmare with Movable Type. Search performance was horrible, and the comment spam problem caused such a drag on the server that we’d had to disable commenting altogether. Now, with the site fully tag-enabled, searchable and comment-able, loads are down dramatically and publishing times have dropped dramatically.

A typical day in load averages on Movable Type:

Load-Mt

A typical day in load averages on WordPress:

Load-Wp

(sorry the first chart is cramped – my load display utility chokes when loads get really high).

Big picture: We’ve launched with a lovely new design, reduced story publishing times from by orders of magnitude, been able to re-enable a bunch of features we’d previously had to disable for load reasons, and added new features that were never possible before. The team of authors and editors is in heaven, and I’m considering bringing the site back onto the main J-School server. It’s been a good week.

63 Replies to “Notes on a Massive WordPress Migration”

  1. Amazing!
    I did kinda move from typo3 to wordpress with primariamedias.ro (~700 posts and ~300 pages).
    I also used a lot of regex for filepaths manipulations. I’ve managed to make the conversion by exporting all the posts data into rss/xml. One of the major problem was localization, bbPress and WordPress do not understand each other when it comes to localisation of an integrated version of these two (again the wp team didn’t payed enough attention to this fact).

    Of course, there’s a lot to thank wp team for the plugins system! Anyway, you did really amazing work!!!

    Good luck!

  2. I used to love MT, but man did they drop the ball. We’ve been migrating all the MT installs to WordPress over here too.

    It’s nice not worrying about your mt-search.cgi, mt-comments.cgi, and mt-tb.cgi scripts consuming obscene amounts of system resources.

  3. Thanks for the deep info on the migration, the list of plug-ins used and the special requests of the team! I’d like to give massive kudos for the nerdy details as I am myself a network admin/minor web techie. Cheers!

  4. PJ – Dropped the ball… sort of. But the fundamental problem is the basic CGI architecture. There’s no good way to fix that without throwing it all away and re-building from scratch in another language.

    Where do you work again?

  5. “There’s no good way to fix that without throwing it all away and re-building from scratch in another language.”

    FastCGI addresses a lot of the shortcomings of CGI. Although it’s not perfect, it certainly rises to the level of a “good fix.”

    Paired with a FastCGI-aware httpd (e.g. Lighttpd) and CGI is quite smooth.

  6. Indeed, SixApart recommends moving to FastCGI for sites suffering from performance problems. But that’s just it – you need to tweak your server setup to match the platform, which is annoying. And some (most?) web hosts may not even allow FCGI at all.

  7. Ha! Pretty cool that your MT permalinks worked due to that canonical redirect code. Believe it or not, that wasn’t the actual purpose of that code — it was meant to handle things like plain text e-mail mangling WP URLs (e.g. wrapping in the middle of the post slug). Bonus!

  8. I would be very interested in the “CDT Docs” plugin you mention. Is it available somewhere?Documenting the way to use the site would be a must for the site I maintain.
    Thank you in advance!

  9. @Mark Jaquith – Intentional or no, you’re my hero. Seriously. The partial URL handling was such an unexpected bonus for the migration project. Thanks.

    @Yves – Sure thing – I’ve just bundled it up and put it on my site. Download here. It’s very simple, and very crude. An ideal version would let you use the native WP post editing screen to save documentation info in the db. This is just raw HTML. But gets the job done! Hope you get some mileage out of it.

  10. By the way, since we’ve got so many WP gurus watching this thread, let me throw out a support question: I can’t get the Visual Editor to work at all on the site, even with all plugins disabled. I also can’t get the Flexible Upload plugin to work, even with all other plugins disabled – it displays garbage – and its author has not been able to solve the problem. Finally, the Options | Reading option to enable/disable gzip support won’t stay checked – it just reverts immediately back to unchecked. I have a hunch these three are related, but don’t know how, or where to begin on a fix.

    If you have a clue what might be going on, or can lend a hand, I’d really appreciate it.

  11. I’ve found the visual editor (ie: TinyMCE) to be buggy over the years. In my experience, it’ll work fine in one browser and not another. Bug reports are filed, but because the behavior is so random or can’t be replicated, the problem goes unsolved. Have you tried a wide variety of browsers?

    Regarding Flexible Upload, it’s been a mixed bag for me ever since upgrading to 2.3. There is an images problem for some users with WP 2.3, so maybe that is connected. On my Powerbook/Camino combo, I had the plugin working fine, but on a friend’s XP/IE combo, it didn’t work.

  12. Well, I tracked down the problem today. One RedirectMatch lines in my .htaccess file, essential for maintaining old links, was also breaking the Flexible Upload plugin. So I’m working on a way to rewrite that line so it doesn’t affect wp-admin URLs. Definitely not browser-related – this is all server side.

  13. By the way, putting all your redirects in the Apache conf file is going to consume much less system resources than using the .htaccess file for your redirects.

  14. Thanks for taking the time to share this, Scot. Six Apart should really pay attention. It was years ago I followed you into MT world. Then you jumped ship, and this one post really captures so many reasons why. Now if you’ll just post a similar Joomla -> WordPress summary. :-)

  15. Jeb – You followed me into the MT world? Yikes, that’s a weird thing to learn (I feel responsible). I should make clear a few things:

    – MT has its strengths
    – MT really works for people in a position to give it the extra TLC it needs (willing to alter the server environment, etc.)
    – Many of the problems I describe above could have been solved while staying on MT, if I had felt that was the right thing to do.

    As for Joomla –> WordPress: While I did do a conversion like that for the Crestmont School site, they’re such radically different systems that I can’t really speak to it. That conversion was a matter of pasting old content into the new site manually (it was small enough for that to be easy).

    While it’s true that many small sites live in overblown CMSs when they don’t need to be, and would be better served by WP, that’s not true of all sites. There’s a certain level of complexity in terms of hierarchy and layout where WP/MT no longer become the right tools for the job. Then it becomes a question of Joomla vs. Drupal vs. 600 other possible CMSs…

  16. Thanks for sharing this; I saw your email today on the uwebd list. We used to run some MT installations here at UT Law, but like you said comment spam got the best of them a few years back and then they moved to a commercial license, and we eventually migrated everyone away from MT.

    I use WordPress for my personal projects, and love it. Have you ever used WordPress MU as a CMS? I’m considering evaluating it, as it seems like it would be ideal for keeping down the insanity of having multiple WP’s set up all over the place while still being pretty flexible. I’d love to hear some downsides of using MU versus plain-old WP if you know of any!

  17. I’ve deployed WPMU as a multi-blog, but not as a CMS per se’. I think it could be a great solution though, and is probably very much underexplored.

    As for handling lots of scattered WP installations, I check them all out with svn then just keep a record as an array in a script I wrote. When upgrade time comes, I’m able to upgrade 50+ installations in about three minutes flat. So it’s really no hassle as long as you stick to the system.

  18. It’s really not fair to blame Movable Type as a CMS for the performance problems caused by a plugin. As to the claim of 15 minutes to publish a single new blog post, I would have to see that to believe it. In any decently designed template set, an update shouldn’t take longer than a matter of 10-20 seconds for something as simple as posting a new blog post. It sounds to me like you built up a lot of dependencies that had to be met with each rebuild.

  19. MikeT: To be clear – I definitely could have improved the situation by optimizing the MT setup, having certain templates generated statically by cron, or as PHP includes, etc. But with all of MT’s hassles and shortcomings, it didn’t seem worth it. We would have migrated to WP even if we hadn’t been facing this particular issue. As for 15 minutes… yep, you had to see it to believe it. It just got progressively worse over the course of two years as the number of tags and tagged entries grew, until the situation was unbearable.

  20. Pingback: botheredByBees
  21. shacker!!!

    how funny to find you by accident as i ponder my static > WordPress image migration problem. good to see you you in the mix. And I hope you and the family are more fabulous than ever.

    I have an unholy WordPress crush at the moment. It gets worse every week. I’m sure my friends are plotting an intervention!

    I’m in the middle of setting up WordPress as a CMS for the wads of content in our two main online exhibitions at NSW Migtation Heritage Centre. At the moment it’s static and I’m hand sorting all pages. Arggg!

    – Annette from the old days

  22. Annette! How great to hear from you. WordPress is an addiction, no doubt. Is there any problem it can’t solve? I’m having trouble finding examples. :) Looks like you’ve mastered it quickly. Let me know if you have any Qs!

    Antiweb misses you.

  23. Thanks for the offer – I may well have questions. See how I go with my plan to move over all the images.

    As for the addiction… it started when I realised WordPress would make a nice CMS for simple sites (mid last year). It got worse when I realised the power of Custom Field GUI / Fresh Post. Then – just when I thought I might have a chance of resuming a normal life – along comes 2.5! I’m doomed!

    I think I’ve spawned about 10 WP installs in the last month! Nothing much live yet, but I’m frantically moving old clients over so they can look after their own stuff.

    I moved the ‘band’ section of http://songpod.com.au over to WP last year, but I think the time has come for me to move the whole site over.

    I was just thinking of Antiweb last night… lurking is probably an understatement at this point. Is there a word for oblivious, automated lurking? The emails go straight into a folder that I never see. Terrible!

  24. I’ve deployed WPMU as a multi-blog, but not as a CMS per se’. I think it could be a great solution though, and is probably very much underexplored.

    As for handling lots of scattered WP installations, I check them all out with svn then just keep a record as an array in a script I wrote. When upgrade time comes, I’m able to upgrade 50+ installations in about three minutes flat. So it’s really no hassle as long as you stick to the system.

  25. Rusya – Your svn checkout array sounds very similar to a script of mine called WP-Mass-Upgrade, which is built expressly for this purpose! And yes, it feels great to be able to upgrade 50+ installs in a few minutes. A far cry from the all-manual days of yore…

  26. I think I’ve spawned about 10 WP installs in the last month! Nothing much live yet, but I’m frantically moving old clients over so they can look after their own stuff.

  27. Hey, I think PHP uses perl-style regexes – so each “[0-9]*” says “Match zero or more [0-9] characters.” Several in a row doesn’t do anything different than a single one – it’s matching any number of digits for year and any number of digits for month. For /yyyy/mm/ I think you want “/([0-9][0-9][0-9][0-9])/([0-9][0-9])/” (or, for bonus points, “/([0-9]{4})/([0-9]{2})/”).

    But consider that what you have is working before making any changes…

  28. Thanks for the write-up. I was especially interested in the description of the rule-based redirects so you were able to avoid a large list of urls.

  29. I would love a copy of your htaccess redirect rules … so that I can improve my methods.

    RedirectMatch Permanent ^/tag/(.*)\+(.*)$

    http://chinadigitaltimes.net/tag/$1-$2

    My actual rules are a bit more complex, as they account for the possibility of tags and authors consisting of 1,2,3 or 4 words. Ask me if you’d like a copy.

    Could you email me a copy please? Thanks so much.

    I typically hard code 301’s for my biggest inbound link pages to leave nothing to chance, what do you think on that?

  30. Data – Sent you a copy of the full .htaccess, hope you find it useful.

    As for hard-coding rather than using .htaccess, I don’t think it has anything to do with “leaving things to chance” but it should have a small performance benefit.

  31. I recently moved a MT blog to WordPress. I used the wordpress plugin “redirection” to handle all of the 301 directs which made it very easy. I was also able to redirect users trying to hit the old MT Search engine, capture and forward their search request to the new wordpress search engine. I have a post with some sample regular expressions up here:

    http://www.iainlbc.com/2010/04/forward-old-movable-type-search-traffic-to-wordpress-search-engine-using-redirection-plugin/

    Cheers and nice post Scot

  32. Laurentiu – If you’re managing 10 installations, be sure and check out WP-Create and WP-Mass-Upgrade. . You’ll save yourself a lot of management hassle.

  33. PJ – Dropped the ball… sort of. But the fundamental problem is the basic CGI architecture. There’s no good way to fix that without throwing it all away and re-building from scratch in another language.

    Where do you work again?

  34. I would be very interested in the “CDT Docs” plugin you mention. Is it available somewhere?Documenting the way to use the site would be a must for the site I maintain.
    Thank you in advance!

Leave a Reply

Your email address will not be published. Required fields are marked *