After Twittering for a few months, I started to feel uncomfortable about not owning my data, and wanted an automated way to store a copy of each Tweet for posterity. Another installation of WordPress would be perfect as a Twitter backup repository (alternatively, you could copy all of your tweets to a dedicated category within your main WP installation, but I chose to do it in a separate install, since I wasn’t looking for integration with my main blog.
There were really two problems to solve:
1) Have new Tweets automatically hoovered into the WP backing store.
2) Get all of my older Tweets ported into the system as well.
Here’s the resulting site. It’s not really intended for public viewing – I don’t care if people browse it, but it’s really just a backup system in the form of a WordPress site.
Part 1 is pretty easy; Part 2 was more complicated. Here are recipes for both procedures.
Store new Tweets in a WordPress blog
Obviously, install a fresh copy of WordPress.
Next, install the TwitterTools plugin. Set it to “Create a blog post from each of your tweets” and not much else. By default, it will check for new Tweets every 15 minutes.
Cake – That’s all there is to it! Set it and forget it.
Get all of your old Tweets into WordPress
This part is quite a bit more complex, but you should be able to get through it in 15-30 minutes. The challenge here is that Twitter does not offer an export function, and the API only lets you grab the most recent 20 or so entries. There is, however, a web-based tool called TweetDumpr that gets around the API limit, presumably by surfing Twitter’s “Older” links and scraping the content.
Obtain a dump of your entire Twitter timeline from TweetDumpr. It will arrive in CSV format. Unfortunately, TweetDumpr won’t give you your entire timeline if you’ve been Twittering for a while – it can hoover out more tweets than the official API will give you access to, but will only go back in time as far as Twitter’s “Older” option allows. So unfortunately I’ve got a big gap in my repository. You may be luckier than I was.
Unfortunately, you can’t import a CSV file directly into WordPress. You’re going to need to convert the exported data to XML, then massage and tweak it to match the XML schema of a native WordPress XML export/import file.
Open the CSV file in Excel and pull down File | Save as. Choose the format “Excel 2004 XML Spreadsheet” and save with a different name.
Now you need a valid WordPress XML export file to compare to. The idea is that you’re going to open the two files side by side in a text editor and, with a bit of search/replace-fu, make the Excel XML export assume the same shape and form of a WordPress XML file. Go to your WordPress back-end and click Manage | Export to download a sample export file.
Open the two files in a decent programmer’s text editor (I like TextMate) – any editor that can handle regular expressions will do. The goal is to give the Excel-generated XML file all of the critical elements of a WordPress XML file. Be careful when editing the XML — if you fail to close a container properly, the process will break.
Here’s the basic editing recipe – your mileage may vary. Note that we don’t need to re-create ALL fields from the example file – we just need to make sure the data we actually want to use is in the right format. Our resulting file is going to be a lot simpler looking than the WP sample file you downloaded.
– Delete the top section of the XML – everything from the 1st line down to the Worksheet
line.
– From the WP export file, copy everything from the 1st line down to the first channel
line. Paste that into the top of the Excel XML file.
– Delete the Table
line from the Excel XML file.
– Go to the bottom of the Excel XML file and delete everything from the Worksheet
line to the end of the file. Now paste the last two lines of the WP export file at the end of the Excel file.
– Now for search/replace-fu. In the Excel XML file, perform these replacements. Note that WP has a concept of post titles, while Twitter does not, which is why we cram “Untitled” into each title field. We’re also assuming that none of your tweets start with the string “2008-“.
- Replace
<Row>
with<item><title>Untitled</title><wp:status>publish</wp:status>
- Replace
</Row>
with</item>
- Replace
<Data ss:Type="String">2008-
with<Data ss:Type="DateString">2008-
(if you have any posts from 2007, repeat this step for 2007). - Replace
+00:00</Data></Cell>
with+00:00</DateData></Cell>
- Replace
<Cell><Data ss:Type="String">
with<content:encoded><![CDATA[
- Replace
</Data></Cell>
with]]></content:encoded>
- Replace
<Cell><Data ss:Type="DateString">
with<wp:post_date>
- Replace
</DateData></Cell>
with</wp:post_date>
- Replace
+00:00
with[replace with nothing]
Now for the fancy regex. We need to replace the capital T in the middle of the datestamps with a space. We only want to match a T that’s surrounded by a digit on either side – otherwise we’ll match all capital Ts in your Twitter history. Be sure you’re doing a regular expression search for this step (look for a regex checkbox in your search/replace dialog):
- Replace
(\d)T(\d)
with$1 $2
Your modified file should now be ready to import into WordPress. In WP, go to Manage | Import and select the WordPress import type. Navigate to your modified file and give it a shot.
I host and backup MT, Gallery, Wiki content for my extended family because I want to know we have it as long as we want. But I often wonder should we all just hand it over to Google (Flickr, Blogger, et al) and being a mini data center. My Google Docs collection has grown meaningful enough that I want to back it all up… I guess with offline. And then my 3 years of gmail… imap it all off?
What is the breakdown of your twitter input? Ex. 40% home Mac, 30% work Mac, 10% iPhone, etc…
that would be.. “stop being a mini data center”
Jeb, I’m with you. See also Web 2.0 Is Sharecropping.
Before I got the iPhone I’d say my Twitter breakdown was 75% home, 25% work (all via Twitterific). Now I’d say it’s close to an even split, though if really busy at work I’ll shut down Twitterific entirely, like I did today.
You?
i’d been thinking about this too, sort of. i wanted to funnel my twitter, flickr, and blog into one social stream on my site. i wrote a little script for that and you can check it out in action.
http://machine501.com/
Nice aggregator, Robert!
Thanks for this post! When I use twitter tools though, it creates the blog post fine – but it populates the header and the body with the same content.
I don’t see a configuration option to remove the header or the body… but I see that your site is much more elegant! How’d you do it? :)
Shripiya – Do you mean the title and the body? There is no concept of a title in a Twitter post. So you need to alter your blog template to simply not show the title – just show metadata for the post, as you see at birdhouse.org/tweets.
Well, the issue is that I am importing into my existing blog and it populates the tweet in both the title and the body – ick! So, I guess I need to figure out how to set up a separate template for just that category…?
Ah, an existing blog… I see the problem. I’d ask at the Twitter Tools site, see what they say.
nice post indeed….but how can you export all your post from the blog to twitter?i guess this is the hard part….i mean all the post…the old posts…not only the new posts.
bazha – It doesn’t make sense to post blog entries to twitter since blogs are unlimited length and twitter is limited to 140 chars.
yeah,,,you can post an excerpt….but now i found twitterfeed..
or you can use twitter updater plugin…for the new posts…
re: “By default, it will check for new Tweets every 15 minutes.”
Not understanding what triggered this 15 minute update, I thought I thought I had to set up a cron job to hit my wp-twitter archive home page to force the Tweet update. Later I discovered (and this is not explained anywhere I could find) that Alex King’s (TwitterTool author) server is the one that is busy doing this for all of us. So we are registering our wp sites when we enable the plug-in and his server is hitting ours every 15 minutes to force an update.
Thanks for clarifying that Jeb.
This part is quite a bit more complex, but you should be able to get through it in 15-30 minutes. The challenge here is that Twitter does not offer an export function, and the API only lets you grab the most recent 20 or so entries. There is, however, a web-based tool called TweetDumpr that gets around the API limit, presumably by surfing Twitter’s “Older†links and scraping the content.
You can use the API to step back through a stream 20 tweets at a time – taking “gulps” and processing them. I wrote a python script that deletes a large backlog of old Favorites using this method. Unfortunately it relies on the old password login style which Twitter no longer supports, but you can still see the method at work there.