Loose notes from SXSW 2008 session “Scalability Boot Camp” with:
Blaine Cook Architect, Twitter Inc
Jakob Heuser Architect, Gaia Interactive
Alan Kasindorf MySQL DBA, SixApart
Sandy Jen Co-founder, Meebo
Kerry Miller Writer, passiveaggressivenotes.com
Good tips from diverse perspectives. Everyone on the panel admitted to having made huge scaling mistakes in the past, and to having learned critical lessons from real-world usage patterns.
Don’t let your filesystem be a hero. Distribute data. Let someone else host files for you.
Everyone screws up their db.
Optimize your schema, optimize your queries, add caching
A few bad queries that take several seconds each can really tangle things up.
Use the “explain” syntax to introspect the db
Caching layer Memcache - USE IT! HiveDB - horizontal data partitioning CouchDB / Hypertable - alternate ways of storing
Get a MySQL consultant. Small companies don’t need a DBA. A few hours of work will get you off the ground.
One of the best ways to scale your site is to fake it. Offload work that the user sees off to the back-end, so it happens without them.
Smarter, not stronger. The simplest ideas really are quite simple. - Consistent for current user, not everyone - Design code for parallel steps - Cronjobs Google time (tools they use) - Starling - Gearman - TheSchwartz
– All these techs are built to be asynchronous.
– An amazing amount of your app can be asynchronous too. When you “poke” someone on FB, you just need to return a page to the user immediately. The user shouldn’t have to wait until the actual poke has occurred.
Set reasonable limits. Flickr has a limit of 5000 friends. Developers have better things to do than sit around supporting users with 10000 friends.
You scale because you grow. Launch slowly, if you can. Incremental feature additions. If you add a new button, let it sit quietly on the site for a while to see how it does – don’t trumpet its existence on your homepage right off the bat.
Frameworks such as Rails don’t always scale easily/neatly.
Languages and frameworks don’t scale; architectures do.
Everything is different. Flickr, LiveJournal, Twittr all look similar, but on the back-end are completely different. Twitter is all about low-latency. LJ and Flickr are more about advanced user interaction.
It’s easy to scale in the wrong place. Sometimes there’s no way to know without trying, especially if what you’re doing is unique. Meebo had to get *instant* exactly right. That’s priority #1.
Meebo: Be honest with your users. “We’re learning as we go – please help us out.” rather than “We’re all-knowing and never make mistakes.” It will ogo over much better.
Business Continuity Plan – Once you cross a given threshold, you can’t do any more business. Understand what volume your current architecture can handle.
Unit testing is good. You can try applying memcached to, say, just one module, or one at a time.
It’s very very hard to simulate real-world traffic loads. This is why premature optimization doesn’t work. You have to watch and see how your site really behaves.
Single points of failure – you don’t find them, they find you.
If you’re load balancing, you can try rolling a new feature out on one server but not on another, then A/B the loads.
If your servers aren’t fast enough, get more. Optimize them later.