Django: Testing for Missing Migrations

When adding or altering model schemas in Django, developers typically generate and commit accompanying migration files. But, counterintuitively, Django wants to track all model changes in migration files, even if they don’t result in database schema changes. Those can be easy to miss.

Regardless the reason, you never want your repo to be in a state where migrations are detected as needed, but accompanying migration files aren’t committed.

Writing a pytest test that can be triggered on github, Circle, Travis, Jenkins, or whatever you use turns out to be trivial, but I couldn’t find documentation or examples on the interwebs, so am posting this here for posterity:

import pytest

from django.core.management import call_command

@pytest.mark.django_db
def test_for_missing_migrations():
    """ If no migrations are detected as needed, `result`
    will be `None`. In all other cases, the call will fail,
    alerting your team that someone is trying to make a
    change that requires a migration and that migration is
    absent.
    """

    result = call_command("makemigrations", check=True, dry_run=True)
    assert not result

It really is that simple! If no migrations are needed, result will be None. Any other return value means someone should run ./manage.py makemigrations to see what’s missing, and commit the results.

New features in django-todo

If you haven’t checked out django-todo for a while, the project has been super-active lately! In the past few months it’s gained support for file attachments, batch-task-import via CSV, and a fully integrated email tracker. Now at version 2.4.6 on pipy, or check out the live demo site.

I started this project as a small demo more than ten years ago, and it’s evolved into a piece of staple software in the Django ecosystem. Proud of what it’s become!

Announcing django-todo 2.0

django-todo is a pluggable, multi-user, multi-group, multi-list todo and ticketing system – a reusable app designed to be dropped into any existing Django project. Users can create tasks for themselves or for others, or create ”assigned tasks” that will be filed into a specific list (public tickets).

That was the original project description, and it hasn’t changed in 10+ years.

When I first created django-todo, it was a simple “let’s learn Django” project I gave to myself. I open sourced it, it’s been relatively successful, and the project has received numerous contributions over the years (grateful!). When I heard that it wasn’t compatible with Django 2.0, I looked back on that old code and realized it was time for a major refactor/upgrade. I’ve been working on the update for the past couple of months (evenings only).

Virtually every module and template has been refactored, much more in line with current best practices. The update started small, but by the end, I had made 75 commits and written the first suite of working tests (finally!). And I adopted Bootstrap as the default layout engine. And finally got around to creating a live demo site for the project.

django-todo 2.0 requires Django 2.0 and Python 3.x – no apologies. Unfortunately, this is a backwards-incompatible update (you’ll need to migrate old data manually, if you have any).

Hope it’s useful to a few teams or individuals out there. Contributions still very much welcome.

View 403, 404, 500 with media in Django DEBUG mode

When working with Django in DEBUG mode, it can be tough to see your 403, 404, and 500 views, since they raise visible stack traces instead of the UX the end user will see. But if you turn DEBUG off, runserver’s local media serving is disabled because it’s designed to work only with DEBUG = True. The solution is scattered throughout the Django docs, and I couldn’t find it compiled into one compact code block anywhere – just reference the handling functions directly from the end of your urls.py:

if settings.DEBUG:
    from django.views.defaults import server_error, page_not_found, permission_denied
    urlpatterns += [
        url(r'^500/$', server_error),
        url(r'^403/$', permission_denied, kwargs={'exception': Exception("Permission Denied")}),
        url(r'^404/$', page_not_found, kwargs={'exception': Exception("Page not Found")}),
    ]

And voila, your 403.html, 404.html, and 500.html templates will be displayed in full glory for developers.

Trix.py – Metadata/Converter for Hunter’s Trix

Hunter’s Trix is an incredible (and very large) collection of “matrix” recordings of some of the best Grateful Dead shows. The series is produced and mixed by Jubal Hunter Seamons and includes CD cover artwork for each volume/show.

trix1

A “matrix” involves taking a high-quality soundboard recording and merging (matrixing) it with one or more audience recordings (Auds) of the same show. The resulting matrix brings you the maximum fidelity of the soundboard source and the ambience/electricity of being in the audience at the same time.

There are more than 100 Hunter matrixes being traded as legal torrents on etree.org.

Unfortunately, there are two problems: 1) They’re all in FLAC format, instead of Apple Lossleess (ALAC). Since most people use iTunes, this means most people must go through a manual transcoding process; 2) The first 94 shows are missing embedded metadata and cover art (the cover art is beautiful). I’m obsessive about having perfect metadata and cover art in every single track in my collection, which meant manually copying and pasting metadata (including track and disc numbers, show dates and venues, track and album titles, etc.) from text files in the download directory into individual track files. It was taking 20+ minutes to process each album. So I decided to automate the process with this python script.

trix2

I had originally planned to share the completed ALAC versions of the collection back to the community, but Hunter talked me out of it. So I’m doing the next best thing here and sharing the conversion script. With everything installed and working, I was able to cut the processing time down from ~20 minutes per recording to 1 minute. The final results are added to your iTunes collection automagically.

trix3

Git it here: https://github.com/shacker/trix

One Codebase, Endless Possibilities: Real HTML5 Hacking

Loose notes from SXSW 2011 session: One Codebase, Endless Possibilities: Real HTML5 Hacking

HTML5 is no question the “buzzword du jour” in tech nowadays, but looking past the vernacular cruft one will discover that the HTML5 technology STACK is actually an incredibly powerful & useful framework for apps well beyond the traditional web browser. Massive companies like Google and Hewlett Packard are placing huge bets on the future of “HTML5 App development”. From HP/Palm’s WebOS to be used in their mobility products to Google’s Chrome OS, HTML5 is not simply another buzzword that can be treated as a mere passing trend, but should actually be taken seriously for app development. But what makes up the HTML5 stack and how will it truly be the future of software? What are the benefits & risks associated with using the HTML5 stack? Prove to me it works. All of these questions & demands will be answered & showcased in the presentation including important issues such as: What constitutes the HTML5 stack Benefits of using the HTML5 stack Use a single codebase Rapidly prototype an app targetting multiple devices including: iPhone, iPad, Android Devices, Chrome OS Devices, Mobile Webkit Browsers, Desktop Browsers Target thousands of developers for extensibility & community development .

Continue reading “One Codebase, Endless Possibilities: Real HTML5 Hacking”

Python Gift Circle

Holiday Python geekiness…

If your family (or classroom or workplace) does “gift circles,” where everyone buys a gift for exactly one other person in the group, you could do (and probably already do do) the old “pull a name out of a hat” thing. But that takes setup time: writing down names, cutting them out, finding a hat, passing it around… shouldn’t this process be automated? Here’s a little Python script to get it done quick.

On my MacBook, the script runs for ten people in 27 milliseconds – think of all the egg nog you could drink in the time you save!

Populate the “recipients” list with real names and run ./gift-circle.py.

Update: This script is now available at github.

#!/usr/bin/python
import random

'''
Gift exchange randomizer in Python.
Step through a list of people and, for each member of that list,
select someone else to be a recipient of their gift. That recipient:

    A) Must not be themselves (no self-gifting)
    B) Must not already have been assigned as a recipient

Due to randomization, we can't prevent the possibility that 7/8 of people
will all give to each other, leaving the 8th to give to themselves. Therefore
we keep running the function until we get full distribution.
'''

def give():
    str = ''

    givers = ['Leslie', 'Jamie', 'Avis', 'Jim', 'Amy', 'Scot', 'Mike', 'Miles', 'Buford', 'Momo']
    recipients = list(givers)  # Make a copy

    for idx, giver in enumerate(givers):

        # Grab random person from the recipients
        recipient = random.choice(recipients)

        # Make sure we haven't either randomly chosen the same recipient and giver OR
        # ended up with only one un-gifted person in the list.
        if recipient == giver:
            return False
        else:
            # Remove this recipient from the pool and build the results string
            recipients.remove(recipient)
            str = str + "{idx}: {giver} gives to {recipient}\n".format(
                idx=idx+1, giver=giver, recipient=recipient
                )
    return str


# Keep trying until we get through the set with no failures
results=give()
while not results:
    results = give()

print results

Output looks like this:

1: Leslie gives to Amy
2: Jamie gives to Leslie
3: Avis gives to Momo
4: Jim gives to Jamie
5: Amy gives to Buford
6: Scot gives to Mike
7: Mike gives to Scot
8: Miles gives to Avis
9: Buford gives to Jim
10: Momo gives to Miles

Happy holidays, you big nerd!

Shorter URLs with Base62 in Django

Update, 4/2017: See this StackOverflow answer for a different (and probably shorter) approach to this problem.

URL shorteners have become a hot commodity in the age of Twitter, where every byte counts. Shorteners have their uses, but they can also be potentially dangerous, since they mask the true destination of a link from users until it’s too late (shorteners are a malware installer’s wet dream). In addition, they work almost as a second layer of DNS on top of the internet, and a fragile one at that – if a shortening company goes out of business, all the links they handle could potentially break.

On bucketlist.org, a Django site that lets users catalog life goals, I’ve been using numerical IDs in URLs. As the number of items stored started to rise, I watched my URLs getting longer. Thinking optimistically about a hypothetical future with tens of millions of records to serve, and inspired by the URL structure at the Django-powered photo-sharing site Instagr.am, decided to do some trimming now, while the site’s still young. Rather than rely on a shortening service, decided to switch to a native Base 62 URL schema, with goal page URIs consisting of characters from this set:

BASE62 = "abcdefghijklmnopqrstuvwxyz0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ"

rather than just the digits 0-9. The compression is significant. Car license plates use just seven characters and no lower-case letters (base 36), and are able to represent tens of millions of cars without exhausting the character space. With base 62, the namespace is far larger. Here are some sample encodings – watch as the number of characters saved increases as the length of the encoded number rises:

Numeric Base 62
1 b
22 w
333 fx
4444 bjG
55555 o2d
666666 cN0G
7777777 6Dwb
88888888 gaYdK
999999999 bfFTGp
1234567890 bv8h5u

I was able to find several Django-based URL shortening apps, but I didn’t want redirection – I wanted native Base62 URLs. Fortunately, it wasn’t hard to roll up a system from scratch. Started by finding a python function to do the basic encoding – this one did the trick. I saved that in a utils.py in my app’s directory.

Of course we need a new field to store the hashed strings in – I created a 5-character varchar called “urlhash” … but there’s a catch – we’ll come back to this.

The best place to call the function is from the Item model’s save() method. Any time an Item is saved, we grab the record ID, encode it, and store the return value in urlhash. By putting it on the save() method, we know we’ll never end up with an empty urlhash field if the item gets stored in an unpredictable way (site users can either create new items, or copy items from other people’s lists into their own, for example, and there may be other ways in the future — we don’t want to have to remember to call the baseconvert() function from everywhere when a single place will do — keep it DRY!)).

Generating hashes

So in models.py:

from bucket.utils import BASE10, BASE62, baseconvert

...

def save(self):

    # Do a bunch of stuff not relevant here...

    # Initial save so the record gets an ID returned from the db
    super(Item, self).save()

    if not self.urlhash:
        self.urlhash = baseconvert(str(self.id),BASE10,BASE62)
        self.save()     

Now create a new record in the usual way and verify that it always gets an accompanying urlhash stored. We also need to back-fill all the existing records. Easy enough via python manage.py shell:

from bucket.models import Item
from bucket.utils import BASE10, BASE62, baseconvert

items = Item.objects.all()
for i in items:
    print i.id
    i.urlhash = baseconvert(str(i.id),BASE10,BASE62)
    print i.urlhash
    print
    i.save()

Examine your database to make sure all fields have been populated.

About that MySQL snag

About that “snag” I mentioned earlier: The hashes will have been stored with mixed-case letters (and numbers), and they’re guaranteed to be unique if the IDs you generated them from were. But if you have two records in your table with urlhashes ‘U3b’ and ‘U3B’, and you do a Django query like :


urlhash = 'U3b'
item = Item.objects.get(urlhash__exact=urlhash)

Django complains that it finds two records rather than one. That’s because the default collation for MySQL tables is case-insensitive, even when specifying case-sensitive queries with Django! This issue is described in the Django documentation and there’s nothing Django can do about it – you need to change the collation of the urlhash column to utf8_bin. You can do this easily with a good database GUI, or with a query similar to this:

ALTER TABLE `db_name`.`db_table_name` CHANGE COLUMN `urlhash` `urlhash` VARCHAR(5) CHARACTER SET utf8 COLLATE utf8_bin NULL DEFAULT NULL COMMENT '' AFTER `id`;

or, if you’re creating the column fresh on an existing table:

ALTER TABLE `bucket_item` ADD `urlhash` VARCHAR( 5 ) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL AFTER `id` , ADD INDEX ( `urlhash` )

Season to taste. It’s important to get that index in there for performance reasons, since this will be your primary lookup field from now on.

Tweak URL patterns and views

Since the goal is to keep URLs as short as possible, you have two options. You could put a one-character preface on the URL to prevent it from matching other word-like URL strings, like:

foo.org/i/B3j

but I wanted the shortest URLs possible, with no preface, just:

foo.org/B3j

Since I have lots of other word-like URLs, and can’t know in advance how many characters the url hashes will be, I simply moved the regex to the very last position in urls.py – this becomes the last pattern matched before handing over to 404.

url(r'^(?P<urlhash>\w+)/$', 'bucket.views.item_view', name="item_view"),

Unfortunately, I quickly discovered that this removed the site’s ability to use Flat Pages, which rely on the same fall-through mechanism, so I switched to the “/i/B3j” technique instead.

url(r'^i/(?P<urlhash>\w+)/$', 'bucket.views.item_view', name="item_view"),

Now we need to tweak the view that handles the item details a bit, to query for the urlhash rather than the record ID:


from django.shortcuts import get_object_or_404
...

def item_view(request,urlhash):        
    item = get_object_or_404(Item,urlhash=urlhash)
	...

It’s important to use get_object_or_404 here rather than objects.get(). That way we can still return 404 if someone types in a word-like URL string that the regex in urls.py can’t catch due to its open-endedness. Note also that we didn’t specify urlhash__exact=urlhash — case-sensitive lookups are the default in Django queries, and there’s no need to specify the default.

If you’ve been using something like {% url item_view item.id %} in your templates, you’ll obviously need to change all instances of that to {% url item_view item.urlhash %} (you may have to make similar changes in your view code if you’ve been using reverses with HttpResponseRedirect).

Handling the old URLs

Of course we still want to handle all of those old incoming links to the numeric URLs. We just need a variant of the original ID-matching pattern:

url(r'^(?P\d+)/$', 'bucket.views.item_view_redirect', name="item_view_numeric"),

which points to a simple view item_view_redirect that does the redirection:


def item_view_redirect(request,item_id):
    '''
    Handle old numeric URLs by redirecting to new hashed versions
    '''
    item = get_object_or_404(Item,id=item_id)
    return HttpResponseRedirect(reverse('item_view',args=[item.urlhash]))

Bingo – all newly created items get the new, permanently shortened URLs, and all old incoming links are handled transparently.

Loose Notes from Djangocon 2010

It’s been inspiring to watch the growth of the Django developer community, and the increasing traction the platform is getting from high-profile sites. NASA, The Onion, Washington Post, Mozilla, PBS, and many other prominent organizations are discovering the power of deploying on a pure Python framework, rather than on an opinionated CMS written in PHP that gets in your way as much as it helps. I was lucky to attend the first Djangocon at Google headquarters a couple of years ago, and lucky again to be able to attend the conference in Portland, OR this year.

Three solid days of panels on topics ran the gamut from low-level detail-oriented sessions like tips on working with forms to high-level recommendations from experts on things like scaling to high-traffic situations, automating the deployment process, and what could be done better. As with any conference, 3/4 of the value is in the panels, and the other 1/4 is in the networking – meeting and talking with people working with the same toolchains, exchanging tips and helping one another. I learned a ton this year. There were surprises, too – from everyone getting their own pony in their shwag bag  to the visit from Oregon congressman David Wu, to the realization that I wasn’t the most junior developer in the room, to the discovery that you could get full to the point of bursting at a vegan restaurant.

About that pony: It all started during a discussion on what features should go into the next version of Django, when someone said “I want a pony!” The feature under discussion was delivered, and the person got their pony. That led to the creation of playful sites like djangopony.com and My Little Django. Hilarious at the time, but honestly, I think the meme has played itself out, and may have just jumped the shark with everyone getting their own pony this year. I love the pony, and I love my new Pinky Pie, but I’m ready for the meme to go away now.

While most sessions were highly technical, one of the highlights was the keynote presentation by Eric Florenzano of the Djangodose podcast, “Why Django Sucks (And How We Can Fix It).” video | slides . The talk generated some controversy, but that’s healthy and good. The talk was refreshing for its honesty and forthcoming with actual solution proposals on most points. Django appeals to enterprise in part because it takes a conservative approach toward change, but the atmosphere of the platform must remain on its toes to stay competitive and forward-thinking.

Newly launched: whydjango.com – to become a collection of case studies explaining why Django is a good fit for organizations and enterprises. I plan to submit case studies for the Graduate School of Journalism and the Knight Digital Media Center soon.

Took copious notes at most of the sessions, but have only edited them lightly – apologies for typos and incomplete sentences. And sorry this is so long! (I didn’t have time to make it shorter). Downloadable slides from many of the talks are available here. And of course I only attended half of the sessions by definition. Full list of sessions here. Want to watch the whole thing? Videos of the sessions are already up!

Continue reading “Loose Notes from Djangocon 2010”

Batch-deleting Twitter favorites

Update: This script broke when Twitter started requiring all API interactions to use oAuth2. The script would have to be made far more complex to support this change, and I have no plans to do so. My current advice: Leave your favorites in place. They don’t hurt anything. Think of them like Facebook “Like”s – you wouldn’t try and delete those, would you?

I read Twitter primarily on the iPhone, and find tons of great links I want to read in a proper browser later on (I personally find reading most web sites on an iPhone to be more hassle than it’s worth). Perfect solution: Side-swipe an item in Tweetie and tap the star icon to mark it as a favorite. Later, visit the Favorites section at twitter.com to follow up.

Unfortunately, over the past couple of years I’ve favorited way more things than I’ll ever have time to read. As of now, I’ve got 1600 favorites waiting to be read. Ain’t never gonna happen. I declare Twitter Favorite bankruptcy! Needed a way to batch-unfavorite the whole collection, and twitter.com doesn’t provide a tool for that.

Ended up writing a script on top of the Tweepy library to get the job done:

Twitter Favorites Bankruptcy