Like most devs, I often have a need to generate tons of fake data to fill out models – addresses, usernames, “news” content, locations, IP addresses, you name it. Data-mocking libraries are invaluable for creating data factories for use in unit tests, and for populating a site under development with realistic content. For Python, I use the Faker lib in conjunction with FactoryBoy.
Recently I was working on a site that required generating a lot of maps, and realized that randomly chosen geographical coordinates were most often over ocean (since earth is mostly water). Realizing that the right way to solve this was to include a hash of locations known to exist on land and to then pull randomly from those, I started a pull request, which evolved over the past few weeks into a whole new geo
module for the lib.
I started with the open source data sets at GeoNames, converted one of their databases into a Python tuple, then extracted every fifth entry to keep the dataset down to a manageable size. Pretty certain I’ve got every country on earth represented, but let me know if you find any missing.
I ended up bringing geographic features from other Faker modules into the new geo module, and added the ability to specify the country you want random land coords for. As of this morning pull request merged!