fbpx

Writing Throwaway Code, Part I: A file at a time using Ruby and Bundler inline

Writing Throwaway Code, Part I: A file at a time using Ruby and Bundler inline
Reading Time: 7 minutes

Do you write much throwaway code? I do. I often find myself thinking in code for a bit to try out a new API or dig out some data in or across systems. This isn’t necessarily code I’ll ever share and I may only use it for the next hour or the next few weeks. This is whiteboard, ad-hoc, workbench, junkbox stuff. Sometimes it does useful work, sometimes the use is just from writing it.

My key for throwaway code is to keep it to a single source file, if possible (with the occasional .env thrown in for security, covered below). If the task is more complicated than a single file can hold it might not be actual throwaway code. 

Since we often work with Rails applications, I tend to write this throwaway code in Ruby. It’s an opportunity to sneak ahead and try the latest Ruby releases, or new language features or gems. I’ve found Ruby to be nice for this – it’s interpreted and scriptable. The Ruby ecosystem has lots of libraries packaged up as Rubygems. Leveraging pre-existing libraries is exactly how I fit functionality into one file. 

To manage these dependencies per project  I use the Bundler gem. But wait, a single file with dependencies to manage, is that a project? Dependency managers usually pull in additional package files and lockfiles, right? Thankfully, Bundler provides a really nice inlining feature that dispenses with a separate package manifest and lockfiles to do just what we need here: keep it all in one file. 

I’d love to show you some actual throwaway code but unfortunately, it’s often entangled with customers’ proprietary information. So, I’ve had to contrive something. Let’s, um, pull in a list of all US postal codes and play around.

First, of course, I had to find a list of US postal codes. After Google and GitHub searches, I’ve learned that most libraries actually source their info from the GeoNames data set. So I’m going to the source here as well.

A plan comes together:

Next is Googling about how to crack into a zip file ([stream unzip ruby]). Searching leads to a nice Stack Overflow answer which in turn becomes the first pass of my one-off file.

When it’s time to write the actual code, I begin with what have become customary initial steps:

  • I ensure that I have the latest Ruby installed (using asdf) because single files are a good chance to play with the latest tools.
  • I run gem install bundler.
  • While there’s just one file, I still make a directory for this file to live in. So, mkdir ~/Projects/junk/fetchpostalcodes. Why create a special directory? A containing directory for the file allows me to add source control later, and provides a place to throw related data files. On some projects, I create a junk directory and these one-offs accumulate there.

Finally, I create the fetchpostalcodes.rb file.

require 'bundler/inline'

gemfile do
  source 'https://rubygems.org'
  gem 'rubyzip', require: 'zip'
  gem 'pry'
end

require 'open-uri'

def get_zip_codes
  codes = []
  URI.open('https://download.geonames.org/export/zip/US.zip') do |content|
    Zip::File.open_buffer(content) do |zip|
      zip.each do |entry|
        if entry.name == 'US.txt'
          entry.get_input_stream do |is|
            is.each_line { codes << _1 }
          end
        end
      end
    end
  end

  codes
end

codes = get_zip_codes

puts codes.map { _1.split("\t")[4] }.uniq.sort.join(", ")

# google [stream unzip ruby]
# https://stackoverflow.com/questions/33173266/ruby-download-zip-file-and-extract

# https://download.geonames.org/export/zip/
# country code      : iso country code, 2 characters
# postal code       : varchar(20)
# place name        : varchar(180)
# admin name1       : 1. order subdivision (state) varchar(100)
# admin code1       : 1. order subdivision (state) varchar(20)
# admin name2       : 2. order subdivision (county/province) varchar(100)
# admin code2       : 2. order subdivision (county/province) varchar(20)
# admin name3       : 3. order subdivision (community) varchar(100)
# admin code3       : 3. order subdivision (community) varchar(20)
# latitude          : estimated latitude (wgs84)
# longitude         : estimated longitude (wgs84)
# accuracy          : accuracy of lat/lng from 1=estimated, 4=geonameid, 6=centroid of addresses or shape
# US      99553   Akutan  Alaska  AK      Aleutians East  013                     54.143  -165.7854       1

You’ll notice I’m throwing a bunch of comments at the bottom. This is quick throwaway code and keeping notes in the file itself means less switching.

I’m including Pry so it’s easy to play around and pull a repl sandbox into my code. For example, when putting this together I wasn’t sure what object that zip.each do |entry| would pass, so I threw in a binding.pry to see what was up.

    Zip::File.open_buffer(content) do |zip|
      zip.each do |entry|
        binding.pry
        if entry.name == 'US.txt'

That gives me a breakpoint to jump into the running system to play.

$ ruby fetchpostalcodes.rb 

From: /Users/darrend/Projects/journal/journal-single-file-ruby/fetchpostalcodes/fetchpostalcodes.rb:17 Object#get_zip_codes:

    16:       zip.each do |entry|
 => 17:         binding.pry
    18:         if entry.name == 'US.txt'

[1] pry(main)> entry.class
=> Zip::Entry

Now, I can research the rubyzip docs and figure out how to use Zip::Entry. Later, I can throw in a binding.pry after the codes = get_zip_codes line. This allows me to inspect the actual data I’m receiving.

[1] pry(main)> codes.first
=> "US\t99553\tAkutan\tAlaska\tAK\tAleutians East\t013\t\t\t54.143\t-165.7854\t1\n"
[2] pry(main)> codes.last
=> "US\t96863\tFPO AA\t\t\t\t\t\t\t21.4505\t-157.768\t4\n"
[3] pry(main)> puts codes.first
US      99553   Akutan  Alaska  AK      Aleutians East  013                     54.143  -165.7854       1

As you can see, this is a very iterative process for me. I’ll figure out one step, then move to the next. It’s handy to pin results in place to speed up the runtime and to avoid hammering resources like databases or apis, or the geonames.org website in this case. In other words, cache results. I’ve found the Lightly gem to provide a simple local quick caching solution. Adding Lightly into our one-off code is easy – add the gem dependency, and then use it.

  gemfile do
    source 'https://rubygems.org'
    gem 'lightly' 
   codes = Lightly.get("get_zip_codes") { get_zip_codes }

It speeds up our code on subsequent runs nicely as well.

 $ time ruby fetchpostalcodes.rb 
, AK, AL, AR, AZ, CA, CO, CT, DC, DE, FL, GA...

real    0m2.699s
user    0m0.917s
sys     0m0.424s

$ time ruby fetchpostalcodes.rb 
, AK, AL, AR, AZ, CA, CO, CT, DC, DE, FL, GA...

real    0m0.750s
user    0m0.506s
sys     0m0.269s

Now I have an array of tab-separated values. How many? Hmm, one second…

puts "#{codes.size} entries to be exact"
$ ruby fetchpostalcodes.rb
, AK, AL, AR, AZ, CA, CO, CT, DC, DE, FL, GA...
41483 entries to be exact

Apparently 41,483 entries to be exact. Maybe I can turn these into structs for further filtering, or include the sqlite and sequel gems and push them into a database. All easy enough.

Actually, what’s that blank at the beginning of the list? I’ll add a pry and see.

$ ruby fetchpostalcodes.rb
From: /Users/darrend/Projects/journal/journal-single-file-ruby/fetchpostalcodes/fetchpostalcodes.rb:32 :

    31: puts "#{codes.size} entries to be exact"
 => 32: binding.pry

[1] pry(main)> codes.map { _1.split("\t")[4] }.uniq.sort.first
=> ""
[2] pry(main)> codes.map { _1.split("\t") }.select { _1[4] == ""}.first
=> ["US", "09001", "APO AA", "", "", "", "", "", "", "38.1105", "15.6613", "\n"]
[3] pry(main)> codes.map { _1.split("\t") }.select { _1[4] == ""}.last
=> ["US", "96863", "FPO AA", "", "", "", "", "", "", "21.4505", "-157.768", "4\n"]

That’s interesting, I wonder what those are. Military bases? I’m off to Google…

I’ve most often used this scheme to generate cross-system reports that use a mix of databases. For example, I have a legacy system using MS SQL Server and a newer replacement system using MySQL. I need to compare data in the two systems as we work on migrations. A single file using the following collection of gems makes tooling together some temporary reporting very easy.

  • mysql2 for, well, MySQL access
  • tiny_tds for MS SQL Server access
  • sequel as a database query and light orm layer
  • net-ssh-gateway to proxy through bastion hosts to reach the databases
  • tabulo to format the results as a Markdown table
  • pry for interactive repl debugging
  • dotenv to allow for easy environment and to keep my source clean of secrets

That last gem — dotenv — is one of the few situations where I veer off from the single-file concept. There’s a chance I’ll want to share this source or even put it into version control, so I don’t want to pollute that single source with hardcoded passwords or sensitive client information. Those details get herded into an unversioned .env file.

Final Thoughts

Using one file as an interactive playground is a great way to focus on a task at hand, while freeing yourself from the weight of an entire project. No worries about keeping the code too clean here. This is about getting something done, trying out new libraries and new techniques. The lack of friction and the immediate response makes figuring out a solution quick and fun.

Read the next article in this seriesWriting Throwaway code, Part II: Moving to interactive notebooks in Ruby and Visual Studio Code

New call-to-action