Correction: I do know what subsets are, I swear

A few months ago, I wrote a post entitled “A nice way to generate all subsets within python“, and you know what? There were some issues with that post. They were pointed out to me by one Owen Jacobson, and I feel like a dork, but not the good kind – more like a fail-dork, or something. Maybe the word for that is fdork.

Anyway, the issues, in Owen’s words, are:

1. from imports in functions have gone away in Python 3; I’d put those at module scope, instead. The other alternative is to use import collections as c within the function and to qualify references, as with c.Iterable.

(I know, imports within functions are an attractive idea, but the resulting semantics are very odd. The first call to a function might, or might not, alter the interpreter’s global state depending on what other imports have already run, and might, or might not, take significantly longer than usual depending on how quickly imports are resolved! Not fun, as I learned to my detriment.)

2. This seems like a great place to use generators and the yield expression. It’d put the decision about whether or not to hold n! permutations in memory or simultaneously or not out in the caller’s context, where there’s more information to make that decision against, and it’d replace four lines of code with one: return c.permute(args).

3. Neither function actually returns all of the subsets of the set formed by their respective argument lists; they return all permutations, instead.

I would like to briefly respond, and by that, I mean own up.

To the first point: neat! This makes a lot of sense – I have been staying away from reading about Python 3 at all, which is not very wise.

To the second point: that’s really smart. It didn’t even occur to me to write this as a generator, partly because I’m not super familiar with the nitty-gritty of generators in python, but upon seeing this I am all “ohhhh, yeah”.

Finally, to the third point: Owen is right; the code I wrote totally doesn’t produce all subsets, so what gives?!

I believe that my initial thought was to use the ‘all_subsets’ function (which should have been named ‘all_permutations’ from the start) to generate all permutations of an n-bit string, which I would then use to generate all subsets. In the past, I’ve made much uglier approaches to generating every bitstring — particularly by calculating each successive permutation manually. This felt like a nice shortcut! The problem is that I only implemented half of it, and then forgot what my original thought was. And then forgot to actually think about what I was saying.

Lessons learned:

1. Make sure you’ve finished what you set out to do before blogging about it, or at least, make sure your posts and reality are approximately congruent.

2. Even if a technology’s adoption (e.g. Python 3) feels infinitely distant, it isn’t too soon to become acquainted with it and to think about writing code which will be less broken when I have to update it. New technology usually exists to solve problems, and in that respect, often points to smart idioms which I can adopt now.

3. Generators are very sensible things to use in utility functions that produce more than a single piece of data. They allow the program to move along faster, and reduce immediate storage requirements, placing the decision in the hands of the function’s user whether or not holding their full result in memory is necessary. I should read up a bit more about how to use them in Python.

4. Owen’s pretty sharp. Thanks for the awesome response 🙂

Possible Parallel Projects

Tuesday, November 1st: 12:01 PM

I am considering two possibilities for my project: an n-dimensional parallel prefix scan OR a ruby library for pilot for my CIS 3090 project… I’ll list some pros and cons!

n-dimensional parallel prefix scan:
pros:

  • may be less work
  • would be super cool
  • could be really useful
  • teaches me some math
  • teaches me some algorithms

cons:

  • really hard; I might not be able to do this
  • could be a lot of “hidden work”
  • risk doing this “wrong”
  • might just not be very fast!

2. Ruby port of pilot library:
pros:

  • have to get to know pilot intimately
  • easy to incrementalize
  • open source project to maintain
  • get way better at ruby

cons:

  • have to get to know pilot intimately
  • probably a ridiculous amount of work
  • not very “original”

Hmmmm…