My take-way from TEDxGuelphU

Water is amazing. Artists can have some great insights to share. Cartwheels fill time like nothing else. Dancing is a great equalizer. Straws are weak (in the face of elastics). Hagfish slime is cool. It’s worth thinking about how to change the world. Science, economics, happiness, conflict, the beautiful world itself, and money all have tremendous and difficult to understand roles to play in accomplishing complex positive changes, but through our connected nature and a collective sense of shared responsibility, we might just have a chance.

I extend my thanks to all you speakers, guests, committee members, tech crew members, hospitality staff, and everyone else who help make sharing ideas possible. Good show. Now let’s not relax too long, because the world won’t wait around.

Correction: I do know what subsets are, I swear

A few months ago, I wrote a post entitled “A nice way to generate all subsets within python“, and you know what? There were some issues with that post. They were pointed out to me by one Owen Jacobson, and I feel like a dork, but not the good kind – more like a fail-dork, or something. Maybe the word for that is fdork.

Anyway, the issues, in Owen’s words, are:

1. from imports in functions have gone away in Python 3; I’d put those at module scope, instead. The other alternative is to use import collections as c within the function and to qualify references, as with c.Iterable.

(I know, imports within functions are an attractive idea, but the resulting semantics are very odd. The first call to a function might, or might not, alter the interpreter’s global state depending on what other imports have already run, and might, or might not, take significantly longer than usual depending on how quickly imports are resolved! Not fun, as I learned to my detriment.)

2. This seems like a great place to use generators and the yield expression. It’d put the decision about whether or not to hold n! permutations in memory or simultaneously or not out in the caller’s context, where there’s more information to make that decision against, and it’d replace four lines of code with one: return c.permute(args).

3. Neither function actually returns all of the subsets of the set formed by their respective argument lists; they return all permutations, instead.

I would like to briefly respond, and by that, I mean own up.

To the first point: neat! This makes a lot of sense – I have been staying away from reading about Python 3 at all, which is not very wise.

To the second point: that’s really smart. It didn’t even occur to me to write this as a generator, partly because I’m not super familiar with the nitty-gritty of generators in python, but upon seeing this I am all “ohhhh, yeah”.

Finally, to the third point: Owen is right; the code I wrote totally doesn’t produce all subsets, so what gives?!

I believe that my initial thought was to use the ‘all_subsets’ function (which should have been named ‘all_permutations’ from the start) to generate all permutations of an n-bit string, which I would then use to generate all subsets. In the past, I’ve made much uglier approaches to generating every bitstring — particularly by calculating each successive permutation manually. This felt like a nice shortcut! The problem is that I only implemented half of it, and then forgot what my original thought was. And then forgot to actually think about what I was saying.

Lessons learned:

1. Make sure you’ve finished what you set out to do before blogging about it, or at least, make sure your posts and reality are approximately congruent.

2. Even if a technology’s adoption (e.g. Python 3) feels infinitely distant, it isn’t too soon to become acquainted with it and to think about writing code which will be less broken when I have to update it. New technology usually exists to solve problems, and in that respect, often points to smart idioms which I can adopt now.

3. Generators are very sensible things to use in utility functions that produce more than a single piece of data. They allow the program to move along faster, and reduce immediate storage requirements, placing the decision in the hands of the function’s user whether or not holding their full result in memory is necessary. I should read up a bit more about how to use them in Python.

4. Owen’s pretty sharp. Thanks for the awesome response 🙂

Possible Parallel Projects

Tuesday, November 1st: 12:01 PM

I am considering two possibilities for my project: an n-dimensional parallel prefix scan OR a ruby library for pilot for my CIS 3090 project… I’ll list some pros and cons!

n-dimensional parallel prefix scan:
pros:

  • may be less work
  • would be super cool
  • could be really useful
  • teaches me some math
  • teaches me some algorithms

cons:

  • really hard; I might not be able to do this
  • could be a lot of “hidden work”
  • risk doing this “wrong”
  • might just not be very fast!

2. Ruby port of pilot library:
pros:

  • have to get to know pilot intimately
  • easy to incrementalize
  • open source project to maintain
  • get way better at ruby

cons:

  • have to get to know pilot intimately
  • probably a ridiculous amount of work
  • not very “original”

Hmmmm…

A nice way to generate all subsets within python.

Disclaimer: this post has some lousy code and a serious factual error. Throughout, I am speaking about a method of generating all permutations of a set, while saying that I’m speaking about generating all subsets of a set! I have written a newer post which integrates content from a comment by Owen Jacobson, along with a short response explaining just what the heck I was thinking. This post has been left intact aside from this disclaimer for the sake of posterity!

Recently on hackernews, I saw this amazing bunch of vimfu (learned some nifty tricks from it too!), and so I went to check out Jim Dennis’ other answers. Nothing quite so popular, but some worthy reads about “what qualifies a person as a kernel developer” and “can I run a .so compiled in red hat under ubuntu?” – the one that inspired this post though, is the classic “how to generate all subsets of a set?

Jim suggests using python’s itertools module (in addition to bitmasking on the subset-representation), and I’ve never had cause to look at itertools before. Boy, how I wish I had! I quickly whipped up the following (after reading about argument packing/unpacking:

There’s probably reasons why it’s ugly, and certainly ways to do it better. What I like about this though is that it handles pretty much anything you throw at it, aside from a single non-iterable object I suppose! So calling “all_subsets(1,2,3,4)”, or “all_subsets(‘funny’)”, or “all_subsets([1,2,3,4])” or “all_subsets({‘a’:1,’b’:2})”, or even all_subsets(1) all work and spit out a list of tuples of the permutations.

A possible extension would be to convert back into the form originally provided – but I think that a standard form of return may be more useful. Regardless, this is a neat tool that I will keep in my box.

Edit: I made a modification to the above, so that it can work with any iterable or non-iterable object:

Wyatt is an Intern: Part 2

I am one month into my internship, and I’ve realized that I’m enjoying it.

Alright, I’ve known all along that I’m enjoying this. But it feels like a qualified decision now. This isn’t the awe of new things like, “wow I enjoy working”, or “ooh pretty city”, it’s a genuine appreciation of having a good thing going on.

I work on Webkit. That’s pretty cool. If you already know all about how webkit works, you can go find out what stuff I’ve patched by looking around in the bugzilla. If you don’t know how that stuff works, then I don’t expect you to care – it’s not very important anyway. I’ve only done test conversions so far, and a few other small things. I’m taking some bigger steps now, as I try to make a few actual code changes in the editing section – a cool 33 thousand lines of code among 60 files. Ramping up to speed on that is what this post is about.

My approach to this is holistic, which might be stupid of me. My fundamental assumption is that the better a programmer understands the system they are working within, the better a job they are able to do. I suppose that for an ideally modular system in a perfect world that might not hold, but I think it does here. I invite debate on the topic, but ask that you bring a) evidence which suggests I am wrong or b) viable alternatives to what I’ve said. If we simply disagree, we won’t be productive 😛

So, the “holistic approach”. I’ll define that as something akin to top-down, or biggest items first. Within the editing code, there’s a class ‘Editor’. It’s a pretty sizeable class – between its headers and implementation, it’s 3-4 thousand lines long. There’s a lot of other smaller classes, too. What my approach means is that I’ll read Editor’s high level definitions, then all of its methods, and as I find things which I don’t understand, branch into them and do similar. This search is a sort of optimised depth-first — only when something seems vital, common, and confusing will I break my task and go investigate it.

In this way, you can get a large chunk of understanding in a go. I spent 10 hours or so at the end of last week reading Editor.h/Editor.cpp and branching into some other important seeming code like Page, Frame, Node, EditorClient, and others. It seems to have gone pretty well. What seemed at first like a very confusing and scary bunch of code is now beginning to fit to a mental model: the Editor has a Page and a Frame and manipulates Nodes, and has all of the “policy information” about editing actions. The editing itself is done by the EditorClient, which is implemented separately by each of the different platforms that have ported Webkit. This large-scale picture makes the task feel much less daunting. I’ll draw an analogy.

Imagine you have a large puzzle with small pieces. The biggest clue for solving it is lookng at the picture on the box. With that frame of reference, you can identify the large objects you have to work with. That’s the first step in the recursion: you broke 1 big problem down into a group of smaller problems. Now the work is to treat each of these smaller problems in the same way: find a key feature or element, and identify subelements in relation to one another. Eventually, you’ll be working at the granularity of single puzzle pieces. Stepping back to code — you’ll eventually reach the granularity of single lines.

I’d like to hear people’s strategies for understanding large, complex, and at times roughly defined systems. I’m very interested in education, and at a certain level of abstraction, I think that all attempts to learn things are attempts to construct mental models which help us navigate complex systems. So I implore you – share your insights 🙂