conventions are a flawed solution to the code-style problem
In my personal code, I like to write names_like_this, and Types_Like_This, and CONSTANTS, and I indent by 2 spaces. I put {‘s on new lines for functions, but not for control structures. And whenever I have 2+ single-line ifs, I’m unlikely to bother writing {}’s out.
No programmer needs to be told about the nature of the debates on style in code. To any non-technical person reading this: imagine multiple Sheldon Coopers arguing about an issue more esoteric and asinine than he normally bothers with. Except that for us, it does tend to matter, and here’s two reasons why:
If there isn’t a consistent set of rules about code style, things will be harder to read for everyone, and there will be more mistakes. Picture collaboratively written prose where the authors insist on spelling words in their own unique ways. That’s why we need to choose a single style in the first place.
If the style is not the one you naturally think in, you will face at least some period of difficulty reading and writing it. Imagine if we decided english would be written with capitalization inverted from how it is today, and instead of spaces, we’d use ^ marks. tHINGS^WOULD^LOOK^LIKE^THIS,^AND^PEOPLE^WOULD^RIGHTLY^COMPLAIN. It’s jarring and uncomfortable — but you would eventually get used to it.
The mental and temporal cost of choosing a specific style for a new project, imposing a style on written code, or changing styles in an existing project is huge. Builds break, people hunt for inconsistencies for days, people write code and then groan and rewrite code, people use complicated regular expressions that become time-sinks. Code linters tell us about our follies: but we have to fix them. Each wasted second is small, but it’s a break in focus, and it’s huge across a hundred developers over a year.
I propose a Code Style Switcher
I followed the following line of reasoning:
- We can identify elements of code style
- We can specify a set of code style conventions
- We can formalize a set of conventions
- We can automatically detect failures against code conventions
- We can recommend style-fixes <- this is about where existing technology ends
- We can automatically apply style-fixes
- We can read a file in and spit it back out semantically identical, but formatted in a different style
- We can make a system where everyone writes and sees their own style
From there, I thought of a few other cool things — given a code corpus:
- We can detect code conventions
- We can detect personal code conventions
- We can detect ‘problem spots’ where inconsistencies arise
- We can calculate a minimum-effort code convention for a team, if we want to
The key idea is that every developer could write their own code, in a way that makes sense to them. Every developer would see code that makes sense to them. There would be no build-breaks based upon lint failures, no “boyscouting” to bring non-conformant files you touch up-to-snuff. And no fussing in code reviews about bracket placement or which character is used for whitespace (and the quantities of chosen).
This is a big problem, and it’s silly that we haven’t universally solved it, when it’s exactly the sort of problem computers are good at solving. (it’s basically machine learning search and replace).
Implementation
A proof of concept would likely operate on a very small scale with specific configurable elements — e.g. {} placement and indentation. A suitable expansion from that point would be to deal with naming conventions (camelCase versus underscores, _names, TypeNames, etc). From there, the horizon point would be an extensible style language capable of defining a style for any parseable token or combination of successive tokens for a given language, as well as the ability to exclude certain tokens from being processed.
Eventually, the switcher would have to be set up as a plugin for an editor, to allow reading/displaying and writing files in a given style. I’ll likely make a stylesheet generator at some point that shows you “your style” given a piece of code and identifies inconsistencies. The underlying implementation will likely be a c library for the sake of portability, but that may come after prototypes written in higher level languages more suited to string processing or with nice native data structures (perl, python, ruby, or js)
Code would likely be stored on the machine in whatever style the author defined for it, but when committed to a repository, it would likely be wise to somehow reconcile the styles. That could be accomplished with post-commit or post-push scripts, but I’d like to think about that problem more. It’s a long ways down the road. I also need to think more about how to get from detecting a failure to applying a fix. The possibility of semantic breaks is very real, and determining “the safe route” may not be tractable for me.
Anyway, now I’ve just got to do it! I’ll add more posts as I make progress — remember though, I’m pretty busy 🙂
Post-script: I hardly even noticed, but I’m on day 18 now. I need to get a “next thing” figured out fast — at the top of my head are speed-reading, reading philosophy and learning math. I do intend to continue blogging every day to increasing my luck surface-area, also to help in attaining/retaining a producer’s spirit.