Failures and errata could be thought of as a fixture in the Computer Science landscape. There is no standards body that is capable of eliminating errors altogether from computer code, no set of tools or mathematical proofs that can guarantee that all of our code is perfect. Software is buggy; that’s just the nature of the beast. We’ll find problems and fix them, but there will always be more. Right?
Yes.
Then why are you writing a blog?
Thus, my question is: can we reduce the number of errors which are produced? This is really a matter that runs exceptionally close to the very heart of Computer Science. In my mind, the best definition of computer science comes from the landmark work of Abelson and Sussman, the Structure and Interpretation of Computer Programs. Sussman suggests that Computer Science as a field is about formalizing notions of process. The same way that geometry is about formalizing notions of mathematical systems, and started with people believing it was just a tool to describe properties of the earth, Computer Science is widely held to be about a certain sort of machine, when it is in fact about formalizing the concept of a process. This is really pretty cool.
One might think about information in two forms, code and data. Data is just simple values like the number 5, the word “Hello”, etc. It is acted upon by code. Code is the stuff that describes what we do with the Data and how we do it. Take, for example, the idea of squaring a number:
int square(int x) { return x * x; }
We have a variable ‘x’, which is a piece of integer data, acted upon by a function ‘square’ which is code. Both are information. It is very easy to verify and check the validity of ‘x’. It’s also easy to verify and check the validity of what the result of the function is – both are data. What is difficult to do is verify or check the validity of the code. In fact, given an arbitrary piece of code and arbitrary input, it has been proven that there can be no general algorithm which can determine whether or not the algorithm will reach a correct answer, or even whether or not the algorithm will ever finish. This is known as the ‘halting problem‘, and is a foundational element of the science of computation. By definition, the search for errors in our code is related to this task. The impossibility of the general problem serves to explain why the search for ways to reduce the number of errors produced is so central to the field.
Present work
Some work has been done (obviously. As I said, this stuff is foundational and central to the field) on ways to prove correctness of programs and on ways to reduce the number of errors that programmers can introduce. A common methodology which is developing is known as Test-Driven Development – it stipulates that before you write code, you produce a full set of tests to rigid specifications, and then forge code which will make the tests pass. Provided that the tests cover the entire space of possible inputs and expected outputs, a program can be functionally proven to be correct. There are not many cases where this is possible, but because test-generation software can be highly automated and use ranges of input and acceptable output, it is feasible. In most cases, a programmer (or team) could not manage to produce such a high level of coverage, but the methodology remains useful. Tests do not force your code to be written correctly, but do force you to fix it before it is put into real world use. A serious issue with this methodology is that many people do not fully follow it, producing low test coverage, and additionally that tests themselves are typically made of code. Thus, there could theoretically be errors in the tests themselves, or in the software which runs the tests.
Several other approaches can be used in tandem with test-driven development. Many people use smart pointers or garbage collection systems to try to get the computer to solve one of the hardest problems in development: memory management. These systems can eliminate the issue of lost memory, but typically run at a lower level of efficiency than programs which manage memory manually. There are also languages which lack a ‘null’ type, or at least which modify the concept of ‘nullness’ to eliminate certain sorts of errors. Other options include runtime analysis tools such as electric fence, valgrind, and gdb which allow a programmer to observe the operation of their program, and tools for profiling and code analysis which can report common sources of errors or possible variations from known error-reducing forms. An example of such a form is to use ‘literal == variable’ inside conditional statements to get the compiler to report situations where you type ‘=’ instead of ‘==’ as errors. These are all simply aids, but used in tandem (alongside pair programming, adequate code review, sufficient design time, and thorough specification), the quantity of errors present in code produced would certainly plummet – perhaps they would approach the ideal.
There is also fascinating work being done in formal verification of the correctness of a particular program. A team implemented a high level prototype of an OS Kernel in the functional language Haskell that was mathematically proven to be free from certain classes of error, and from there translated the code into C while retaining the qualities that allowed for the proof to hold.
Future work
Expansion of the Haskell-proof code would be really cool. What is likely to come into existence within our lifetimes is a much more formal level of code review, as computers pervade every aspect of society. Bugs will only be capable of causing more damage as time passes, and liability must be established. Does liability lie with the company who wrote the code? Perhaps it rests upon the individual or team who wrote specific lines? What about the architect who originally devised of the system? Or the people running it who did not understand its use? These are all difficult questions — and one of the best solutions would be to set up a formal, official system which defines answers. Just as Engineers must take ethical and safety training before they are allowed to create systems for public use, and must sign off on blueprints and designs before they are actually physically constructed, such a system will have to exist for programmers.
When software designers envisage a system, they will have to perform the greatest level of research on possible things that could go wrong, and build every bit of fault tolerance into it that is possible. The closest thing to formal proofs of correctness will have to be produced. Programmers will have to be individually held accountable for the code they produced, and team leads for the code produced under them, going all the way up the ladder. This is the case because one person’s code is not often individually a problem. Instead, code interacting with other code tends to be what causes issues.
Whatever body is formed to establish these standards, oversee the regulation of software production houses, and enforce personal liability will have to be multinational and hold recognition from the governments of the world. Perhaps it will need to be placed within something like the United Nations. There are a myriad issues with this idea, but those problems can and will be solved, and it would be better for people to lose a little money and time while investing in making software right by default than to keep our idea of competition up and let what will eventually be billions of people die because of typographical errors in software.
Hypercomputers and the distant future.
There is a class of computer beyond the turing machine: it is known as the hypercomputer. Hypercomputers operate super-recursively on supertasks. These systems could produce answers to problems that are not presently answerable, such as (in some cases) the undecidability problem. If such systems could be made, then the issue of determining correctness of code would be far easier, and possibly even trivial. Many theoretical notions of hypercomputers exist, but very few are even remotely physically possible. What is a particularly interesting realization is that it is an open problem whether or not any naturally existing processes are capable of acting as hypercomputers, particularly (and intriguingly) whether or not the Human brain is capable of solving these sorts of super-Turing problems. If we are capable of determining, generally, the correctness of a program (or whether or not one will halt), then strong AI may also be able to do this.
In some distant time, we may have learned a great deal more about the notions of process which we have just begun to understand in the last century or so. This would parallel the process of the development of understanding of mathematics and geometry. Within that future, it is possible that notions of process and the languages used to express them will have evolved to such an extent that we are able to very easily determine the correctness of an already written program. This is possible because, at present, we are capable of determining the correctness of a program, it is just an extremely time consuming and difficult process, which is prohibitively expensive to regularly undertake.
Hopefully, the world will stop being so wrong some day soon.
I’m intrigued by the idea of eventually being able to reasonably and fully test programs. It’ll be an amazing thing.
And, really, love the shipment of fail.