C vs Memory

No time for garbage collection

Posted on by

There’s a false belief among young programmers that C is a bad language. I’m writing today to dispel one aspect of that notion: the misbelief that managing memory is hard.

Conventional “wisdom” dictates that modern languages with automatic garbage collection are a necessity. I’ve heard it said that superhuman mental abilities are needed to avoid forgetting to free and reuse free memory, thereby preventing memory leakage.

This view is untrue. The skills needed to manage memory are the same skills every programmer must learn when starting out. Now don’t get me wrong, I enjoy some languages with garbage collection. Go, Limbo, and Oberon come to mind. And it’s not like garbage collection was unheard of when Ken Thompson and Dennis Ritchie invented C. John McCarthy invented Garbage in the late 50s for his LISP programming language. The creators behind COBOL, FORTRAN, ALGOL, bliss, Pascal, BCPL, BASIC, PL/I, RPG, and many more left out garbage collection too because all in all, it’s an expensive solution to a non-problem.

First, I’ll speak about how expensive garbage collection is. There are various algorithms for it, but all of them keep track of two things: where’s my memory, and where’s it being used. The pointer variable has to be known by the system. The system also has to know when the scope of the pointer is exited - when the function to which that pointer is a local variable to, returns, for example.

You may think that when the pointer is no longer used, the memory can’t be either, but you’d be wrong. The function may have set a global variable to the pointer’s value or called another function that set a global variable. The function could have even set a dynamic structure with a pointer that is now pointing to the structure we allocated.

Things are getting sticky, eh?

Not all pointers are local variables in functions. Clearly, some of them are in data structures which are themselves dynamically allocated. They may even point to themselves. Then how does the system keep track of those pointers?

When the system allocates a new structure, it must know the layout of that structure and the location of pointers within. It must then add new members to the structure that keep track of those pointers.

We can’t free memory when the pointer variable simply goes out of scope. We can’t even keep track of the number of references, decrementing them when pointers go out of scope. Plus, structures can point to themselves, so a reference count would never go to zero anyway.

Then, how do we free memory?

With garbage collection, code is executed to free memory when there is not much left. The system kind of stops and takes inventory, frees all the unused storage, and then restarts.

There are a number of different garbage collection algorithms. A common one is “mark and sweep.” All dynamically allocated memory has a hidden variable that is cleared in the beginning of the mark and sweep. Then, using all currently alive pointers, both global and local, the system sets the bits of every dynamically allocated structure that can be referenced from those global and local pointers.

The mark and sweep crawls down any linked list and visits all the branches of any tree structures. Then the system looks at all the allocated memory and sees if the mark is set or cleared. If it’s cleared, that memory is freed.

All of this causes a noticeable pause; annoying in the best of situations. In the case of some real-time programs and operating systems, this pause is a deal breaker. Systems can’t just stop responding for a few seconds.

Oodles of code and time go into providing garbage collection, but it’s a problem that can be avoided with solid coding style.

How do we deal with allocating memory in C? Usually we use the function “malloc” to allocate storage and the function “free” to return it. The original Unix systems (and the Plan 9 system Coraid currently uses) actually use the “brk” and “sbrk” system calls to get memory from the kernel, so we can write our own allocators instead of using “malloc” and friends.

In software design, there is a natural place to allocate memory and a natural place to free it, often in the same function. The memory is allocated early on, and freed when all the work is done. A function that creates a tree node, for example, will allocate the memory. Later, there will be a function to free the entire tree. It’s all pretty simple.

The key to managing memory is well-organized code. A great deal of software programming is managing data and the operations on that data. If knowing when to free memory is hard because of disorganized code, there will surely be bigger problems that have nothing to do with managing memory.

In my 40+ years of coding, I’ve seen a lot of messy code. Usually this was the result of converting code from one form to another until the desired results were reached. This is stream of consciousness software development, and while I write with a stream of consciousness, it’s not how I program.

The best way to design systems is to make a layer of primitives to support the real task at hand. Then write that task. This is true of any language.

C doesn’t have garbage control because we don’t need it. Malloc’ing and freeing memory is just too simple to bother with the extra code, data space, and unexpected delays. It’s just easy.

About the Author

Brantley CoileInventor, coder, and entrepreneur, Brantley Coile invented Stateful packet inspection, network address translation, and Web load balancing used in the Cisco LocalDirector. He went on to create the Coraid line of storage appliances, a product he continues to improve today.

Sign up to have interesting musings delivered direct to your inbox.