Softpanorama

May the source be with you, but remember the KISS principle ;-)
Home Switchboard Unix Administration Red Hat TCP/IP Networks Neoliberalism Toxic Managers
(slightly skeptical) Educational society promoting "Back to basics" movement against IT overcomplexity and  bastardization of classic Unix

Memory allocation functions

  1. Basic Ideas
  2. Programming Examples
  3. De-allocation and memory leakage
  4. Debugging
  5. Meaning of Addresses
  1. Basic Ideas

    ANSI-C defines a group of four functions to allocate and de-allocate blocks of memory on request. These are to be found in the standard libraries, the prototypes are all in <stdlib.h>. They are very widely used by applications either directly or indrectly to create buffers for objects of inherently unpredictable sizes.

    The functions are.

    Function Operation
    void *calloc(size_t n, size_t s) Allocates space for n objects of size s. The space is initialised to all bits zero.
    void free(void *p) De-allocates the space pointed to by p. This must have a value returned by a previous call to calloc, malloc or realloc.
    void *malloc(size_t s) Allocates a space of size s. The initial value is indeterminate.
    void *realloc(void *p, size_t s) Changes the size of space indicated by p by the amount s. If the new area is larger the contents of the new portion are indeterminate.

    All these functions collectively administer a set of memory blocks sometimes called the arena or heap. Internally there is information specifying how large each block is. Freeing blocks may result in amalgamation of smaller blocks. A request for a larger block may result in an underlying software request for allocation of more real memory to the process by the host operating system.

    If a program is using these calls it is important to ensure that code never writes beyond the end of a memory block, this will overwrite other blocks and may corrupt the arena administration data, it will almost certainly result in an obscure program crash. It also important to ensure that all memory blocks obtained using one of the "alloc" calls is released via a "free" call. Failure to do this will result in the program gradually (or rapidly) accumulating an ever larger amount of memory until the system runs out, this is known as a memory leak.

    Under Solaris 2, confusingly, there are several sets of the basic functions, which you get depends on which library you link against. They are the bsdmalloc routines, the malloc(3C) functions and the malloc(3X) functions. They are all ANSI compliant, the 3X routines are space efficient, the bsdmalloc routines are fast and the 3C routines are a compromise. Each group includes some extra non-ANSI functionality that may be of interest.

    The 3C group is the default.

     

  2. Programming Examples

    Here's an example of a program that creates an array and fills it with the character 'x'. The program also reports how much CPU time the various operations took.

    #include        <stdlib.h>
    #include        <time.h>
    main()
    {
            int     i,size;
            char    *base;
            clock_t start,end;
            printf("Enter size of array to create ");
            scanf("%d",&size);
            start = clock();
            base = (char *)calloc(size,sizeof (char));
            end = clock();
            if(base != NULL)
            {
                    printf("Array of size %d created OK at address %p\n",size,base);
                    printf("Allocation took %d milliseconds\n",
                            (int)((end-start)*1E3/CLOCKS_PER_SEC));
            }
            else
            {
                    printf("Not enough memory\n");
                    exit(1);
            }
            start = clock();
            for(i=0;i<size;i++)
                    base[i] = 'x';
            end = clock();
            printf("Filling the array took %d milliseconds\n",
                            (int)((end-start)*1E3/CLOCKS_PER_SEC));
            
    }
    

    There are several interesting points to note.

    Here are some results from running the program to create arrays of 1, 10, 100 and 1000 Megabytes. You may find it interesting to repeat these tests on different systems.

    bash$ a.out
    Enter size of array to create 1000000
    Array of size 1000000 created OK at address 20be0
    Allocation took 70 milliseconds
    Filling the array took 130 milliseconds
    bash$ a.out
    Enter size of array to create 10000000
    Array of size 10000000 created OK at address 20be0
    Allocation took 700 milliseconds
    Filling the array took 1390 milliseconds
    bash$ a.out
    Enter size of array to create 100000000
    Array of size 100000000 created OK at address 20be0
    Allocation took 7020 milliseconds
    Filling the array took 13900 milliseconds
    Enter size of array to create 1000000000
    Not enough memory
    
    The overheads involved in creation of the arrays should be noted as should the time taken to fill the arrays.

    It is tempting when using calloc() to create instances of structure one by one as required by the program logic, this may not be wise as the overheads involved are significant. A useful alternative is to write some sort of wrapper function that maintains a "pool" of such structures and calls calloc() only when the "pool" is empty. Here's some typical code.

    #define	POOLSIZE	100
    
    
    struct	x *new()
    {
    	static	int	nfree = 0;	
    	static	struct	x *base;
    	if(!nfree)
    	{
    		base = (struct x *)calloc(POOLSIZE,sizeof(struct x));
    		if(base == NULL) return NULL;
    		nfree = POOLSIZE;
    	}
    	return base + POOLSIZE - nfre++;
    }
    

    A routine that creates new instances of memory consuming objects on demand in this fashion is often called a constructor.

    A common practice is to use structures to store information about objects, this will typically include descriptive text. The descriptive text needs to be stored somewhere and it is very common practice to calloc() a suitable area of memory. Here's a modified version of the "new" function shown above that does this. This time we're not using a pool of structures but allocating them one by one, this may be worth the overheads if they are going to be subsequently released.

    struct	x
    {
    	char	*text;
    	int	value;
    };
    
    
    struct	x *new(char *s)
    {
    	static	int	nfree = 0;
    	static	struct	x *base,*xptr;
    	char	*sptr;
    	sptr = (char *)calloc(strlen(s)+1,sizeof(char));
    	if(sptr == NULL) return NULL;
    	strcpy(sptr,s);
    	xptr = (struct x *)calloc(1,sizeof(struct x));
    	if(xptr == NULL)
    	{
    		free(sptr);
    		return NULL;
    	}
    	xptr -> text = sptr;
    	return xptr;
    }
    

    Again there are several interesting points here.

  3. Releasing memory space, leaks and garbage collection

    If instances of struct x are being created by the code shown in the previous section, it is very tempting to dispose of them thus.

    	struct	x *instance;
    
    	instance = new("hello world");
    
    	free(instance);
    

    This is wrong as, although it does release the memory occupied by the instance of a struct x it does not release the memory used to store a copy of the associated text. The address of this memory location is lost and it is no longer accessible, however the memory management routines have no way of knowing this. This is a classic example of a memory leak. The programmer should have written :-

    	free(instance->text);
    	free(instance);
    

    It is probably better practice to code a specific destructor function to go with the constructor function here called new(). This does not, of course, deal with the type of memory leak caused by creating an instance of something to deal with an input or some part of processing an input and then not destroying it, perhaps because the processing went through some unusual path through the program. Some C++ implementations, apparently, automatically destroy objects when they pass out of scope, this is of limited use if a pointer to an object is being used.

    In some programming languages, such as Java, the memory management facilities include a facility known as garbage collection. This requires that the memory management facilities periodically examine the whole data space of the program looking for pointers into the arena and checking that all objects in the arena are actually targets of pointers. This further requires that the language supports comparatively strong typing so that such pointers can be readily identified. The overheads involved are significant although Java implementations multi-thread garbage collection with normal activity so garbage collection can proceed while waiting for input. The normal memory management functions supplied with C compilers do not provide garbage collection.

  4. Debugging

    Debugging a program that uses calloc() and free() extensively can be difficult. There are two types of error that need to be tracked down.

    1. Those caused by over-writing the bounds of an allocated memory block. This is almost identical to the well-known C problem of "running off" the end of an array. The resultant damage to the arena will often cause an almost immediate crash.
    2. Memory leaks. This is a longer term problem that simple testing with a small amount of data is unlikely to reveal. Testing against large volumes of data is important.

    Under Solaris 2, dbx has some facilities to deal with such problems although they require linkage against a special set of allocation routines. They can be used periodically to display information about the arena. Solaris also provides a library function mallinfo that provides access to the arena information, this can be wrapped in a simple print function and included at suitable points in the program, or used in conjunction with assertions about the various values (i.e. that the values at the end of execution of a piece of code are the same as those at the start). Here's an example of the sort of code that might be used.

    	struct	mallinfo old,new;
    		.
    		.
    	old = mallinfo();
    		.
    		.
    		.
    	new = mallinfo();
    	assert(old.uordblks == new.uordblks);
    		.
    

    If the number of uordblks has changed, probably indicating that some memory has been allocated but not released the failed assertion will cause a core dump which can then be examined using the debugger. The usefulness of this approach is slightly restricted by lack of information about the significance of the various items of arena information reported by mallinfo().

  5. Addresses

    The addresses returned by calloc() are fairly arbitrary. If two, or more, objects are created by successive calls to calloc() they will not, in general, occupy successive locations as this example shows.

    #include	<time.h>
    main()
    {
    	struct	tm *t[3];
    	int	i,s = sizeof (struct tm);
    	for(i=0;i<3;i++)
    	{
    		t[i] = (struct tm*)calloc(1,s);
    	}
    	printf("Size of struct tm = %x\n",s);
    	for(i=0;i<3;i++)
    		printf("Instance %d base address %x\n",
    					i,t[i]);
    }
    

    The program allocates memory for three instances of a struct tm and reports the addresses. Here is the output.

    bash$ a.out
    Size of struct tm = 24
    Instance 0 base address 209e8
    Instance 1 base address 20a18
    Instance 2 base address 20a48
    

    This shows that the size of a struct tm is 24x whereas the difference between the addresses of the successive locations is 30x, this suggests an overhead of 1210 bytes per allocated piece of memory.

    It is also important to realise that the addresses associated with allocated memory objects are absolute within the program's address space. This means that the following practices are likely to cause severe problems.