Softpanorama

May the source be with you, but remember the KISS principle ;-)
Home Switchboard Unix Administration Red Hat TCP/IP Networks Neoliberalism Toxic Managers
(slightly skeptical) Educational society promoting "Back to basics" movement against IT overcomplexity and  bastardization of classic Unix

A symbol table

News Lexical analysis Best books Recommended Links Searching Algorithms   Humor Etc

A symbol table is a data structure used by a compiler to keep track of scope/ binding information about names. Compiler can use a single symbol table or multiple symbol tables. The latter is an option if compiler is written in the language with dynamic storage allocation.

This information in symbol table is created by lexical analyser and later enhanced by parser and code generator. is used in the source program to identify the various program elements, like variables, constants, procedures, and the labels of statements.

During lexical phase the symbol table is searched every time a name is encountered in the source text. When a new name discovered a new entry is added. If new information about an existing name is discovered, the content of the corresponding record in the symbol table is updated.  Therefore, a symbol table must have an efficient lookup mechanism for accessing the information about identifies and other elements of the program.  Additions to the symbol table is just a small fraction of lookups.

Each entry in a symbol table can be implemented as a record that consists of several fields.

In block structured languages the information about a name depends on the block in which the name is encounted. For example, in C , the same name can be used as a variable name and as a member name of a structure, both in the same block.

That means that the record created by lexical analyzer can later be split into multiple records when the declarations are processed.

One solution is to delay creation of records till the when the identifier syntactic role is discovered. In such cases, the lexical analyzer only returns the string representing the name to the parser, rather than a pointer to the symbol table record.

If there is a upper bound on the length of the identifiers (say 32 characters look reasonable), then the name can be stored in the symbol table record itself. But if there is no such limit, then an indirect scheme of storing name is used. A separate array of characters, called a "string table," is used to store the name, and a pointer to the name is kept in the symbol table record. This idea was pioneered by XPL.

The information about the address assigned to the identifier is also kept in the symbol table. If the compiler is going to be generating assembly code, then the assembler takes care of the storage allocation. If not then the compiler should have appropriate capabilities.

The unsorted list

An unsorted linear list of records is the easiest way to implement a symbol table. The new names are added to the table in the order that they arrive. Whenever a new name is to be added to the table, the table is first searched linearly or sequentially to check whether or not the name is already present in the table. If the name is not present, then the record for new name is created and added. As the same name can often is used again (as in i=i+5)  it makes sense to search the table from the last record, not the first.

You can also use heuristics optimization by inserting the last found record at the very top of the table. This way frequently used identifiers will be clustered near the top.

Trees Structures

A search tree is a more efficient approach to symbol table organization. They are well covered in Knuth.

Hash Tables

This is the most efficient approach to building symbol table. Well covered in Knuth.

In memory database

You can use various in-memory databases as modern computer provide ample amount of memory.

Relational database

If  you expect that amount of information stored will be enormous then regular database such as MySQL is Berkeley or even MySQL might be OK too. Generally this is an overkill but in experimental compilers it provides important advantages die to better visibility and ability to manipulate the database using SQL.  That greatly simplifies debugging.

Scope Rules

These scope rules require a more complicated symbol table organization than simply a list of associations between names and attributes. That's why many experimental languages and also in the past such languages as Perl use flat "global" namespace. The simplest approach of to use multiple symbol tables, one for each active block, such as the block that the compiler is currently in.

The other is to use block prefix to denote the "real" name of the variable.

You can also use nesting depth as part of the name. In this case a triple {procedure name, nesting depth, variable name} uniquely identifies all variables.


Top Visited
Switchboard
Latest
Past week
Past month

NEWS CONTENTS

Old News

[Dec 28, 2010] 6.1 Symbol Tables

After ASTs have been constructed, the compiler must check whether the input program is type-correct. This is called type checking and is part of the semantic analysis. During type checking, a compiler checks whether the use of names (such as variables, functions, type names) is consistent with their definition in the program. For example, if a variable x has been defined to be of type int, then x+1 is correct since it adds two integers while x[1] is wrong. When the type checker detects an inconsistency, it reports an error (such as ``Error: an integer was expected"). Another example of an inconsistency is calling a function with fewer or more parameters or passing parameters of the wrong type.

Consequently, it is necessary to remember declarations so that we can detect inconsistencies and misuses during type checking. This is the task of a symbol table. Note that a symbol table is a compile-time data structure. It's not used during run time by statically typed languages. Formally, a symbol table maps names into declarations (called attributes), such as mapping the variable name x to its type int. More specifically, a symbol table stores:

One convenient data structure for symbol tables is a hash table. One organization of a hash table that resolves conflicts is chaining. The idea is that we have list elements of type:

class Symbol {
    public String key;
    public Object binding; 
    public Symbol next;
    public Symbol ( String k, Object v, Symbol r ) { key=k; binding=v; next=r; }
}
to link key-binding mappings. Here a binding is any Object, but it can be more specific (eg, a Type class to represent a type or a Declaration class, as we will see below). The hash table is a vector Symbol[] of size SIZE, where SIZE is a prime number large enough to have good performance for medium size programs (eg, SIZE=109). The hash function must map any key (ie. any string) into a bucket (ie. into a number between 0 and SIZE-1). A typical hash function is, h(key) = num(key) mod SIZE, where num converts a key to an integer and mod is the modulo operator. In the beginning, every bucket is null. When we insert a new mapping (which is a pair of key-binding), we calculate the bucket location by applying the hash function over the key, we insert the key-binding pair into a Symbol object, and we insert the object at the beginning of the bucket list. When we want to find the binding of a name, we calculate the bucket location using the hash function, and we search the element list of the bucket from the beginning to its end until we find the first element with the requested name.

Consider the following Java program:

1) { 
2)   int a;
3)   {
4)     int a;
5)     a = 1;
6)   };
7)   a = 2;
8) };
The statement a = 1 refers to the second integer a, while the statement a = 2 refers to the first. This is called a nested scope in programming languages. We need to modify the symbol table to handle structures and we need to implement the following operations for a symbol table:

We have already seen insert and lookup. When we have a new block (ie, when we encounter the token {), we begin a new scope. When we exit a block (ie. when we encounter the token }) we remove the scope (this is the end_scope). When we remove a scope, we remove all declarations inside this scope. So basically, scopes behave like stacks. One way to implement these functions is to use a stack of numbers (from 0 to SIZE) that refer to bucket numbers. When we begin a new scope we push a special marker to the stack (eg, -1). When we insert a new declaration in the hash table using insert, we also push the bucket number to the stack. When we end a scope, we pop the stack until and including the first -1 marker. For each bucket number we pop out from the stack, we remove the head of the binding list of the indicated bucket number. For example, for the previous C program, we have the following sequence of commands for each line in the source program (we assume that the hash key for a is 12):

1) push(-1)
2) insert the binding from a to int into the beginning of the list table[12]
   push(12)
3) push(-1)
4) insert the binding from a to int into the beginning of the list table[12]
   push(12)
6) pop()
   remove the head of table[12]
   pop()
7) pop()
   remove the head of table[12]
   pop()
Recall that when we search for a declaration using lookup, we search the bucket list from the beginning to the end, so that if we have multiple declarations with the same name, the declaration in the innermost scope overrides the declaration in the outer scope.

The textbook makes an improvement to the above data structure for symbol tables by storing all keys (strings) into another data structure and by using pointers instead of strings for keys in the hash table. This new data structure implements a set of strings (ie. no string appears more than once). This data structure too can be implemented using a hash table. The symbol table itself uses the physical address of a string in memory as the hash key to locate the bucket of the binding. Also the key component of element is a pointer to the string. The idea is that not only we save space, since we store a name once regardless of the number of its occurrences in a program, but we can also calculate the bucket location very fast.

Recommended Links

Symbol table - Wikipedia, the free encyclopedia

Symbol Tables Symbol Table Information Symbol table organization ...

Evaluate Cadaver Symbol Table-Binding Implementation



Etc

Society

Groupthink : Two Party System as Polyarchy : Corruption of Regulators : Bureaucracies : Understanding Micromanagers and Control Freaks : Toxic Managers :   Harvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Neoliberalism  : The Iron Law of Oligarchy : Libertarian Philosophy

Quotes

War and Peace : Skeptical Finance : John Kenneth Galbraith :Talleyrand : Oscar Wilde : Otto Von Bismarck : Keynes : George Carlin : Skeptics : Propaganda  : SE quotes : Language Design and Programming Quotes : Random IT-related quotesSomerset Maugham : Marcus Aurelius : Kurt Vonnegut : Eric Hoffer : Winston Churchill : Napoleon Bonaparte : Ambrose BierceBernard Shaw : Mark Twain Quotes

Bulletin:

Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient markets hypothesis : Political Skeptic Bulletin, 2013 : Unemployment Bulletin, 2010 :  Vol 23, No.10 (October, 2011) An observation about corporate security departments : Slightly Skeptical Euromaydan Chronicles, June 2014 : Greenspan legacy bulletin, 2008 : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Financial Humor Bulletin, 2010 : Inequality Bulletin, 2009 : Financial Humor Bulletin, 2008 : Copyleft Problems Bulletin, 2004 : Financial Humor Bulletin, 2011 : Energy Bulletin, 2010 : Malware Protection Bulletin, 2010 : Vol 26, No.1 (January, 2013) Object-Oriented Cult : Political Skeptic Bulletin, 2011 : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method  : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law

History:

Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds  : Larry Wall  : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOSProgramming Languages History : PL/1 : Simula 67 : C : History of GCC developmentScripting Languages : Perl history   : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history

Classic books:

The Peter Principle : Parkinson Law : 1984 : The Mythical Man-MonthHow to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Hater’s Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite

Most popular humor pages:

Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor

The Last but not Least Technology is dominated by two types of people: those who understand what they do not manage and those who manage what they do not understand ~Archibald Putt. Ph.D


Copyright © 1996-2021 by Softpanorama Society. www.softpanorama.org was initially created as a service to the (now defunct) UN Sustainable Development Networking Programme (SDNP) without any remuneration. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.

FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.

This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...

You can use PayPal to to buy a cup of coffee for authors of this site

Disclaimer:

The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the Softpanorama society. We do not warrant the correctness of the information provided or its fitness for any purpose. The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be tracked by Google please disable Javascript for this site. This site is perfectly usable without Javascript.

Last modified: March 12, 2019