Compiling Interpreters: Lexical Scope

Linked Lists Are Actually Pretty Cool

May 03, 2024

Consider a program with nested scopes

int x = 5;
if ( condition ) {
   double y = 7;
} else {
   float z = 8.0;
}

Whether the program is interpreted or compiled, the language tool will need to use a data structure to represent lexical scope. Scope is a term used to represent a branching tree of variable names, types, values, and any other information associated with the variable name. Despite the fact that almost all languages provide a concept of lexical scope, the implementation and choices of data structures vary widely from language to language and even implementation to implementation. This is a very deep subject with a lot of room for discussion.

What Do Scopes Do?

In the above program there are three bindings: x, y, and z. However, there is one complicating factor: y and z exist in different scopes. Due to the if conditional branch, only y or z will become bound, not both. Permitting inner scopes to use unbound variables would probably be considered an error in most languages. To correctly implement a lexical scope there are three operations that typically must be performed throughout compilation or interpretation.

Insert: Bind types, values, and other information to a variable name.

Lookup: Find the types, values, and other information associated with a variable name.

Clone: Create a functionally identical copy of an existing lexical scope.

What are Some Common Data Structure Options?

Different languages take different approaches to lexical scope. Different data structures will have vastly different performance characteristics.

Linked List

Insert - O(1)

Lookup - O(n)

Clone - O(1)

Memory Used - O(n)

Hashtable

Insert - O(1)

Lookup - O(1)

Clone - O(n)

Memory Used - O(n^2)

Analyzing the Hashtable Option

Hashtables initially look appealing, however the polynomial memory usage is often a deal breaker. For each branch it is necessary to create a new deep copy of the scope, which leads to a worst case O(n^2) memory usage. However, in practice it is sometimes possible to delete keys after exiting a scope, which permits reuse of existing scopes. Such optimizations can reduce memory usage from polynomial to linear.

Analyzing the Linked List Option

Linked Lists are probably the simplest option. With no optimizations, copy and insert are constant. Linear lookup is really not that bad either. The key to linked lists efficiency is the always looking backwards nature of scope. Older bindings can always be considered immutable which leads to trivial sharing of data. Inner scopes can share only the truly shared data with no need to perform a deep copy of any scope. This means that tail references can always be shared and “copy” is a constant operation.

Summary

There exist a wider variety of data structures used in practice, however immutable data vs mutable data is a big philosophical difference in the implementation of lexical scope. Consider your favorite language and implementation, does it use mutable or immutable scope?

Andrew’s Substack

Discussion about this post