Referenced Before Assignment Local Variable In C++

In computer programming, the scope of a name binding – an association of a name to an entity, such as a variable – is the region of a computer program where the binding is valid: where the name can be used to refer to the entity. Such a region is referred to as a scope block. In other parts of the program the name may refer to a different entity (it may have a different binding), or to nothing at all (it may be unbound).

The scope of a binding is also known as the visibility of an entity, particularly in older or more technical literature – this is from the perspective of the referenced entity, not the referencing name. A scope is a part of a program that is or can be the scope for a set of bindings – a precise definition is tricky, but in casual use and in practice largely corresponds to a block, a function, or a file, depending on language and type of entity. The term "scope" is also used to refer to the set of all entities that are visible or names that are valid within a portion of the program or at a given point in a program, which is more correctly referred to as context or environment.[a]

Strictly speaking[b] and in practice for most programming languages, "part of a program" refers to "portion of the source code (area of text)", and is known as lexical scope. In some languages, however, "part of a program" refers to "portion of run time (time period during execution)", and is known as dynamic scope. Both of these terms are somewhat misleading – they misuse technical terms, as discussed in the definition – but the distinction itself is accurate and precise, and these are the standard respective terms. Lexical scope is the main focus of this article, with dynamic scope understood by contrast with lexical scope.

In most cases, name resolution based on lexical scope is straightforward to use and to implement, as in use one can simply read backwards in the source code to determine to which entity a name refers, and in implementation one can simply maintain a list of names and contexts when compiling or interpreting a program. Basic difficulties arise in name masking, forward declarations, and hoisting, while considerably subtler ones arise with non-local variables, particularly in closures.

Definition[edit]

The strict definition of the (lexical) "scope" of a name (identifier) is unambiguous – it is "the portion of source code in which a binding of a name with an entity applies" – and is virtually unchanged from its 1960 definition in the specification of ALGOL 60. Representative language specification follow.

ALGOL 60 (1960)[1]
The following kinds of quantities are distinguished: simple variables, arrays, labels, switches, and procedures. The scope of a quantity is the set of statements and expressions in which the declaration of the identifier associated with that quantity is valid.
C (2007)[2]
An identifier can denote an object; a function; a tag or a member of a structure, union, or enumeration; a typedef name; a label name; a macro name; or a macro parameter. The same identifier can denote different entities at different points in the program. [...] For each different entity that an identifier designates, the identifier is visible (i.e., can be used) only within a region of program text called its scope.
Go (2013)[3]
A declaration binds a non-blank identifier to a constant, type, variable, function, label, or package. [...] The scope of a declared identifier is the extent of source text in which the identifier denotes the specified constant, type, variable, function, label, or package.

Most commonly "scope" refers to when a given name can refer to a given variable – when a declaration has effect – but can also apply to other entities, such as functions, types, classes, labels, constants, and enumerations.

Lexical scope vs. dynamic scope[edit]

A fundamental distinction in scoping is what "part of a program" means. In languages with lexical scope (also called static scope), name resolution depends on the location in the source code and the lexical context, which is defined by where the named variable or function is defined. In contrast, in languages with dynamic scope the name resolution depends upon the program state when the name is encountered which is determined by the execution context or calling context. In practice, with lexical scope a variable's definition is resolved by searching its containing block or function, then if that fails searching the outer containing block, and so on, whereas with dynamic scope the calling function is searched, then the function which called that calling function, and so on, progressing up the call stack.[4] Of course, in both rules, we first look for a local definition of a variable.

Most modern languages use lexical scoping for variables and functions, though dynamic scoping is used in some languages, notably some dialects of Lisp, some "scripting" languages like Perl, and some template languages.[c] Even in lexically scoped languages, scope for closures can be confusing to the uninitiated, as these depend on the lexical context where the closure is defined, not where it is called.

Lexical resolution can be determined at compile time, and is also known as early binding, while dynamic resolution can in general only be determined at run time, and thus is known as late binding.

Related concepts[edit]

In object-oriented programming, dynamic dispatch selects an object method at runtime, though whether the actual name binding is done at compile time or run time depends on the language. De facto dynamic scoping is common in macro languages, which do not directly do name resolution, but instead expand in place.

Some programming frameworks like AngularJS use the term "scope" to mean something entirely different than how it is used in this article. In those frameworks the scope is just an object of the programming language that they use (JavaScript in case of AngularJS) that is used in certain ways by the framework to emulate dynamic scope in a language that uses lexical scope for its variables. Those AngularJS scopes can themselves be in scope or out of scope (using the usual meaning of the term) in any given part of the program, following the usual rules of variable scope of the language like any other object, and using their own inheritance and transclusion rules. In the context of AngularJS, sometimes the term "$scope" (with a dollar sign) is used to avoid confusion, but using the dollar sign in variable names is often discouraged by the style guides.[5]

Use[edit]

Scope is an important component of name resolution,[d] which is in turn fundamental to language semantics. Name resolution (including scope) varies between programming languages, and within a programming language, varies by type of entity; the rules for scope are called scope rules or scoping rules. Together with namespaces, scoping rules are crucial in modular programming, so a change in one part of the program does not break an unrelated part.

Overview[edit]

See also: Variable (programming) § Scope and extent

When discussing scope, there are three basic concepts: scope,extent, and context. "Scope" and "context" in particular are frequently confused: scope is a property of an identifier, and is fixed, while context is a property of a program, which varies by position. More precisely, context is a property of a position in the program, either a position in the source code (lexical context) or a point during run time (execution context,runtime context, or calling context). Execution context consists of lexical context (at the current execution point) plus additional runtime state such as the call stack.[e] Thus, when the execution point of a program is in a variable name's scope, the "variable (name) is in context" (meaning "in the context at this point"), and when the execution point "exits a variable (name)'s scope", such as by returning from a function, "the variable (name) goes out of context".[f] Narrowly speaking, during execution a program enters and exits various scopes, and at a point in execution identifiers are "in context" or "not in context", hence identifiers "come into context" or "go out of context" as the program enters or exits the scope – however in practice usage is much looser.

Scope is a source-code level concept, and a property of identifiers, particularly variable or function names – identifiers in the source code are references to entities in the program – and is part of the behavior of a compiler or interpreter of a language. As such, issues of scope are similar to pointers, which are a type of reference used in programs more generally. Using the value of a variable when the name is in context but the variable is uninitialized is analogous to dereferencing (accessing the value of) a wild pointer, as it is undefined. However, as variables are not destroyed until they go out of context, the analog of a dangling pointer does not exist.

For entities such as variables, scope is a subset of lifetime (also known as extent) – a name can only refer to a variable that exists (possibly with undefined value), but variables that exist are not necessarily visible: a variable may exist but be inaccessible (the value is stored but not referred to within a given context), or accessible but not via the given name, in which case it is out of context (the program is "out of the scope of the name"). In other cases "lifetime" is irrelevant – a label (named position in the source code) has lifetime identical with the program (for statically compiled languages), but may be in or out of context at a given point in the program, and likewise for static variables – a static global variable is in context for the entire program, while a static local variable is only in context within a function or other local context, but both have lifetime of the entire run of the program.

Determining which entity an identifier refers to is known as name resolution or name binding (particularly in object-oriented programming), and varies between languages. Given an identifier, the language (properly, the compiler or interpreter) checks all entities that are in context for matches; in case of ambiguity (two entities with the same name, such as a global and local variable with the same name), the name resolution rules are used to distinguish them. Most frequently, name resolution relies on an "inner-to-outer" rule, such as the Python LEGB (Local, Enclosing, Global, Built-in) rule: names implicitly resolves to the narrowest relevant context. In some cases name resolution can be explicitly specified, such as by the and keywords in Python; in other cases the default rules cannot be overridden.

When two identical identifiers are in context at the same time, referring to different entities, one says that name masking is occurring, where the higher-priority name (usually innermost) is "masking" the lower-priority name. At the level of variables, this is known as variable shadowing. Due to the potential for logic errors from masking, some languages disallow or discourage masking, raising an error or warning at compile time or run time.

Various programming languages have various different scoping rules for different kinds of declarations and identifiers. Such scoping rules have a large effect on language semantics and, consequently, on the behavior and correctness of programs. In languages like C++, accessing an unbound variable does not have well-defined semantics and may result in undefined behavior, similar to referring to a dangling pointer; and declarations or identifiers used outside their scope will generate syntax errors.

Scopes are frequently tied to other language constructs and determined implicitly, but many languages also offer constructs specifically for controlling scope.

Levels of scope[edit]

Scope can vary from as little as a single expression to as much as the entire program, with many possible gradations in between. The simplest scoping rule is global scope – all entities are visible throughout the entire program. The most basic modular scoping rule is two-level scoping, with a global scope anywhere in the program, and local scope within a function. More sophisticated modular programming allows a separate module scope, where names are visible within the module (private to the module) but not visible outside it. Within a function, some languages, such as C, allow block scope to restrict scope to a subset of a function; others, notably functional languages, allow expression scope, to restrict scope to a single expression. Other scopes include file scope (notably in C), which functions similarly to module scope, and block scope outside of functions (notably in Perl).

A subtle issue is exactly when a scope begins and ends. In some languages, such as in C, a scope starts at declaration, and thus different names declared within a given block can have different scopes. This requires declaring functions before use, though not necessarily defining them, and requires forward declaration in some cases, notably for mutual recursion. In other languages, such as JavaScript or Python, a name's scope begins at the start of the relevant block (such as the start of a function), regardless of where it is defined, and all names within a given block have the same scope; in JavaScript this is known as variable hoisting. However, when the name is bound to a value varies, and behavior of in-context names that have undefined value differs: in Python use of undefined variables yields a runtime error, while in JavaScript undefined variables are usable (with undefined value), but function declarations are also hoisted to the top of the containing function and usable throughout the function.

Expression scope[edit]

Many languages, especially functional languages, offer a feature called let-expressions, which allow a declaration's scope to be a single expression. This is convenient if, for example, an intermediate value is needed for a computation. For example, in Standard ML, if returns , then is an expression that evaluates to , using a temporary variable named to avoid calling twice. Some languages with block scope approximate this functionality by offering syntax for a block to be embedded into an expression; for example, the aforementioned Standard ML expression could be written in Perl as , or in GNU C as .

In Python, auxiliary variables in generator expressions and list comprehensions (in Python 3) have expression scope.

In C, variable names in a function prototype have expression scope, known in this context as function protocol scope. As the variable names in the prototype are not referred to (they may be different in the actual definition) – they are just dummies – these are often omitted, though they may be used for generating documentation, for instance.

Block scope[edit]

Many, but not all, block-structured programming languages allow scope to be restricted to a block, which is known as block scope. This began with ALGOL 60, where "[e]very declaration ... is valid only for that block.",[6] and today is particularly associated with languages in the Pascal and C families and traditions. Most often this block is contained within a function, thus restricting the scope to a part of a function, but in some cases, such as Perl, the block may not be within a function.

unsignedintsum_of_squares(constunsignedintN){unsignedintret=0;for(unsignedintn=1;n<=N;n++){constunsignedintn_squared=n*n;ret+=n_squared;}returnret;}

A representative example of the use of block scope is the C code shown here, where two variables are scoped to the loop: the loop variable , which is initialized once and incremented on each iteration of the loop, and the auxiliary variable , which is initialized at each iteration. The purpose is to avoid adding variables to the function scope that are only relevant to a particular block – for example, this prevents errors where the generic loop variable has accidentally already been set to another value. In this example the expression would generally not be assigned to an auxiliary variable, and the body of the loop would simply be written but in more complicated examples auxiliary variables are useful.

Blocks are primarily used for control flow, such as with if, while, and for loops, and in these cases block scope means the scope of variable depends on the structure of a function's flow of execution. However, languages with block scope typically also allow the use of "naked" blocks, whose sole purpose is to allow fine-grained control of variable scope. For example, an auxiliary variable may be defined in a block, then used (say, added to a variable with function scope) and discarded when the block ends, or a while loop might be enclosed in a block that initializes variables used inside the loop that should only be initialized once.

A subtlety of several programming languages, such as Algol 68 and C (demonstrated in this example and standardized since C99), is that block-scope variables can be declared not only within the body of the block, but also within the control statement, if any. This is analogous to function parameters, which are declared in the function declaration (before the block of the function body starts), and in scope for the whole function body. This is primarily used in for loops, which have an initialization statement separate from the loop condition, unlike while loops, and is a common idiom.

Block scope can be used for shadowing. In this example, inside the block the auxiliary variable could also have been called , shadowing the parameter name, but this is considered poor style due to the potential for errors. Furthermore, some descendants of C, such as Java and C#, despite having support for block scope (in that a local variable can be made to go out of scope before the end of a function), do not allow one local variable to hide another. In such languages, the attempted declaration of the second would result in a syntax error, and one of the variables would have to be renamed.

If a block is used to set the value of a variable, block scope requires that the variable be declared outside of the block. This complicates the use of conditional statements with single assignment. For example, in Python, which does not use block scope, one may initialize a variable as such:

ifc:a='foo'else:a=''

where is accessible after the statement.

In Perl, which has block scope, this instead requires declaring the variable prior to the block:

my$a;if(c){$a='foo';}else{$a='';}

Often this is instead rewritten using multiple assignment, initializing the variable to a default value. In Python (where it is not necessary) this would be:

while in Perl this would be:

my$a='';if(c){$a='foo';}

In case of a single variable assignment, an alternative is to use the ternary operator to avoid a block, but this is not in general possible for multiple variable assignments, and is difficult to read for complex logic.

This is a more significant issue in C, notably for string assignment, as string initialization can automatically allocate memory, while string assignment to an already initialized variable requires allocating memory, a string copy, and checking that these are successful.

subincrement_counter(){my$counter=0;returnsub(){return++$counter;}}

Some languages allow the concept of block scope to be applied, to varying extents, outside of a function. For example, in the Perl snippet at right, is a variable name with block scope (due to the use of the keyword), while is a function name with global scope. Each call to will increase the value of by one, and return the new value. Code outside of this block can call , but cannot otherwise obtain or alter the value of . This idiom allows one to define closures in Perl.

Function scope[edit]

Most of the commonly used programming languages offer a way to create a local variable in a function or subroutine: a variable whose scope ends (that goes out of context) when the function returns. In most cases the lifetime of the variable is the duration of the function call – it is an automatic variable, created when the function starts (or the variable is declared), destroyed when the function returns – while the scope of the variable is within the function, though the meaning of "within" depends on whether scoping is lexical or dynamic. However, some languages, such as C, also provide for static local variables, where the lifetime of the variable is the entire lifetime of the program, but the variable is only in context when inside the function. In the case of static local variables, the variable is created when the program initializes, and destroyed only when the program terminates, as with a static global variable, but is only in context within a function, like an automatic local variable.

Importantly, in lexical scoping a variable with function scope has scope only within the lexical context of the function: it moves out of context when another function is called within the function, and moves back into context when the function returns – called functions have no access to the local variables of calling functions, and local variables are only in context within the body of the function in which they are declared. By contrast, in dynamic scoping, the scope extends to the runtime context of the function: local variables stay in context when another function is called, only moving out of context when the defining function ends, and thus local variables are in context of the function in which they are defined and all called functions. In languages with lexical scoping and nested functions, local variables are in context for nested functions, since these are within the same lexical context, but not for other functions that are not lexically nested. A local variable of an enclosing function is known as a non-local variable for the nested function. Function scope is also applicable to anonymous functions.

defsquare(n):returnn*ndefsum_of_squares(n):total=0i=0whilei<=n:total+=square(i)i+=1returntotal

For example, in the snippet of Python code on the right, two functions are defined: and . computes the square of a number; computes the sum of all squares up to a number. (For example, is 42 = , and is 02 + 12 + 22 + 32 + 42 = .)

Each of these functions has a variable named that represents the argument to the function. These two variables are completely separate and unrelated, despite having the same name, because they are lexically scoped local variables with function scope: each one's scope is its own, lexically separate function and thus, they don't overlap. Therefore, can call without its own being altered. Similarly, has variables named and ; these variables, because of their limited scope, will not interfere with any variables named or that might belong to any other function. In other words, there is no risk of a name collision between these identifiers and any unrelated identifiers, even if they are identical.

Note also that no name masking is occurring: only one variable named is in context at any given time, as the scopes do not overlap. By contrast, were a similar fragment to be written in a language with dynamic scope, the in the calling function would remain in context in the called function – the scopes would overlap – and would be masked ("shadowed") by the new in the called function.

Function scope is significantly more complicated if functions are first-class objects and can be created locally to a function and then returned. In this case any variables in the nested function that are not local to it (unbound variables in the function definition, that resolve to variables in an enclosing context) create a closure, as not only the function itself, but also its environment (of variables) must be returned, and then potentially called in a different context. This requires significantly more support from the compiler, and can complicate program analysis.

File scope[edit]

A scoping rule largely particular to C (and C++) is file scope, where scope of variables and functions declared at the top level of a file (not within any function) is for the entire file – or rather for C, from the declaration until the end of the source file, or more precisely translation unit (internal linking). This can be seen as a form of module scope, where modules are identified with files, and in more modern languages is replaced by an explicit module scope. Due to the presence of include statements, which add variables and functions to the internal context and may themselves call further include statements, it can be difficult to determine what is in context in the body of a file.

In the C code snippet above, the function name has file scope.

Module scope[edit]

In modular programming, the scope of a name can be an entire module, however it may be structured across various files. In this paradigm, modules are the basic unit of a complex program, as they allow information hiding and exposing a limited interface. Module scope was pioneered in the Modula family of languages, and Python (which was influenced by Modula) is a representative contemporary example.

In some object-oriented programming languages that lack direct support for modules, such as C++, a similar structure is instead provided by the class hierarchy, where classes are the basic unit of the program, and a class can have private methods. This is properly understood in the context of dynamic dispatch rather than name resolution and scope, though they often play analogous roles. In some cases both these facilities are available, such as in Python, which has both modules and classes, and code organization (as a module-level function or a conventionally private method) is a choice of the programmer.

Global scope[edit]

A declaration has global scope if it has effect throughout an entire program. Variable names with global scope — called global variables — are frequently considered bad practice, at least in some languages, due to the possibility of name collisions and unintentional masking, together with poor modularity, and function scope or block scope are considered preferable. However, global scope is typically used (depending on the language) for various other sorts of identifiers, such as names of functions, and names of classes and other data types. In these cases mechanisms such as namespaces are used to avoid collisions.

Lexical scoping vs. dynamic scoping [edit]

The use of local variables — of variable names with limited scope, that only exist within a specific function — helps avoid the risk of a name collision between two identically named variables. However, there are two very different approaches to answering this question: What does it mean to be "within" a function?

In lexical scoping (or lexical scope; also called static scoping or static scope), if a variable name's scope is a certain function, then its scope is the program text of the function definition: within that text, the variable name exists, and is bound to the variable's value, but outside that text, the variable name does not exist. By contrast, in dynamic scoping (or dynamic scope), if a variable name's scope is a certain function, then its scope is the time-period during which the function is executing: while the function is running, the variable name exists, and is bound to its value, but after the function returns, the variable name does not exist. This means that if function invokes a separately defined function , then under lexical scoping, function does not have access to 's local variables (assuming the text of is not inside the text of ), while under dynamic scoping, function does have access to 's local variables (since is invoked during the invocation of ).

$x=1$function g (){echo$x;x=2;}$function f (){localx=3; g ;}$ f # does this print 1, or 3?3$echo$x# does this print 1, or 2?1

Consider, for example, the program on the right. The first line, , creates a global variable and initializes it to . The second line, , defines a function that prints out ("echoes") the current value of , and then sets to (overwriting the previous value). The third line, defines a function that creates a local variable (hiding the identically named global variable) and initializes it to , and then calls . The fourth line, , calls . The fifth line, , prints out the current value of .

So, what exactly does this program print? It depends on the scoping rules. If the language of this program is one that uses lexical scoping, then prints and modifies the global variable (because is defined outside ), so the program prints and then . By contrast, if this language uses dynamic scoping, then prints and modifies 's local variable (because is called from within ), so the program prints and then . (As it happens, the language of the program is Bash, which uses dynamic scoping; so the program prints and then .)

Lexical scoping[edit]

With lexical scope, a name always refers to its (more or less) local lexical environment. This is a property of the program text and is made independent of the runtime call stack by the language implementation. Because this matching only requires analysis of the static program text, this type of scoping is also called static scoping. Lexical scoping is standard in all ALGOL-based languages such as Pascal, Modula2 and Ada as well as in modern functional languages such as ML and Haskell. It is also used in the C language and its syntactic and semantic relatives, although with different kinds of limitations. Static scoping allows the programmer to reason about object references such as parameters, variables, constants, types, functions, etc. as simple name substitutions. This makes it much easier to make modular code and reason about it, since the local naming structure can be understood in isolation. In contrast, dynamic scope forces the programmer to anticipate all possible dynamic contexts in which the module's code may be invoked.

programA;varI:integer;K:char;procedureB;varK:real;L:integer;procedureC;varM:real;begin(*scope A+B+C*)end;(*scope A+B*)end;(*scope A*)end.

For example, consider the Pascal program fragment at right. The variable is visible at all points, because it is never hidden by another variable of the same name. The variable is visible only in the main program because it is hidden by the variable visible in procedure and only. Variable is also visible only in procedure and but it does not hide any other variable. Variable is only visible in procedure and therefore not accessible either from procedure or the main program. Also, procedure is visible only in procedure and can therefore not be called from the main program.

There could have been another procedure declared in the program outside of procedure . The place in the program where "" is mentioned then determines which of the two procedures named it represents, thus precisely analogous with the scope of variables.

Correct implementation of static scope in languages with first-classnested functions is not trivial, as it requires each function value to carry with it a record of the values of the variables that it depends on (the pair of the function and this environment is called a closure). Depending on implementation and computer architecture, variable lookupmay become slightly inefficient[citation needed] when very deeply lexically nested functions are used, although there are well-known techniques to mitigate this.[7][8] Also, for nested functions that only refer to their own arguments and (immediately) local variables, all relative locations can be known at compile time. No overhead at all is therefore incurred when using that type of nested function. The same applies to particular parts of a program where nested functions are not used, and, naturally, to programs written in a language where nested functions are not available (such as in the C language).

History[edit]

Lexical scoping was used for ALGOL and has been picked up in most other languages since then.[4]Deep binding, which approximates static (lexical) scoping, was introduced in LISP 1.5 (via the Funarg device developed by Steve Russell, working under John McCarthy). The original Lisp interpreter (1960) and most early Lisps used dynamic scoping, but descendants of dynamically scoped languages often adopt static scoping; Common Lisp and Scheme (with SRFI 15) have both dynamic and static scoping. Perl is another language with dynamic scoping that added static scoping afterwards. Languages like Pascal and C have always had lexical scoping, since they are both influenced by the ideas that went into ALGOL 60 (although C did not include lexically nested functions).

The term "lexical scope" dates at least to 1967,[9] while the term "lexical scoping" dates at least to 1970, where it was used in Project MAC to describe the scoping rules of the Lisp dialect MDL (then known as "Muddle").[10]

Dynamic scoping[edit]

With dynamic scope, a global identifier refers to the identifier associated with the most recent environment, and is uncommon in modern languages.[4] In technical terms, this means that each identifier has a global stack of bindings. Introducing a local variable with name pushes a binding onto the global stack (which may have been empty), which is popped off when the control flow leaves the scope. Evaluating in any context always yields the top binding. Note that this cannot be done at compile-time because the binding stack only exists at run-time, which is why this type of scoping is called dynamic scoping.

Generally, certain blocks are defined to create bindings whose lifetime is the execution time of the block; this adds some features of static scoping to the dynamic scoping process. However, since a section of code can be called from many different locations and situations, it can be difficult to determine at the outset what bindings will apply when a variable is used (or if one exists at all). This can be beneficial; application of the principle of least knowledge suggests that code avoid depending on the reasons for (or circumstances of) a variable's value, but simply use the value according to the variable's definition. This narrow interpretation of shared data can provide a very flexible system for adapting the behavior of a function to the current state (or policy) of the system. However, this benefit relies on careful documentation of all variables used this way as well as on careful avoidance of assumptions about a variable's behavior, and does not provide any mechanism to detect interference between different parts of a program. Dynamic scoping also voids all the benefits of referential transparency. As such, dynamic scoping can be dangerous and few modern languages use it. Some languages, like Perl and Common Lisp, allow the programmer to choose static or dynamic scoping when defining or redefining a variable. Examples of languages that use dynamic scoping include Logo, Emacs Lisp, and the shell languages bash, dash, and PowerShell.

Dynamic scoping is fairly easy to implement. To find an identifier's value, the program could traverse the runtime stack, checking each activation record (each function's stack frame) for a value for the identifier. In practice, this is made more efficient via the use of an association list, which is a stack of name/value pairs. Pairs are pushed onto this stack whenever declarations are made, and popped whenever variables go out of scope.Shallow binding is an alternative strategy that is considerably faster, making use of a central reference table, which associates each name with its own stack of meanings. This avoids a linear search during run-time to find a particular name, but care should be taken to properly maintain this table. Note that both of these strategies assume a last-in-first-out (LIFO) ordering to bindings for any one variable; in practice all bindings are so ordered.

An even simpler implementation is the representation of dynamic variables with simple global variables. The local binding is performed by saving the original value in an anonymous location on the stack that is invisible to the program. When that binding scope terminates, the original value is restored from this location. In fact, dynamic scope originated in this manner. Early implementations of Lisp used this obvious strategy for implementing local variables, and the practice survives in some dialects which are still in use, such as GNU Emacs Lisp. Lexical scope was introduced into Lisp later. This is equivalent to the above shallow binding scheme, except that the central reference table is simply the global variable binding environment, in which the current meaning of the variable is its global value. Maintaining global variables isn't complex. For instance, a symbol object can have a dedicated slot for its global value.

Dynamic scoping provides an excellent abstraction for thread local storage, but if it is used that way it cannot be based on saving and restoring a global variable. A possible implementation strategy is for each variable to have a thread-local key. When the variable is accessed, the thread-local key is used to access the thread-local memory location (by code generated by the compiler, which knows which variables are dynamic and which are lexical). If the thread-local key does not exist for the calling thread, then the global location is used. When a variable is locally bound, the prior value is stored in a hidden location on the stack. The thread-local storage is created under the variable's key, and the new value is stored there. Further nested overrides of the variable within that thread simply save and restore this thread-local location. When the initial, outer-most override's scope terminates, the thread-local key is deleted, exposing the global version of the variable once again to that thread.

Macro expansion[edit]

Main article: Macro expansion

In modern languages, macro expansion in a preprocessor is a key example of de facto dynamic scope. The macro language itself only transforms the source code, without resolving names, but since the expansion is done in place, when the names in the expanded text are then resolved (notably free variables), they are resolved based on where they are expanded (loosely "called"), as if dynamic scoping were occurring.

The C preprocessor, used for macro expansion, has de facto dynamic scope, as it does not do name resolution by itself. For example, the macro:

will expand to add to the passed variable, with this identifier only later resolved by the compiler based on where the macro is "called" (properly, expanded), is in dynamic scope, and is independent of where the macro is defined. Properly, the C preprocessor only does lexical analysis, expanding the macro during the tokenization stage, but not parsing into a syntax tree or doing name resolution.

For example, in the following code, the in the macro is resolved (after expansion) to the local variable at the expansion site:

#define ADD_A(x) x + avoidadd_one(int*x){constinta=1;*x=ADD_A(*x);}voidadd_two(int*x){constinta=2;*x=ADD_A(*x);}

Qualified identifiers[edit]

As we have seen, one of the key reasons for scope is that it helps prevent name collisions, by allowing identical identifiers to refer to distinct things, with the restriction that the identifiers must have separate scopes. Sometimes this restriction is inconvenient; when many different things need to be accessible throughout a program, they generally all need identifiers with global scope, so different techniques are required to avoid name collisions.

To address this, many languages offer mechanisms for organizing global identifiers. The details of these mechanisms, and the terms used, depend on the language; but the general idea is that a group of identifiers can itself be given a name — a prefix — and, when necessary, an entity can be referred to by a qualified identifier consisting of the identifier plus the prefix. Normally such identifiers will have, in a sense, two sets of scopes: a scope (usually the global scope) in which the qualified identifier is visible, and one or more narrower scopes in which the unqualified identifier (without the prefix) is visible as well. And normally these groups can themselves be organized into groups; that is, they can be nested.

Although many languages support this concept, the details vary greatly. Some languages have mechanisms, such as namespaces in C++ and C#, that serve almost exclusively to enable global identifiers to be organized into groups. Other languages have mechanisms, such as packages in Ada and structures in Standard ML, that combine this with the additional purpose of allowing some identifiers to be visible only to other members of their group. And object-oriented languages often allow classes or singleton objects to fulfill this purpose (whether or not they also have a mechanism for which this is the primary purpose). Furthermore, languages often meld these approaches; for example, Perl's packages are largely similar to C++'s namespaces, but optionally double as classes for object-oriented programming; and Java organizes its variables and functions into classes, but then organizes those classes into Ada-like packages.

By language[edit]

This section needs expansion. You can help by adding to it.(April 2013)

Scoping rules for representative languages follow.

C[edit]

Main article: Linkage (software)

In C, scope is traditionally known as linkage or visibility, particularly for variables. C is a lexically scoped language with global scope (known as external linkage), a form of module scope or file scope (known as internal linkage), and local scope (within a function); within a function scopes can further be nested via block scope. However, standard C does not support nested functions.

The lifetime and visibility of a variable are determined by its storage class. There are three types of lifetimes in C: static (program execution), automatic (block execution, allocated on the stack), and manual (allocated on the heap). Only static and automatic are supported for variables and handled by the compiler, while manually allocated memory must be tracked manually across different variables. There are three levels of visibility in C: external linkage (global), internal linkage (roughly file), and block scope (which includes functions); block scopes can be nested, and different levels of internal linkage is possible by use of includes. Internal linkage in C is visibility at the translation unit level, namely a source file after being processed by the C preprocessor, notably including all relevant includes.

C programs are compiled as separate object files, which are then linked into an executable or library via a linker. Thus name resolution is split across the compiler, which resolves names within a translation unit (more loosely, "compilation unit", but this is properly a different concept), and the linker, which resolves names across translation units; see linkage for further discussion.

In C, variables with block scope enter scope when they are declared (not at the top of the block), move out of scope if any (non-nested) function is called within the block, move back into scope when the function returns, and move out of scope at the end of the block. In the case of automatic local variables, they are also allocated on declaration and deallocated at the end of the block, while for static local variables, they are allocated at program initialization and deallocated at program termination.

The following program demonstrates a variable with block scope coming into scope partway through the block, then exiting scope (and in fact being deallocated) when the block ends:

#include<stdio.h>intmain(void){charx='m';printf("%c\n",x);{printf("%c\n",x);charx='b';printf("%c\n",x);}printf("%c\n",x);}

The program outputs

There are other levels of scope in C.[12] Variable names used in a function prototype have function prototype visibility, and exit scope at the end of the function prototype. Since the name is not used, this is not useful for compilation, but may be useful for documentation. Label names for GOTO statement have function scope, while case label names for switch statements have block scope (the block of the switch).

C++[edit]

All the variables that we intend to use in a program must have been declared with its type specifier in an earlier point in the code, like we did in the previous code at the beginning of the body of the function main when we declared that a, b, and result were of type int. A variable can be either of global or local scope. A global variable is a variable declared in the main body of the source code, outside all functions, while a local variable is one declared within the body of a function or a block.

Modern versions allow nested lexical scoping.

Go[edit]

Go is lexically scoped using blocks.[3]

Java[edit]

Java is lexically scoped.

A Java class can contain three types of variables:[13]

Local variables are defined inside a method, or a particular block. These variables are local to where they were defined and lower levels. For example, a loop inside a method can use that method's local variables, but not the other way around. The loop's variables (local to that loop) are destroyed as soon as the loop ends.

Member variables, also called fields are variables declared within the class, outside of any method. By default, these variables are available for all methods within that class and also for all classes in the package.

Parameters are variables in method declarations.

In general, a set of brackets defines a particular scope, but variables at top level within a class can differ in their behavior depending on the modifier keywords used in their definition. The following table shows the access to members permitted by each modifier.[14]

ModifierClassPackageSubclassWorld
publicYesYesYesYes
protectedYesYesYesNo
(no modifier)YesYesNoNo
privateYesNoNoNo

JavaScript[edit]

JavaScript has simple scoping rules,[15] but variable initialization and name resolution rules can cause problems, and the widespread use of closures for callbacks means the lexical environment of a function when defined (which is used for name resolution) can be very different from the lexical environment when it is called (which is irrelevant for name resolution). JavaScript objects have name resolution for properties, but this is a separate topic.

JavaScript has lexical scoping[16] nested at the function level, with the global scope being the outermost scope. This scoping is used for both variables and for functions (meaning function declarations, as opposed to variables of function type).[17] Block scoping with the and keywords is standard since ECMAScript 6. Block scoping can be produced by wrapping the entire block in a function and then executing it; this is known as the immediately-invoked function expression (IIFE) pattern.

While JavaScript scoping is simple – lexical, function-level – the associated initialization and name resolution rules are a cause of confusion. Firstly, assignment to a name not in scope defaults to creating a new global variable, not a local one. Secondly, to create a new local variable one must use the keyword; the variable is then created at the top of the function, with value and the variable is assigned its value when the assignment expression is reached:

A variable with an Initialiser is assigned the value of its AssignmentExpression when the VariableStatement is executed, not when the variable is created.[18]

This is known as variable hoisting[19] – the declaration, but not the initialization, is hoisted to the top of the function. Thirdly, accessing variables before initialization yields , rather than a syntax error. Fourthly, for function declarations, the declaration and the initialization are both hoisted to the top of the function, unlike for variable initialization. For example, the following code produces a dialog with output , as the local variable declaration is hoisted, shadowing the global variable, but the initialization is not, so the variable is undefined when used:

a=1;functionf(){alert(a);vara=2;}f();

Further, as functions are first-class objects in JavaScript and are frequently assigned as callbacks or returned from functions, when a function is executed, the name resolution depends on where it was originally defined (the lexical environment of the definition), not the lexical environment or execution environment where it is called. The nested scopes of a particular function (from most global to most local) in JavaScript, particularly of a closure, used as a callback, are sometimes referred to as the scope chain, by analogy with the prototype chain of an object.

Closures can be produced in JavaScript by using nested functions, as functions are first-class objects.[20] Returning a nested function from an enclosing function includes the local variables of the enclosing function as the (non-local) lexical environment of the returned function, yielding a closure. For example:

functionnewCounter(){// return a counter that is incremented on call (starting at 0)// and which returns its new valuevara=0;varb=function(){a++;returna;};returnb;}c=newCounter();alert(c()+' '+c());// outputs "1 2"

Closures are frequently used in JavaScript, due to being used for callbacks. Indeed, any hooking of a function in the local environment as a callback or returning it from a function creates a closure if there are any unbound variables in the function body (with the environment of the closure based on the nested scopes of the current lexical environment, or "scope chain"); this may be accidental. When creating a callback based on parameters, the parameters must be stored in a closure, otherwise it will accidentally create a closure that refers to the variables in the enclosing environment, which may change.[21]

Name resolution of properties of JavaScript objects is based on inheritance in the prototype tree – a path to the root in the tree is called a prototype chain – and is separate from name resolution of variables and functions.

Lisp[edit]

Lisp dialects have various rules for scoping. The original Lisp used dynamic scoping; it was Scheme that introduced static (lexical) scoping to the Lisp family. Common Lisp adopted lexical scoping from Scheme, as did Clojure, but some other dialects of Lisp, like Emacs Lisp, still use dynamic scoping.

Python[edit]

For variables, Python has function scope, module scope, and global scope. Names enter scope at the start of a context (function, module, or globally), and exit scope when a non-nested function is called or the context ends. If a name is used prior to variable initialization, this raises a runtime exception. If a variable is simply accessed (not assigned to) in a context, name resolution follows the LEGB rule (Local, Enclosing, Global, Built-in). However, if a variable is assigned to, it defaults to creating a local variable, which is in scope for the entire context. Both these rules can be overridden with a or (in Python 3) declaration prior to use, which allows accessing global variables even if there is an intervening nonlocal variable, and assigning to global or nonlocal variables.

As a simple example, a function resolves a variable to the global scope:

>>> deff():... print(x)...>>> x='global'>>> f()global

Note that is initialized before is called, so no error is raised, even though it is declared after is declared. Lexically this is a forward reference, which is allowed in Python.

Here assignment creates a new local variable, which does not change the value of the global variable:

>>> deff():... x='f'... print(x)...>>> x='global'>>> print(x)global>>> f()f>>> print(x)global

Assignment to a variable within a function causes it to be declared local to the function (hence the local variable is in scope for the entire function), and thus using it prior to this assignment raises an error. This differs from C, where the local variable is only in scope from its declaration, not for the entire function. This code raises an error:

>>> deff():... print(x)... x='f'...>>> x='global'>>> f()Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<stdin>", line 2, in fUnboundLocalError: local variable 'x' referenced before assignment

The default name resolution rules can be overridden with the or (in Python 3) keywords. In the below code, the declaration in means that resolves to the global variable. It thus can be accessed (as it has already been initialized), and assignment assigns to the global variable, rather than declaring a new local variable. Note that no declaration is needed in – since it does not assign to the variable, it defaults to resolving to the global variable.

If you're closely following the Python tag on StackOverflow, you'll notice that the same question comes up at least once a week. The question goes on like this:

x = 10deffoo(): x += 1print x foo()

Why, when run, this results in the following error:

Traceback (most recent call last): File "unboundlocalerror.py", line 8, in <module> foo() File "unboundlocalerror.py", line 4, in foo x += 1 UnboundLocalError: local variable 'x' referenced before assignment

There are a few variations on this question, with the same core hiding underneath. Here's one:

lst = [1, 2, 3] deffoo(): lst.append(5) # OK#lst += [5] # ERROR here foo() print lst

Running the statement successfully appends 5 to the list. However, substitute it for , and it raises , although at first sight it should accomplish the same.

Although this exact question is answered in Python's official FAQ (right here), I decided to write this article with the intent of giving a deeper explanation. It will start with a basic FAQ-level answer, which should satisfy one only wanting to know how to "solve the damn problem and move on". Then, I will dive deeper, looking at the formal definition of Python to understand what's going on. Finally, I'll take a look what happens behind the scenes in the implementation of CPython to cause this behavior.

The simple answer

As mentioned above, this problem is covered in the Python FAQ. For completeness, I want to explain it here as well, quoting the FAQ when necessary.

Let's take the first code snippet again:

x = 10deffoo(): x += 1print x foo()

So where does the exception come from? Quoting the FAQ:

This is because when you make an assignment to a variable in a scope, that variable becomes local to that scope and shadows any similarly named variable in the outer scope.

But is similar to , so it should first read , perform the addition and then assign back to . As mentioned in the quote above, Python considers a variable local to , so we have a problem - a variable is read (referenced) before it's been assigned. Python raises the exception in this case [1].

So what do we do about this? The solution is very simple - Python has the global statement just for this purpose:

x = 10deffoo(): global x x += 1print x foo()

This prints , without any errors. The statement tells Python that inside , refers to the global variable , even if it's assigned in .

Actually, there is another variation on the question, for which the answer is a bit different. Consider this code:

defexternal(): x = 10definternal(): x += 1print(x) internal() external()

This kind of code may come up if you're into closures and other techniques that use Python's lexical scoping rules. The error this generates is the familiar . However, applying the "global fix":

defexternal(): x = 10definternal(): global x x += 1print(x) internal() external()

Doesn't help - another error is generated: . Python is right here - after all, there's no global variable named , there's only an in . It may be not local to , but it's not global. So what can you do in this situation? If you're using Python 3, you have the keyword. Replacing by in the last snippet makes everything work as expected. is a new statement in Python 3, and there is no equivalent in Python 2 [2].

The formal answer

Assignments in Python are used to bind names to values and to modify attributes or items of mutable objects. I could find two places in the Python (2.x) documentation where it's defined how an assignment to a local variable works.

One is section 6.2 "Assignment statements" in the Simple Statements chapter of the language reference:

Assignment of an object to a single target is recursively defined as follows. If the target is an identifier (name):

  • If the name does not occur in a global statement in the current code block: the name is bound to the object in the current local namespace.
  • Otherwise: the name is bound to the object in the current global namespace.

Another is section 4.1 "Naming and binding" of the Execution model chapter:

If a name is bound in a block, it is a local variable of that block.

[...]

When a name is used in a code block, it is resolved using the nearest enclosing scope. [...] If the name refers to a local variable that has not been bound, a UnboundLocalError exception is raised.

This is all clear, but still, another small doubt remains. All these rules apply to assignments of the form which clearly bind to . But the code snippets we're having a problem with here have the assignment. Shouldn't that just modify the bound value, without re-binding it?

Well, no. and its cousins (, , etc.) are what Python calls "augmented assignment statements" [emphasis mine]:

An augmented assignment evaluates the target (which, unlike normal assignment statements, cannot be an unpacking) and the expression list, performs the binary operation specific to the type of assignment on the two operands, and assigns the result to the original target. The target is only evaluated once.

An augmented assignment expression like can be rewritten as to achieve a similar, but not exactly equal effect. In the augmented version, is only evaluated once. Also, when possible, the actual operation is performed in-place, meaning that rather than creating a new object and assigning that to the target, the old object is modified instead.

With the exception of assigning to tuples and multiple targets in a single statement, the assignment done by augmented assignment statements is handled the same way as normal assignments. Similarly, with the exception of the possible in-place behavior, the binary operation performed by augmented assignment is the same as the normal binary operations.

So when earlier I said that is similar to, I wasn't telling all the truth, but it was accurate with respect to binding. Apart for possible optimization, counts exactly as when binding is considered. If you think carefully about it, it's unavoidable, because some types Python works with are immutable. Consider strings, for example:

x = "abc" x += "def"

The first line binds to the value "abc". The second line doesn't modify the value "abc" to be "abcdef". Strings are immutable in Python. Rather, it creates the new value "abcdef" somewhere in memory, and re-binds to it. This can be seen clearly when examining the object ID for before and after the :

>>> x = "abc" >>> id(x) 11173824 >>> x += "def" >>> id(x) 32831648 >>> x 'abcdef'

Note that some types in Python are mutable. For example, lists can actually be modified in-place:

>>> y = [1, 2] >>> id(y) 32413376 >>> y += [2, 3] >>> id(y) 32413376 >>> y [1, 2, 2, 3]

didn't change after , because the object referenced was just modified. Still, Python re-bound to the same object [3].

The "too much information" answer

This section is of interest only to those curious about the implementation internals of Python itself.

One of the stages in the compilation of Python into bytecode is building the symbol table [4]. An important goal of building the symbol table is for Python to be able to mark the scope of variables it encounters - which variables are local to functions, which are global, which are free (lexically bound) and so on.

When the symbol table code sees a variable is assigned in a function, it marks it as local. Note that it doesn't matter if the assignment was done before usage, after usage, or maybe not actually executed due to a condition in code like this:

x = 10deffoo(): if something_false_at_runtime: x = 20print(x)

We can use the module to examine the symbol table information gathered on some Python code during compilation:

importsymtable code = '''x = 10def foo(): x += 1 print(x)''' table = symtable.symtable(code, '<string>', 'exec') foo_namespace = table.lookup('foo').get_namespace() sym_x = foo_namespace.lookup('x') print(sym_x.get_name()) print(sym_x.is_local())

This prints:

So we see that was marked as local in . Marking variables as local turns out to be important for optimization in the bytecode, since the compiler can generate a special instruction for it that's very fast to execute. There's an excellent article here explaining this topic in depth; I'll just focus on the outcome.

The function in handles variable name references. To generate the correct opcode, it queries the symbol table function . For our , this returns a bitfield with in it. Having seen , generates a . We can see this in the disassembly of :

35 0 LOAD_FAST 0 (x) 3 LOAD_CONST 1 (1) 6 INPLACE_ADD 7 STORE_FAST 0 (x) 36 10 LOAD_GLOBAL 0 (print) 13 LOAD_FAST 0 (x) 16 CALL_FUNCTION 1 19 POP_TOP 20 LOAD_CONST 0 (None) 23 RETURN_VALUE

The first block of instructions shows what was compiled to. You will note that already here (before it's actually assigned), is used to retrieve the value of .

This is the instruction that will cause the exception to be raised at runtime, because it is actually executed before any is done for . The gory details are in the bytecode interpreter code in :

TARGET(LOAD_FAST) x = GETLOCAL(oparg); if (x != NULL) { Py_INCREF(x); PUSH(x); FAST_DISPATCH(); } format_exc_check_arg(PyExc_UnboundLocalError, UNBOUNDLOCAL_ERROR_MSG, PyTuple_GetItem(co->co_varnames, oparg)); break;

Ignoring the macro-fu for the moment, what this basically says is that once is seen, the value of is obtained from an indexed array of objects [5]. If no was done before, this value is still , the branch is not taken [6] and the exception is raised.

You may wonder why Python waits until runtime to raise this exception, instead of detecting it in the compiler. The reason is this code:

x = 10deffoo(): if something_true(): x = 1 x += 1print(x)

Suppose is a function that returns , possibly due to some user input. In this case, binds locally, so the reference to it in is no longer unbound. This code will then run without exceptions. Of course if actually turns out to return , the exception will be raised. Python has no way to resolve this at compile time, so the error detection is postponed to runtime.


0 comments

Leave a Reply

Your email address will not be published. Required fields are marked *