Chronos is one of the many Smalltalk-related blogs syndicated on Planet Smalltalk
χρόνος

Discussion of the Essence# programming language, and related issues and technologies.

Blog Timezone: America/Los_Angeles [Winter: -0800 hhmm | Summer: -0700 hhmm] 
Your local time:  

2007-09-23

The Semantics of "Variables"

Depending on your terminological tradition, objects have "instance variables," "members," "fields," "slots," "associations" or "references." But what matters isn't the terminology, but the semantics and mathematics.

In terms of structure and mathematics, objects are nodes (vetices, points) in a directed graph, and the references objects make to each other are directed arcs (edges) in the graph. The transitive closure of all the objects reachable by following the references from some root object is commonly called the object graph of the root object. Any object can be considered as the "root" of an object graph—although each object considered as a root node results in a different object graph.

This is one reason you typically don't find either a tree data structure or a graph data structure provided as a Collection class in a typical class library: An object already is a tree or graph node, and by defining the object's instance variables and methods, its class defines the name and semantics of the "arcs" that can be navigated from a node of that type.

Objects are graph-theoretic, in the same sense that relational databases are set-theoretic. Set theory is a subset of graph theory—which is why object-relational mapping is non-trivial.

But what is the semantics of a variable? In other words, what is the meaning of the fact that a particular variable (or any reference by name to an object) has any particular value?

Predicates

A named object reference, just like an arc in a graph, represents (or "models") an arity-2 predicate, where the two nodes are the arguments of the predicate. The existence of the arc connecting the two graph nodes is, by convention, interpreted as an assertion that the predicate represented ("modelled") by the arc that connects the two nodes is true.

For example, a variable named "parent" would normally be used to assert that whatever object is the value of that variable is the "parent" of some other object (or represents/models the parent of whatever entity is being modelled by that object). More formally, if there is an object A and an object B, and object A has an instance variable named "parent" whose value is object B, then that situation is normally intended to represent (model) the fact that the predicate hasParent(objectA, objectB) is true.

The name of a variable (or of any named reference to a value) should indicate the semantic role played by the variable—in other words, it should name the predicate modelled by the reference. Programmers often fail to do this, preferring instead to give names to variables based on the type of the values to be held by the variables. But that's a fundamental mistake: It's much easier to infer the type of a value from its semantic role, than it is to infer the semantic role from the type. And the semantic role played by variables is far more important to the meaning of code than is the type of their values.

Language designers typically have been making an analogous mistake, focusing consiberable energy and effort on type constraint systems for variables, while almost completely ignoring the core semantics of variables (or other named references to values.) Fortunately, that is beginning to change.

In addition to the semantic role played by a variable, the following aspects of (or perspectives on) the semantics of a reference to an object (value) are also of fundamental importance:

  1. Cardinality: The minimum and maximum number of different individuals that can validly be members of the arity-2 relation defined by the arity-2 predicate represented by an instance variable.
  2. Symmetry: Is the arity-2 predicate represented by an instance variable symmetric? If hasSibling(A, B), then is it necessarily true that hasSibling(B, A)?
  3. Transitivity: Is the arity-2 predicate represented by an instance variable transitive? If hasAncestor(A, B) and hasAncestor(B, C), then is it necessarily true that hasAncestor(A, C)?
  4. Equivalence Semantics: What matters about the object referenced by a variable—it's identity as an object, or the value it represents?
  5. Modality Semantics: Does the referenced value serve to identify the referrer, or is it simply an accidental (transient, variable) property of the referrer?
The remainder this post deals with the equivalence and modality semantics of inter-object references. For a more complete discussion of the cardinality, symmetry and transitivity of inter-object references, check out RDF and OWL.

Equivalence Semantics

The tradition in object modelling is to classify the properties (instance variables) of an object as being either attributes or associations. The usual explanation of the difference is that an instance variable whose type is one of the primitive types is an "attribute," but that an instance variable whose type is one of the object types is an "association." That explanation is misleading at best, and totally wrong at worst.

To see the problem with the conventional explanation, consider an instance variable whose type is java.lang.String. A Java String is an object, not a primitive type. But there can be no question that String-valued instance variables in Java are attributive, not associative. As another example, consider any instance variable of any Smalltalk object. In Smalltalk, all values are objects—there are no primitive types—and so all instance variables have objects as their values. The idea that the instance variables of a Smalltalk object can't be attributive, just because Smalltalk has no primitive types, is absurd on its face: Whether an instance variable is an attributive reference or an associative reference should not depend on which programming language one uses.

The valid explanation of the difference between an attribute and an association is that the equivalence semantics of the two are different. An attribute references the value of an object. An association references the identity of an object. The distinction between the value and the identity of an object only matters if the object is mutable. If an object is immutable, then its value can't be modified. Since an attribute references the value of an object, any change to the value of the referenced object violates the semantics of the attributive reference.

Primitive values (the values of primitive types) are immutable. That's why they are commonly used as the values of attributes. But any object that is immutable can easily serve as the value of an attributive reference—which is one reason that Smalltalk objects can, in fact, have instance variables that are attributes, even though all values are objects.

It can sometimes be the case that an instance variable needs to be an attributive reference, but there is no type or class available whose instances can be the value of such a variable, and which are (or can be made) immutable. Such cases require either that a different class or type be developed whose instances are immutable, or that the programmer take special care to prevent the objects that are referenced attributively from having their values modified in a way that (silently) breaks the intended equivalence semantics of any attributive references to those objects.

One commonly used technique for "safely" using a mutable object as the value of an attributive reference is to use an accessor method that returns a copy of the attributively-referenced mutable object, instead of answering the object itself. Another is to have "object copy" logic that substitutes a copy of the original (mutable) object as the value of any attributive instance variable in any copy of the original object that is created.

Put another way: Knowing both a) the equivalence semantics of a variable and b) the mutability of the objects that will be the values of an instance variable can be quite useful for safely and correctly implementing the code of a class (or generating safe and correct code for it from a model.)

Modality Semantics

The modality semantics of an instance variable deals with the scope or context of the predicate represented by the variable. Does the predicate assert a necessary and universal truth, or does it represent an accidental, transient truth that may be valid only in a limited scope or context? Although the modality of a predicate can vary along a variety of perspectives, so far I've found it useful to classify instance variables into four different modality categories:
  • axiomatic variables
  • identity variables
  • state variables, and
  • transient variables
[Note: Both the names and the definitions of the terms "identity variable" and "state variable" were introduced to me by Bobby Woolf, in a conversation we had at some point in the mid-1990s.]

The predicate represented by an axiomatic variable asserts a universal and necessary truth—in other words, it asserts an axiom.

If the instance variable "guid" represents the assertion that the predicate "hasGuid(referrer, aValue)" must necessarily be true, for all time and in all contexts, then "guid" is an axiomatic variable, and the association between the referrer and the referenced value is axiomatic.

The predicate represented by an identity variable asserts a universal and necessary truth, and its value must be unique with respect to either a) any other object of the same class, or b) any entity in the real world modelled by one or more objects in the program or database. The latter case deals with situations where different objects of different types model different aspects of the same entity in the "real world." Note that all identity variables also qualify as axiomatic variables, but not all axiomatic variables qualify as identity variables.

If the instance variable "guid" represents the assertion that the predicate "hasGuid(referrer, aUniqueValue)" must necessarily be true, for all time and in all contexts, and if the value of the variable must be "unique" (as defined above,) then "guid" is an identity variable, and the association between the referrer and the referenced value is an axiomatic identity relationship.

Note that there are cases where a variable by itself is not an identity variable, but it may participate in an identity relationship along with two or more other variables. In RDBMS terminology, such variables are elements of a multi-part key.

The predicate represented by a state variable asserts a non-universal, accidental truth which is not derivable from other predicates that have been asserted.

If the instance variable "location" represents the assertion that the predicate "hasLocation(referrer, aLocation)" happens to be true, as of a particular moment and/or in a particular context, and if that fact is not derivable from other asserted facts, then "location" is a state variable.

The predicate represented by a transient variable asserts a fact which is derivable from other predicates that have been asserted.

If the instance variable "area" represents the assertion that the predicate "hasArea(referrer, anArea)" is true, but that fact is derivable from other asserted facts (e.g., the width, height and shape of the referrer,) then "area" is a transient variable.

Knowing the modality category of a variable aids in making optimal design decisions when implementing a class (or generating code for a class):
  • The binding of an axiomatic variable to its value should normally be immutable. If mutator methods are provided for such variables, they should not be useable once the instance has been properly initialized.
  • The binding of a state variable to its value should be mutable. It will probably be necessary to define mutator methods for such instance variables.
  • The values of a transient variable can be set to nil at any time, without any loss of information. No mutator methods should be provided for such variables; their value should be lazily computed as necessary.

Conclusion

If you design or architect software, you need to know all of the above concepts. So do those who design software tests, programming languages, or object modelling languages/tools. The above concepts are also vital to the domain of model-driven architecture and model-driven design.


No comments: