Woodenman language requirements
A. MAJOR CHARACTERISTICSEdit
The language should be extensible so that the user of the language can extend the apparent set of data types and operations available to his programs by means of specifications made within his programs. The number of specialized capabilities needed for a common language is large and diverse. In many cases, there is no consensus as to the form these capabilities should take in a programming language. The operational requirements dictating specific specialized language capabilities are volatile and future needs cannot always be foreseen. No language can make available all the features useful to the broad spectrum of military applications, anticipate future applications and requirements, or even provide a universally "best" capability in support of a single application area. A common language must have capability for growth. It should contain all the power necessary to satisfy all the applications and the ability to specialize that power to the particular application task. An extensible language will make it possible to add new application-oriented features and to add new programming techniques and mechanisms to the language using descriptions written entirely within the language. Extensions should have the appearance and costs of features which are built into the language while actually being only catalogued accessible application packages. A static programming language cannot be all things to all people, but an extensible language can be adapted to meet changing requirements in a variety of areas.
The source language should contain a simple clearly identified kernel which houses all the power of the language.
The capabilities available in an extensible language can be partitioned into two groups, those which are definable by extension and these which provide an essential primitive capability of the language. The smaller and simpler the kernel, the easier the language will be to learn and use. If the kernel is clearly delineated and language features not in the kernel are defined in terms of the kernel, then only the kernel language need be implemented to make the full source language capability available. The kernel language should be simple in the sense that it is small and each feature provides a single unique capability not duplicated in other kernel features. Kernel features should provide relatively low level general purpose capabilities not yet specialized particular applications.
A variety of application-oriented extensions should be provided with the language.
An extensible kernel language alone is not sufficient for a common language. Even though in theory such a language provides the necessary power and the capability for extension to special applications, the users of the language cannot be expected to become language designers or to divert project funds to develop the required extensions to make the language useful.
Source language structures not in kernel language should be maintained in a compile-time accessible library of extensions. The library should be capable of holding anything definable in the source language.
In an extensible language with a simple kernel the usefulness of the language derives primarily from the existence and accessability of specialized application-oriented extensions. Whether an extension library should contain source or object code is a question of implementation efficiency and should not be determined by the definition of the source language. It should be remembered, however, that interfaces cannot be validated at program assembly time without some equivalent of their source language interface specifications, that object modules are machine dependent and therefore not portable, that source code is often more compact than object code, and that compilers for simple languages can often compile faster than a loader can load from relocatable object programs. There is no reason why routines written in other programming languages should not be accessible through the library, providing they conform to the object language interface conventions.
The language should be typed. The type or mode of all variables, components of composite data structures, expressions, operations, and parameters should be determinable at compile time and unalterable at run-time. The language should require that the type of each variable, component of composite data structures, and formal parameter be explicitly specified in source programs.
By the type of a data object is meant the set of objects themselves, the essential properties of those objects and the set of operations which give access to and take advantage of those properties. The author of any correct program in any programming language must, of course, know the type of all data and variables used in his programs. If the program is to be maintainable, modifiable and comprehensible by someone other than its author, then the types of variables, operations, and expressions should be easily determined from the source program. Type specifications in programs also provide the redundancy necessary to automatically verify that the programmer has adhered to his own type conventions.
The source language should provide access to machine dependent hardware facilities through encapsulated machine language insertions.
Machine language insertions are necessary for interfacing special purpose devices for accessing special hardware capabilities, and for certain code optimizations on time critical paths. The language should, however, be so designed that there is little need or incentive for its users to enter the machine language level. The machine language insertions should be encapsulated so they can be easily recognized when moving to another object machine and so the full security of procedure calls can be provided at their invocation.
B. SOURCE LANGUAGE CHARACTERISTICSEdit
Neither the language definition nor the translator should limit the size of program components.
This is an example of the principle that a programming language should not impose arbitrary rules and restrictions which must be learned and dealt with by the programmer. Neither the language nor the translator should limit the maximum array dimensions, the length of identifiers, the maximum number of parenthesis levels, the size of data structures, or the number of identifiers. Program components which affect the object representation of programs will, of course, have limits imposed by the object machine. The translator should report when the program exceeds the resources of the intended object machine but should not build in arbitrary limits of its own.
Each data structure and operation of the Kernel language should provide a single capability which is composable and has a straightforward implementation in the object code of conventional architecture machines.
Kernel language data and operations should be simple and provide a single capability so that their use does not impose costs for unwanted capability. They should be composable so they can be used as building blocks for more specialized capabilities. They should be compatible with object machines so that they have low cost implementation.
There should be no defaults in programs which affect the program logic. Decisions which affect program logic should either be made irrevocably when the time [the] language is designed or made explicit in each program.
The only alternative is implementation dependent defaults with the translator determining the meaning of programs. What a program will do should be determinable from the program and the defining documentation for the programming language. Omission of any selection which affects the program logic should be treated as an error by the translator.
Defaults should be provided for special capabilities affecting object representation and other properties which the average programmer does not know or care about. Such defaults should always mean that the programmer does not care which choice is made.
The language should be oriented to provide a high degree of management control and visibility to programs and toward self documenting programs with the programmer required to make his decisions explicit. On the other hand, the programmer should not be forced to overspecify his program and thereby cloud their logic, unnecessarily eliminate opportunities for optimization, and misrepresent arbitrary choices as essential to the program logic. Defaults should be allowed, in fact encouraged, in don't care situations.
No language defined symbols appearing in the same program should have essentially different meanings.
This contributes to the clarity and uniformity of programs, protects against psychological ambiguity and avoids some error prone features of extant languages In particular, this would exclude the use of = to imply both assignment and equality, would exclude conventions implying that parenthesized parameters have special semantics (as with PL/l subroutines), and would exclude the use of a colon to both declare a label and separate input and output parameters (as in Jovial). It would not, however, require different operator symbols for integer, real or even matrix arithmetic, since these are, in fact, uses of the same abstract operations.
There should be source language capability for specifying the intended object environment.
When a language has different host and object machines and when its compiler can produce code for several object machines or several configurations of a given object machine, the programmer should be able io document and to specify the intended object machine configuration within the source language program. The object environment specification should include the correct computer model, the memory size, any special hardware options, the operating system if present, special object site conventions, and the peripheral configurations. These specifications might be simply a list of identifiers and would probably be canned as library elements when several programs are being developed for the same object machine.
The source language should permit inclusion of assertions, assumptions, axiomatic definitions of data types, and units of measures in programs. Because there is currently no best notation for these purposes the language should not impose any particular syntax for their use.
There are many opinions on the desirability, usefulness, and proper form of each of these specifications. It is clear that better program documentation is needed and that specifications of these kinds may help. Specifications also introduce the possibility of automated testing, formal program proofs, and dimensional analysis. The language should not prohibit inclusion of these forms of specification but neither should any particular form be imposed for their use, or translators required to take special action on them. The presence or absence of assertions, assumptions, axiomatic definitions, units or measure or comments in source language programs shouldn't affect the translators ability to translate the program and generate object code.
B.1 DATA TYPESEdit
The use of defined types should be indistinguishable from built-in types.
There should be no special cases, ad hoc,or inconsistent rules to interfere and complicate learning, using and implementing the language. If built-in features and user defined extensions are treated in the same way throughout the language so that the kernel language, standard application-oriented extensions, extensions and application programs are treated in a uniform manner by the user and by the translator then these distinctions will grow dim to everyone's advantage. When the language contains all the essential power, when few can tell the difference between the kernel language and the extensions, and when extensions to the source language do not impact the compiler and its standardization, then there is no incentive to proliferate languages.
The language should provide data types for integer, real, Boolean, character, array (i.e., composite data structures with indexable components of homogeneous type), and record (i.e., composite data structures with labeled components of heterogeneous type) types.
These are the common data types of most programming languages and object machines and are sufficient to mechanize any other desired type.
The language should provide a pointer mechanism which can be used to point within specified composite data structures to build data with shared and/or recursive substructure; but variables and expressions of pointer type are not desired.
The need for pointers is obvious in building data structures with shared or recursive substructures; such as, directed graphs, stacks, queues, and list structures. Unfortunately, providing pointers as absolute address data types produces a gap in security mechanisms and encourages the development of ad hoc data structures incapable of comprehension or proof. The desired pointer capability is that required to build a data structure containing fields which need not be collocated with the structure. That is, some fields are indirectly named rather than being allocated within the structure itself. There is no requirement for pointer variables, for pointers to data of unknown type, nor for pointers to variables.
Two types of reals should be provided: normalized and floating point numbers and fixed point numbers in the interval -l to l. Scale factor management for fixed point numbers should be the responsibility of the user.
Many small machines do not have floating point hardware and some applications require greater precision than can be obtained from the floating point hardware of their object machines. Both floating point and fixed point arithmetic should be provided, but scale management for fixed point should be left to the programmer and no special effort should be made to encourage the use of fixed point.
The source language should require global specification of the precision for real numbers. This specification should be interpreted as the maximum precision required by the program logic and the minimum precision to be supported by the object code.
Machine independence, in the use of real numbers, can be achieved only if the user can place constraints on the translator and object machine without forcing a specific mechanization of the arithmetic. Precision specifications, as the maximum required by the program and the minimum to be implemented by the object code, provides all the power and guarantees needed by the programmer without unnecessarily [x]sing on the object machine. Precision specifications do not change the type of reals or the set of applicable operations.
The character set and collating sequence for character data should be specifiable within user designated program scopes.
The character set to be used in data is often determined by the object machine and its peripheral devices. In some cases, several character sets may be required in the same program. The user should be able to define the desired character set within his program, and should be able to convert between character sets: The definitions of the most common character sets (including ASCII) might be made available in the standard library
The language should require user specification of the number of dimensions, the range of subscript values for each dimension, and the type of each array. The number of dimensions and type should be determined at compile time.
This allows static arrays (which can be allocated at compile or load time) and automatic arrays (which can be allocated at scope entry). These are sufficient to permit allocation of space pools for management of more complex data structures including dynamic arrays. The range of subscript values for any given dimension should be 2 contiguous subsequence of some enumeration type. It has been suggested that the lower bound on array subscripts (i.e., the array origin) be fixed by the language definition at 0 or 1. Certainly the origin should be determinable at compile time, but limiting the origin to 0 or 1 would be an arbitrary special case decision to aid the compiler writer at the expense of application programs. The run time costs of implementing origin 1 are no more than for any other nonzero origin known at compile time. Most programmers are not used to origin 0 and find it inconvenient or unnatural.
The language should permit records to have alternative structures, each of which is fixed at compile time. The name and type of each record component should be specified by the user at compile time.
This provides all that is safe to use in CMS-2 and JOVIAL OVERLAY and in FORTRAN EQUIVALENCE. It permits hierarchically structured data of heterogeneous type, permits records to have alternative structures as long as each structure is fixed at compile time and the choice is fully discriminated at run time, but does not permit arbitrary references to memory through renaming nor does it permit dropping type checking to handle overlaid structures.
The user should be able to specify whether composite data structures are to be packed for maximum storage utilization or unpacked to minimize access time. Packed data should have a uniform field sizes independent of the object machine.
Data can be placed one item per machine word (or half word, or double word) for each inexpensive access or it can be packed to maximal density to conserve storage space. The user should be permitted to specify which if it is important to his program. If he does not specify then the packing should be optimal as determined by the compiler, neither choice should be a default. Dense data is required when dealing with large data files which also must be transferred among different machines. If field sizes are determined directly from the description of the data then there will be a machine independent bit equivalent form for transferring data (e.g., the COBOL data description for records).
Assignment and access operations should be automatically defined for all data types. The assignment operation should permit any value of a given type to be assigned to a variable, array or record component of that type,
Variables should be available for all data types. Variables are useful only when there exist corresponding access and assignment operations. Because no special semantics is required as a function of the type for reference and assignment, they can be defined automatically
The source language should have built-in equivalence and nonequivalence operations which can be used to compare any two data objects (regardless of type compatibility) for identity.
Equivalence is an essential universal operation which should not be subject to restriction on its use. Proper semantic interpretation of equivalence requires that operends of disjoint types never be equivalent. Consequently, its usefulness at run time is restricted to data of the same type or of types with nonempty intersections. In any case, the test should be for logical identity. The use of equivalence is not recommended for real numbers but resolution of what equivalence means for imprecise quantities is a problem of numerical analysis not language design.
Relational operations should be automatically defined for numeric data and all types defined by enumeration. Numbers and types defined by enumeration have an obvious ordering which should be available through relational operations. The same mechanism might be used for the character set collating sequence (i.e., define character set as an enumeration of characters).
The built-in operations for numbers should include: addition, subtraction, multiplication, division (with a real result) and module division.
These are the most widely used numeric operations and are available as hardware operations in most machines.
No arithmetic operation which is within the precision or range specifications of the program should ever truncate the most significant digits of a numeric quantity; truncation and rounding should always be on the least significant digits.
This requirement seems obvious, particularly for floating point numbers, and yet many of our existing languages truncate the must significant mantissa digits in some mixed and floating point operations. The language should adhere to the "law of least astonishment".
The built-in Boolean operations should include and, or and xor. The operations and and or on scalars should be evaluated in short circuit mode.
Short circuit mode means that and and or are in fact control operations which do not evaluate their second argument if the value of the first argument is false or true, respectively. Short circuit evaluation has no disadvantages over the corresponding computational operations and sometimes produces faster executing code, particularly in languages where the user can rely on the short circuit execution.
The source language should permit scalar operations to be applied to conformable arrays and records to indicate component by component operations.
Conformability should require exactly the same number of components and one for one compatability in type. For arrays, correspondence should be by position in similarly shaped arrays. For records, correspondence should be by component name. In many situations component by component operations are done on array and record elements. In fact, a primary reason for having arrays is to permit large numbers of similarly treated objects to have a uniform notation. The COBOL language is built around the idea of operations on corresponding components of records. Component by component operations available directly in the source language hides the details of the sequencing and thereby simplifies the program and makes more optimizations available. In addition it permits simultaneous execution on machines with parallel processing hardware. Although component by component operations should be available for built-in composite data structures which are used to define application oriented structures, but that capability should not be automatically inherited by defined data structures. A matrix might be defined using arrays, but it should not inherit the array operations automatically. Multiplication for matrices would for example be unnatural, confusing and inconvenient if the product operator for matrices were interpreted as a component by component operation instead of cross product. Component by component operations will also allow operations on character strings represented as vectors of characters and efficient Boolean vector operations.
Explicit type conversion operations should not be required for floating point arithmetic with integer or fixed point arguments, nor for conversion between numeric ranges.
An explicit integer to floating point operation is not required because within the specified real precision any range of integers is a subset of the same range of reals. Similarly the possible fixed point values will always be a subset of the floating point values of the same precision. Because ranges do not form closed systems range, validation is not possible at compile time (e.g., I = I + l may be a range error). At best, the compiler might point out likely range errors.
B.3 VARIABLES, LITERALS, AND CONSTANTSEdit
The user should have the ability to associate constant values with identifiers.
The use of identifiers to represent literal values has often made programs more readable, more easily modifiable and less prone to error when the value of a constant must be changed. Associating constant values with an identifier is preferable to assigning the value to a variable because it is then clearly marked in the program as a constant, can be checked for unintentional changes, and often can have a more efficient object representation.
The language should provide a syntax and a consistent interpretation for numeric literals. Numeric literals should have the same value (within the specified precision) in both programs and data.
The point here, and one that should be obvious to any programmer who must use numeric data, is that regardless of the source of the data and regardless of the object machine the value of constants should be the same. For integers it should be exact and for reals it should be the same within the specified precision. Compiler writers however would disagree. They object to this requirement on two grounds: that it is too costly if the host and object machines are different and that it is unnecessary if they are the same. In fact, all costs are at compile- time and must be insignificant compared to the life time costs resulting from object code containing the wrong constant values. As for being unnecessary, there have been all too many cases of different values from program and data literals on the same machine because the compile-time and run-time conversion packages were different and imprecise.
The language should permit the user to specify the initial values of individual variables at the time of their allocation. There should be no default initial values. It should be considered an error if a variable is accessed before it obtains an initial value.
The ability to initialize variables at the time of their allocation will contribute to program clarity, but a requirement to do so would be an arbitrary and sometimes costly decision to the user. Default initial values, on the other hand, contribute to neither program clarity nor correctness and can be even more costly at run-time. Every variable must be initialized before it is accessed or its value will be unpredictable garbage with no chance for program correctness. The translator should treat any access to a variable before it has been assigned as an error. Whether a variable will be assigned a value is in general unsolvable at compile time, but in those cases in which it is not easily determined by the translator, it will not be easily determined by the programmer and those who must maintain the program and should, therefore, be considered an error.
The source language should require its users to individually specify the range of values for integer variables. These specifications should be interpreted as an upper bound on the range of values which will be assigned to a variable and a lower bound on the range which must be supported by the object code. Range specifications should not be interpreted as defining new types.
Range specifications are a special form of assertion. They aid in understanding and determining the correctness of programs. They can also be used as additional information by the compiler in deciding what storage and allocation to use (e.g., half words may be more efficient for integers in the range 0 to 1000); Range specifications also offer the opportunity for the translator to automatically insert range tests for run-time or debug-time validation of the program logic. With variable ranges specified in the program, it becomes possible to perform many subscript bounds checks at compile-time. These bounds checks, however, will be only as valid as the range specifications which cannot, in general, be validated at compile-time.
The range of values which can be associated with a variable, array, or record component may be any built-in type, any defined type or a subset of any enumeration type.
B.4 EXTENSION FACILITIESEdit
There should be no default declarations. Each program element should be defined in the kernel language, in a library extension, or in the program.
As programmers, we should not expect the translator to write our programs for us. If we somehow know that the translator's default convention is compatible with our needs for the case at hand, we should still document the choice so others can understand and maintain our programs. Neither should we be able to delay definitions (possibly forget them) until they cause trouble in the operational system.
The user should be able, within the source language, to extend existing operations to new data types.
When an operation is an abstraction of an existing operation for a new type or is a generalization of an existing operation, it is inconvenient, confusing and misleading to use any but the existing operator symbol or name.
Type definitions in the source language should include, as a unit, both the class of data objects comprising the type and the set of operations applicable to that class.
Types define abstract data objects with special properties. The data objects are given a representation in terms of existing data structures, but they are of little value until operators are available to take advantage of their special properties. When we obtain access to a type, we need its operations as well as its data. Numeric data is needed in many applications but is of no value to any without arithmetic operations. Neither should a defined type automatically inherit the operations of the data with which it is represented.
The data objects comprising a defined type should be definable by enumeration of their literal names, as Cartesian products of existing types (i.e., as array and record classes), by discriminated union (i.e., as the union of disjoint types) and as the power set of an enumeration type.
This list comprises the currently known set of useful definitional mechanisms for data types which do not require run-time support, such as, garbage collection and dynamic storage allocation. These mechanisms are sufficient to define data sequences, recursive data structures, and efficient sparce data structures.
Type definition by free union and subsetting is not desired.
Free union adds no new power not provided by discriminated union but does require giving up the security of types in return for programmer freedom. Range or subset specifications on variables are useful documentation and debugging aids but should not be construed as types. Subsets do not introduce new properties or operations not available to the superset and often do not form a closed system under the superset operations. Unlike types, membership in subsets can be determined only at run time.
The source language should permit user specification of the axiomatic properties of a defined type independent of the particular mechanization used to implement those properties.
Programming languages require specification of not only the effect of programs, routines, and expressions but how those actions are to take place. Often decisions are made arbitrarily and are nonconsequential when made but are not identified as such. lf there is no note made of which decisions were intended and which are arbitrary, the program will grow to rely on the arbitrary decisions and neither the translator nor the programmer will be able to predict the consequences when a better choice is found.
When defining a type the user should be able to specify the initialization procedure for the type and the actions to be taken at the time of allocation and deallocation of variables of that type.
It is often necessary to do bookkeeping or to take other special action when variables of a given type are allocated or deallocated. The language should not limit the class of definable types by withholding the ability to define those actions. Initialization might take place once when the type is allocated (i.e., in its allocation scope) and would be used to set up the procedures and initialize the variables which are local to the type definition.
The language should allow the user to distinguish between scope of allocation and scope of access.
The scope of allocation of a program structure is that region of the program for which the object representation of the structure should be present. The allocation scope defines the program scope for which own variables of the structure must be maintained and identifies the time for initialization of the structure. The access scope defines the regions of the program in which the allocated structure is accessible to the program. In some cases the user may desire that each use of a defined program structure be independent (i.e., the allocation and accessing scopes would be identical.) In other cases, the various accessing scopes might share a common allocation of the structure.
The ability to limit the scope of access for separately defined structures should be available to both the designer and the user of the structure.
Limited access specified in a type definition is necessary to guarantee that changes to data representations and to management routines which purportedly do not affect the calling programs, are in fact safe. By rigorously controlling the set of operations applicable to a defined type, the type definition guarantees that no external use of the type can accidentally or intentionally use hidden nonessential properties of the type.
Limited access on the call side provides a high degree of security and eliminates nonessential naming conflicts without limiting the degree of accessability which can be built into programs. The alternative notion, that all declarations which are external to a program should have the same scope, is inconvenient and costly in creating large systems which are composed from many subsystems because it forces global access scopes and the attendant naming conflicts on subsystems not using the defined items.
The scope of identifiers should be wholly determined at compile time. Identifiers should be introduced at the beginning of their scope and multiple use of identifiers should not be allowed in the same scope except for embedded blocks in which case the innermost identifier should apply.
The language should use conventional scope rules while making declarations and other definitions of identifiers easy to recognize and avoiding errors and ambiguities from multiple use of identifiers in a single scope.
There should be no order dependent side effects in expressions.
This is a semantic restriction saying that the effect of evaluating an expression (at least from the point of view of the caller) should be independent of the order in which the arguments to the expression are evaluated. This is less restrictive to the compiler and the generation of efficient object code than is a straight left-to-right or other language imposed operand order execution rule. It is less restrictive to the programmer than a strict no side effect rule. It would, for example, allow imbedded assignments within expressions providing they do not assign to variables used elsewhere in the expression.
The order of execution of operations within an expression should be obvious to the reader. There should be few levels of operator hierarchy and they should be widely recognized.
Care must be taken to insure that the execution order of operators within expressions is not psychologically ambiguous. That is, to guarantee that the order implemented by the language is the same as intended by the programmer and understood by those reading the program. This kind of problem can be minimized by having few precedence levels, by allowing explicit parenthesis to specify the intended execution order, by requiring explicit parenthesis in sequences of non-associative operators at the same precedence level (e.g., x/y/z should not be allowed without parenthesis}. If user defined in-fix operators are permitted explicit parantheses should be required for their use.
Expressions of a given type should be permitted anywhere in source programs where constants or references to variables of that type are allowed.
This is just a special case of not imposing arbitrary restrictions and special case rules on the user of the source language. Special mention is made here only because so many languages do restrict the form of expressions. FORTRAN, for example, has a list of seven different syntactic forms for subscript expressions but does not permit all forms of arithmetic expressions.
Constant expressions in programs should be evaluated at compile or load time.
The ability to write constant expressions in programs has proven valuable in languages with this capability, particularly with regard to program readability and in avoiding programmer error in externally evaluating and transcribing constant expressions. They are most often used in declarations. There is no need, however, that constant expressions impose run-time costs for their evaluation. They can be evaluated once at compile time or if this is inconvenient because of incompatibilities between the host and object machines, the compiler can generate code for their evaluation at load time. In any case, the resulting value should be the same (at least within the stated precision) regardless of the object machine.
B.7 CONTROL STRUCTURESEdit
The language should provide structured control mechanisms for sequential, conditional, iterative, recursive, pseudo parallel processing, exception handling and asynchronous interrupt handling.
These mechanisms provide a spanning set of control structures. Adding additional kinds would be redundant; omitting any of these will leave a gap in the classes of programs which can be written without resorting to machine level primitives. The most appropriate operations in several of these areas is an open question. For the present, the choice should be a complete set of composable control primitives each of which is easily mapped onto object machines and which does not impose run-time charges for unused or unneeded generality.
The source language should provide a "go to" operation applicable to program labels within its most local scope of definition.
The go to is a machine level capability which is still needed to fill in any gaps which might remain in the choice of structured control primitives, to provide compatability for transliterating programs written in older languages, and because of the wide familiarity of current practitioners with its use. The language should not, however, impose unnecessary costs for its presence. The go to should be limited to explicitly specified program labels within the most local scope of definition. The go to should not be used to exit procedures or scope blocks. Neither should the language provide specialized facilities which encourage its use in dangerous and confusing ways. Switches, designational expressions, label variables, label parameters and numeric labels are all undesirable.
The conditional control structures should be fully partitioned and should permit selection among alternative computations based on the value of a Boolean expression, on the subtype of a value from a discriminated union, or on a computed choice among labeled alternatives.
The conditional control operations should be fully partitioned so that choice is clear and explicit in each case. There should be some general form of conditional which allows an arbitrary computation to determine the label chosen (e.g., Zahn's device provides a good solution to the general problem). Special cases are also needed for the more common cases of the Boolean expression (e.g., if then else) and for value or type discrimination (e.g., case on one of a set of values or subtype of a union).
The iterative control structure should permit the termination condition to appear anywhere in the loop, should permit control variables to be local to the iterative control, and should not impose excessive overhead in clarity or run time execution costs for common special case termination conditions (e.g.; fixed number of iterations or elements of an array exhausted)
In its most general form, a programmed loop is executed repetitively until some computed predicate becomes true. There may be more than one terminating predicate, and they might appear anywhere in the loop. Specialized control structures (e.g.; While do) have been used for the common situation in which the termination condition precedes each iteration. The most common case is termination after a fixed number of iterations and a specialized control structure should be provided for that purpose (e.g., FORTRAN DO or Algol for}. A problem which arises in many programming languages is that loop control variables are global to the iterative control and thus will have a value after loop termination but that value is usually an accident of the implementation. Specifying the meaning of control variables after loop termination in the language definition resolves the ambiguity but must be an arbitrary decision which will not aid program clarity or correctness, and will interfere with the generation of efficient object code. Loop control variables are, by definition, variables used to control the repetitive execution of a programmed loop and, as such, have, and should have, meaning only during loop executions.
There should be no source language distinctions between recursive and nonrecursive procedures.
Recursion is desirable in many applications because it is a neat and elegant concept which can shorten and clarify programs and simplify proof procedures. Recursion is required in order to avoid unnecessarily opaque, complex and confusing programs when operating on recursive data structures. If recursive and nonrecursive procedures are marked in the source language, that specification represents just one more special case to be learned and dealt with by the user. The objections to recursion come from a feeling that recursion requires greater run-time costs in time and space than does nonrecursive procedures. In fact, recursion and iteration have the same costs in many cases, and stack allocation of procedure bodies can save space at run-time. The problem has been that recursion has, for the most part, been implemented only in environments in which run-time efficiency has not been of great importance and, therefore, has been implemented in a straightforward, inefficient manner in which the user pays the full cost of the worst case recursion for all procedures. As with any other feature, procedures should be implemented in the most efficient manner consistent with their use. In particular, if there are costs inherent in the use of recursion, they should not be charged to non-recursive procedures. Optimizations and special case processing should, however, be the responsibility of the translator and not the user.
The pseudo parallel processing capabilities should include the ability to create, pass control among, and terminate processes.
The particular form of parallel processing, interleaved execution, or coroutine to be used should be left to the user. The kernel language should, however, provide a low level capability for creating processes, passing control among them and terminating them so the user can build his own form of parallel or coroutine processing. Creation of processes, of necessity, requires dynamic storage allocation. The kernel capability should be such that only parallel processes and coroutines pay that price and it is desirable that the user be able to specify the allocation scheme.
The exception handling control structure should permit the user to cause transfer of control and data for any error or exception condition which might occur in his program.
It is essential in many applications that there be no program halts beyond the user's control. The user must be able to specify the action to be taken on any exception condition which might occur within his program. The exception handling mechanism should be parameterized so data can be passed to the recovery point. Exception situations might include arithmetic overflow, exhaustion of available space, and hardware errors.
There should be a source language capability for handling asynchronous hardware interrupts in a recoverable manner.
One cannot write programs such as operating systems, executives and monitors which service hardware interrupts without access to the interrupt system. Minimally there must be an ability in the source language to specify the interrupt processing routine, to dynamically determine what interrupt has occurred, and to return to the interrupted program. These capabilities can be provided in a machine independent form, but the set of available interrupts must be machine dependent. There should be no source language distinction between true hardware interrupts and those synthesized by an operating system, language extensions, or the user program.
There should be a consistent set of rules applicable to all parameters, whether they be for procedures, for types, for exception handling, for parallel processes, for declarations, or for built-in operators. There should be special operations (e.g., array substructuring) applicable only to parameters.
Uniformity and consistency contributes to ease of learning. implementing and using a language; allows the user to concentrate on the programming task instead of the language; and leads to more readable, understandable, and predictable programs.
Formal and actual parameters should always agree in type. The size and subscript range for array parameters need not be determinable at compile time, but can themselves be passed as part of the parameter.
Type transfers hidden in procedure calls with incompatible formal and actual parameters, whether intentional or accidental, has long been a source of program errors and difficult to maintain programs.
There should be only two classes of formal parameter data: those which act as constants representing the actual parameter value at the time of call, and those which rename the actual parameter which must be a variable. In addition, there should be a formal parameter class for specifying the control action when exception conditions occur, and a class for parameters processed entirely at compile time.
The two data parameter classes are often called call by value and call by reference, respectively. They are the only two widely used parameter passing mechanisms and the many alternatives (at least 9 have been suggested) add complexity and cost to a language without increasing the clarity or power. A language with exception handling capability must have a way to pass control and related data through procedure call interfaces. Actual exception handling control parameters should be optional (i.e., only specified when needed). Compile time parameters are needed in extensible languages to permit specification of generic procedures and data structures such as stacks, and queues without repeating the definition for each element type.
There should be provision for variable numbers of parameters, but in such cases all but a constant number of them must be of the same type and probably treated as an array on the formal parameter side.
There are many useful purposes for procedures with variable numbers of arguments. These include what are usually called intrinsic functions such as print, generalizations of operations which are both commutative and associative such as max and min, and for repetitive application of the same binary operation such as the Lisp list operation. The use of variable number of argument operations need not and should not cause relaxation of any compile-time checks, require use of multiple entry procedures, allow the number of actual parameters to vary at run-time, nor require special calling mechanisms. If the parameters which can vary are limited to a program specified type treated as any other argument on the call side and as elements of an array within the procedure definition, full type checking can be done at compile time. There is no reason to prohibit in line expansion, and there is no prohibition on writing special procedures for some fixed number of parameters.
B.9 STANDARD EXTENSIONSEdit
All run time overhead in programs should be avoidable. Language features which require run time support should be provided as extensions which are brought in only when used.
Language features (such as, automatic and dynamic array allocation, process scheduling, file management, and I/O processing) require run-time support software. These features should be provided as extensions and not as part of the kernel language so that the user assess the costs and can write his own specialized extensions for these purposes when the standard extensions are not compatible with his requirements. Neither should there be any automatic movement of programs or data between main store and backing store unless the user can bring that movement under his control. In no case should the user have to pay space or time for support packages he does not use.
The source language should contain standard line independent interfaces to machine dependent capabilities, including peripheral equipment and special hardware.
The convenience, ease of use and savings in production and maintenance costs resulting from using high order languages come from being able to use specialized capabilities without building them from scratch. Thus, it is essential that high-level capabilities be supplied with the language.
There is currently little agreement on standard operating system, I/O, or file system interfaces. This does not preclude support of one or more forms for the near term. If these interfaces are supported as standard extensions and not built into the kernel language, they can be supplanted as better forms are recognized.
There should be a standard data base interface. It should be semantically compatible with systems generated using data base languages, syntactically consistent with the remainder of the common language, and provided as an extension.
The use of large data bases and logical files is essential to many DoD computer applications. Any selected common language must be capable of interfacing with data base systems; and, because standards are limited and there is ongoing research in this area, the data base interface should be definable as an extension which can grow at the user level without inventing a new language.
The language should give access to real time clocks in a machine independent form. Operations on real time clocks should include reading the time of day, waiting for a specified time, and interruption after a specified time. The same capabilities, operations and notations available for real time should be available for virtual time.
Real time capability is essential to many DoD applications. The source language should provide a machine independent form of access and operation to real time clocks. This should be to avoid the cost when the capability is not needed, to keep the kernel language simple and uncluttered by features for particular applications, and to insure that, when necessary, the user can define his own specialized real time facility. Virtual time is very helpful in discrete simulation problems and is conceptually similar to real time capability. There is no reason why they should not be treated in a consistent manner.
The source language should be free format, should allow the use of mnemonically signficant identifiers, should be based on conventional forms, should be simple, uniform and probably LR(l), should not provide special notations for rare cases, and should not permit abbreviation of identifiers or key words.
Clarity and readability of programs should be the primary criteria for selecting a syntax. Each of the above points can contribute to program clarity. The use of free format, mnemonic identifiers and conventional forms allows the programmer to use notations which have their familiar meanings, to put down his ideas and intentions in the order and form that humans think about them, and to transfer skills he already has to the solution of the problem at hand. A simple uniform language reduces the number of cases which must be dealt with by anyone using the language; if programs are difficult for the translator to parse, they will be difficult for people. Similar things should use the same notations with the special case processing reserved for the translator and object machine. The purpose of mnemonic identifiers and key words is to be informative and increase the distance between lexical units of programs. The use of abbreviation eliminates these advantages for a questionable increase in coding ease.
The user should not be able to modify the source language syntax. Specifically, he should not be able to modify operator hierarchies, introduce new precedence rules or define new key word forms.
If the user can change the syntax of the language then he can change the basic character and understanding of the language. The distinction between semantic extensions and syntactic extensions is similar to that between being able to coin new words in English or being able to move to another natural language. Coining words requires learning those new meanings before they can be used but at the same time increases the power of the language for some application area. Changing the grammar (e.g., using French), however, undermines the basic understanding of the language itself, changes the mode of expression, and removes the commonalities which obtain between various specializations of the language. Growth of a language through definition of new data and operations and the introduction of new words and symbols to identify them is desirable but there should be no provision for changing the structure of the language. The language should, of course, provide sufficiently general forms that they can be adopted to new possibly unforeseen situations. Neither does this preclude associating new meanings with existing in-fix operators nor defining new in-fix operators without precedence rules.
The syntax of source language programs should be composable from a character set suitable for publication purposes, but no feature of the language should be inaccessable using the 64 character ASCII subset.
A common language should use notations and a charscter set convenient for communicating algorithms, programs, and programming techniques among its users. On the other hand, the language should not require special equipment (c.g., card readers and printers) for its use. The use of the 64 character ASCII subset will make the language compatible with the international standard seven level subset, ISO-7 and with the Federal information processing standard 64 character set, FIPS-l, which has been adopted by the U.S.A. Standard Code for Information Interchange (USASCII).
The language definition should provide the formation rules for identifiers and literals. These should include a language defined break character for use internal to identifiers and literals.
Lexical units of the language should be defined in a simple, uniform and easily understood manner. The most desirable break character is the space. A literal break character contributes to the readability of programs and makes the entry of long literals less error prone. With a space as a break character one can enter multipart identifiers such as REAL TIME CLOCK or long literals such as 3.l4l59 26535 84. Use of the break can also be used to guarantee that missing quote brackets on character literals do not cause errors which propagate beyond the next end-of-1ine. The language might require separate quoting of each line of a long literal:
"This is a Long" "literal string".
There should be no continuation of lexical units across lines.
Many elementary input errors arise at the end-of-lines. Programs are input on line-oriented media, but the concept of end-of-line is foreign to free format text. Most of the error prone aspects of end-of-line can be eliminated by prohibiting lexical units to continue over lines. This has the sometimes undesirable effect of limiting identifiers and literals to the length of lines unless spaces and end-of-lines are permitted to break identifiers and literals into multiple lexical units.
Key words should be reserved, should be few in number, should be informative, and should not be usable in place of an identifier.
By key words of the language are meant those symbols and symbol strings which have special meaning in the syntax of programs. They introduce special syntactic forms such as are used for control structures and declarations, or they are used as in-fix operators, or as some form of parenthesis. Key words should be reserved, that is unusable as identifiers, to avoid confusion and ambiguity. Key words should be few in number because each new key word introduces another case in the parsing rules and, thereby, adds to the complexity of the language, and because large numbers of key words inconvenience and complicate the programmers task of chasing informative identifiers. It is more important that key words be informative than that they be short but cryptic. A major exception is the key word introducing a comment; it is the comment and not its key word which should do the informing. Comments should begin with a single special character which will encourage their use and not take the space needed for the ccmment. Finally, there should be no place in a source language program in which a key word can be used in place of an identifier. That is, functional form operations and special data items built into the language or accessible as a standard extension should not be treated as key words, but should be treated as any other identifier.
The source language should have a single uniform comment convention. Comments should be easily distinguishable from code, should be introduced by a single language defined character, should permit any combination of characters to appear, and should be able to appear anywhere in programs. Comments should not prohibit automatic reformatting of programs, and should not permit errors in missing comment brackets to propagate beyond the next end-of-line.
There are all obvious points which will encourage the use of comments in programs and avoid their error prone features in some existing language. Comments anywhere in a program should not be taken to mean that they can appear internal to a lexical unit such as an identifier, key word, or between the opening and closing brackets of a character string. One comment convention which nearly meets these criteria is to have a special comment end with either the quote or an end-of-line ending comment.
The language should not permit unmatched parenthesis.
Some programming languages permit closing parenthesis to be omitted. If for example a program contained more BEGINs than ENDS the translator might insert enough ENDs at the end of the program to make up the difference. This makes programs easier to write because it sometimes saves writing several ENDs at the end of programs and because it eliminates all syntax errors for missing ENDs. Failure to require proper parenthesis matching makes it more difficult to write correct programs. Good programming practice requires that matching parenthesis be included in programs whether required by the language. Unfortunately, if they are not required by the language then there can be no syntax check to discover when errors are made. The language should require full parenthesis matching. This does not preclude syntactic features such as case x of s1, s2 ... sn end case in which end is paired with a key word other than begin.
C. COMPILE TIME CAPABILITIESEdit
The library of extensions should be organized as a collection of specialized compools giving the user access to all definitions related to a given application or specialized capability.
Compools have proven very useful in organizing and controlling shared data structures. A similar mechanism should be employed to manage and control access to related library definitions. The content of both library extensions and type definitions are related objects definable in the language. There is little reason to distinguish these two kinds of program modules and a language which merges the two will be simpler, easier to learn and easier to use. These same modules might also act as parallel and co-routine templates.
The translator should provide a variety of useful options to aid generation, test, documentation and modification of programs.
The translator should have special capabilities to aid the programmer. The "best" set of capabilities and their proper form is not currently known. Since nonstandard choices of translator options will not adversely affect software commonality, the language definition should not dictate any arbitrary choice. Instead the development of new translator aids should be encouraged within the constraint of implementing the source language as defined. Some of the translator options which have been sugggested and may be useful inc1ude the following. Code might be compiled for assertions which would give run-time warnings when the value of the assertion predicate is false. Dimensional analysis might be done on units of measure specifications. Special optimizations might be invoked. There might be capability for timing analysis and gathering run-time statistics. There might be translator supplied feedback to provide management visibility regarding progress and conformity with local conventions. The user might be able to inhibit code generation. The translator might provide a listing of the number of instructions generated against corresponding source inputs and/or an estimate of their execution times. It might provide a variety of listing options including cross-reference lists.
The language should support the integration of separately written modules into an operational program.
This is required to permit use of extension and subroutine libraries and for the integration of large system programs. The user should be able to cause anything in the library to be inserted into his program.
The source language should permit the use of conditional statements (e.g., case statements) dependent on the object environment. In such cases, the conditional should be evaluated at compile-time and object code produced only for the selected path.
This capability permits the writing of procedures with a standard source language interface, but different object representations as a function of the object machine and configuration. With the exception of permitting program reference to the environment specification it is just a special case of evaluation of constant expressions at compile time.
D. OBJECT REPRESENTATIONEdit
The translator should not impose run-time cost for unused generality. A primary goal of any translator should be the generation of efficient object code.
The source language, both Kernel and extensions, will contain capabilities which are not needed by everyone or, at least, not by everyone all the time. When a program does not use a feature or capability, that program should pay no penalty for the capability being in the language. That the penalties can be avoided for library extension capabilities is obvious, since they need not be brought in at all. Other features may generate special object codes when their full generality is not required. Parameter passing for single arguments might, for example, be implemented much less expensively at run time than is the general case.
The user should be able to specify that a particular call on a procedure is to be implemented as an open routine.
The use of inline open procedures can reduce the run-time execution costs significantly in some cases. There are the obvious advantages in eliminating the parameter passing in avoiding the saving of return marks, and in not having to pass to and from the routine. Some less obvious, but often more important, advantages in saving run-time costs is the ability to execute constant portions of routines at compile time and thereby eliminate time and space for those portions of the procedure body at run time. Open Routine capability is especially important for machine language insertion.
Any optimizations performed by the translator should not change the effect of the program.
More simply, the translator cannot give up program reliability and correctness, regardless of the excuse. It should be noted that for most programming languages there are few known safe optimizations and many unsafe ones. The number of applicable safe optimizations can be increased by making more information available to the compiler and by choosing language constructs which allow safe optimizations. This requirement allows optimization by code motion providing that motion does not change the effect of the program.
E. THE TRANSLATOREdit
No implementation of the language should contain source language features which are not defined in the "standard" language.
This guarantees that use of programs and software subsystems will not be restricted to a particular site by virtue of using their unique version of the language. It also represents a commitment to freezing the source language, inhibiting innovations and growth of the form of the source language, and confining the source language to the current state of the art in return for stability, wider applicability of software tools, reusable software, greater software visibility, and increased payoff for tool-building efforts.
Every translator for the language should implement the entire language. There should be no subset implementations.
If individual compilers implement only a subset of the language, then there is no chance for software commonality. If a translator does not implement the entire language, it cannot give its users access to standard supported libraries or to application programs implemented on some other translator. Requiring that the full language be implemented will be expensive only if the language is large, complex, and nonuniform. The intended source language product from this effort is a small simple uniform kernel language with the specialized features, support packages and complex features relegated to library routines not requiring direct translator support. If simple low cost translators are not feasible for the selected language, then the language is too large and complex to be standardized and the goal of language commonality will not be achievable. The effort should be terminated.
The translator should not impose compile-time costs for unused generality. A primary goal of any translator should be low cost translation.
The user should have control over the level of optimization applied to his programs. He should have control over the costs and benefits he obtains from the translator. Optimization is unimportant to some programs and only sometimes important in the development of any program.
Translators should be able to produce code for a variety of object machines. The machine independent parts of translators should be built independent of the code generators.
There is currently no common widely used computer in the DoD. There are at least 250 different models of commercial machines in use in DoD with many more home grown varieties. A common language must be applicable to a wide variety of models and sizes of machines. Translators should be written so that they can produce object code for several machines. This reduces the proliferation of translators and makes the full power of an existing translator available at the cost of producing an additional code generator.
The translator need not be able to run on all the object machines. Self-hosting is not required.
This follows from having an operational environment which includes many small machines which are unable to support the design, documentation, test, and debugging aids necessary for the development of timely, reliable or efficient software. It also follows from the need to avoid penalizing large machine users for the restrictions of small machines when a common language is used. It is desirable that the translator be able to run on a variety of machines, but this should not be used as an excuse to eliminate needed source language capabilities.
The translator should do full syntax checking, should check all operations and parameters for type compatibility and should verify that any other semantic restrictions on the source language are met.
The purpose of source language redundancy and avoidance of error prone language features is security. The price is paid in programmer inconvenience in having to specify his intent in greater detail. The payoff comes when the translator checks that the source language is internally consistent and adheres to its authors' stated intentions. There is a clear trade-off between security and programming ease; surveys conducted in the services show that the programmers as well as managers will opt for security over ease when given the choice. The same choice is dictated by the need for well documented modifiable software.
The translator should produce compile time explanatory diagnostic messages. These should include error messages and warnings.
The translator should attempt to provide the maximal useful feedback to the user. Diagnostic messages should not be coded but should be explanatory and in source language terms. Translators should continue checking after one error has been found but should be careful not to generate erroneous messages because of translator confusion. Warnings should be generated when a source language construct is exceptionally expensive or impossible to implement on the specified object machine. The set of diagnostic messages should be determined by the translator as a function of its environment and translation method are not specified in the language definition, although the language definition might provide guidelines.
The translator should be amenable to change.
The adopting of a common language should be a commitment to the current state of the art for programming language design for some duration. It should not, however, prevent access to new software and hardware technology, new techniques and new management strategies which will not impact source language design. In particular, inovation should be encouraged in the development of translators for a common language providing they implement exactly the source language as defined. Translators like all computer programs should be written in expectation of change; they should be well documented and easily modified.
It is desirable that translators for a common language be written in their own source language.
The existence of at least one such translator assures that the language is rich enough to perform a useful and demanding programming task. If the language is well defined and uniform in structure, a self description will contribute to user understanding of the language. The existence of translators written in their own source language also makes the compiler automatically available on any of their object machines (assuming sufficient hardware resources are available.)
F. DEBUGGING FACILITIESEdit
There should be effective debugging facilities associated with the language. The particular function or form of these facilities should not be dictated.
Software tools and aids are needed for the effective use of any programming language. Particularly important are debugging aids. There are however no recognized standards for debugging systems nor should there be. Although debugging facilities must be available or made available for any selected language, neither the language definition, selection process, or a standards group should dictate the particular debugging facilities or their form. Whatever facilities are built should however be widely available. This allows research and development to continue and does not impede transfer of new debugging technology.
Some debugging facilities suggested to this effort have been post mortem analysis including frequency information for statements obeyed and reports of abnormal termination in source language terms. Controlled snapshots and some form of tracing might be useful. There might be facilities for breakpoints, binary dumps with restart, traceback from errors, diagnostics in source language terms, interactive debugging, and filtered debugging data.
G. LANGUAGE DEFINITON, STANDARDS AND CONTROLEdit
The semantics of the language should be defined unambiguously and clearly. To the extent a formal definition assists in attaining these objectives, the language's semantics should be specified formally.
A complete and unambiguous definition of a common language is essential. Otherwise each translator will resolve the ambiguities and fill in the gaps in their own unique way. There are currently a variety of methods for formal specification of programming language semantics, but it remains a major effort to produce a rigorous formal description, and the resulting products are of questionable practical value. The real value in attempting a formal definition is that it uncovers the incomplete and ambiguous specifications. An attempt should be made to provide a formal definition of any language selected but success in that effort should not be requisite to its selection.
The user documentation of the language should be complete and tutorial in nature. The source language syntax should be given in BNF or some other easily understood formal metalanguage with the corresponding semantics given in English with examples.
The language should be intuitively correct and easily learned and understood by its potential users. A successful example of a language description of this type is the Algol-60 report.
There should be a control agent to ensure that there is only one version of the source language and that implementations of the language conform to that standard. Without controls a hopefully common language will become another umbrella under which new languages will proliferate while retaining the same name.
There should be identifiable support agent(s) responsible for maintaining the translators, thc design, development, debugging and maintenance aids, and the support and application libraries for the common language.
Language commonality is an essential step in achieving software commonality, but the real benefits accrue when projects and contractors can draw on existing software with assurance that it will be supported, when systems can build from off the shelf components or at least with common goals, and when funds can be spent to expand existing capabilities rather than building from scratch. Support of common widely used tools and aids must be provided independent of progects and their individual funding if common software is to be widely used.
Library extension facilities should be given the same kind of control and support as the kernel language.
In any given application of an extensible language three levels of the system must be learned and used: the kernel language, the standard extensions used in that application area, and the local application programs. The project must be responsible for the local application programs and local extensions, but not for the language and its standard extensions which are used by many projects and sites. lf the local project or site is responsible, then they will be responsible only for their own project and site unique language and extensions and there will be no common extensions.