Lee Spector, Chris
Perry, Jon Klein,
and Maarten Keijzer
School of Cognitive Science
Hampshire College
November, 2003 - September, 2004 (See the document version history at the end of this document. This document is a successor to the Push 2.0 Programming Language Description, from which it borrows large chunks of text.)
Overview
Push Concepts
Simple Examples
Configuration
Random Code Generation
Implementation Notes
Push3 vs. Push2
Push2 vs. Push1
Under Discussion
Type/Instruction Catalog
Acknowledgments
Document Version History
Push is a programming language intended primarily for use in evolutionary computation systems (such as genetic programming systems), as the language in which evolving programs are expressed. Push has an unusually simple syntax, which facilitates the development (or evolution) of mutation and recombination operators that generate and manipulate programs. Despite this simple syntax, Push provides more expressive power than most other program representations that are used for program evolution. Push programs can process multiple data types (without the syntax restrictions that usually accompany this capability), and they can express and make use of arbitrary control structures (e.g. recursive subroutines and macros) through the explicit manipulation of their own code (via a "CODE" data type). This allows Push to support the automatic evolution of modular program architectures in a particularly simple way, even when it is employed in an otherwise ordinary genetic programming system (such as PushGP, which is a "generic" GP system except that it evolves Push programs rather than Lisp-style program trees). Push can also support entirely new evolutionary computation paradigms such as "autoconstructive evolution," in which genetic operators and other components of the evolutionary system themselves evolve (as in the Pushpop and SwarmEvolve2 systems).
This document describes version 3.0 of the Push programming language (a.k.a "Push3"), which shares general features with the first and second versions of Push (a.k.a "Push1" and "Push2") although several details have changed with each new version. Although it is based on Push1, a good introduction to the basic principles of Push and its use for evolutionary computation is:
Spector, L., and A. Robinson. 2002. Genetic Programming and Autoconstructive Evolution with the Push Programming Language. In Genetic Programming and Evolvable Machines, Vol. 3, No. 1, pp. 7-40. (http://hampshire.edu/lspector/pubs/push-gpem-final.pdf, pdf 216KB)
A more recent discussion, based on Push2, is in:
Spector, L. 2004. Automatic Quantum Computer Programming: A Genetic Programming Approach. Boston, MA: Kluwer Academic Publishers. (Note that the chapters on Push can be read independently of the chapters on quantum computing. More information on the book is available from http://hampshire.edu/lspector/aqcp/.)
The present document is a self-contained specification for Push3 that also briefly describes differences between Push1, Push2, and Push3. That is, one should be able to use Push3 or even to re-implement Push3 using this document alone. But it does not address the motivations behind the design -- it does not discuss why one might want to use Push in the first place. For such discussions please consult the references cited above, and/or the additional publications listed at http://hampshire.edu/lspector/push.html.
Freely available Push implementations are listed at http://hampshire.edu/lspector/push.html. A mailing list for Push-related discussions can be accessed via http://lists.hampshire.edu/mailman/listinfo/push.
Push achieves its combination of syntactic simplicity and semantic power through the use of a stack-based execution architecture that includes a stack for each data type. A CODE data type, with its own stack and an associated set of code-manipulation instructions, provides many of the more interesting features of the language. Push instructions, like instructions in all stack-based languages, take any arguments that they require and leave any results that they produce on data stacks. To provide for "stack safe" execution of arbitrary code Push adopts the convention, used widely in stack-based genetic programming, that instructions requiring arguments that are not available (because the relevant stacks are empty) become NOOPs; that is, they do nothing. Because Push's stacks are typed, instructions will always receive arguments and produce results of the appropriate types (if they do anything at all), regardless of the contexts in which they occur.
The syntax of Push is simply this:
program ::= instruction | literal | ( program* )
In other words:
Some implementations may require spaces around parentheses. Parenthesized sequences are also referred to as "lists," and Push programs can in fact be treated as list data structures. Literals are constants such as "3" (an integer constant) and "3.14" (a floating point number constant) and "TRUE" (a Boolean constant). Instruction names are not case sensitive and they can include the "." character. Instruction names generally start with the name of the type that they primarily manipulate, followed by a "."; for example, INTEGER.+ is the instruction for adding two integers, and BOOLEAN.DUP is the instruction for duplicating the value on the top of the Boolean stack. In some cases, when an instruction interacts with data of multiple types, it is not obvious to which type the instruction "belongs"; in these cases the instruction prefix is usually taken from the type of the primary result of the instruction (if any).
Execution of a Push program involves the recursive application of the following procedure:
To execute program P:
If P is a single instruction then execute it.
Else if P is a literal then push it onto the appropriate stack.
Else (P must be a list) sequentially execute each of the
Push programs in P.
The recursive executions implied in the final line of this procedure can be implemented in at least two ways. The simplest technique, which was used in versions of Push prior to Push3, is to rely on the support for recursion in the language within which Push is implemented; that is, one simply writes the procedure as specified, with a sequence of recursive calls on the final line. One of the primary innovations in Push3 is to manage the recursion in Push itself, using a new EXEC (execution) stack (this idea was developed by Maarten Keijzer). In Push3 the procedure above is recast as follows:
To execute program P:
Push P onto the EXEC stack
LOOP until the EXEC stack is empty:
If the first item on the EXEC stack is a single instruction
then pop it and execute it.
Else if the first item on the EXEC stack is a literal
then pop it and push it onto the appropriate stack.
Else (the first item must be a list) pop it and push all of the
items that it contains back onto the EXEC stack individually,
in reverse order (so that the item that was first in the list
ends up on top).
In a sense this is merely an alternative implementation, with no semantic significance; it produces the same results as the simpler recursive implementation UNLESS we do new things to take advantage of the EXEC stack. One new capability provided by the EXEC stack results from the fact that the complete state of the Push interpreter now resides in the stack contents and the values of a few global variables (which store interpreter parameter and name bindings -- see below). This means that it becomes trivial to make a Push interpreter "reentrant" in the sense that it can be suspended and re-started at any time during a computation. This can be useful, for example, when a Push interpreter is embedded in an environment in which other processes (perhaps including other Push interpreters) must also be given CPU time. Such a capability could also be provided in other ways -- for example by dedicating a separate operating system process to each Push interpreter -- but the EXEC stack mechanism makes this easier and more efficient.
Other new capabilities provided by the EXEC stack result from explicit manipulation of the code that it contains. This allows for particularly efficient implementation of certain control structures including combinators. A simple example of this new functionality is given in the Simple Examples section below.
A top-level call to the Push interpreter may be provided with a second program, called the "configuration code," that can be used to preload values onto stacks and to set interpreter parameter values. In addition, the main program passed to the top-level call will itself normally be pushed onto the CODE stack before execution; this convention simplifies the expression of some recursive programs (see below for an example). If this behavior is not desired then one can turn it off by setting the TOP-LEVEL-PUSH-CODE parameter to FALSE.
The standard NAME data type provides for symbolic names that can be bound to values (thereby acting as variables and defined instructions) using DEFINE instructions. Any identifiers that do not represent known Push instructions or literals of other type (e.g. TRUE and FALSE) are recognized as NAMEs. If a name has not previously been given a value then it is pushed onto the NAME stack when it is encountered. A subsequent call to a DEFINE instruction (such as INTEGER.DEFINE or CODE.DEFINE) will bind the name on top of the NAME stack to the top value of the designated stack. From that point forward the name will act as an instruction which, when executed, will push the bound value onto the EXEC stack. For most types this will result in the value ending up, after the next execution cycle, back on the stack from which it originally came (since the execution of a literal just pushes it onto the appropriate stack). If the bound value is code, however, this will result in the bound value being EXECUTED -- so the name will act as a newly defined instruction.
A NAME.QUOTE instruction is provided to move names that already have definitions back onto the NAME stack, presumably for the sake of re-definition. When NAME.QUOTE is executed a flag is set that causes the next encountered name to be pushed onto the NAME stack, whether or not it has previously been bound (and the flag is then cleared, whether or not the name was previously bound). The CODE.QUOTE instruction is similar in some respects; it causes the next encountered piece of code, whatever it is, to be pushed onto the CODE stack rather than being executed. This is useful for moving code onto the CODE stack for manipulation and/or execution by other instructions. CODE.QUOTE can be implemented with a flag, similarly to NAME.QUOTE, but the existence of the EXEC stack permits a simpler implementation: simply move the top item of the EXEC stack to the CODE stack. Push includes a full suite of list-manipulation instructions that can be used to modify code in arbitrary ways, along with execution instructions (such as CODE.DO and CODE.DO*TIMES) that can be used to execute the modified code.
A Push interpreter contains a random code generator that can be used to produce random programs or program fragments; the algorithm for this is provided in the Random Code Generation section below. The random code generator can be called from outside the interpreter (e.g. to create or mutate programs in a genetic programming system) or from a standard CODE.RAND instruction (which is analogous to RAND instructions available for other types). An "ephemeral random constant" mechanism allows randomly-generated code to include newly-generated literals of various types.
Execution safety is an essential feature of Push, in the sense that any syntactically correct program should execute without crashing or signaling an interrupt to the calling program. This is because Push is intended for use in evolutionary computing systems, which often require that bizarre programs (for example those that result from random mutations) be interpreted without interrupting the evolutionary process. The "stack safety" convention described above (that is, the convention that any instruction that finds insufficient arguments on the stacks acts as a NOOP) is one component of this feature. In addition, all instructions are written in ways that are internally safe; they have well defined behavior for all predictable inputs, and they typically NOOP in predictable "exceptional" situations (like division by zero). Additional safety concerns derive from the availability of explicit code manipulation and recursive execution instructions, which can in some cases produce exponential code growth or non-terminating programs. In response to these concerns a Push interpreter must enforce two limits:
The convention regarding the order of arguments for instructions that are normally rendered as infix operators is that the argument on the top of the stack is treated as the right-hand argument and the argument second-from the top is treated as the left-hand argument. This means that the linear representation of an expression containing one of these instructions looks like the normal infix expression, except that the instruction is moved to the end. For example, we divide 3.14 by 1.23 using "( 3.14 1.23 FLOAT./)" and we subtract 2 from 23 using "( 23 2 -)".
While Push's stacks are generally treated as genuine stacks---that is, instructions take their arguments from the tops of the stacks and push their results onto the tops of the stacks---a few instructions (like YANK and SHOVE) do allow direct access to "deep" stack elements by means of integer indices. To this extent the stacks can be used as general, random access memory structures. This is one of the features that ensures the Turing-completeness of Push (another being the arbitrary name/value bindings supported by the NAME data type and DEFINE methods; see below).
This section contains just a few simple examples, to give the reader a
feel for the
language and to demonstrate some of its unique features.
First, some simple arithmetic and logic:
( 2 3 INTEGER.* 4.1 5.2 FLOAT.+ TRUE FALSE BOOLEAN.OR )
Execution of this code leaves the relevant stacks in the following states:
BOOLEAN STACK: ( TRUE )
CODE STACK: ( ( 2 3 INTEGER.* 4.1 5.2 FLOAT.+ TRUE FALSE BOOLEAN.OR ) )
FLOAT STACK: ( 9.3 )
INTEGER STACK: ( 6 )
Next, some more "scrambled-looking" arithmetic:
( 5 1.23 INTEGER.+ ( 4 ) INTEGER.- 5.67 FLOAT.* )
Execution of this code leaves the relevant stacks in the following states:
CODE STACK: ( ( 5 1.23 INTEGER.+ ( 4 ) INTEGER.- 5.67 FLOAT.* ) )
FLOAT STACK: ( 6.9741 )
INTEGER STACK: ( 1 )
A few points to note about this example:
Here is a tiny program that adds an integer pre-loaded onto the stack to itself:
( INTEGER.DUP INTEGER.+ )
When run with 5 pre-loaded onto the INTEGER stack, for example, this leaves 10 on top of the stack. The following does the same thing in a slightly more complicated way, pushing code onto the CODE stack and then executing it:
( CODE.QUOTE ( INTEGER.DUP INTEGER.+ ) CODE.DO )
This can be converted into a new "DOUBLE" instruction as follows:
( DOUBLE CODE.QUOTE ( INTEGER.DUP INTEGER.+ ) CODE.DEFINE )
or equivalently:
( CODE.QUOTE ( INTEGER.DUP INTEGER.+ ) DOUBLE CODE.DEFINE )
Or even more concisely, using the EXEC stack (more explanation of which is provided below), as:
( DOUBLE EXEC.DEFINE ( INTEGER.DUP INTEGER.+ ) )
After executing any of these definitions the name DOUBLE will act as an instruction that doubles the number on top of the INTEGER stack.
The following more complicated example computes the factorial of an integer pre-loaded onto the INTEGER stack. This example makes use of the fact that top-level calls to the interpreter normally push the executed code onto the CODE stack before execution:
( CODE.QUOTE ( INTEGER.POP 1 )
CODE.QUOTE ( CODE.DUP INTEGER.DUP 1 INTEGER.- CODE.DO INTEGER.* )
INTEGER.DUP 2 INTEGER.< CODE.IF )
This works by first pushing two pieces of code (for the base case and recursive case of the recursive factorial algorithm, respectively) onto the CODE stack; these are pushed on top of the code for the full program, which is pre-loaded onto the CODE stack by the top-level call to the interpreter. The subsequent code compares the provided integer with 2 and, depending on the result of this, executes one of the pushed pieces of code (and discards the other; see the documentation for CODE.IF in the Type/Instruction Catalog below for details). In the base case this will produce an answer of 1, while in the recursive case it will recursively compute the factorial of one less than the provided number, and multiply that result by the provided number. When called with 5 pre-loaded on the INTEGER stack this leaves the relevant stacks in the following states:
CODE STACK: (( CODE.QUOTE ( INTEGER.POP 1 )
CODE.QUOTE ( CODE.DUP INTEGER.DUP 1 INTEGER.- CODE.DO INTEGER.* )
INTEGER.DUP 2 INTEGER.< CODE.IF ))
BOOLEAN STACK: ( )
INTEGER STACK: (120)
This example is interesting because it demonstrates the use of the code stack for recursion, but there are much simpler ways to calculate the factorial function in Push. For example the following code uses an iteration instruction to compute the factorial of a number pre-loaded onto the INTEGER stack:
( 1 INTEGER.MAX CODE.QUOTE INTEGER.* 1 CODE.DO*RANGE)
The initial "1 INTEGER.MAX" is necessary only for the special case of an input of zero. And an even more parsimonious factorial function can be written using an iteration instruction that operates on the EXEC stack:
( 1 INTEGER.MAX 1 EXEC.DO*RANGE INTEGER.* )
Instructions that manipulate the EXEC stack appear to take their arguments "from the right," rather than from the left as all other Push instructions. This is because execution procedure pushes all of the items in a list onto the EXEC stack before executing any of them, and because the items to the right will be further down in the stack. So given a program like:
( A B C )
C will be on top of the EXEC stack when B is being executed. So if B accesses the EXEC stack it will see C, even though C is to its right. While this may at first be confusing, it allows EXEC instructions to manipulate code without quotation, producing particularly concise code. For example, consider the following program, which is intended to be executed with at least two integers on the INTEGER stack and at least two floating point numbers on the FLOAT stack:
( INTEGER.= CODE.QUOTE FLOAT.* CODE.QUOTE FLOAT./ CODE.IF )
The CODE.IF instruction takes two items off of the CODE stack and executes one or the other of them depending on what's on top of the BOOLEAN stack. In this case it will execute FLOAT.*, multiplying the top two floating point numbers, if the top two integers were equal, and it will execute FLOAT./, dividing the top two floating point numbers, otherwise. With EXEC.IF, which works analogously but using the EXEC stack rather than the CODE stack, this can be expressed more concisely as:
( INTEGER.= EXEC.IF FLOAT.* FLOAT./)
The EXEC stack also permits the concise expression of recursive control structures by means of "combinators." For example, the "Y" combinator, implemented in Push as EXEC.Y, inserts, beneath the top item of the EXEC stack, a second copy of that top item wrapped in another call to EXEC.Y. The resulting recursive calling sequence might later be terminated with the EXEC.K combinator or with EXEC.POP or EXEC.FLUSH. Consider the following implementation of a "while" loop:
( EXEC.Y ( <BODY/CONDITION> EXEC.IF ( ) EXEC.POP ) )
Execution of EXEC.Y inserts a copy of the entire expression (including EXEC.Y) beneath the remainder of the expression on the EXEC stack. The <BODY/CONDITION> code is then executed. If this leaves TRUE on top of the BOOLEAN stack then EXEC.IF leaves the following empty list on top of the EXEC stack and discards the call to EXEC.POP. After the execution of the empty list (which has no effect) the full original expression will again rise to the top of the EXEC stack, starting the next recursive call. If, on the other hand, the execution of the <BODY/CONDITION> leaves FALSE on top of the BOOLEAN stack, then EXEC.IF will discard the following empty list and instead execute the EXEC.POP. The call to EXEC.POP will remove the recursive call and thereby terminate the recursion. As a concrete example, consider:
( ARG FLOAT.DEFINE |
Define name ARG to store the top float. |
When run with an integer I pre-loaded on the INTEGER stack and floating point number F pre-loaded on the FLOAT stack, this will compute F raised to the I power.
Many other control structures can be implemented with combinators and other EXEC instructions, and the control structures can be added to the language as instructions using CODE.DEFINE and related instructions. The user is warned, however, that the use of arbitrary stack-manipulation instructions (such as ROT, YANK, and SHOVE) on the EXEC stack can create "execution spaghetti" -- that is, execution sequences that are very difficult to understand, and possibly difficult to edit/mutate without unexpected consequences. These instructions are provided for the EXEC type, as for all other types, but it might be prudent to use them sparingly. Manipulation of code on the CODE stack may also produce code that is difficult to understand, but at least it does so prior to, rather than during, its own execution. Whether evolution will make good use of EXEC stack manipulations is an open question.
A Push interpreter is configured by setting the values of interpreter parameters, including lists of the types and instructions that can appear in randomly generated code. This information can be specified either in a configuration file, the format of which is specified below, or in code that is passed to the interpreter as "configuration code." Configuration files may not be supported by all implementations, as the configuration code mechanism is functionally equivalent but simpler to implement.
A Push3 configuration file is a plain text file. Any line beginning with "#" is a comment and is ignored. Actual configuration lines come in three forms:
A parameter-setting line lists an implemented parameter (see below) and a value for that parameter. A type line consists of the word "type" followed by the name of an implemented type, and it has the effect of making the named type available in the interpreter; one effect of this is that the ephemeral random constant generator of the given type will be "turned on," allowing for constants of the type to be included in randomly-generated code. An instruction line consists of the word "instruction" followed by the name of an instruction; the effect of such a line is to allow the named instruction to appear in randomly-generated code.
Ideally an implementation should warn the user if the configuration "turns on" instructions that access stacks of "turned off" types, or if there are other apparent inconsistencies in the configuration. But it is not necessary that implementations do this, and there is no standardized set of inconsistencies that must be reported.
The following is a fragment of a valid configuration file:
## PARAMETER SETTINGS
MAX-RANDOM-FLOAT 1.0
MIN-RANDOM-FLOAT -1.0
MAX-RANDOM-INTEGER 10
MIN-RANDOM-INTEGER -10
EVALPUSH-LIMIT 1000
NEW-ERC-NAME-PROBABILITY 0.001
MAX-POINTS-IN-RANDOM-EXPRESSIONS 25
MAX-POINTS-IN-PROGRAM 100
## TYPES
type FLOAT
type NAME
type CODE
type BOOLEAN
type INTEGER
## INSTRUCTIONS
instruction INTEGER.FROMBOOLEAN
instruction INTEGER.FROMFLOAT
instruction INTEGER.>
instruction INTEGER.<
A Push implementation that supports configuration files should also provide a way to generate a complete configuration file, containing type and instruction lines for all implemented types and instructions. A reasonable way to configure such a Push interpreter is to start with such a complete configuration file and to comment out lines to turn off types and instructions. Note, however, that many instructions may depend on multiple types; for example, many code-manipulation and stack-manipulation instructions use integer indices and therefore require the INTEGER type, many comparison instructions require the BOOLEAN type for depositing their results, and all DEFINE instructions require the NAME type. Ideally, implementations should provide warnings when such dependencies are not satisfied by a configuration, but they need not do so; the person specifying the configuration should be sufficiently familiar with the instruction set to ensure that necessary types are "turned on."
The following parameters should be supported in configuration files:
The alternative mechanism for interpreter configuration is to pass "configuration code" as a second argument to the top level call to the interpreter. This code is simply Push code which may contain, among other things, calls to parameter-setting instructions. All of the parameters listed above have corresponding parameter-setting Push instructions, which are named with an "ENV." prefix. So, for example, the MIN-RANDOM-INTEGER parameter can be set to 100 using configuration code that includes "100 ENV.MIN-RANDOM-INTEGER". The "ENV" prefix here stands for "Environment," and it is inspired by a set of possible extensions, still under discussion, in which there would be a full-fledged environment type and an environment stack. The inclusion of parameter-setting instructions in randomly generated code is possible but probably not advisable.
The lists of "turned on" types and instructions are specified in configuration code using the ENV.TYPES and ENV.INSTRUCTIONS instructions, each of which takes an argument, which should be a list, from the CODE stack.
The following is an example piece of configuration code:
( 150 ENV.EVALPUSH-LIMIT
100.0 ENV.MAX-RANDOM-FLOAT
PI 3.141592 FLOAT.DEFINE
CODE.QUOTE ( FLOAT./ FLOAT.* FLOAT.- FLOAT.+ ) ENV.INSTRUCTIONS
CODE.QUOTE ( FLOAT ) ENV.TYPES )
This configuration code sets values for two parameters (EVALPUSH-LIMIT and MAX-RANDOM-FLOAT), leaving all other parameters at their default values (some of which may be implementation specific). It then defines an instruction called PI that will push 3.141592. The final two lines specify the instructions and types that can appear in random code; the specification here is for a minimal floating-point-only arithmetic configuration.
Several algorithms for the generation of random code have been described in the genetic programming literature. Code generation is less complicated for Push programs than it is for Lisp-style code trees, since in Push one doesn't have to worry about function "arity" or about function vs. argument positions when generating code. So it is easier, for example, to generate programs with predictable size and shape distributions.
The following is the standard Push random code generation algorithm, which is used for the CODE.RAND instruction. It may also be useful for the initialization of programs in evolutionary computation systems, and it is used for this purpose in PushGP. It produces a uniform distribution of sizes and what seems to be a reasonable distribution of shapes, in a reasonable amount of time.
Note that the instruction set referred to in RANDOM-CODE-WITH-SIZE should include any NAMEs that have been bound with a DEFINE instruction.
Function RANDOM-CODE (input: MAX-POINTS)
Set ACTUAL-POINTS to a number between 1 and MAX-POINTS,
chosen randomly with a uniform distribution.
Return the result of RANDOM-CODE-WITH-SIZE called with input
ACTUAL-POINTS.
End
Function RANDOM-CODE-WITH-SIZE (input: POINTS)
If POINTS is 1 then choose a random element of the instruction
set. If this is an ephemeral random constant then return a
randomly-chosen value of the appropriate type; otherwise
return the chosen element.
Otherwise set SIZES-THIS-LEVEL to the result of DECOMPOSE
called with both inputs (POINTS - 1). Return a list
containing the results, in random order, of
RANDOM-CODE-WITH-SIZE called with all inputs in
SIZES-THIS-LEVEL.
End
Function DECOMPOSE (inputs: NUMBER, MAX-PARTS)
If NUMBER is 1 or MAX-PARTS is 1 then return a list
containing NUMBER
Otherwise set THIS-PART to be a random number between 1 and
(NUMBER - 1). Return a list containing THIS-PART and
all of the items in the result of DECOMPOSE with inputs
(NUMBER - THIS-PART) and (MAX-PARTS - 1)
End
This section describes a few of the features of some current implementations and their interfaces. They can also be interpreted as suggestions for anyone building their own Push implementations. These are not features of the language per se, but they may help one to ensure that two implementations are consistent with one another. Implementation-SPECIFIC installation and usage notes should accompany each implementation.
Test suite scheme: A more-or-less standardized "test suite" scheme is intended to help ensure that a Push implementation behaves like other implementations. The scheme is to process three files (a configuration file, a file containing a list of literals for initializing the stacks, and a file containing a program) and to produce an output file that contains a list of literals which, if read back in to the interpreter (that is, if "executed" as a program), would re-create the stacks as they were at the end of the computation. The values in the output file are listed type by type, following the order in which types are declared in the configuration file. Each implementation should also provide some mechanism for comparing its output files with those of other implementations (for example by reading the files and comparing stack states, or by comparing the files in a way that disregards insignificant white space). For systems that use configuration code, rather than configuration files, this scheme would have to be altered somewhat.
Templates: It often makes sense to create slightly different versions of the same basic instruction for multiple types. For example, many of the standard stack-manipulation instructions (e.g. DUP, POP, etc.) make sense for all types (and depending on the way that the interpreter is written it may be possible to use identical method implementations for all of them). In Push1 an instruction overloading mechanism, using a TYPE type in the language itself, allowed one to exploit these commonalities (and also affected the semantics of the language in complex ways). In Push2 and Push3 the TYPE type was dropped and all instruction names refer to single methods; instruction names often include type names but this is just a convention. To recapture the software engineering benefits of the overloading mechanism (e.g. code reuse) an implementation should provide a template mechanism for type and instruction definitions.
Libraries: The configuration mechanism is intended to simplify the integration of new types/instructions into an interpreter. Any implementation of the language should provide a clear API for including libraries of types/instructions and for building/loading configuration files and/or configuration code.
Client-provided instructions with call-backs: Push interpreters are often imbedded within "client" programs that invoke interpreters on various inputs and do various things with the outputs. The integration of the interpreter into the client environment should be as tight as possible. Minimally, the client should be able to process a configuration file and/or configuration code, push/pop onto/from stacks, and invoke the interpreter on a piece of code. Ideally, the client should also be able to install new instructions that interact both with the client's environment and the interpreter state. For example, we use our C++ interpreter as a plug-in to the BREVE simulation environment (http://www.spiderland.org/breve), and we provide a way for the BREVE user to write instructions in BREVE's scripting language (which may, among other things, manipulate the interpreter's stacks) and add them to an embedded interpreter.
Push3 differs from Push2 mainly in the following ways:
"Big picture" changes:
Push2 | Push3 | |
Definition |
TIMES2 |
If TIMES2 is known to have no prior binding: TIMES2 or TIMES2 If TIMES2 may have a prior binding: NAME.QUOTE TIMES2 or NAME.QUOTE TIMES2 |
Invocation |
TIMES2 CODE.GET CODE.DO |
TIMES2 |
Additional changes:
Push2 differs from Push1 mainly in the following ways:
Changes to the language per se:
Refinements to our implementations and their interfaces:
Following are a couple of items that are currently under discussion for possible inclusion in future versions of Push:
Rejected alternative names for recent versions of Push:
The following are descriptions of "standard" types and instructions, in the sense that any Push3 implementation that provides these types/instructions should implement them in ways that conform to these descriptions. However, some implementations may not implement all of these types or instructions, and some implementations may implement more; use your implementation's configuration mechanism (described briefly above) to configure your implementation appropriately.
Unless otherwise noted all instructions POP all arguments that they consult. For example, the description of INTEGER.= states that it "Pushes TRUE onto the BOOLEAN stack if the top two INTEGERs are equal, or FALSE otherwise." In this case two items (the two that are compared) are popped from the INTEGER stack by the INTEGER.= instruction. In addition, unless otherwise noted all results are pushed after the arguments are popped. So for example BOOLEAN.= pushes the result of the comparison onto the BOOLEAN stack after popping the two values to be compared from the BOOLEAN stack. If any needed argument is not available then the instruction acts as a NOOP; that is, it does nothing, and leaves all stacks in the states that they were in prior to the call.
Indexing into stacks, e.g. for SHOVE, YANK, and YANKDUP instructions, is zero-based; that is, the top of the stack has index 0. Negative indices into stacks are interpreted as 0 (the stack top), and indices that exceed the stack depth are interpreted as the highest meaningful value (e.g. the stack bottom for YANK, or one beyond the stack bottom for SHOVE).
Type | BOOLEAN |
Description | For use in comparisons, logic, etc. Literals are TRUE and FALSE. Required for use of various comparison operators (e.g. INTEGER.=) and CODE.IF. |
Instructions |
|
Type | CODE |
Description | For explicit code manipulation and execution. May also be used as a general list data type. This type must always be present, as the top level interpreter will push any code to be executed on the CODE stack prior to execution. However, one may turn off all CODE instructions if code manipulation is not needed. |
Instructions |
|
Type | EXEC |
Description | Code queued for execution. The EXEC stack maintains the execution state of the Push interpreter. Instructions that specifically manipulate the EXEC stack can be used to implement various kinds of control structures. The CODE stack can also be used in this way, but manipulations to the EXEC stack are "live" in the sense that they are manipulating the actual execution state of the interpreter, not just code that might later be executed. |
Instructions |
|
Type | FLOAT |
Description | Floating-point numbers (that is, numbers with decimal points). |
Instructions |
|
Type | INTEGER |
Description | Integer numbers (that is, numbers without decimal points). |
Instructions |
|
Type | NAME |
Description | For creating bindings between symbolic identifiers and values of various types; that is, for implementing (global) variables and defined instructions. Bindings are created with DEFINE instructions. Any identifier that is not a known Push instruction or a known literal of any other type is considered a NAME and will be pushed onto the NAME stack when encountered, unless it has a definition (in which case its associated value will be pushed on the EXEC stack when it is encountered. The NAME.QUOTE instruction can be used to get a name that already has a definition onto the NAME stack. |
Instructions |
|
This material is based upon work supported by the National Science Foundation under Grant No. 0308540 and Grant No. 0216344, and by the Defense Advanced Research Projects Agency (DARPA) and Air Force Research Laboratory, Air Force Materiel Command, USAF, under agreement number F30502-00-2-0611.
The first version of this document was created in November, 2003.
YYYYMMDD
20031229: - Created this version history.
- Addition re: use of names for code variables.
- Additions related to instruction constants.
- Addition of RANDOM-SEED parameter.
20040109: - Tweaked descriptions of *.GET instructions (thanks Maarten Keijzer)
20040112: - Added Random Code Generation section.
- Changed specified behavior of division/modulus by zero to
be NOOP rather than pushing zero. (& related text changes).
- Change specified behavior of *.GET with unbound NAME to be NOOP.
- Added "Under Discussion" section.
20040113: - More rejected names for Push2.
20040412: - Added CODE.FROMBOOLEAN, CODE.FROMINTEGER, CODE.FROMFLOAT
CODE.FROMNAME, CODE.DO*COUNT, CODE.DO*TIMES.
20040730: - Major revisions for Push2->Push3 transition.
- Maarten Keijzer added as co-author.
20040731: - Fixed typos, added "Pushkin," other minor fixes.
- Fixed factorial examples for input of zero.
20040802: - Added correct attribution to Christophe Mckeon.
20040829: - Fixed bug in EXEC.IF example, thanks to Christophe Mckeon.
20040901: - Removed duplicate entry for BOOLEAN.POP, thanks to Christophe Mckeon.
20040910: - Cosmetic changes for publication as a Hampshire College Cognitive
Science Technical Report.
[end]