Thursday, April 16, 2015

Advanced Software Language Design Concepts

Minimal Specification

A minimally specified program is the idea of describing exactly what is needed to accomplish an algorithm and nothing else.  For example, extraneous statements are often added to software and include inefficiencies (conceptual mistakes), debug or logging.  These should be indicated as extraneous within the language.

As all of these statements are essentially commentary, let us propose "/:" to prefix in inessential line, "//" to prefix a traditional comment, and "/?" to prefix a documentation comment.  We'll prefix use "/*" rather than "/" to specify multi-line.

// Let's log now...
/: log(INFO,"This is a log message");

/*: log(INFO,"Contents of list");
for l in list.items()

But extra statements do not constitute the entirety of unnecessary information.  What about statement ordering?  Rather than specify unnecessary order, let's specify different syntax for lexical scoping rules that allow different ordering:
[] = any order
() = specific order
{} = only one of

So for example:

Point add(Point a, Point b)
    x = a.x+b.x;
    y = a.y+ b.y;
  return Point(x,y);

This is a very succinct way increase parallelism in software.  A clever compiler can use this information to reorder instructions for optimization, spawn threads or even start "micro-threads" (a short simultaneous execution on 2 processors of a multi-core machine which share the same stack before the moment of separation).

If the concept of minimal specification is applied throughout the language, there are quite a few other interesting language ideas that emerge.

Syntatic Specifications


Interfaces exist in one form or another in many programming languages.  However, the related type systems suffer from a lack of flexibility that causes them to be less than fully utilized.

Type specifications should be parametric.  That is, be able to specify multiple types simultaneously:

type Point({int, float, double} ElemType ) =
  ElemType x,
  ElemType y

(we don't need template <> notation, types are parametric)

You could quickly define a grouping of types (remember that {} means "one of"):

type Number  = {int, float, double}
 In cases where the constituent types do not implement the same interface (do not have the same member operators), the operators available to Number is the intersection of the operators available in its constituent types.

Aside: This is very different than the following 3-tuple:
type triple = (int, float, double)

Let's define a keyword: the "any" type means any type!

Let's specify the addition function, where the parameters can be heterogeneous Point types:
Point Add( Point a, Point b);

Let's specify the addition function where all objects must be the same fully-realized type:
ParamType Add( (ParamType = Point) a, ParamType b);

Interface Reductions

Languages today almost exclusively allow programmers to add to the existing symbol table.  The only notable exception is the use of public, private, and protected in C++ and other object-oriented languages.

However, these "canned" namespaces are based on program structure assumptions that miss the complexity of modern software development.  For example, an API may have multiple levels of interface, depending the application programmer's chosen power/complexity trade off.  The implementation of the API may have specific functions needed to interface with an optional component.  These functions and related member variables, could be removed during compilation if the other component is not part of the build, resulting in space and time efficiencies.  The implementation may have a debugging interface...

Instead, let us define interface groups and allow classes to include specific prototypes and interfaces into the group:

interface group API;
interface group GUI;
interface group data;

A module can choose what interface groups to present to other software layers.  It can combine pieces of other interface groups into an new group and present that.  This has the effect of reducing the namespace. 

Given an extremely flexible syntax parser, you should be able to specify most modern languages in a single language.

Semantic Specifications

Interfaces constitute syntatic specifications.  What about semantics?  A semantic specification defines how an object should behave.  Today we get away with concepts like "assert" and "unit test"; but there is no formal specification of semantics.  Without a formal specification engineers cannot write adhering implementations or formal proofs and compilers cannot apply logical reasoning for optimization.

  For example:

  semantic stack(any a) = assert(a == a.push(x).pop())

  semantic queue(any b) =
    any (x,y,first,second);
    a.push_back(x),     a.push_back(y),
    first = a.pop(),
    second = a.pop(),
    assert(x == first),
    assert(y == second)

An interface actually consists of both syntax (interface) and test (semantic) specifications:

type List(any T) =
  def add(T a) {...},
  def remove(T a) {...},
  def push(T a) {...},
  def T pop() {...},

  semantic(List a, assert(a == a.add(x).remove(x))),
  implements semantics stack;
  implements semantics queue;

Performance Specifications

Performance specification is an important part of the semantic specifications from a practical perspective, although it is (generally) not part of the minimal specification (so we'll use the /: prefix).

Why is performance specification important? A programmer is confronted with multiple implementations of an interface (say a List, or Map).  To pick the optimal implementation he must match the usage patterns in his code with the implementation that implements those member functions most efficiently.  To do so correctly, he needs classes and member functions to be annotated with performance specification.

type MyList(any T) =
   int length;
   def push(T a) {...}, //: O(p=1,m=1)
   def find(T a) {...}, //: O(p=length/2, m=1)
   def quickSort(T a) {...}, //: O(p=length*log(length), m=1)
   MyList(T) clone() {...}, //: O(p=length, m=length)

Note, given these performance specifications it may be possible for the profiler to feed back data into the compiler to recommend the best implementation.

Computer-Assisted Development

Integrated IDE

The language should not be defined solely in ASCII format.  Today's computers are fully capable of displaying binary data (pictures, music, etc) in human-consumable format inside the context of a traditional program editor and so languages should allow this data to be included.

var Image SplashImage = [[actual image here]]

Computer Annotated Source

Continuing the philosophy of minimal specification let us NOT specify the specific list required for this task.  Let us just specify that it must be an object with a list and GUI interface:

var (List, GUI) choices,

choices.push_back("last choice"),

The compiler can choose any object that provides both the List and GUI interfaces.  During profiling execution, the compiler keeps track of how often each API was called.  Although this is not the case in the above example, let us imagine that the push_back() function was called repeatedly in a performance sensitive area.

After execution, the system notices this and chooses an implementation of choices that optimizes the push and push_back functions based on the performance annotations that are part of each classes' definition (see above).  It annotates the source code to this effect, using the "inessential" marker "/" with the computer-can-change annotation "|":

var (List, GUI) choices,  //| instantiate DoublyLinkedList(string)

choices.push_back("last choice"),

If the programmer wants to override this choice or stop it from ever changing he can remove the computer-can-change annotation:

var (List, GUI) choices,  /: instantiate MyDoublyLinkedList(string)

Or of course, using the traditional method:
var MyDoublyLinkedList(string) choices;

Compiler Interpreter Equivalence