Johann Joss ideas about computer languages improvements:
In development by Mr. J. Joss ...
Use mathematical notation
Algol introduced different styles of representation. Today we have the hardware
which would allow to use the representation language as hardware representation
but we are stuck with ASCII or the ASR 32 teletype.
We do not even have the mathematical
symbols for >=, <= unequal, locical and and or!
What is for many people lost is, that a programming language is a description of
an algorithm. Prof. Rutishauser once said, that an informal description is not an
algorithm. Informally you can say: And then one iterates until eps is small enough.
In an Algol program, you have to say what "small enough" means and you have to
guarantee that this "small enough" is reached.
In my opinion the programming languages should evolve towards easyer reading.
we should be allowed to use (TeX notation) \frac{a+b}{c+d} instead of (a+b)/(c+d).
Of course, the printed output and IDE should look like the processed TeX.
New floating point standard:
I consider the IEEE floating point statdard a catastrophy. Using guard digits just makes
a program very difficult to control.
Fortunately, the GNU compiler collection corrects this fact. Tests were run on a PC using
C and Fortran.
Guard digits are used, except one switches on optimization or debug mode. In these modes
everything is predictable, no guard digits used.
Guard digits are mixed precision arithmetic. It is difficult enough to really understand fixed
precision. The mixed precision just adds more complexity to an already difficult situation.
Partial underflow: The number range where partial underflow is working is very small.
Say, you have an decimal exponent range of +-300 and 14 precision arithmetic. Partial
underflow may help up to -314, which is not really worth having. An algorithm which
makes good use of the partial underflow is difficult to find, if it exists at all.
Squelching the partial underflow to 0 makes ist much easier to detect and guard against.
Implementing partial underflow in software is extreamly dangerous. A program which
runs into that area almost stops. If this is in a real time situation it becomes a disaster.
If it is implemented in silicon, it is a waste of silicon and energy, which would have better usages.
When I learned programming, I got the impression, that Algol real numbers are the mathematical
real numbers+dirth.
In a lecture, Prof. Rutishauser presented a different view: There is the set of machine numbers M.
They form a subset of the real numbers. The machine operations are a mapping of MxM->M.
This mapping should conform to certain rules (axioms), so that theorems about algorithms can be proven.
The language designer has no business to define the machine operations but he has to present
a well defined interface to the underlying hardware. If he tries otherwise the resulting program
becomes so slow that it is almost useless (at least for a very big set of problems).
This is often not understood at all.
Hardware data type for array addresses:
A big question is whether a compiler should check the bounds of an
index to an array or not. Doing so slows down the program, not doing
it may lead to diffcult to find or even dangerous errors.
I cannot understand why this is not helped by the hardware.
Here is may proposal:
Have a hardware datatype for addresses. This datatype contains 3 adderesses:
The address of the element with index 0, the first and the last address of the
memory area. Addresss arithmetic would be done with special address manipulation
instructions and every indirect reference would be checked by the hardware.
This would not slow down the executeion because for a read access, the value could
be given to the processor immediately and the checking done during the execution.
If there is a violation of the check an interrupt would be generated.
Of course, this interrupt would come a bit late and the program cpounter would no
longer point to the actual instruction. But this is not a real limitation because
optimising compilers shuffle the machine instructions araound anyhow.
In any
case it would be a big help to find and remove otherwise difficult to track errors.
It also would stop all buffer overflow attacks and so make our computers much safer.
When memory is freed by a program the pointer would be made invalid.
The compiler
could use pointers only by indirect reference, so that there are no copies of the pointer
type variable, but only references. This would slow down execution and it is up to
the compiler writer (and user, if different compiler options would be made available)
Just as a remark: The instruction set of the PDP 10 computer had a hardware datatype
byte pointer. This was heavily used in the operating system (at these times the operating
systems were usually written in assembly language) for accesssing bit fields is system tables.
It was also used in I/O. You had just to redefine the byte pointer in order to make
a program work on full words (36 bits), 7-bit characters (normal ASCII) or 6-bit character
(e.g.used for file names etc.) In these days memory vas very valuable, so 6 bit characters
made sense.
Comments about string implementation
The standard C string library is inhearently unsafe and requires great care by the
programmer. Historically this is very understandable. When C was invented, the
computers had very limited memory and were for todays standards slow. One was
happy, if the task could be done at all. Also the programs were small and so less
error prone.
C uses 0-terminated strings. This has been critcised often because e.g. determining
the length requires scanning. Everybody who programmed assembly languages in
the 1960-ies and 70-ies found 0-terminated strings very easy to work with. There
are no registers wasted for counters and/or limits. This made the programs much
smaller and also usually much faster. The biggest drawback ist one character
is (zero) is no longer available. Usually this is not a real problem.
Today, the size of the programs is much less a problem. I cannot understand why the
standard C library with its danger of buffer overflow has not be deprecated and
even forbidden for web applications.
goto and if-then-else comments
The goto statement is quite contraversal. Many damn it. There are even some who
declare goto free progams as structured.
The problem of the goto statement is not the goto but the label. When I see a goto
in a program, I know immediately what it means. When I see a label, I have to search
through the whole block to see where all teh gotos are.
I guess that most of the aversion against the goto is in reality a fight against Fortran.
most of the gotos in a Fortran program are in fact if-then-else or if-then.
if-then-else ist much more preferable than a goto because it communicates the intentions
of the programmer much better.
Also the Fortran style invites a bad layout of the program. If you use if-then-else, you find
the then-part and the else-part together. In many Fortran programs you find a structure
like the following: At a certain point a special condition has to be handeled. At this point
an if with a goto is inserted and the normal flow continues. Later the special handling
is programmed and this is terminated by a jump back to the main sequence. Very often
the main code makes some assumptions which are no longer valid in the special case or
there is not even a good point to continue. This makes the programs difficult to read and
error prone.
Basically, there are 2 types of gotos: Forward gotos and backward gotos. In the theory
of computations about the most difficult problem is to determine whether an algorithm
terminates. Forward gotos never create loops by themselves and are for ths reason
quite harmless. On the other hand backward gotos can create loops and need special attention.
One place where gotos are very handy is for error handling. If a procedure finds
some error condition it very often has to do some cleanup before returning. These
error points can occur at various points in the procedure. For this gotos come very handy.
This technique ist used extensively in the Linux kernel. Some declare this as bad style.
I consider it a good style and the critqes as not understanding the problem.
Many goto free programs contain cleanup code spread out through the whole procedure,
followeg by a return statement. This makes maintenance of the program very hard and
error prone. When a new cleanup is needed, the programmer has to scan the whole
procedure for points where new cleanup is needed. Also it leads to duplicated cleanup
code.
An other place, where I prefere gotos is when a user input is checked and if found
incorrect an error message is displayed and a repeat of the input is requested.
I have some mixed feelings about this. There is usually only one reference to the
label, a global item, the label is used for something purely local and it is a backward goto.
The alternative solution is a loop. I consider this as extremely ugly. It is syntactically
a loop, but the loop is usually just executed once. So semantically it is not a loop.
It also misleads the compiler. The compiler may try to move invariant code out of a
loop which is only executed once. Also many programmers use an auxiliary variable
for the wermination of the loop like: ok:=false; while(!ok) ......
The ok is an even moreglobal object than the label. Also ok=false; is just not really correct. Nothing is not okat any point. This just adds other ugly parts to an already ugly concept.
I am not really happy with this use of gotos, but I do not know any better alternative at this moment.
own as a static variable declaration
One cumbersome concept of Algol are the own variables. The big problem there is their initialzation.
In modern object oriented programming, this concept is called static variables.
It is still a needed, but still cumbersome concept.
It is a situation like Dirac said: "Some new ideas are here needed."
The doughter of Prof. Rutishauser, one of the founders of Algol is working on a
biography of her father. I may get some useful data from there.