Skip to content

Kolmogorov Quotient

Mark Janssen edited this page Aug 26, 2019 · 3 revisions

Let's say you want to explore the notion of quantifying the amount of elegance a programming language provides. That is, the amount a high-level language is able to reduce the complex via it`s language constructs.

We'll define this idea of "simplification" as a factor of text-wise reduction (fewer characters needed to express a complex concept, à la Algorithmic Information Theory) and another, less easy-to-quantify concept of maintainability.

Fleshing out this latter concept, it is clear it has to do with how easily one can establish programmer consensus for the given task; i.e. how many programmers of the language would put it back the same way you've expressed it or otherwise agree on the best implementation between different implementations of the same problem.

I will define the Kolmogorov Quotient so that higher numbers for a given language denote a reduction in the complexity of solving the problem in the given language.

The metric for "text-wise reduction" should incorporate a constant factor based on (non-identifier) symbols used in the language and source text. These factors will be the same across all languages implemented (i.e. designed) for a given architecture (e.g. VonNeumannArchitecture vs. Symbolics) and will be a measure of the significance of the symbol; i.e. the topology of the language tokens. (These terms, alas, will also need to be fleshed out and defined.)

Once the basic premise and a methodology above is agreed to, it is only a matter of a rough constant of difference for any specific implementation/architecture. That is, as long as the architecture is the same across all measurements, the number should be valid and comparable between languages.

But it could be implemented something like so: Pick a language "close to the machine", like C or Assembly, and measure the amount of bytes of machine code it used to implement a standard suite(*) of common, non-threaded programming tasks (base_language_count). Then code the exact same functionality in the language you are wanting to measure (without using external libraries) and count the number of bytes of source code (test_language_count).

KQuotient = (2/#_of_equivalent_programs) * (base_language_count / test_language_count)

Since time vs. space is equal in importance (or "tradeoff value"), the equation must deal with two possible implementation ideals in the base language. That means that #_of_equivalent_programs should be equal to 2.

Also, base_language should probably be machine code and the number of bytes generated, while test_language is the number of text (source) characters, also in probably in bytes.

One should also record, for discussion, number of "equal programs". By which I mean, the number of programs fluent programmers of the language agree that are the best and equivalent solutions to the problem. (further: "equivalent solutions" are those who's output is the same for every input.) Languages which have a large number of equal programs say something interesting about either the programming language or the types of programmers the language attracts. What it says, I don't know.... :^)

The first ratio should always be greater than 1.0. I suppose one could game the metric by designing a language with single letter keywords, but those shouldn't count.

(*) "standard suite of common programming tasks...": I see two main categories:

  • Data-processing suite, limited to simple text I/O (computation towards the machine)
  • GUI suite (computation towards the user)
The purpose of this idea is to end the tiring language wars about whose language is the best. By giving a quantitative metric, people can at least argue better.
Clone this wiki locally