To the homepage of Prof. Dr. Walther Umstätter - / - Back


Knowledge Measurement

German version

Abstract:

Digital working computers can be used like intelligent 'message meters' - for measurement of information, knowledge, noise, and redundancy. All these forms have to be measured in bits. Computer programs, for the detection of redundancy or the elimination of noise, are existing and improvable. Knowledge has to be seen as an extremely efficient information compression. It is understood as an information that can be explained, that means that we are able to anticipate or even to predict many occurrences. Knowledge measurement is economically important for the digital library, for the scientific community, and also for an improved estimation of intelligence in computers and creatures.

In the late winter of 1943/44 in Princeton (N.Y.) N. Wiener and J. v. Neumann arranged a meeting with other mathematicians like J. W. Tukey, engineers, and physiologists. They found that it was convenient to measure information in terms of numbers of yeses and noes and to call this unit of information a bit. During the World War II this knowledge was sometimes top secret. But in 1949 C. Shannon and W. Weaver published "The Mathematical Theory of Communication", that catalysed the thinking of numerous scientists. Sciences and humanities found in the interdependence of entropy and information a new homologous fundament.
The attempt to replace the term bit by the term shannon unit, used by the International Standard Organization (ISO) in 1975, to make the difference between a binary digit and a binary element clear, was retrospectively not very successful. In the following years the cybernetics, information theory, and the construction of the digital working computers changed the sciences and later on the humanities.

After nearly twenty years (10.1.1963) in the "Report of The President's Science Advisory Commitee USA" with the title: "Science, Government, and Information" by order of J.F. Kennedy, thirteen American scientists and specialists under the presidency of Alvin M. Weinberg, from Oak Ridge National Laboratory, estimated the amount of information in the Library of Congress with roughly 1013 bit.

This was a very revolutionary view with tremendous consequences for many scientific libraries and the online-documentation centres in the world. Databases like Chemical Abstracts, Biological Abstracts, Eric, Medlars, SciSearch etc. grew up and made important parts of the sciences retrievable.

But this estimation of information content was regrettable wrong:
1. Only the number of characters in the textual information was calculated. Graphics, pictures, movies, sound recordings, or chemical and physical characteristics of the different media left unconsidered. The information of some billion bits was not regarded. The reason for this limited way of thinking was simple and is well known. The computers of the third generation at that time started to change their function from mathematical instruments to storage and retrieval systems for ASCII characters.
2. There is not only information in the libraries (Kemp, D. A. 1976). It is also redundancy, noise, and knowledge in the estimated number of 1013 bit. Under simple conditions the value can be reduced ten or hundred times. The term information was and is till now often and easily confused with the information theoretic message.
So we have to see that Weinberg and his colleagues estimated only the textual messages in the Library of Congress by looking for the necessary storage capacity in a digital working computer.

Knowledge can be based on experience logic or causality (Umstätter, W. 1992). It is a special form of quality control (Ewert, G. und Umstätter, W.1997). All types of messages are measured in bits - information, knowledge, noise and redundancy. The digital working computer is the 'message meter' par excellence. Using this measuring instrument in the right way, we are also able to distinguish the different forms of messages in all scientific areas, stored in our libraries.
Classic redundancy can be measured only by comparison of the message contents in the memory. The most common form is the complete repetition of information which is part of the different types of error correction codes, in contrast to the error detection codes which are often used by data transmission systems. In this case, a repetition is only requested if a transmission error can be registered. Our natural language is a mixture of the different types of redundancy, which we can characterise as a posteriori. It is often supposed that redundancy is information in excess. But it is very important for trustworthiness, for the security in archiving, for high speed retrieval, and for other information logistic considerations. A systematic construction of redundancy are the well known check bits in Hamming's code.
An other very special form of redundancy is knowledge, which can be characterised as a priori. Conditional probabilities like Markov processes, rules, structures, or laws can be found out, and used for more or less vague predictions, for hypotheses, for inter- or extrapolations, and for real theories. At his best, the receiver is able to construct empirical an internal model (Umstätter,W. and Rehm, M. 1981) which can simulate his environment in an abstracted form, like living systems has done, using the biogenetic evolutionary strategy (Umstätter, W. 1981; Umstätter, W. and Rehm, M. 1984).

Based on the information theory, knowledge is measurable, by comparison of predicted information with the real incoming information from a dedicated sender. It can be measured bit by bit. A small part of right predictions can be obtained by chance. This part should be estimated and subtracted from the real knowledge. He is very important for learning systems and especially for evolutionary strategies. In contrast to the a posteriori and the a priori redundancy, real noise is not predictable.
Knowledge stored in a computer, used for knowledge measurement, can be the knowledge of an expert system, but also simulated special human knowledge.
Information theory is an extremely objective explanation for the total subjective communication between a determined source and the appointed receiver. Both of them use the identical signs or codes. Otherwise they do not communicate. Using only similar codes, the receiver has to interpret the arriving messages. This model of communication is so basic and so exact, that it can be used for all "communications" between atoms, planets, individuals, or whole societies. His mathematical content is only limited by the three elements of this ubiquitous model. The same is valid for the knowledge measurement. In so far, all theories are existing within certain domains, that means between the definition of a dedicated source, the transmitting channel, and his receiver.

Knowledge is anticipative and probabilistic. It's value is high, if most of his predictions, with regard to the information channel, are correct, in contrast to the information which is possibilistic, because the most improbable characters has by definition the highest information content in a message.

One of the greatest problems for the message meters is the reduction of noise, without loss of information. It's a matter of fact that we can tolerate variations, that means copies with some differences, mutations, or little defects, and we can classify them as redundant. By a higher resolution the differences must be distinguished. With other words, the precision of our knowledge has to be taken into account. Tolerating mistakes, a statement is undoubtedly right - but in the living world, our environment often isn't so tolerant.

In an example we can admit that a receiver is trying to predict the incoming data which will inform him about a simple linear development. Tolerating a deviation of 10 percent, all predictions may be correct. The knowledge measurement is yielding 100 %. Making the precision of the categorisation stronger, and reducing the tolerance to only 1 percent, most of the predictions will be wrong, and the knowledge is perhaps only 20 %. But this is a typical problem of the information - noise - redundancy relation, well known in the information theory. With growing noise, redundancy is more and more necessary for sufficient reliability. Another form of noise are the signals which can be detected but not decoded. All data without information- or redundancy-content have to be categorised under the term noise.
Knowledge measurement is undoubtedly an urgent desideratum in the field of knowledge management, in education, and most probable it will have an extremely high importance for the scientific community. Especially for the international knowledge transfer and the estimation of the financial value of libraries, quantification of knowledge is essential.

The typical way of testing knowledge in the educational area since a long time is the comparison of post-course requested information in exams. But a lot of questions requires only the repetition of facts, committed to memory. That means to recall data, like the retrieval in a database. Exams of this form are very bad indicators for real knowledge. They are only appropriate for testing memory-forced learning. In contrast, knowledge is characterised fundamentally by the ability to explain memorised facts or to know the reason for functionalities.

Another question which seems to be of interest pertains to the very nature of the knowledge creation process. One of the most interesting methods is probably the biogenetic evolutionary strategy which has extracted knowledge out of the special surrounding world, holding the individuals for long time alive. It is organised in the internal model, evolving in the ontogeny, repeating an abstracted phylogeny. Two types of knowledge came up on this way, the empiricism with an inductive element, and the reasoned synthesis with the deductive element.

It has been shown in 1981 by Umstätter, W. and Rehm, M. that the libraries in the world contained the hidden knowledge about the internal models in living systems, developed by the biogenetic evolutionary strategy. The consequence of this consideration was, that these strategy is leading to a characteristic, that we have to call knowledge, with the admirably anticipative ability of living systems (Umstätter, W. 1981). It is highly impressive to see that the information, knowledge, noise, and redundancy, stored on the DNA, is ranging roughly from 1 mm to 1 m of molecular length. This is a very small range compared with the difference of morphological and physiological complexity between a bacterium and the Homo sapiens (Umstätter, W. 1990). That can only be done by extremely used information compression.

To make it clear: Knowledge measurement should not be confused with the creation of knowledge. It's a matter of quantity, not quality. But it is helpful to consider the consequences for the construction of knowledge bases. The estimated results are not absolute values. They are the outcome of a purification like yields in chemistry. According to Occam's Razor, we try to minimise the bits per theory, and to maximise the correct predictions. Information is infinite and his measurement is scaled in a logarithmic way (Umstätter, W. 1992). The discussion about the finite or infinite characteristic of knowledge is till now an open question. In his tendency it seems to be deterministic. But it isn't, because the determinism depends on the information (Rapoport, A. 1953), the noise, and the redundancy that is obtained by the receiver. This is a strict probabilistic information theoretic basis.

Ib = Hmax = maximised information content
Ib = Mb - (Nb + Rb)

To get purified information, noise has to be eliminated by a noise filter. Such filters we use, e.g. in the so called data "compression" of pictures, in which a nearly white area is for instance abstracted to a plain white area. In reality, this is a data abstraction. For redundancy extraction, we use a real data compression, by summing up all forms of repetition, without any loss of information. And knowledge means to find out a real information compression.

Walther Umstätter

Ewert, G. und Umstätter, W.: Lehrbuch der Bibliotheksverwaltung. Hiersemann Verl. Stuttgart (1997)
Kemp, D. A.: The Nature of Knowledge - an introduction for librarians (Clive Bingley, London 1976)
Rapoport, A.: Operational Philosophy - Integrating Knowledge and Action (Harper and Brothers, New York, (1953).
Umstätter,W. und Rehm, M.: Einführung in die Literaturdokumentation und Informationsvermittlung (Saur Verl., München, 1981).
Umstätter, W.: Kann die Evolution in die Zukunft sehen? Umschau 81 (17) 534-535 (1981).
Umstätter, W. und Rehm, M.: Bibliothek und Evolution. Nachr. f. Dok. 35 (6) 237-249 (1984).
Umstätter, W.: Naturw. Die Wissenschaftlichkeit im Darwinismus. Rundsch. 21 (9) Beil.: Biologie Heute 4-6 (1990).
Umstätter, W.: Die evolutionsstrategische Entstehung von Wissen. In: Fortschritte in der Wissensorganisation Band 2. Hrsg. Deutsche Sektion der Internationalen Gesellschaft für Wissensorganisation e.V. S.1-11, Indeks Verlag (1992)
Umstätter, W.: Die Skalierung von Information, Wissen und Literatur. Nachr. f. Dok. 43 (4) 227-242 (1992a).


Last update: 1. September 1998 © by Walther Umstaetter