To the homepage of Prof. Dr. Walther Umstätter - / - Back
Abstract:
Digital working computers can be used like intelligent 'message meters'
- for measurement of information, knowledge, noise, and redundancy. All these
forms have to be measured in bits. Computer programs, for the detection of
redundancy or the elimination of noise, are existing and improvable. Knowledge
has to be seen as an extremely efficient information compression. It is
understood as an information that can be explained, that means that we are
able to anticipate or even to predict many occurrences. Knowledge measurement
is economically important for the digital library, for the scientific community,
and also for an improved estimation of intelligence in computers and
creatures.
In the late winter of 1943/44 in Princeton (N.Y.) N. Wiener and J. v. Neumann
arranged a meeting with other mathematicians like J. W. Tukey, engineers,
and physiologists. They found that it was convenient to measure information
in terms of numbers of yeses and noes and to call this unit of information
a bit. During the World War II this knowledge was sometimes top secret. But
in 1949 C. Shannon and W. Weaver published "The Mathematical Theory of
Communication", that catalysed the thinking of numerous scientists. Sciences
and humanities found in the interdependence of entropy and information a
new homologous fundament.
The attempt to replace the term bit by the term shannon unit, used by the
International Standard Organization (ISO) in 1975, to make the difference
between a binary digit and a binary element clear, was retrospectively not
very successful. In the following years the cybernetics, information theory,
and the construction of the digital working computers changed the sciences
and later on the humanities.
After nearly twenty years (10.1.1963) in the "Report of The President's Science Advisory Commitee USA" with the title: "Science, Government, and Information" by order of J.F. Kennedy, thirteen American scientists and specialists under the presidency of Alvin M. Weinberg, from Oak Ridge National Laboratory, estimated the amount of information in the Library of Congress with roughly 1013 bit.
This was a very revolutionary view with tremendous consequences for many scientific libraries and the online-documentation centres in the world. Databases like Chemical Abstracts, Biological Abstracts, Eric, Medlars, SciSearch etc. grew up and made important parts of the sciences retrievable.
But this estimation of information content was regrettable wrong:
1. Only the number of characters in the textual information was calculated.
Graphics, pictures, movies, sound recordings, or chemical and physical
characteristics of the different media left unconsidered. The information
of some billion bits was not regarded. The reason for this limited way of
thinking was simple and is well known. The computers of the third generation
at that time started to change their function from mathematical instruments
to storage and retrieval systems for ASCII characters.
2. There is not only information in the libraries (Kemp, D. A. 1976). It is also redundancy, noise, and knowledge in
the estimated number of 1013 bit. Under simple conditions the
value can be reduced ten or hundred times. The term information was and is
till now often and easily confused with the information theoretic message.
So we have to see that Weinberg and his colleagues estimated only the textual
messages in the Library of Congress by looking for the necessary storage
capacity in a digital working computer.
Knowledge can be based on experience logic or causality
(Umstätter, W. 1992). It is a special form of quality
control (Ewert, G. und Umstätter, W.1997). All types
of messages are measured in bits - information, knowledge, noise and redundancy.
The digital working computer is the 'message meter' par excellence. Using
this measuring instrument in the right way, we are also able to distinguish
the different forms of messages in all scientific areas, stored in our libraries.
Classic redundancy can be measured only by comparison of the message contents
in the memory. The most common form is the complete repetition of information
which is part of the different types of error correction codes, in contrast
to the error detection codes which are often used by data transmission systems.
In this case, a repetition is only requested if a transmission error can
be registered. Our natural language is a mixture of the different types of
redundancy, which we can characterise as a posteriori. It is often supposed
that redundancy is information in excess. But it is very important for
trustworthiness, for the security in archiving, for high speed retrieval,
and for other information logistic considerations. A systematic construction
of redundancy are the well known check bits in Hamming's code.
An other very special form of redundancy is knowledge, which can be characterised
as a priori. Conditional probabilities like Markov processes, rules, structures,
or laws can be found out, and used for more or less vague predictions, for
hypotheses, for inter- or extrapolations, and for real theories. At his best,
the receiver is able to construct empirical an internal model
(Umstätter,W.
and Rehm, M. 1981) which can simulate his environment
in an abstracted form, like living systems has done, using the biogenetic
evolutionary strategy
(Umstätter,
W. 1981; Umstätter, W. and
Rehm, M. 1984).
Based on the information theory, knowledge is measurable, by comparison of
predicted information with the real incoming information from a dedicated
sender. It can be measured bit by bit. A small part of right predictions
can be obtained by chance. This part should be estimated and subtracted from
the real knowledge. He is very important for learning systems and especially
for evolutionary strategies. In contrast to the a posteriori and the a priori
redundancy, real noise is not predictable.
Knowledge stored in a computer, used for knowledge measurement, can be the
knowledge of an expert system, but also simulated special human knowledge.
Information theory is an extremely objective explanation for the total subjective
communication between a determined source and the appointed receiver. Both
of them use the identical signs or codes. Otherwise they do not communicate.
Using only similar codes, the receiver has to interpret the arriving messages.
This model of communication is so basic and so exact, that it can be used
for all "communications" between atoms, planets, individuals, or whole societies.
His mathematical content is only limited by the three elements of this ubiquitous
model. The same is valid for the knowledge measurement. In so far, all theories
are existing within certain domains, that means between the definition of
a dedicated source, the transmitting channel, and his receiver.
Knowledge is anticipative and probabilistic. It's value is high, if most of his predictions, with regard to the information channel, are correct, in contrast to the information which is possibilistic, because the most improbable characters has by definition the highest information content in a message.
One of the greatest problems for the message meters is the reduction of noise, without loss of information. It's a matter of fact that we can tolerate variations, that means copies with some differences, mutations, or little defects, and we can classify them as redundant. By a higher resolution the differences must be distinguished. With other words, the precision of our knowledge has to be taken into account. Tolerating mistakes, a statement is undoubtedly right - but in the living world, our environment often isn't so tolerant.
In an example we can admit that a receiver is trying to predict the incoming
data which will inform him about a simple linear development. Tolerating
a deviation of 10 percent, all predictions may be correct. The knowledge
measurement is yielding 100 %. Making the precision of the categorisation
stronger, and reducing the tolerance to only 1 percent, most of the predictions
will be wrong, and the knowledge is perhaps only 20 %. But this is a typical
problem of the information - noise - redundancy relation, well known in the
information theory. With growing noise, redundancy is more and more necessary
for sufficient reliability. Another form of noise are the signals which can
be detected but not decoded. All data without information- or redundancy-content
have to be categorised under the term noise.
Knowledge measurement is undoubtedly an urgent desideratum in the field of
knowledge management, in education, and most probable it will have an extremely
high importance for the scientific community. Especially for the international
knowledge transfer and the estimation of the financial value of libraries,
quantification of knowledge is essential.
The typical way of testing knowledge in the educational area since a long time is the comparison of post-course requested information in exams. But a lot of questions requires only the repetition of facts, committed to memory. That means to recall data, like the retrieval in a database. Exams of this form are very bad indicators for real knowledge. They are only appropriate for testing memory-forced learning. In contrast, knowledge is characterised fundamentally by the ability to explain memorised facts or to know the reason for functionalities.
Another question which seems to be of interest pertains to the very nature of the knowledge creation process. One of the most interesting methods is probably the biogenetic evolutionary strategy which has extracted knowledge out of the special surrounding world, holding the individuals for long time alive. It is organised in the internal model, evolving in the ontogeny, repeating an abstracted phylogeny. Two types of knowledge came up on this way, the empiricism with an inductive element, and the reasoned synthesis with the deductive element.
It has been shown in 1981 by Umstätter, W. and Rehm, M. that the libraries in the world contained the hidden knowledge about the internal models in living systems, developed by the biogenetic evolutionary strategy. The consequence of this consideration was, that these strategy is leading to a characteristic, that we have to call knowledge, with the admirably anticipative ability of living systems (Umstätter, W. 1981). It is highly impressive to see that the information, knowledge, noise, and redundancy, stored on the DNA, is ranging roughly from 1 mm to 1 m of molecular length. This is a very small range compared with the difference of morphological and physiological complexity between a bacterium and the Homo sapiens (Umstätter, W. 1990). That can only be done by extremely used information compression.
To make it clear: Knowledge measurement should not be confused with the creation
of knowledge. It's a matter of quantity, not quality. But it is helpful to
consider the consequences for the construction of knowledge bases. The estimated
results are not absolute values. They are the outcome of a purification like
yields in chemistry. According to Occam's Razor, we try to minimise the bits
per theory, and to maximise the correct predictions. Information is infinite
and his measurement is scaled in a logarithmic way
(Umstätter, W. 1992). The discussion about the
finite or infinite characteristic of knowledge is till now an open question.
In his tendency it seems to be deterministic. But it isn't, because the
determinism depends on the information (Rapoport, A.
1953), the noise, and the redundancy that is obtained by the receiver.
This is a strict probabilistic information theoretic basis.
To get purified information, noise has to be eliminated by a noise filter. Such filters we use, e.g. in the so called data "compression" of pictures, in which a nearly white area is for instance abstracted to a plain white area. In reality, this is a data abstraction. For redundancy extraction, we use a real data compression, by summing up all forms of repetition, without any loss of information. And knowledge means to find out a real information compression.
Ewert, G. und Umstätter, W.: Lehrbuch der
Bibliotheksverwaltung. Hiersemann Verl. Stuttgart (1997)
Kemp, D. A.: The Nature of Knowledge - an introduction for librarians (Clive
Bingley, London 1976)
Rapoport, A.: Operational Philosophy - Integrating Knowledge and Action (Harper
and Brothers, New York, (1953).
Umstätter,W. und Rehm, M.: Einführung in die Literaturdokumentation
und Informationsvermittlung (Saur Verl., München, 1981).
Umstätter, W.: Kann die Evolution in die Zukunft
sehen? Umschau 81 (17) 534-535 (1981).
Umstätter, W. und Rehm, M.: Bibliothek und Evolution.
Nachr. f. Dok. 35 (6) 237-249 (1984).
Umstätter, W.: Naturw. Die Wissenschaftlichkeit
im Darwinismus. Rundsch. 21 (9) Beil.: Biologie Heute 4-6 (1990).
Umstätter, W.: Die evolutionsstrategische Entstehung
von Wissen. In: Fortschritte in der Wissensorganisation Band 2. Hrsg. Deutsche
Sektion der Internationalen Gesellschaft für Wissensorganisation e.V.
S.1-11, Indeks Verlag (1992)
Umstätter, W.: Die Skalierung von Information,
Wissen und Literatur. Nachr. f. Dok. 43 (4) 227-242 (1992a).