Among the teases of NG1 was ‘computable meaning’. On Google the phrase barely registers so there is little risk in giving it the following definition: A comprehensive digital representation that is computed from a piece of natural language and can then be computed with, for example to generate further pieces of natural language.
In principle, what is needed is to mimic electronically the mental network. If that includes language knowledge as well as all other sorts of knowledge, then the job is done – since computable meaning only has to be meaningful to a computer, not to a human.
There is a good prospect of computers acquiring knowledge as if via human sensory channels. But other things look intractable – emotions, instincts etc, and connections between these and sense-derived material. Without these aspects of human cognition, computable meaning is a non-starter.
This piece proposes an alternative that could unlock the potential of computable meaning for computational linguistics at least, improving established applications and adding new ones. It would however be restricted to those things, being unable to mimic cognition per se.
NG uses the simple idea of a network that accommodates all knowledge and thereby accounts for everything in language. The three-way grouping of concepts has been obsessively pursued in LanguidSlog. The most important conclusion is that there is no ghost-in-the-machine, only nodes and links and activation levels. In principle then, the computing task is feasible because those building-blocks are easy to simulate.
There are snags of course. One is that the numbers of building-blocks are huge. Another is that the compounding of three-way groups is complex, with any one part of the network typically participating in many concepts. A compensating strength of the computer is that it can process a prodigious amount of language data very quickly.
But the major obstacle is that there can be no parallel to the language acquisition processes outlined in LS39 and 40. The crucial thing in human acquisition is the interplay between language (the code) and direct inputs to cognition (the meaning). For building an artificial network in a computer, there can only be language.
Let’s assume there is an NG-based machine parser. The parser uses a database of the junctions that can occur in the language – PARENT WORD / SYNTACTIC RELATION / DEPENDENT WORD. Initially the data is gathered manually but eventually a critical mass is achieved. From this point, further data is gathered algorithmically. It’s possible to gather data in quantities limited only by the corpora and the computing power used.
Although it is not possible for the computer to acquire knowledge in the form WORD / MEANS / CONCEPT, it can acquire knowledge as WORD / CO-OCCURS WITH / PATTERN ACROSS ALL WORDS.
The pattern is not, in any familiar sense, the meaning of the word; but it is arguably something to do with the meaning because the word cannot co-occur promiscuously without regard to its meaning. Zooming out, the picture is of everything defined, in some sense, in terms of everything else – which is just what must occur in the mental network.
This doesn’t seem such a wild idea when one considers how much of knowledge in the mind consists in concepts acquired solely from encounters with words in context. Philosophy is perhaps the best (worst?) example, with so much of the effort being to establish the meaning of crucial words by placing them appropriately.
The problem is that those contexts also contain crucial words which need similarly careful treatment. But language still works even in such esoteric material. In acquiring an P / C / M word (see LS39) for a difficult word, the M concept must be realised in a sub-network involving lots of language, not the sort of straightforward propagation of the ‘basics’ from the senses and emotions as implied in LS10. If language works for humans with some words defined only by other words, perhaps it can work well enough for computers with all words defined in that way.
Junction occurrences as a matrix
The ‘pattern across all words’ can be visualised as a row or column in a matrix populated by the occurrences of particular groupings. The matrix needs a row for every word-as-dependent and a column for every word-as-parent – somewhere in the region of 10^6 by 10^6 cells.
There is also a third dimension to cover the particular syntactic relation, of which there are only a dozen or so possibilities. This is because one pairing of word-as-dependent and word-as-parent can occur with more than one relation, e.g. coach__ bolts.
For simplicity the following discussion assumes two dimensions only. Furthermore ‘row’ for a word usually means ‘row or column’ as the reader can probably see.
By processing a very large corpus of language, the matrix becomes populated such that a typical row has a mixture of occupied and null cells. ‘Occupied’ and ‘null’ could be 1 and 0; or a cell could indicate the number of occurrences of that junction. The row provides the pattern, probably unique amongst the 10^6 rows, for the word.
Matrix ==> network?
A matrix is a quite different structure from a relational network. Can network linkage in the brain be inferred from the matrix? The answer is almost certainly YES at least for parts of the network that can be represented in language. Whether the derived network would show, for example, recognisable taxonomies can only be determined by experiment.
Inferring the mental network would rely on the idea that, although two rows in the matrix have different patterns, there may be some overlap between the patterns. Overlap certainly represents some commonality of syntactic behaviour and presumably some commonality of semantic behaviour too. In general, the rows for two words may have some overlapping cells simply from being the same category; they may then have further overlapping cells from semantic commonality, e.g. both are bird; and each will have some cells which do not overlap the other, being semantically distinct, e.g. one is robin and the other penguin.
The processing to build the network from, say, 10^12 cells will be time-consuming. The processing will also be complex, with many subtleties arising from differences between rows in terms of the number of occupied cells and the occurrence-numbers they hold. A simple example is that it might be helpful to take account of a null cell being less definitive for an infrequent word than for a frequent word.
Another problem likely to need solving is a tendency of words to link to too-specific nodes in the network. For example, ‘bird’ is used most often in typical, rather than definitive, ways – with ‘fly’, ‘tweet’ etc – and would therefore be linked to a node subordinate to the node that generic-meaning ‘bird’ ought to represent directly. Thus it would be at the same level in the taxonomy as ‘robin’ and ‘penguin’. The issue is not about the existence of the node but about ensuring appropriate words attach to it. Typical uses of ‘bird’ are penguin-excluding and so some special significance must be given to predicative uses of nouns such as a robin is a bird and a penguin is a bird.
All this effort should be rewarded if the corpus used to populate the matrix is large enough to cover every accessible aspect of human knowledge. A few parts of knowledge may be ineffable or as yet have nowhere been expressed, but any resulting omissions in the network would not mar its usefulness for computational linguistics.
The computer-acquired network could allow investigation of activation beyond the P / M / C / R / C / M / P arrangement and into the concepts represented by the Ms. That is a worthy research objective in itself but a better understanding could also lead to faster parsing because the need for laborious disambiguation is avoided if the appropriate concept is already activated. Parsing delivers bundles of propositions as described in LS8 but propositions can remain active from earlier in the discourse and its circumstances. A parse could therefore be more than just the structure of a single sentence. For example, an instance of it in a sentence can be assigned a syntactic role, but also its antecedent could be confidently identified.
The process so far is:
===> matrix of junctions
===> patterns per word
===> inferred network of concepts
Suppose this has been done successfully for English. What if it is then done for another language? If the inferred network for the second language contains pretty much the same concepts interrelated in the same ways, then there would be a means for translation – given that the sets of language knowledge are capable of working in the opposite direction and thus do production as well as comprehension (see LS55):
shared network of concepts
===> conceptual propositions
===> word junctions in either language
Adding a further language enables two-way translation with all the languages already in the scheme without requiring changes to those others. Cumulative effort is linear, not exponential. This must be the Holy Grail for computational linguists.
But remember the concepts in an inferred network are the Ms in P / C / M words – as first discussed in LS7. So is it too optimistic to expect two languages to give the same concepts and relationships? The answer could be that it’s one world with one language-using species so there is essentially one network with perhaps a few peripheral, cultural variations.
The snag is that the stratagem proposed here relies on language and so any network derived from one language is likely to reflect the idiosyncrasies of that language – unless phonological words (Ps in earlier diagrams) are sufficiently granular in their scope as to be universal. But it’s not at all clear whether the granularity in human language processing is fine enough for that. Coarser-grained is suggested by that being more efficient (fewer junctions and propositions to be processed) and by the perennial debate about cause-and-effect between language and thought and culture.
Even if there is correspondence between two inferred networks, it is unlikely to be immediately obvious. Some methodology for aligning networks would need to be developed. Perhaps a panel of bilinguals would agree a set of word-pairs that are exactly equivalent, giving fixed points to start comparing the networks. Comparison could then be largely algorithmic.
Assuming it is then judged feasible, the overall task of aligning the networks would be laborious. It would require the whole procedure to be repeated with more elaborate phonological chunking and language knowledge.
Optimistically, completing work on the first pair of languages will provide a network that gets quite close to universality, and will provide better guidance for researchers working on other languages. The effort on aligning each subsequent language should fall – perhaps exponentially.
Deus ex machina
Finally, a look beyond the horizons of current computational linguistics. What has been sketched above is only to use the inferred network for substitution. But there must be other types of transformation. If the mechanisms behind those transformations can be discovered, like those behind language, other ways to harness the network become possible – in a nutshell, getting the network to do reasoning.
This may sound nightmarish but there are all sorts of benign tasks which could be performed on textual and even spoken material. Fact checking, consistency checking, logic checking, identification of rhetorically dubious usage etc could all be done in an objective fashion and at high speed – eventually in real time. Perhaps this would open up a future in which the proles have some technology to protect them from the linguistic malpractice of politicians, priests, professionals, advertisers, salesmen, journalists – and even academics.
I’ll likely be dead before all this happens. But happen it will. Wouldn’t it be grand if NG were the key to a bit of real progress in linguistics?