Home » Articles » NG1. Why Noam Chomsky should read this blog

NG1. Why Noam Chomsky should read this blog

56 years is a long time for doing something.  Beethoven’s age at death.  And Hitler’s.  But modern linguistics, born of Chomsky’s Syntactic Structures, is now 57.  Is it also dead or just languishing?  Anyway it hasn’t delivered the killer theory.  OK, there’s a lot else to linguistics but, as Rutherford said, that’s stamp-collecting, not physics.

Noam Chomsky in 2004
Noam Chomsky

To be explanatory and valid generally, shouldn’t the theory of language – the physics – be based on mental architecture?

‘Architecture’ means the way data is stored and processed at micro-level – much as a computer might be described as having ’16-bit architecture’.  Also significant here is ‘structure’.  Together they could be misleading because, literally, the structure of a building enables the architecture.  Here it’s better to think of structure as the model and architecture as the Lego® bricks used to make it.

Mental architecture should accommodate (a) cognitively valuable stuff and (b) phonologically usable stuff, encoding (a) into (b) and decoding (b) into (a).  Of course, (a) would include much besides what is shared using language.

There are many theories.  Multiple descriptive theories could logically coexist if each were drawn from a different subset of language use.  But no more than one explanatory, mental architecture-based theory can be true.  So, of Minimalism and CG, HPSG, SFG, TAG, WG and all the other ~G spots – which is the one?

Probably none of them.  Wouldn’t one true theory have led computational linguists to mimic mental processes in their software?  They still prefer their clever maths.

You could agree with the above and then object that, while mental architecture is practically unknown, even the best language theory must be provisional.  But research in mental architecture and in language should be synergistic.  A bit of speculation on one side could lead to better speculation on the other, and so on.  Since language is the part of cognition that has the most detailed data, linguists shouldn’t wait for neuroscience but should grab the initiative.

We should start again, avoiding a pitfall that makes theories implausible.  Specifically, we shouldn’t model human language as in a stored-program computer – with complex processes manipulating complex data structures.  A computer may perform a human-like task but that doesn’t mean a human could perform that task in the same way.  That would need addressable storage – for which neuroscience provides no evidence.

Presume addressability and you can postulate fantastical properties for lexical items and elaborate rules about the interplay of those properties.  That is why there are so many theories endlessly competing.

Instead, we should look for an autonomous, homogeneous architecture – no separation of process from data, no ‘ghost in the machine’.  You may wonder how there could then be, for example, movement of sentence constituents.  That persistent feature of Chomsky’s evolving position is indeed impossible to reconcile with such an architecture.  My hunch is that ‘movement’ is at best a misleading metaphor.

I have more ideas and will detail them later in this blog.  If you agree, disagree or have other ideas, please post your thoughts here.  Together we can get to the theory that academia has somehow avoided (perhaps because orthodoxy is best to get funding and build a career).  A slog – but for 57 weeks, not years.


The blog will not routinely cite academic literature.  This is partly to achieve a less formal style.  But it is mainly because there is little out there to support the radical ideas.

The arguments are built from quite simple premises.  Jargon cannot be avoided altogether but readers will find Wikipedia gives enough support.

The blogger

Network Grammar will reflect the mindset of a 1960s programmer, not a linguistics prof.  I did 39 years building software.  Having never screwed up and having done some innovative things, I looked around for an amusing computer project to occupy my retirement.  ‘Computable meaning’ resonated nicely with Turing (1936) and promised to keep me busy for a while.  No, I didn’t know what I was talking about … meaning that can be computed from natural language and then can be computed with?

As a start, I went back to uni to find out about language.  Fun but frustrating. Linguistics doesn’t have all the answers and what it does offer lacks plausibility.  My work since has therefore been on the basic issues that have consumed thousands of man-years since Chomsky (1957).

The ideas that have emerged seem promising.  I have approached several scholars but sparked no interest.  That would be less vexing if one of them said ‘Your ideas will not work because A, B and C’.  Then if A, B or C were good, I could get on with something completely different.

Likely my problem is that I am implying to these profs ‘Your ideas will not work because X, Y and Z’.  But I am still …


  1. KA says:


    given how the blog further develops, I understand how central the following claim is to get your project off the ground: “That would need addressable storage – for which neuroscience provides no evidence.” That may be so, but cognitive science provides lots of evidence for it. So… can you please elaborate why you think that the arguments in C. R. Gallistel and A. P. King. Memory and the computational brain: Why cognitive science will transform neuroscience. Wiley-Blackwell, 2009. don’t go through? It seems to me that that book is an extended argument that addressable storage is necessary all over the place when brains compute stuff.

    Thank you for your reply.

    • Mr Nice-Guy says:

      Thanks, KA. Good challenge. Not having read Gallistel and King, I’m simply going on the description Amazon uses, in particular ‘…suggests that the architecture of the brain is structured precisely for learning and for memory, and integrates the concept of an addressable read/write memory mechanism into the foundations of neuroscience.’

      Broadly cognitive science (including G&K?) uses a top-down methodology. Underlying mechanisms are inferred from observable phenomena. In contrast LanguidSlog works bottom-up, hypothesising a mental architecture and building from it the mechanisms for language. I don’t address the whole of cognition, but if the approach applies generally it would make the evolution of language in homo sapiens easier to explain.

      Actually the sequence in which the ideas emerged is not as set out in the blog. What came first was the organisation of language knowledge (see LS7 et seq). This was the reaction of an old-time IT man to the unconvincing accounts in mainstream linguistics. I would be happy to defend this against anyone. It is the most important idea.

      From that came the realisation that conventional PSG representations of sentences had no relevance to (human) language processing. Then it dawned on me that if PSG structures did participate in sentence processing, RAM-like addressable storage would be needed. Finally I found that neuroscience (at least as represented by Wikipedia) provided no evidence for addressable storage.

      ‘No addressable storage’ seemed the simplest and most striking message and so it’s the focus for the earliest LanguidSlog posts. But it’s not ‘central’ to the whole thing. By the way, no one has yet debunked the ‘John kissed Lucy’ argument.

      As for G&K, I admire the fact that they acknowledge the need for addressable storage in order to support the mechanisms they have inferred. (Someone in mainstream linguistics should have done that long ago.) But have they tried hard enough to find an alternative that fits neuroscience better?

      I claim to have done so and LanguidSlog is working its way through some representative bits of syntax. Frankly I haven’t yet worked out all the answers for islands, scope effects etc. But I’m pretty confident – and I also have a good story to tell (soon) on acquisition.

      Please remember, the organisation of language knowledge is the important thing. My idea for this will stand even if G&K are correct and even if (for example) the claims about the potential of recurrent neural networks to implement PSGs or any other grammar are correct.

  2. KA says:

    Thank you for your time and your response, Mr Nice-Guy.

    The way I understood the logic of the early posts was that linguistic theory got the wrong end of the stick and must look for a different way of doing syntax because addressable memory is not something brains have. If argued carefully and substantiated that might have swayed some linguists, especially if combined with the beginnings of a theory that can do justice to language. It’s certainly been a popular and successful argument in those parts of the neural networking community who don’t believe that the mind uses symbols but stays at a subsymbolic level.

    It might be interesting to note that there is new neuroscientific evidence that the idea that memory is located in synaptic strength (alone) is not correct: F. Johansson, D.-A. Jirenhed, A. Rasmussen, R. Zucca, and G. Hesselow. Memory trace and timing mechanism localized to cerebellar Purkinje cells. PNAS, 111(41):14930–14934, 10 2014. If that finding holds up, what is plausible in terms of brain architecture all of a sudden shifts dramatically.

    But if you say that the memory isn’t central, then all we need to do is look at your theory of syntax on its own merits and you need to convince us that you have the better theory of syntax. That’s a different enterprise and it means you can’t claim the moral high ground of having the only theory that accords with what we know about neuroscience.

    There is one paragraph in your reply, though, that really rubs me the wrong way: “As for G[allistel]&K[ing], I admire the fact that they acknowledge the need for addressable storage in order to support the mechanisms they have inferred. (Someone in mainstream linguistics should have done that long ago.)”

    The parenthetical seems deeply unfair to me. It has long been known that to get language off the ground, you need devices with at least certain kinds of memory. Chomsky 1957 showed that finite state automata, i.e., devices without any memory, are woefully inadequate to generate natural languages even when viewed strictly as string sets. He also argued that context free rewrite rules are insufficient. Context free rewrite rules require pushdown automata, i.e., devices with a pushdown stack. As far as I can see pushdown stacks of arbitrary depth are just as implausible devices as fully addressable memory according to the synaptic strength theory of memory that neuroscientists adhere to. You can’t exactly say that linguists have hidden these facts, since they are considered fundamental results in mathematical linguistics and can be found in any textbook on the matter, as a look at standard textbooks like B. H. Partee, A. G. B. ter Meulen, and R. E. Wall. Mathematical Methods in Linguistics. Kluwer Academic, Dordrecht ; Boston, 1990. or more recent treatments like M. Kracht. The Mathematics of Language. Studies in Generative Grammar 63. Mouton de Gruyter, Berlin, New York, 2003. will show you. So, please don’t go around saying that mainstream linguists have the fact that the devices need memory that current neuroscience deems implausible.

    The rub is, of course, that in the 80s Shieber convinced the community (S. Shieber. Evidence against the context-freeness of natural languages. Linguistics and Philosophy, 8:333–343, 1985.) that Chomsky was right in claiming that natural language require more than just context free power. Again, the textbooks above discuss this. More than context free power entails memory beyond a pushdown stack. None of this is a secret. The basic facts that more than just a pushdown stack is needed has been in graduate level textbooks in linguistics for over a quarter of a century.

    These results are relevant for *your* project as well. They are based on languages viewed as stringsets, i.e., on a view of language that abstracts away entirely from trees. If you were building a device with no memory at all, then you would be building a finite state machine. And linguists wouldn’t listen, because finite state machines cannot characterize human languages. Of course, you aren’t building a finite state machine. Your machine has a pushdown stack to allow you to scan back an arbitrary number of words to find one that isn’t in a relevant relation yet — see your bonus post in reply to HK’s challenges. But if that is all your machine has, it is not sufficient. GPSG, the most sophisticated attempt to make context free phrase structure rules work for natural language, was abandoned when Shieber’s results came out. Linguists will just throw Shieber at you (or M. Kracht. The emergence of syntactic structure. Linguistics and Philosophy, 30(1):47–95, 2007) and walk away. You need more than a pushdown stack. Incidentally, your table notation suggests that you have data structures, i.e., addressable memory, unless I have misunderstood something.

    In a word, memory is a red herring.

    Thanks again for bearing with me.

    • Mr Nice-Guy says:

      Great stuff, KA. Thanks – and keep it coming!

      First of all, please don’t take offence at that parenthetical comment. The important word in the paragraph is ‘acknowledge’. My comment was fair because I have never encountered ‘addressable storage’ in the linguistics literature, and you, who evidently have read far more than me, cite a book on cognitive science published as recently as 2009.

      Whether or not linguists have given sufficient prominence to addressable storage is a matter of opinion. What seems indisputable is (1) that most syntactic theories depend on addressable storage if they are to be explanatory; and (2) that addressable storage is not proven in neuroscience, merely inferred from theories in cognitive science and linguistics.

      May I ask whether you think that human language processing uses a stored program?

      Regarding what’s not ‘central’ in the blog, I specified ‘no addressable storage’, not ‘memory’ – which, as you later say, is a red herring. Please don’t dismiss LanguidSlog just because you can’t accept ‘no addressable storage’. I nonetheless claim some moral high ground for trying to build a theory without shaky foundations.

      I also claim MHG for a theory in which the lexicon is clearly defined and deals with the ‘many to many’ problem, kicked-the-bucket idioms, etc. Please tell me if that’s been done before. If it has, then once again crucial stuff has somehow eluded me.

      There’s so much in your comment … Let me pick up just a couple of points before finishing.

      I don’t like ‘[My] machine has a pushdown stack to allow [me] to scan back an arbitrary number of words to find one that isn’t in a relevant relation yet’. For me that’s too much like processing by a stored-program computer. Also I think that activation remaining on an earlier word should do the trick. (Somewhere I tentatively suggest that activation is consumed only when a word occurs in a junction as dependent. Reconciling that with the facts about long-range dependencies is one of the challenges I still face.)

      ‘[My] table notation suggests that [I] have data structures, i.e. addressable memory’. Yes, I do have data structures – the triangles in LS7 for example. But my assumption is PROGRESSION through the network. The next concept is always linked to the current one, so no address is required.

      Thanks again.


This site uses Akismet to reduce spam. Learn how your comment data is processed.