Not to be too obtuse, but isn't WordNet (you know, the project that inspired the creation of ImageNet) "an ImageNet for language"? It seems kind of weird to bring up ImageNet within the context of NLP and not mention WordNet once.
WordNet (as you probably know) is a database that groups English words into a set of synonyms. If you consider WordNet as a clustering of high-level classes, then you could argue that ImageNet is the "WordNet for vision", meaning the clustering of object classes.
The article uses a different meaning of ImageNet, namely ImageNet as pretraining task that can be used to learn representations that will likely be beneficial for many other tasks in the problem space. In this sense, you could use WordNet as an "ImageNet for language" e.g. by learning word representations based on the WordNet definitions. This is something people have done, but there are a lot more effective approaches.
I hope this helped and was not too convoluted.
I don't think WordNet has been much of a thing in NLP, especially nothing like ImageNet has been in CV. WordNet is only simple word-to-word relationships. "NLP" tends to denote more syntactical, phrase- or sentence-level text analysis; bag-of-word tools like WordNet or TF-IDF are not often considered "true" NLP, but might be called text mining instead.
The phrase "Imagenet moment" is generally used to refer to the success of deep learning in the ILSVRC 2012 competition, which used the Imagenet dataset. This is the case in this article.