For a serious NLP pipeline, you usually separate the analysis into layers.
A sentence contains multiple kinds of information simultaneously:
“What are the words?”
Example:
running -> run
“How do the words function grammatically?”
Includes:
Example:
running = VERB, present participle
dogs = plural noun
“How are words structurally connected?”
Includes:
Example:
John ate pizza
subject = John
verb = ate
object = pizza
This is sentence structure.
“What does it mean?”
Includes:
Example:
bank = financial institution
vs
bank = river edge
“What is implied by context?”
Includes:
Example:
Can you open the window?
Literal: question
Pragmatic meaning: request
“How do sentences connect together?”
Includes:
Example:
John entered the room.
He sat down.
"He" = John
“How does this connect to world knowledge?”
Includes:
Example:
doctor -> medical profession
hospital -> place of work
“How common or probable is this?”
Includes:
Example:
strong tea
is more probable than
powerful tea
“How is meaning represented numerically?”
Includes:
Example:
king - man + woman ≈ queen
“Can meaning be represented formally?”
Includes:
Example:
All humans are mortal.
Socrates is human.
=> Socrates is mortal.
“What emotion or tone exists?”
Includes:
Example:
I love this
= positive sentiment
“How does it sound?”
Includes:
Useful for:
“What structured data can be extracted?”
Includes:
Example:
Apple acquired Beats in 2014
Extract:
(Apple, acquired, Beats)
“What conclusions can be inferred?”
Includes:
Modern LLM systems combine many of these implicitly.
Traditional NLP systems usually build them explicitly as separate pipelines.
A full advanced NLP stack often looks like:
Raw Text
→ Tokenization
→ POS Tagging
→ Parsing
→ NER
→ Coreference
→ Semantic Parsing
→ Embeddings
→ Knowledge Linking
→ Reasoning
→ Generation
WordNet mainly helps in:
SpaCy mainly helps in:
Transformers mainly help in: