Generalizing Language Models: Token-Class Learning

1 The Masking Process

By replacing specific identifiers with Functional Labels, we force the model to learn linguistic structure rather than memorizing data.

Original Text	Generalized Version
"Ace accommodation, how can I help?"	[ORG], how can I help?"
"I'd like to stay on the Gold Coast."	I'd like to stay in [LOCATION]."
"Who am I speaking to?"	Who am I speaking to?"
"Miss Mackinlay. Sylvia Mackinlay."	[TITLE] [SURNAME]. [FIRSTNAME] [SURNAME]."

2 Programmatic Strategies

A. OOV Dictionary

Matches words against a standard corpus (e.g., Oxford).

✅ Keep: Words found in dictionary.
❌ Mask: Capitalized non-dictionary words.

B. POS Tagging

Uses NLP libraries to identify parts of speech.

if tag == "NNP":
replace_with("[ENTITY]")

Why this works for Any Text

The model stops learning names and starts mastering Contextual Triggers.

Trigger Phrase

"My name is..."

Expectation

[NAME]

Trigger Phrase

"I live in..."

Expectation

[LOCATION]

Trigger Phrase

"Arriving on..."

Expectation

[DATE]

3 Implementation Example

Rental Agent: Good morning! [BUSINESS], how can I help you?

Client: I'd like to stay in [LOCATION].

Rental Agent: Certainly, who am I speaking to?

Client: [TITLE] [SURNAME].

Key Result: By focusing on the "scaffolding" (the English words), your model handles unknown names effortlessly because it recognizes the pattern, not the specific person.