Most practical systems treat a sentence as:

Then they compare or recombine at multiple levels.

Example:

Sentence A:

“the expensive salmon”

Sentence B:

“I bought for my own dinner”

A practical system does NOT usually combine raw text blindly.

Instead it extracts structures like:

Grammar:

Semantics:

Then it searches for compatible continuations.


  1. SIMPLEST METHOD Token probability

LLMs mainly work using:

P(next_token | previous_tokens)

Example:

Input:

“the expensive salmon”

Possible continuations:

The model ranks possibilities by probability.

This already creates recombination behavior.


  1. EMBEDDING SIMILARITY Most practical modern method

Convert sentence → vector.

Example:

“The expensive salmon” → [0.12, -0.44, 0.88, …]

“I bought seafood for dinner” → [0.10, -0.39, 0.81, …]

Then compare using cosine similarity.

Common uses:

Best practice:

Common tools:


  1. STRUCTURE MATCHING Very important for generation

Systems often learn reusable templates.

Example:

Pattern:

“the [ADJ] [NOUN] I bought for my own [NOUN]”

Can generate:

This is compositional generation.

Very important concept:


  1. VALIDATION “How do we know it makes sense?”

Practical systems use multiple checks.

  1. Grammar validation
  1. Semantic plausibility Example:
  1. World knowledge Example:
  1. Statistical likelihood If many similar patterns appeared during training, probability increases.

  1. BEST PRACTICES Most practical systems

  1. Use embeddings for semantic similarity NOT raw string matching.

Bad:

Good:


  1. Separate:

This improves recombination quality.


  1. Use chunk-level recombination NOT random word mixing.

Good chunks:

Example:


  1. Validate generated text before accepting.

Typical checks:


  1. Store latent representations instead of memorizing sentences.

Modern systems prefer: sentence → embedding/vector

rather than: sentence → raw lookup table


  1. ADVANCED SYSTEMS

More advanced architectures combine:

This allows:

instead of simple copying.


  1. IMPORTANT REALIZATION

Human language itself is highly compositional.

Humans also recombine learned fragments:

So:

“new sentence”

usually means:

NOT: