• wonderingwanderer@sopuli.xyz
    link
    fedilink
    English
    arrow-up
    9
    ·
    1 day ago

    A token is basically a linguistic unit, like a word or a phrase.

    LLMs don’t parse text word-by-word because it would miss a lot of idiomatic meaning and other context. “Dave shot a hole in one at the golf course” might be parsed as “{Dave} {shot} {a hole in one} {at the golf course}”

    They use NLP to “tokenize” text, meaning parsing it into individual tokens, so depending on the tokenizer I suppose there could be slight variations on how a text is tokenized.

    Then the LLM runs each token through layers of matrices on attention heads (basically, vectors) in order to assess the probabilistic relationships between each token, and uses that process to generate a response via next-token prediction.

    It’s a bit more complex than that, of course. Tensor calculus, billions of weighted parameters, layers divided by hidden sizes, also matmuls, masks, softmax, and dropout. Also the “context window” which is how many tokens it can process at a time. But it’s the gist of it.

    But a token is just the basic unit that gets run through those processes.