Viewing a single comment thread. View all comments

r0sten t1_iyt6r8t wrote

It rhymed bacteria with delirious... which sort of works.

This is a text based process, so it doesn't know what words sound like, but it can extrapolate from poems in it's corpus. Still, how many poems rhyming bacteria with delirious are there??

5

-ZeroRelevance- t1_iytvl5z wrote

It can’t really rhyme due to how the model perceives words. Everything is broken up into tokens, which may represent entire words, parts of words, or even individual letters. This inconsistency makes it extremely hard for LLMs to pick up on patterns in spelling or wording that allow for quality poetry, and basically forces the model to simply rote learn common patterns. There is more discussion on that here. However, this isn’t a hopeless problem. The obvious solution to me, which is also discussed in the prior link, is simply encoding each letter as a different token. This does lead to several improvements, but it’s ultimately a tradeoff between length and quality, because encoding each character individually means that you need far more tokens (3-4x) to represent an equivalent amount of text.

3