New Training Methods #2
Loading…
x
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Currently the chat template we use for training takes a list of dictionaries with two fields:
role
(which can beuser
orassistant
) andcontent
, which is the message text itself. With a corpus of 100,000+ messages this seems to have worked well enough for now, but the problem is Miku gets confused who she's talking to mid-conversation, because the usernames of the individuals that sent the messages in the training data are effectively anonymized. Therefore, when she addresses you, your identity is nothing more than an educated guess. Perhaps it would be a better idea to set the role to each actor's Discord username, and then do the same during inference, so she can accurately identify us.Additionally, the intent of training Miku on our chat logs was to make a bot that knows all our inside jokes and is politically incorrect like we are, essentially a fifth member of the clique. To that end, I trained her by randomly selecting 20% of all message sequences (consecutive messages sent by the same person at the same time) and marking them as "assistant" messages, and that seems to work fairly well. She nevertheless has to pretend to have some surface-level characteristics of the real Hatsune Miku, such as being female. For that part I rely on prompting, but from time to time she slips up by adopting one of our names, or referencing something only men can do, like getting one's dick sucked. Sometimes, despite being instructed otherwise, she will allude to a link or image attached to her message when there is none because she isn't trained to produce either of those things. This obviously is an open question, and the solution may involve some combination of prompting and RLHF (#1).
Finally, Miku occasionally says some nonsense that isn't relevant to the conversation. This is likely due to a combination of the temperature (0.9), the nature of the training data, and her context length when doing inference. The training data is inherently out of order, because messages aren't always responded to right away like in a face-to-face conversation. Thankfully, Discord has had a reply feature for a few years now which may be used to reconstruct some sort of semantic ordering. To fix this, it may require devising a new chat template which embeds the contents of the replied-to message if there is one. I've also seen a chat template which just includes the username of the person being replied to, as if they are being pinged.
To generate context, the bot pulls the last five messages and filters out those that don't have text. When a conversation is taking place about a person, the subject of the conversation may be referred to by pronouns only, and Miku will quickly forget precisely who she's talking about. The fix for this will likely require long-term memory of some sort, which will be covered in its own issue (#3). Another flaw with generating context this way is that her own messages will be included, which quickly leads to a cycle of garbage-in-garbage-out if she slips up even once. This could hopefully be avoided once I add human feedback, and then I can just filter the bad messages out of the context.
Done!
87e9cc39e0
Tried it. This doesn't actually seem work as well as I thought, so I switched back. This was before I added in the reply-inclusive chat template, maybe combining the two could help idk.
This seems like the perfect use case for RAG, but there remains the question of how we will populate the knowledge base with accurate info.
Also, I would like to try merging model checkpoints at some point. What I'm doing is training a single QLora on a merged dataset of our own chats + ToxicQA (to make her capable of saying super edgy stuff), but it seems more common for people to merge the models after training.
Besides toxicqa, some kind of (normal) instruction-following dataset like Alpaca may also make her not so stupid.
probably could reduce the amount of training data coming from our chat relative to other datasets, or training for fewer steps - might be some overfitting going on