New Training Methods #2

Open
opened 2024-04-06 04:34:00 +00:00 by james · 5 comments
Owner

Currently the chat template we use for training takes a list of dictionaries with two fields: role (which can be user or assistant) and content, which is the message text itself. With a corpus of 100,000+ messages this seems to have worked well enough for now, but the problem is Miku gets confused who she's talking to mid-conversation, because the usernames of the individuals that sent the messages in the training data are effectively anonymized. Therefore, when she addresses you, your identity is nothing more than an educated guess. Perhaps it would be a better idea to set the role to each actor's Discord username, and then do the same during inference, so she can accurately identify us.

Additionally, the intent of training Miku on our chat logs was to make a bot that knows all our inside jokes and is politically incorrect like we are, essentially a fifth member of the clique. To that end, I trained her by randomly selecting 20% of all message sequences (consecutive messages sent by the same person at the same time) and marking them as "assistant" messages, and that seems to work fairly well. She nevertheless has to pretend to have some surface-level characteristics of the real Hatsune Miku, such as being female. For that part I rely on prompting, but from time to time she slips up by adopting one of our names, or referencing something only men can do, like getting one's dick sucked. Sometimes, despite being instructed otherwise, she will allude to a link or image attached to her message when there is none because she isn't trained to produce either of those things. This obviously is an open question, and the solution may involve some combination of prompting and RLHF ().

Finally, Miku occasionally says some nonsense that isn't relevant to the conversation. This is likely due to a combination of the temperature (0.9), the nature of the training data, and her context length when doing inference. The training data is inherently out of order, because messages aren't always responded to right away like in a face-to-face conversation. Thankfully, Discord has had a reply feature for a few years now which may be used to reconstruct some sort of semantic ordering. To fix this, it may require devising a new chat template which embeds the contents of the replied-to message if there is one. I've also seen a chat template which just includes the username of the person being replied to, as if they are being pinged.

To generate context, the bot pulls the last five messages and filters out those that don't have text. When a conversation is taking place about a person, the subject of the conversation may be referred to by pronouns only, and Miku will quickly forget precisely who she's talking about. The fix for this will likely require long-term memory of some sort, which will be covered in its own issue (). Another flaw with generating context this way is that her own messages will be included, which quickly leads to a cycle of garbage-in-garbage-out if she slips up even once. This could hopefully be avoided once I add human feedback, and then I can just filter the bad messages out of the context.

Currently the chat template we use for training takes a list of dictionaries with two fields: `role` (which can be `user` or `assistant`) and `content`, which is the message text itself. With a corpus of 100,000+ messages this seems to have worked well enough for now, but the problem is Miku gets confused who she's talking to mid-conversation, because the usernames of the individuals that sent the messages in the training data are effectively anonymized. Therefore, when she addresses you, your identity is nothing more than an educated guess. Perhaps it would be a better idea to set the role to each actor's Discord username, and then do the same during inference, so she can accurately identify us. Additionally, the intent of training Miku on our chat logs was to make a bot that knows all our inside jokes and is politically incorrect like we are, essentially a fifth member of the clique. To that end, I trained her by randomly selecting 20% of all message sequences (consecutive messages sent by the same person at the same time) and marking them as "assistant" messages, and that seems to work fairly well. She nevertheless has to pretend to have some surface-level characteristics of the real Hatsune Miku, such as being female. For that part I rely on prompting, but from time to time she slips up by adopting one of our names, or referencing something only men can do, like getting one's dick sucked. Sometimes, despite being instructed otherwise, she will allude to a link or image attached to her message when there is none because she isn't trained to produce either of those things. This obviously is an open question, and the solution may involve some combination of prompting and RLHF (#1). Finally, Miku occasionally says some nonsense that isn't relevant to the conversation. This is likely due to a combination of the temperature (0.9), the nature of the training data, and her context length when doing inference. The training data is inherently out of order, because messages aren't always responded to right away like in a face-to-face conversation. Thankfully, Discord has had a reply feature for a few years now which may be used to reconstruct some sort of semantic ordering. To fix this, it may require devising a new chat template which embeds the contents of the replied-to message if there is one. I've also seen a chat template which just includes the username of the person being replied to, as if they are being pinged. To generate context, the bot pulls the last five messages and filters out those that don't have text. When a conversation is taking place about a person, the subject of the conversation may be referred to by pronouns only, and Miku will quickly forget precisely who she's talking about. The fix for this will likely require long-term memory of some sort, which will be covered in its own issue (#3). Another flaw with generating context this way is that her own messages will be included, which quickly leads to a cycle of garbage-in-garbage-out if she slips up even once. This could hopefully be avoided once I add human feedback, and then I can just filter the bad messages out of the context.
james added the
enhancement
question
labels 2024-04-06 04:34:01 +00:00
Author
Owner

devising a new chat template which embeds the contents of the replied-to message if there is one.

Done!
87e9cc39e0

> devising a new chat template which embeds the contents of the replied-to message if there is one. Done! 87e9cc39e01d835eeef82f2ab03f833cbbd4667c
Author
Owner

Perhaps it would be a better idea to set the role to each actor's Discord username

Tried it. This doesn't actually seem work as well as I thought, so I switched back. This was before I added in the reply-inclusive chat template, maybe combining the two could help idk.

Long-term memory of some sort

This seems like the perfect use case for RAG, but there remains the question of how we will populate the knowledge base with accurate info.

> Perhaps it would be a better idea to set the role to each actor's Discord username Tried it. This doesn't actually seem work as well as I thought, so I switched back. This was before I added in the reply-inclusive chat template, maybe combining the two could help idk. > Long-term memory of some sort This seems like the perfect use case for RAG, but there remains the question of how we will populate the knowledge base with accurate info.
Author
Owner

Also, I would like to try merging model checkpoints at some point. What I'm doing is training a single QLora on a merged dataset of our own chats + ToxicQA (to make her capable of saying super edgy stuff), but it seems more common for people to merge the models after training.

Also, I would like to try merging model checkpoints at some point. What I'm doing is training a single QLora on a merged dataset of our own chats + ToxicQA (to make her capable of saying super edgy stuff), but it seems more common for people to merge the models after training.
Author
Owner

Besides toxicqa, some kind of (normal) instruction-following dataset like Alpaca may also make her not so stupid.

Besides toxicqa, some kind of (normal) instruction-following dataset like Alpaca may also make her not so stupid.
Author
Owner

probably could reduce the amount of training data coming from our chat relative to other datasets, or training for fewer steps - might be some overfitting going on

probably could reduce the amount of training data coming from our chat relative to other datasets, or training for fewer steps - might be some overfitting going on
Sign in to join this conversation.
No Milestone
No project
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: james/MikuAI#2
No description provided.