diff --git a/README.md b/README.md index 6d1c17e..39dea30 100644 --- a/README.md +++ b/README.md @@ -1,3 +1,62 @@ # miku -Discord bot/companion for the group chatte, powered by the GPT-J language model and modified with a soft prompt to understand all of our esoteric, elaborate inside jokes. +Discord bot/companion for the group chatte, powered by the GPT-~~J~~ Neo language model and modified with a soft prompt to understand all of our esoteric, elaborate inside jokes. + +## Setup + +Python 3.8+ and PyTorch required. CUDA strongly recommended. + +The `c1-1.3B` model used in development should work without needing beefy hardware. It has been tested on a 1050 Ti 4 GB. + +Set up a virtual environment: + +Linux/MacOS + +```shell +python3 -m venv venv +source venv/bin/activate +``` + +Windows + +```shell +py -3 -m venv venv +.\venv\Scripts\activate +``` + +Install required packages + +```shell +pip install -r requirements.txt +``` + +Copy `.env.example` to `.env` and fill in the bot's `TOKEN`. + +For chat scraping, you will also need to get your own `USER_TOKEN`. + +* In Discord, hit Ctrl+Shift+I to open up developer tools +* Go to the Network tab and filter by XHR requests +* Open a new channel, or scroll up, or do something else that will trigger an authenticated request +* Click on one that looks suitable (e.g. `messages?limit=50`) +* Under the "Request" tab, copy the contents of the `Authorization` request header. + +## Usage + +Scrape the messages from the chat channel you wish to use for a soft prompt. You will be prompted for the channel ID, which you can get by having developer mode on in Discord and right-clicking, or copying the last part of the URL in the browser. + +```shell +cd src +python -m scraper +``` + +Train the soft prompt (TODO) + +Run the Hatsune Miku bot. The first time you do this, it will download the model, which is ~5 GB. + +```shell +python -m miku +``` + +## Final Remarks + +sukima nuts diff --git a/requirements.txt b/requirements.txt index 92700e5..7679e18 100644 Binary files a/requirements.txt and b/requirements.txt differ diff --git a/src/scraper/scraper.py b/src/scraper/scraper.py index 26251f6..b396a9e 100644 --- a/src/scraper/scraper.py +++ b/src/scraper/scraper.py @@ -113,7 +113,7 @@ def boot(token: str): if not token: token = input('Enter your Discord user token (Authorization request header): ') channel = input('Enter channel ID: ') - default_export = Path.cwd().parent / 'chats' + default_export = Path.cwd().parent.parent / 'chats' export = input('Enter path to export transcripts (default "chats"): ') scraper = Scraper(token, channel, Path(export) if export else default_export) scraper.scrape()