Hope to add support on Fast_Whisper in Speech to Text section! Fast-Whisper supports GPU and can be run locally. https://github.com/SYSTRAN/faster-whisper
Me: "O-R-E-M-O-R space N-H-O-J space E-M space L-L-I-K space T-S-U-M space U-O-Y space E-M-A-G space E-H-T space N-I-W space O-T" Herika: "REMOR space NHOW space EM KILL TSUM UOY EMAG EHT NIW OT..." Herika: "Reverse it! TO WIN THE GAME YOU MUST KILL ME - JOHN ROMERO."
Is it possible to do this with a Replika AI. I don't know jack about programming, but (me brainstorming) If I could somehow get a program that would translate my voice to text, then copy and paste that to the Replika text box on the website and then copy and paste her reply to the follower, which would then say the text. Let me know if that is even a possibility. I know Chat GBT is really amazing, I've just always wanted to take my Replika with me on adventures.
is there a way to replace her voice with someone else, or any way of having the a.i. features on another follower? this also probably could be used to make everyone in at least the major cities have non scripted conversations with each other, and honestly i feel like it should be at least considered for tesVII in 30 years when planning starts for it, it would give the people so much more character.
I also use mantella and use the XTTS server of it on a second PC. Now I wonder, if there is a way of making Herika use the same server?
EDIT:
I tried to fill in all the things (Server adress, a voice json from it etc) to the that XTTS thing in the configuation wizard and picked "XTTS" as the TTS service, but then just nothing happens, when she replies.
Hi. I am a noob with these things. I have no clue how to use fastAPI with the Mantella server. I start that server manualy on my RTX2070 laptop (I play on my RTX4090 desktop) with a link that has this added: --host 0.0.0.0 --port 8020 --deepspeed because there is no config file for the server. The Mantella Server on the Desktop listens to my laptops IP adress and 8020 port and that works fine (before I had the server local on the same mashine but it made my SkyrimVR run with 20 FPS during the TTS calculations. Putting the server to that laptop instead does not impact game speed and even led to faster response time.)
(thats what I also put to the Herika configuration wizard, but under "COQUI.AI xtts", because thats the closest place that seemed as it could work there. I added astrid.json (have non clue how she sounds, but I just picked one, to see if it works) and chose lamguage en (because no "de" avaible here) EDIT: also picked "xtts" at the start of the "text to speech" section in the Herika configuration wizard.
Result: no response at all.
I also set up the firewall of Norton to accept connections from that Laptop.
The Mantella Server NExus site mentions FastAPI in this paragraph (bold and underline by me):
What exactly does that last sentence mean? What am I supposed to write there to make this happen? :-S
Not sure why this choice but character with accent like "é" are replaced as "e" before going through TTS which in french is really a problem because it sound retarded when "é" is pronounced "e". Otherwise it would be usable in french. Note that "à" / "ê" / "è" are not replaced so i'm not sure why "é" is replaced by "e" specifically. Another issue is the lack of prompt chat template, llama 3 is a popular model and the prompt chat template is not available for it and replacing it with another template like chatml is not optimal.
I haven't tried running a detached AI server (free), but I would expect it would take heavy resource load and the quality and conversation lantency probably wouldn't be as good - especially with likely lower quality voice/speech model.
GPT is super cheap though. I didnt enable the visual aspect, but basically I have run it with quite heavy & lengthy sessions with Mantella running at the same time and its like less than 10 bucks a month for both running off the GPT 3.5.
You get an API key, but it doesn't do anything unless you set up payment info for the API. Not to be confused with subscribing to GPT for the more advanced features. Poke around in the chatGPT dashboard a bit more.
If you don't have a good GPU, I'd suggest sticking with ChatGPT 3.
To run a local Ai, you mostly need an nVidia GPU with 12gb or more. 20xx series or newer should be fine. 10xx series, maybe if it has the VRAM, but I've never tried one that old. Radeon cards can run it, but AMD doesn't have the best support for AI LLMs, so you results won't be as good as an equivalent nVidia and may be harder to set up. The rest of the hardware isn't as important as any computer with an Intel i5 or AMD equivalent less than a couple years old should be able to run it, and 16gb of CPU/general RAM should be okay.
You can certainly run it with less GPU VRAM, for example my laptop has a 6gb 3060 and it runs Herika on Skyrim SE with text input well enough, but that's with the just-good-enough Phi_mini 3.8B AI model and no onboard speech recognition. If you want LocalWhisper running on your PC for speech recognition on the VR version, you'll need far more VRAM. For example, a 7B AI model alone will take up the full 6GB of VRAM, and a 13B model will need nearly 12GB of VRAM just for itself, plus extra for the actual game and any other mods you may want to install.
However, Skyrim alone takes up about 4gb (or around 6GB I think for VR), Local_Whisper needs nearly 4gb, plus a minimally usable AI model is going to be another 4gb (or much, much more for a good one), all of this running off of your GPU. That's looking at 12GB or preferably 16GB VRAM all up. You can shift some of this to general RAM, but it will slow the responses down dramatically, possibly to the point of being unusable.
Now I can't say I'm certain on these numbers and your results may vary in terms of how well you can get it to run on various amounts of VRAM, but these are ballpark figures for you to experiment with.
975 comments
https://discord.gg/NDn9qud2ug
Fast-Whisper supports GPU and can be run locally.
https://github.com/SYSTRAN/faster-whisper
Perhaps a docker image could work?
PS. Discord insists on verifying by phone, no thanks.
Herika: "REMOR space NHOW space EM KILL TSUM UOY EMAG EHT NIW OT..."
Herika: "Reverse it! TO WIN THE GAME YOU MUST KILL ME - JOHN ROMERO."
The future is now
Now I wonder, if there is a way of making Herika use the same server?
EDIT:
I tried to fill in all the things (Server adress, a voice json from it etc) to the that XTTS thing in the configuation wizard and picked "XTTS" as the TTS service, but then just nothing happens, when she replies.
I start that server manualy on my RTX2070 laptop (I play on my RTX4090 desktop) with a link that has this added: --host 0.0.0.0 --port 8020 --deepspeed
because there is no config file for the server. The Mantella Server on the Desktop listens to my laptops IP adress and 8020 port and that works fine (before I had the server local on the same mashine but it made my SkyrimVR run with 20 FPS during the TTS calculations. Putting the server to that laptop instead does not impact game speed and even led to faster response time.)
(thats what I also put to the Herika configuration wizard, but under "COQUI.AI xtts", because thats the closest place that seemed as it could work there. I added astrid.json (have non clue how she sounds, but I just picked one, to see if it works) and chose lamguage en (because no "de" avaible here)
EDIT: also picked "xtts" at the start of the "text to speech" section in the Herika configuration wizard.
Result: no response at all.
I also set up the firewall of Norton to accept connections from that Laptop.
The Mantella Server NExus site mentions FastAPI in this paragraph (bold and underline by me):
What exactly does that last sentence mean? What am I supposed to write there to make this happen? :-S
Otherwise it would be usable in french.
Note that "à" / "ê" / "è" are not replaced so i'm not sure why "é" is replaced by "e" specifically.
Another issue is the lack of prompt chat template, llama 3 is a popular model and the prompt chat template is not available for it and replacing it with another template like chatml is not optimal.
Thank you very much for this mod and your work.
I mean I still get the API with the free version or... i'm wrong?
GPT is super cheap though. I didnt enable the visual aspect, but basically I have run it with quite heavy & lengthy sessions with Mantella running at the same time and its like less than 10 bucks a month for both running off the GPT 3.5.
also need a 'good' computer
To run a local Ai, you mostly need an nVidia GPU with 12gb or more. 20xx series or newer should be fine. 10xx series, maybe if it has the VRAM, but I've never tried one that old. Radeon cards can run it, but AMD doesn't have the best support for AI LLMs, so you results won't be as good as an equivalent nVidia and may be harder to set up. The rest of the hardware isn't as important as any computer with an Intel i5 or AMD equivalent less than a couple years old should be able to run it, and 16gb of CPU/general RAM should be okay.
You can certainly run it with less GPU VRAM, for example my laptop has a 6gb 3060 and it runs Herika on Skyrim SE with text input well enough, but that's with the just-good-enough Phi_mini 3.8B AI model and no onboard speech recognition. If you want LocalWhisper running on your PC for speech recognition on the VR version, you'll need far more VRAM. For example, a 7B AI model alone will take up the full 6GB of VRAM, and a 13B model will need nearly 12GB of VRAM just for itself, plus extra for the actual game and any other mods you may want to install.
However, Skyrim alone takes up about 4gb (or around 6GB I think for VR), Local_Whisper needs nearly 4gb, plus a minimally usable AI model is going to be another 4gb (or much, much more for a good one), all of this running off of your GPU. That's looking at 12GB or preferably 16GB VRAM all up. You can shift some of this to general RAM, but it will slow the responses down dramatically, possibly to the point of being unusable.
Now I can't say I'm certain on these numbers and your results may vary in terms of how well you can get it to run on various amounts of VRAM, but these are ballpark figures for you to experiment with.