KoboldCPP Setup at Skyrim Special Edition Nexus

KoboldCPP Setup

Endorsements

Total views

37.7k

Video Guide

IF YOU ARE NEW TO RUNNING OFFLINE AI MODELS FOR THE LOVE OF GOD READ THIS:

KoboldCPP is a program used for running offline LLM's (AI models).
However it does not include any offline LLM's so we will have to download one separately.
Running KoboldCPP and other offline AI services uses up a LOT of computer resources.
We only recommend people to use this feature if they have a powerful GPU or a 2nd computer to offload the resources too.
Ideally you want a NVIDA GPU for the best performance.
You will most likely have to spend some time testing different models and performance settings to get the best result with your machine. This is still "experimental" technology.

Basic Terminology:
LLM: Large Language Model, the backbone tech of AI text generation.
7B, 13B etc: How many billions of parameters an LLM has. More parameters = "smarter" (subjective) but more resource intensive it is.
HuggingFace: Website which hosts a whole heap of free LLM's.

Why run an offline LLM instead of using ChatGPT?

Its free! Apart from having the expensive hardware to run it in the first place.....
Less censorship with offline LLM models.

KoboldCPP Download

Go here: https://github.com/LostRuins/koboldcpp and on the righthand side click the latest release.
Download the latest koboldcpp.exe file and place it on your desktop.
Well done you have KoboldCPP installed! Now we need an LLM.

LLM Download

Currently KoboldCPP support both .ggml (soon to be outdated) and .gguf models. For this tutorial we are going to download an LLM called MythoMax. You can use any other compatible LLM. Check the Discord's #llm channel to find more.
Go here: https://huggingface.co/TheBloke/MythoMax-L2-13B-GGML and click Files and Versions.
Any of the .bin files will work, in this case download mythomax-l2-13b.ggmlv3.q5_K_M.bin.

Note on qN levels: Different models have different qN levels. The higher the Q number the more VRAM it will use, meaning better quality text generation. We found that q4_k_m and q5_k_m are a good sweet spot. Do not go lower than q4_k_s.
TL;DR download larger .bin file for big smart AI.
Once you have downloaded the file place it on your desktop, or wherever you want to store these files.

KoboldCPP Setup

Run koboldcpp.exe as Admin.
Once the menu appears there are 2 presets we can pick from. Use the one that matches your GPU type.

1. CuBLAS = Best performance for NVIDA GPU's
2. CLBlast = Best performance for AMD GPU's

For GPU Layers enter "43". This is how many layers of the GPU the LLM will use. Different LLM's have different amount of maximum layers (7B use 35 layers, 13B use 43 layers etc.). If you are finding that your computer is choking when generating AI response you can tone this down.
Make sure Launch Browser and Streaming Mode are enabled.
Click Browse and select the LLM file we downloaded earlier.
Your menu should look like this (I am using a NVIDA GPU):

If everything looks good hit Launch.
A web browser interface should pop up. Just leave the command terminal running and we are all set to connect this to the mod.

Herika Setup

In the configuration menu for the mod all we need to do is point to the URL of the KoboldCPP server running on your computer. Depending on what you are hosting the Herika Server on there are 2 ways to point to it.
UWAMP = Just set the configuration as: $KOBOLDCPP_URL="http://localhost:5001";
DwemerDistro = Because its hosted on another "Virtual Machine" on your PC you cannot simply point it to localhost. You will need to use your computer's private IP address. This is easy to find.
Open up a Command Prompt as Admin
Run this command: ipconfig
Whatever is your primary WIFI/Ethernet adapter you should see an IPv4 Address, copy that. (Should look something like 192.168.x.x or 172.16.x.x)
Your KoboldCPP_URL configuration should look like this: $KOBOLDCPP_URL="http://192.168.81.32:5001"; (Replace with your own IP address)

Launch the game, open up the MCM menu and under $SPG set a hotkey for switching between AI models.
Press the hotkey and you should be in KoboldCPP mode.
If everything is setup correctly you should get a response from Herika! You can check the KoboldCPP command menu to see more information about the AI generation.
Make sure to play around with the KoboldCPP settings and other LLM's to find the best performance for your computer!

Article information

Added on

08 September 2023 1:06PM

Edited on

15 September 2023 4:04AM

Written by

rang97

17 comments

Pages

XyobisX

member
1 kudos

25 Apr 2025, 3:28PM

I just tried it and I can say that it does respond quickly (between 2 and 3 minutes), BUT there is one issue: it slows down the game a bit. Personally, I'd rather use LLM Studio; it doesn't respond as quickly (between 2 and 5 minutes), but the game runs normally. If you want to install it, search on YouTube; the video is outdated, or ask Grok AI.

deleted107756568

account closed
0 kudos

08 Jan 2025, 10:05AM

You guys like to provide outdated tutorials. Aight aight i'll tryhard a bit myself. I mean i won't try to update that crap. That's your job, i'll just try to do the configuration myself.

Brianobliviahn

member
0 kudos

24 Dec 2024, 10:59AM

can you use "gpt for all" like say the Hermes model or should I just get kobold? Thanks

XxPikachoxX

member
0 kudos

29 Nov 2024, 6:42PM

I have a RTX 4060,
Will the program work well?

TheElderLord1337

premium
6 kudos

28 Nov 2024, 11:57PM

where do i add $KOBOLDCPP_URL = "http://localhost:5001";

Luxembroom

member
0 kudos

27 Sep 2024, 1:02PM

Edit: nevermind, problem solved!

Goreblood

member
0 kudos

21 Nov 2023, 5:38PM

I was running into issues with the latest oobabooga in which I was getting 403's when Dwemer was hitting the ip:port it was running it's openAPI server extension on. So I moved on to just using KoboldCpp instead.

I was running into some trouble integrating with the latest KoboldCpp (1.50.1), however. I looked at the terminal for kobold and saw that it was throwing a python error indicating an expected string type was getting set to null, and I also could see the prompt was missing from the payload. I stepped through the koboldcpp.php and realized that there are some additionally required variables for the connector in the server configuration that were not initialized by default (assuming I followed the instructions correctly).

So, if anyone else missed this too, this was my configuration that ultimately worked:

$CONNECTORS=["koboldcpp"];
...
$CONNECTOR["koboldcpp"]["url"]="http://YOUR_IPV4_ADDRESS:5001";
$CONNECTOR["koboldcpp"]["max_tokens"]=100;
$CONNECTOR["koboldcpp"]["temperature"]=0.98;
$CONNECTOR["koboldcpp"]["rep_pen"]=1.04;
$CONNECTOR["koboldcpp"]["top_p"]=0.9;
$CONNECTOR["koboldcpp"]["MAX_TOKENS_MEMORY"]=512;
$CONNECTOR["koboldcpp"]["template"]="alpaca"; // ADDED - A template is required in the php code for the prompt to be provided in the payload to kobold
$CONNECTOR["koboldcpp"]["use_default_badwordsids"]=false; // ADDED - Was defaulting to null if not provided
$CONNECTOR["koboldcpp"]["eos_token"]="";  // ADDED - Defaults to null, unless another flag is used to set it to "\n", which then throws the NoneType issue trying to invoke .encode()

I'm using the 13b Q8_0 MythoMax GGUF as my LLM (uses >13gb VRAM).. a really great model, but then I don't have enough VRAM left for SkyrimVR when running at 43 gpu layers, on my 4090(!!), I cut it back to 28. Unfortunately for whatever reason that is the difference of 3-5s responses to 30s+ responses. There's also 13b Q5_K_M version of the model that uses *only* >9gb VRAM, which is a bit faster on 28 layers... I felt like I got less random responses on the Q8.

youlostchris
- supporter
- 0 kudos
Locked

Sticky
11 Apr 2024, 6:44AM
Same issue with OOBA... why were they using the same port? that makes no sense at all. I know this is months later... but wtf

mhrrdd2004

member
0 kudos

19 Jan 2024, 3:10PM

Can anyone help me? Where is the configuration menu?

Freak42
- member
- 2 kudos
Locked

Sticky
20 Jan 2024, 11:28AM
Which configuration menu?
- KoboldCPP -> config menu pops up after launch automatically
- Server application -> there is a webui -> address can be found under "Server Control Pane" in terminal window that pops up after start of "Dwemer Distro" (run.bat)
- MIMIC3 (if used) -> there is a webui -> address can be found under "MIMIC3 Configuration" in terminal window that pops up after start of "Dwemer Distro"
- HERIKA plugin -> You need to edit the SimpleGateWayer.ini to be found in the SKSE folder of your Skyrim and fill in the right IP Address (to be found in the above mentioned terminal window)
Basically every information about the addresses of the WEBUIs, the right IP-Address to fill into the SimpleGateWayer.ini or the path to the conf.php file can be found in the terminal window, that opens, when you run Dwemer Distro!

One hint: Very often using "localhost" instead of the real IP-adress of you local machine causes trouble with the firewall since the server application is running on a virtual machine. So better use the real IP-address.

Hope that helps.

MartinRequiem

premium
8 kudos

30 Dec 2023, 10:52AM

the description is outdated, says use latest kobold.exe that is 1.53 , there is no "streamer mode" option on that ?????

caesarhannibal
- premium
- 0 kudos
Locked

Sticky
04 Jan 2024, 3:39AM
Same issue
Freak42
- member
- 2 kudos
Locked

Sticky
20 Jan 2024, 11:03AM
Streaming mode is active by default in newer versions of KoboldCPP.
Just use it.

yclept

premium
0 kudos

25 Nov 2023, 3:20PM

Disclaimer: I have no experience with Herika, I just have some with KoboldCPP

You may want to try the ROCm port of KoboldCPP if you have AMD card, rather than CLBlas, it's much faster. On windows you shouldn't need to install the ROCm SDK, as the the relevant dll's should be be included with the Adrenalin Drivers or baked into the .exe. However only 7000 and high end 6000 cards have full ROCm support (mid range or lower end 6000 *might* work, but might not).

When the option for streaming mode in the launcher was removed, apparently it was actually made the default, so you I don't think you need to use 1.42.1 to get it.

Pages

Mods of the month

Vortex mod manager

Collections tutorials

Vortex mod manager

Supporter images

Give feedback

Please log in or register

KoboldCPP Setup

Article information

Added on

Edited on

Written by

17 comments

Members

Mods

Kudos given

Page served in