Video Guide

IF YOU ARE NEW TO RUNNING OFFLINE AI MODELS FOR THE LOVE OF GOD READ THIS:

KoboldCPP is a program used for running offline LLM's (AI models).
However it does not include any offline LLM's so we will have to download one separately. 
Running KoboldCPP and other offline AI services uses up a LOT of computer resources.
We only recommend people to use this feature if they have a powerful GPU or a 2nd computer to offload the resources too.
Ideally you want a NVIDA GPU for the best performance.
You will most likely have to spend some time testing different models and performance settings to get the best result with your machine. This is still "experimental" technology. 

Basic Terminology:
LLM: Large Language Model, the backbone tech of AI text generation.
7B, 13B etc: How many billions of parameters an LLM has. More parameters = "smarter" (subjective) but more resource intensive it is. 
HuggingFace: Website which hosts a whole heap of free LLM's. 

Why run an offline LLM instead of using ChatGPT?
  • Its free! Apart from having the expensive hardware to run it in the first place.....
  • Less censorship with offline LLM models.


KoboldCPP Download

  • Go here: https://github.com/LostRuins/koboldcpp and on the righthand side click the latest release.
  • Download the latest koboldcpp.exe file and place it on your desktop.
  • Well done you have KoboldCPP installed! Now we need an LLM.

LLM Download

  • Currently KoboldCPP support both .ggml (soon to be outdated) and .gguf models. For this tutorial we are going to download an LLM called MythoMax. You can use any other compatible LLM. Check the Discord's #llm channel to find more. 
  • Go here: https://huggingface.co/TheBloke/MythoMax-L2-13B-GGML and click Files and Versions.
  • Any of the .bin files will work, in this case download mythomax-l2-13b.ggmlv3.q5_K_M.bin.

    Note on qN levels:
    Different models have different qN levels. The higher the Q number the more VRAM it will use, meaning better quality text generation. We found that q4_k_m and q5_k_m are a good sweet spot. Do not go lower than q4_k_s. 
    TL;DR download larger .bin file for big smart AI.


  • Once you have downloaded the file place it on your desktop, or wherever you want to store these files. 


KoboldCPP Setup

  • Run koboldcpp.exe as Admin.
  • Once the menu appears there are 2 presets we can pick from. Use the one that matches your GPU type.
1. CuBLAS = Best performance for NVIDA GPU's
2. CLBlast = Best performance for AMD GPU's
  • For GPU Layers enter "43". This is how many layers of the GPU the LLM will use. Different LLM's have different amount of maximum layers (7B use 35 layers, 13B use 43 layers etc.). If you are finding that your computer is choking when generating AI response you can tone this down. 
  • Make sure Launch Browser and Streaming Mode are enabled.
  • Click Browse and select the LLM file we downloaded earlier.
  • Your menu should look like this (I am using a NVIDA GPU):

  • If everything looks good hit Launch.
  • A web browser interface should pop up. Just leave the command terminal running and we are all set to connect this to the mod.
    
Herika Setup

  • In the configuration menu for the mod all we need to do is point to the URL of the KoboldCPP server running on your computer. Depending on what you are hosting the Herika Server on there are 2 ways to point to it.

  • UWAMP = Just set the configuration as: $KOBOLDCPP_URL="http://localhost:5001";

  • DwemerDistro = Because its hosted on another "Virtual Machine" on your PC you cannot simply point it to localhost. You will need to use your computer's private IP address. This is easy to find.
    Open up a Command Prompt as Admin
    Run this command: ipconfig
    Whatever is your primary WIFI/Ethernet adapter you should see an IPv4 Address, copy that. (Should look something like 192.168.x.x or 172.16.x.x)
    Your KoboldCPP_URL configuration should look like this: $KOBOLDCPP_URL="http://192.168.81.32:5001"; (Replace with your own IP address)


  • Launch the game, open up the MCM menu and under $SPG set a hotkey for switching between AI models.
  • Press the hotkey and you should be in KoboldCPP mode. 
  • If everything is setup correctly you should get a response from Herika! You can check the KoboldCPP command menu to see more information about the AI generation.
  • Make sure to play around with the KoboldCPP settings and other LLM's to find the best performance for your computer!

Article information

Added on

Edited on

Written by

rang97

16 comments

  1. Bidencalypse
    Bidencalypse
    • member
    • 0 kudos
    You guys like to provide outdated tutorials. Aight aight i'll tryhard a bit myself. I mean i won't try to update that crap. That's your job, i'll just try to do the configuration myself.
  2. Brianobliviahn
    Brianobliviahn
    • member
    • 0 kudos
    can you use "gpt for all" like say the Hermes model or should I just get kobold? Thanks
  3. XxPikachoxX
    XxPikachoxX
    • member
    • 0 kudos
    I have a RTX 4060, 
    Will the program work well?
  4. TheElderLord1337
    TheElderLord1337
    • premium
    • 4 kudos
    where do i add $KOBOLDCPP_URL = "http://localhost:5001";
  5. Luxembroom
    Luxembroom
    • member
    • 0 kudos
    Edit: nevermind, problem solved!
  6. Goreblood
    Goreblood
    • member
    • 0 kudos
    I was running into issues with the latest oobabooga in which I was getting 403's when Dwemer was hitting the ip:port it was running it's openAPI server extension on. So I moved on to just using KoboldCpp instead.

    I was running into some trouble integrating with the latest KoboldCpp (1.50.1), however. I looked at the terminal for kobold and saw that it was throwing a python error indicating an expected string type was getting set to null, and I also could see the prompt was missing from the payload. I stepped through the koboldcpp.php and realized that there are some additionally required variables for the connector in the server configuration that were not initialized by default (assuming I followed the instructions correctly).

    So, if anyone else missed this too, this was my configuration that ultimately worked:
    $CONNECTORS=["koboldcpp"];
    ...
    $CONNECTOR["koboldcpp"]["url"]="http://YOUR_IPV4_ADDRESS:5001";
    $CONNECTOR["koboldcpp"]["max_tokens"]=100;
    $CONNECTOR["koboldcpp"]["temperature"]=0.98;
    $CONNECTOR["koboldcpp"]["rep_pen"]=1.04;
    $CONNECTOR["koboldcpp"]["top_p"]=0.9;
    $CONNECTOR["koboldcpp"]["MAX_TOKENS_MEMORY"]=512;
    $CONNECTOR["koboldcpp"]["template"]="alpaca"; // ADDED - A template is required in the php code for the prompt to be provided in the payload to kobold
    $CONNECTOR["koboldcpp"]["use_default_badwordsids"]=false; // ADDED - Was defaulting to null if not provided
    $CONNECTOR["koboldcpp"]["eos_token"]="";  // ADDED - Defaults to null, unless another flag is used to set it to "\n", which then throws the NoneType issue trying to invoke .encode()

    I'm using the 13b Q8_0 MythoMax GGUF as my LLM (uses >13gb VRAM).. a really great model, but then I don't have enough VRAM left for SkyrimVR when running at 43 gpu layers, on my 4090(!!), I cut it back to 28. Unfortunately for whatever reason that is the difference of 3-5s responses to 30s+ responses. There's also 13b Q5_K_M version of the model that uses *only* >9gb VRAM, which is a bit faster on 28 layers... I felt like I got less random responses on the Q8.
    1. youlostchris
      youlostchris
      • supporter
      • 0 kudos
      Same issue with OOBA... why were they using the same port? that makes no sense at all. I know this is months later... but wtf
  7. mhrrdd2004
    mhrrdd2004
    • member
    • 0 kudos
    Can anyone help me? Where is the configuration menu? 
    1. Freak42
      Freak42
      • member
      • 2 kudos
      Which configuration menu?


      • KoboldCPP -> config menu pops up after launch automatically
      • Server application -> there is a webui -> address can be found under "Server Control Pane" in terminal window that pops up after start of "Dwemer Distro" (run.bat)

      • MIMIC3 (if used) -> there is a webui -> address can be found under "MIMIC3 Configuration" in terminal window that pops up after start of "Dwemer Distro"
      • HERIKA plugin -> You need to edit the SimpleGateWayer.ini to be found in the SKSE folder of your Skyrim and fill in the right IP Address (to be found in the above mentioned terminal window)


      Basically every information about the addresses of the WEBUIs, the right IP-Address to fill into the SimpleGateWayer.ini or the path to the conf.php file can be found in the terminal window, that opens, when you run Dwemer Distro!

      One hint: Very often using "localhost" instead of the real IP-adress of you local machine causes trouble with the firewall since the server application is running on a virtual machine. So better use the real IP-address.

      Hope that helps.
  8. MartinRequiem
    MartinRequiem
    • premium
    • 7 kudos
    the description is outdated,  says use latest kobold.exe  that is 1.53 , there is no "streamer mode" option on that   ?????  






    1. caesarhannibal
      caesarhannibal
      • member
      • 0 kudos
      Same issue
    2. Freak42
      Freak42
      • member
      • 2 kudos
      Streaming mode is active by default in newer versions of KoboldCPP.
      Just use it.
  9. yclept
    yclept
    • premium
    • 0 kudos
    Disclaimer: I have no experience with Herika, I just have some with KoboldCPP

    You may want to try the ROCm port of KoboldCPP if you have AMD card, rather than CLBlas, it's much faster. On windows you shouldn't need to install the ROCm SDK, as the the relevant dll's should be be included with the Adrenalin Drivers or baked into the .exe. However only 7000 and high end 6000 cards have full ROCm support (mid range or lower end 6000 *might* work, but might not).

    When the option for streaming mode in the launcher was removed, apparently it was actually made the default, so you I don't think you need to use 1.42.1 to get it.
  10. mrpeaceful
    mrpeaceful
    • supporter
    • 0 kudos
    I'm using koboldcpp 1.42.1 and I put in $KOBOLDCPP_URL="http://localhost:5001"; for the url but it keeps saying malformed?