Install LLAMA under Windows

I just wanted to start playing with something similar to ChatGPT. I have a Windows 10 PC based on Intel i9-13900K (so pretty much top of the line in terms of performance both for single core and for multicore) and 64 GB of DRAM (a bit over what most people have, but I understood from the beginning that those LLM also need colossal memory to store their parameters and to run).

So, here is how to proceed (thanks to the precious information from Xanny.eth).

WSL and Linux environment

Install and setup WSL, by opening a PowerShell and typing:

wsl --install

It will take a few minutes to set up. But it’s straightforward and needs no input. Just need to reboot once at the end.

Installing Ubuntu 22 LTS on the Windows PC. It is a free application from the Microsoft Store which should install right away.

When this is done, launch Ubuntu from the start menu. It will open a terminal window and request a login and password. You should enter them (and not forget them).

LLaMA dependencies

If it is not open, open a Ubuntu terminal window.

sudo apt-get update
sudo apt install make cmake build-essential python3 python3-pip git unzip

Then,

python3 -m pip install torch numpy sentencepiece

You now have a full set of background dependencies in place.

Building LLaMA itself

It is quite simple: Type the following:

git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
make

This should be it.

Training data parameters

The real difficulty is getting the parameters (the training data). The difficulty comes from two aspects:

The larger the training data you want to use, the more memory you will need to run it. The alpaca-native-weights (apparently the more powerful ones easily available – about the same quality as ChatGPT 3) require more than 16GB of DRAM (I observed a 32 GB DRAM use when running them with a bunch of other things on my computer, like a couple of browsers, a mailer program, etc.)
The alpaca-native-weights are about 7 billion parameters (a 4+GB file to download). But they keep moving because they appear to be subject to repeated DMCA notices (the exact license of this file seems… complicated; Quite probably open source, but this is being challenged by Meta and others). So, the best you can do is to go to Pastebin to get the BitTorrent magnet and use it to download the file.

Then, the ggml-alpaca-7b-q4.bin file needs to be delivered to the llama.cpp directory.

Running LLaMA

Let the drums roll: You only have to run the command line in Ubuntu:

./main --color -i -ins -n 512 -p "You are a helpful AI who will assist, provide information, answer questions, and have conversations." -m ggml-alpaca-7b-native-q4.bin

Here is John Smith your personal AI chat assistant.

A few more recommendations

I noticed a few things that you may want to play with after the first run.

The -p option (followed by a text string) may be critical because it is setting up the background environment of your chat AI. This is an initializing prompt, not visible to the user, but deeply influencing the rest. For example, it is similar to what Microsoft or OpenAI apply beforehand in ChatGPT or Bing, in order to “give it a personality” or “to censor it”. You can play with this to freely censor your AI, or give it added freedom.

The -n 512 option has an influence on the depth of the prediction LLaMA will use. It may make it better at writing (or not) at the possible expense of CPU power.

The -t 32 (the default value) option allows defining the number of threads used by LLaMA computations. I recommend setting it to the number of threads/cores of your CPU in order to avoid spending useless efforts.