Friday, 24 January 2025

Large Language Models - LLM on Local Machine

                        Large Language Models (LLMs) have revolutionized AI by enabling machines to understand and generate human-like text. While major companies are offering these models over cloud, many users are exploring the benefits of running LLMs on locally on their machines, especially when privacy, cost-efficiency, and data control are key considerations. Running LLMs traditionally requires powerful GPUs, tools like Ollama make it possible to run models locally on your machine, even with just a CPU.

Let's explore how to use Ollama on a local machine via the command line (CLI), without writing any code, and discuss the advantages of running LLMs locally using only CPUs.


Why run LLMs locally on your CPU?

Running LLMs on a CPU offers several key benefits, especially when you do not have access to high-end GPUs:

1. Cost Efficiency

  • Cloud-based APIs can incur significant costs, particularly if you're running models regularly. By running the model locally, you eliminate the costs.

2. Privacy and Security

  • Keeping all data processing local ensures that sensitive information doesn’t have to leave your machine, protecting your privacy and offering full control over your data.

3. Flexibility and Control

  • Running an LLM locally on your machine gives you the freedom to customize it for specific use cases without being constrained by cloud service terms or API limitations. You can use it in your preferred workflow and modify it as needed.

4. Zero Latency

  • By using a local installation, you avoid delays that come from network calls to cloud services, giving you near-instant access to the model.

Ollama


Ollama is a simple, user-friendly tool that allows you to run pre-trained language models locally on your machine. Ollama is optimized for both CPU and GPU usage, meaning you can run it even if your machine doesn’t have powerful GPU hardware. It abstracts the complexity of setting up models and running them, offering a clean command-line interface (CLI) that makes it easy to get started.

Setting up Ollama:

Step 1:
  • Visit the Ollama website to download the installer for your platform (Windows, macOS, or Linux).
  • Follow the installation instructions provided for your operating system.
Once installed, you’ll have access to the ollama command directly in your terminal.

Step 2: 
  • Open the terminal and check the ollama installation with the below command
    • ollama --version

      This confirms that ollama is installed properly.
  • Pull any locally deployable model like llama 3.2 
    • ollama pull llama3.2:1b 


  • List the models downloaded in the local system
    • ollama list

  • Run the model and start the conversation with ollama
    • ollama run modelname


Useful Ollama Commands:


Comamnd Description
ollama serve Starts Ollama on your local system.
ollama show Displays details about a specific model, such as its configuration and release date
ollama run Downloads the specified model to your system.
ollama list Lists all the downloaded models
ollama ps Shows the currently running models
ollama stop Stops the specified running model
ollama pull Pulls the specified model to the local system
ollama rm Removes the specified model from your system
bye Exit the ollama conversation



Challenges and Limitations of running LLMs locally on CPUs

While running LLMs locally with Ollama has many advantages, there are some limitations to keep in mind:

  • Performance on CPUs: Running large models on CPUs can be slower than using GPUs. Although Ollama is optimized for CPUs, you may still experience slower response times, especially with more complex tasks.

  • Memory Usage: LLMs can consume a lot of memory, and running them locally may require a significant amount of RAM. Ensure that your machine has at least 16 GB of RAM for decent performance. Larger models will require more memory, which could lead to slowdowns or crashes on systems with limited resources.

  • Model Size: Some larger models, such as GPT-3, may not be practical to run on a CPU due to their massive size and resource requirements. Ollama offers smaller models that are more feasible to run on CPU-based machines, but for the largest models, you may still need a GPU for optimal performance.


Conclusion:

Ollama makes running LLMs locally on a CPU simple and accessible, offering an easy-to-use CLI for tasks like text generation and question answering, without needing code or complex setups. It lets you run models efficiently on various hardware, providing privacy, cost savings, and full data control.