Advanced Test Automation: 2025

Friday, 17 October 2025

Model Context Protocol: Local MCP Vs Remote MCP

In the world of AI, the Model Context Protocol (MCP) is a game-changer. It's an open-source framework that lets large language models (LLMs) securely connect with external tools and data sources. Think of it as a universal remote for AI agents, allowing them to do more than just generate text—they can interact with the real world, like fetching data from a database, checking your calendar or sending an email. But when it comes to deploying an MCP server, you have two main options: Local MCP and Remote MCP. Which one should you choose?

This guide breaks down the core differences, including the pros and cons of each and provides step-by-step instructions on how to set up both.

NOTE: To get initial understanding on MCP server, refer the below blog post:
Understanding Model Context Protocol(MCP)

Local MCP vs. Remote MCP: The Core Differences

The fundamental difference lies in where the server is hosted and how it communicates with the AI client.

A Local MCP server runs on the same machine as the AI client (e.g., your desktop or laptop). It's great for development and personal use because it's fast, and all communication happens over a local connection, often using stdio (Standard Input/Output). You have full control over the data and tools it can access, as they are on your machine. However, it's limited to that single machine and isn't accessible over the internet.

A Remote MCP server is hosted in the cloud/common server, accessible over the internet/intranet via HTTP or HTTP+SSE. It’s designed for accessibility and scalability. Since it’s hosted, multiple users and clients can connect to it from anywhere. Think of it as the difference between a local desktop application and a web-based service. Remote MCP is the key to bringing powerful AI agents to a wider audience, including web apps and mobile devices and it often uses secure web standards like OAuth for authentication.

Local - Remote MCP

Creating a Local MCP Server

Setting up a local MCP server is the perfect way to get your hands dirty and test things out. We'll use a simple Python example, which is a common and straightforward approach.

Step 1: Setup your Environment

First, ensure you have Python installed and are using a virtual environment, which is standard practice in a professional IDE like PyCharm

Step 2: Install Dependencies

With your virtual environment activated, install the necessary packages.

pip install fastmcp

Step 3: Write the Server Logic

In your PyCharm project, create a file named server.py. In this file, you'll define the tools your MCP server will expose. Tools are just functions that your AI client can call.

A real-world example: an MCP tool that can fetch stock prices using a financial API.

from mcp.server.fastmcp import FastMCP

from typing import Any

import requests

# Initialize the MCP server

mcp = FastMCP("MyLocalStockBot")

# Define a tool to fetch the current stock price

@mcp.tool()

def get_stock_price(ticker: str) -> str:

"""Fetches the current price of a stock using a public API."""

try:

api_url = f"https://api.example.com/stocks/{ticker}/price"

response = requests.get(api_url)

response.raise_for_status() # Raise an exception

data = response.json()

price = data.get("price")

if price:

return f"The current price for {ticker.upper()} is ${price}."

else:

return f"Price for {ticker.upper()} not found."

except requests.exceptions.RequestException as e:

return f"Error fetching stock price: {e}"

# Run the server

if __name__ == "__main__":

mcp.run(transport='stdio')

Step 4: Run the Server

Now, you can run your server directly from PyCharm's terminal

python server.py

Your server will start and listen for requests from any MCP client configured to connect to it locally. Clients like Claude Desktop or certain IDEs can be configured to point to this script, allowing you to use your custom tools within your AI chat sessions.

Creating a Hosting a Remote MCP Server

This is where things get a bit more involved, as you're moving from a local machine to remote environment. The key is containerizing your application and using a platform that can host it as an remotely-accessible service.

Step 1: Prepare the Server for HTTP Transport
Modify your server.py to use an HTTP transport instead of stdio. This requires a web server framework. Let's stick with FastAPI, which integrates nicely with FastMCP.


from fastapi import FastAPI
from mcp.server.fastmcp import FastMCP
import uvicorn
import requests

app = FastAPI()

# Initialize the MCP server
mcp = FastMCP("MyRemoteStockBot")

# Define your tools here (same as the local example)
@mcp.tool()
def get_stock_price(ticker: str) -> str:
    """Fetches the current price of a stock using a public API."""
    try:
        api_url = f"https://api.example.com/stocks/{ticker}/price"
        response = requests.get(api_url)
        response.raise_for_status()
        data = response.json()
        price = data.get("price")
        if price:
            return f"The current price for {ticker.upper()} is ${price}."
        else:
            return f"Price for {ticker.upper()} not found."
    except requests.exceptions.RequestException as e:
        return f"Error fetching stock price: {e}"

# Add the MCP endpoint to your FastAPI app
app.include_router(mcp.get_router(), prefix='/mcp')

# A health check endpoint is always a good idea
@app.get('/health')
def health_check():
    return {"status": "ok"}

Step 2: Containerize your Application
To deploy on a remote server, it is good to package it into a Docker container. Create a Dockerfile in your project directory.

Dockerfile:

# Use a slim Python image
FROM python:3.12-slim

# Set the working directory
WORKDIR /app

# Install dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy the application code
COPY server.py .

# Expose the port the app will run on
EXPOSE 8080

# Define the command to run the application
CMD ["uvicorn", "server:app", "--host", "0.0.0.0", "--port", "8080"]


You'll also need a requirements.txt file listing your dependencies: fastapi, uvicorn, fastmcp and requests

Step 3: Deploy to a remote environment
If you are deploying in the corporate environment, you have the advantage of default layer of security. Here's how you can deploy your containerized application on a company server without using a public cloud provider.

Build the Docker Image: On the server, build your Docker image from the Dockerfile.
        docker build -t my-mcp-server .
Run the Container: Run the container on the server, mapping the internal port to a public port
        docker run -d --name mcp-server -p 8080:8080 my-mcp-server
Configure Access: The server is now running. For other users or services within the company to access it, they'll need to use the internal IP address or domain name of the server, along with the exposed port (e.g., http://internal-server-ip:8080/mcp). Access control can be handled through a company firewall or network policies.

Remote MCP servers are a powerful way to make your custom tools and data sources available to a wider range of AI applications. By containerizing and deploying to a server, you can build scalable, secure and always-on integrations for your AI workflows.

Wednesday, 13 August 2025

A Guide to Prompt Engineering

Imagine you're trying to describe a complex idea to a new team member. The first time, you get a blank stare. The second, you get a slightly better look. By the third time, after you've refined your explanation, they finally "get it." This is a lot like talking to an AI. Your first prompt might get you a response that's technically correct but completely misses your intent. Your second might be a little closer. But the third, a carefully crafted prompt will get you exactly what you were looking for.

This process of refining your communication with an AI is called prompt engineering. It’s the essential skill for getting the most out of large language models (LLMs). In this guide, we'll break down what prompt engineering is, why it's so important for anyone working with AI and provide you with actionable techniques, including a powerful template that you can start using today to get better, more reliable results from your AI interactions.

What Is Prompt Engineering, Anyway?

Think of a large language model (LLM) like a super-smart, eager-to-please intern. It has access to an incredible amount of information, but it needs clear, precise instructions to do its job well.

Prompt engineering is simply the art of crafting these effective instructions. It's the skill of giving an AI model the right context, constraints and direction to get a high-quality, predictable output. It's the difference between a messy first draft and a polished, ready-to-go final product.

Why Bother Learning This?

You might be asking, "Why can't the AI just figure it out?" Good question. The truth is, these models are sophisticated pattern-matching machines. They predict the next word in a sequence based on probability. A vague prompt can be interpreted in a dozen different ways leading to:

Irrelevant Answers: The AI misunderstands your intent and goes off on a tangent.
Low-Quality Content: Without specific instructions, the AI defaults to the most common and often most boring answer.
Wasted Time: You end up spending more time editing the AI's output than you would have spent writing it yourself.

By learning prompt engineering, you’re not just using the tool, you're mastering it. You're moving from a casual user to a power user.

Actionable Techniques You Can Use Today

Ready to get started? Here are some of the most effective techniques to improve your prompts immediately.

1. Be Specific and Direct

This is the golden rule. Vague prompts lead to vague answers. The more detail you provide, the better.

Bad Prompt: Write about the problems with modern software.

This is way too broad. What problems? For whom? What kind of software?

Good Prompt: Write a brief, two-paragraph explanation for a non-technical manager about the common challenges of integrating legacy systems with new cloud-based applications. Use simple language and focus on the business impact of these challenges.

Here, we've specified the audience, the topic, the length and the key focus. The AI knows exactly what to do.

2. Give the AI a Role

By assigning a specific persona or role to the AI, you can drastically change the tone, style and content of its response.

Example:

You are a senior software architect.
You are a performance testing expert.
You are a journalist writing a news headline.

This simple framing technique helps the AI access the right style and expertise from its vast training data.

3. Use Delimiters for Clarity

For longer or more complex prompts, it's easy for the AI to get confused about what's an instruction and what's data. Delimiters—like triple quotes ("""), XML tags (<data>) or even just a simple heading—can help separate these parts.

Example:

Your task is to summarize the following text into three key takeaways.

Text: """A recent study on microservices architecture showed a 
significant increase in development velocity but also a rise in 
operational complexity. The study found that teams using a distributed
 system required more robust monitoring and logging tools to maintain 
service reliability. However, the ability to independently deploy 
services led to faster feature delivery."""

This simple formatting ensures the AI knows exactly which part of the prompt is the text to be processed.

4. Specify the Output Format

Don't leave the output format up to chance. If you need a bulleted list, a JSON object or a markdown table, just ask for it.

Example: Create a JSON object from the following data, with keys for 'project_name', 'status' and 'due_date'.

This is especially powerful when you're using AI to generate data for a script or an application.

The Magic Prompt Template

Now, let's put it all together into a reusable template you can copy and paste. This template combines all the best practices we’ve discussed and will instantly upgrade your prompts. Just fill in the blanks!

Understanding Model Context Protocol (MCP)

Have you ever noticed that even the smartest AI models sometimes seem to be operating in a vacuum? They're brilliant at answering a single question, but ask them a follow-up about the document they just summarized or the file you just opened and they have no clue. It's because they're missing a critical piece of the puzzle: a standardized way to access and understand the world of information beyond their own neural networks.

This isn't about memory management, which is important, but it’s about a much bigger challenge: connecting the LLM to the real-world tools, files and data that developers use every day. This is the core problem the Model Context Protocol (MCP) was built to solve. It's not just a set of rules for conversation; it's a blueprint for a whole new kind of architecture that links AI models directly to your data and tools.

What is the Model Context Protocol (MCP)?

The Model Context Protocol (MCP) is an open standard that gives LLMs access to a wide variety of external contexts. Think of it as a universal language for AI integrations. Just like a USB-C port provides a standardized way to connect different devices—a monitor, a keyboard, or an external hard drive—MCP provides a standardized way to connect an AI application to tools, resources and prompts.

The ultimate goal is to break down the "information silos" that have historically isolated AI models, enabling them to build complex workflows and solve real-world problems.

The MCP Architecture: A Client-Server Model

The architecture of MCP is surprisingly straightforward and follows a classic client-server model. It’s not just a single application; it's a system of connected components that work together to provide context to the LLM.

Let's break down the key participants:

1. The MCP Host (The AI Application)

This is your AI-powered application—like an IDE with an integrated AI assistant, a desktop application or even a web-based chatbot. The host is the orchestrator, it coordinates and manages the entire process. It's the "client" in the client-server relationship, but it's more than that—it’s the interface the user interacts with.

2. The MCP Client (The Connector)

The MCP client is a component that lives within the MCP host. Its sole job is to maintain a connection to an MCP server and facilitate the exchange of information. The host uses the client to discover what capabilities (tools, resources, etc.) are available on the server.

Understanding the Internals of an AI Chatbot

AI chatbots have become an integral part of our digital experience, assisting users in customer support, content generation, and even general conversations. But have you ever wondered how these chatbots work behind the scenes?

In this article, we’ll break down the internals of an AI chatbot, covering its key components and how it processes user inputs to generate meaningful responses.

Core Components of an AI Chatbot:

An AI chatbot comprises of several key components that work together to understand and generate human-like responses.

Natural Language Processing (NLP): NLP is the backbone of chatbot intelligence. It enables the bot to understand, interpret and generate human language. NLP is composed of several subcomponents:

Tokenization: Breaking down sentences into individual words or tokens.
Part-of-Speech Tagging (POS): Identifying the grammatical type of each word.
Named Entity Recognition (NER): Recognizing important entities like names, dates and locations.
Sentiment Analysis: Detecting the emotion behind the text

Machine Learning Models: Most modern chatbots use machine learning models trained on large datasets of conversations. These models help the bot learn context, grammar and response generation. Some popular models are:

Rule-Based Models: These respond to inputs based on defined patterns/keywords.
Retrieval-Based Models: Selecting the best response from a set of predefined responses.
Generative Models: AI models such as GPT that generate responses dynamically based on context.

Dialog Management System: This system make sure that conversations flow logically. It keeps track of context, user preferences and previous interactions to provide clear and relevant responses.

Backend & APIs: Backend consists of servers, databases and APIs that store conversation history, integrate with external systems and process requests efficiently.

How an AI Chatbot Processes a User Query:

When an user interacts with chatbot, the below process happen behind the scenes:

User Input: The user types a query.
Preprocessing: The input text is cleaned and prepared.
NLP Understanding: The chatbot extracts intent, context and key information.
Passing Input to LLM: The processed text is sent to the LLM (Large Language Model).
LLM Generates a Response: The model predicts the most relevant response using deep learning techniques.
Postprocessing: The generated response is refined.
Sending the Response: The chatbot displays the response to the user.

Challenges in Building an AI Chatbot:

Developing a AI chatbot comes with lot of challenges like:

Understanding Complex Queries: Handling ambiguous or multi-intent queries.
Context Retention: Maintaining long-term conversational context.
Bias and Ethical Concerns: Avoiding biases in response and ensuring responsible AI use.
Integration with External Systems: Seamless connection with databases and APIs for accurate and relevant responses.

Future of AI Chatbots:

With advancements in deep learning, AI chatbots are becoming more sophisticated. Future trends include:

More Human-Like Conversations: Improved emotional intelligence and personalization.
Multimodal Capabilities: Combining text, voice and images for better interactions.
Autonomous AI Agents: Self-learning bots that adapt to user preferences dynamically.

Thursday, 27 February 2025

Techniques to Improvise LLMs and thier differences

As large language models (LLMs) keep transforming natural language processing (NLP), their capabilities can be further enhanced through specialized techniques that enhance accuracy, flexibility, and contextuality to boost their capabilities further. Although LLMs are strong enough by themselves, augmenting techniques such as Retrieval-Augmented Generation (RAG), fine-tuning, and other advanced methodologies can optimize the performance for targeted applications. These methods enable models to tap external knowledge, adapt to new domains, and return more accurate, contextually intelligent outputs.

This blog discusses multiple methods, including RAG, CAG (Context Augmented Generation), KAG (Knowledge Augmented Generation), and fine-tuning by which LLMs can be extended and broadened in function, with descriptions of how each works and under what circumstances best to use them.

Friday, 24 January 2025

Large Language Models - LLM on Local Machine

Large Language Models (LLMs) have revolutionized AI by enabling machines to understand and generate human-like text. While major companies are offering these models over cloud, many users are exploring the benefits of running LLMs on locally on their machines, especially when privacy, cost-efficiency, and data control are key considerations. Running LLMs traditionally requires powerful GPUs, tools like Ollama make it possible to run models locally on your machine, even with just a CPU.

Let's explore how to use Ollama on a local machine via the command line (CLI), without writing any code, and discuss the advantages of running LLMs locally using only CPUs.

Why run LLMs locally on your CPU?

Running LLMs on a CPU offers several key benefits, especially when you do not have access to high-end GPUs:

1. Cost Efficiency

Cloud-based APIs can incur significant costs, particularly if you're running models regularly. By running the model locally, you eliminate the costs.

2. Privacy and Security

Keeping all data processing local ensures that sensitive information doesn’t have to leave your machine, protecting your privacy and offering full control over your data.

3. Flexibility and Control

Running an LLM locally on your machine gives you the freedom to customize it for specific use cases without being constrained by cloud service terms or API limitations. You can use it in your preferred workflow and modify it as needed.

4. Zero Latency

By using a local installation, you avoid delays that come from network calls to cloud services, giving you near-instant access to the model.

Ollama

Ollama is a simple, user-friendly tool that allows you to run pre-trained language models locally on your machine. Ollama is optimized for both CPU and GPU usage, meaning you can run it even if your machine doesn’t have powerful GPU hardware. It abstracts the complexity of setting up models and running them, offering a clean command-line interface (CLI) that makes it easy to get started.

Setting up Ollama:

Step 1:

Visit the Ollama website to download the installer for your platform (Windows, macOS, or Linux).
Follow the installation instructions provided for your operating system.

Once installed, you’ll have access to the ollama command directly in your terminal.

Step 2:

Open the terminal and check the ollama installation with the below command

ollama --version
This confirms that ollama is installed properly.

Pull any locally deployable model like llama 3.2

```
ollama pull llama3.2:1b 
```

List the models downloaded in the local system

```
ollama list
```

Run the model and start the conversation with ollama

```
ollama run modelname
```

Useful Ollama Commands:

Comamnd Description

ollama serve Starts Ollama on your local system.

ollama show Displays details about a specific model, such as its configuration and release date

ollama run Downloads the specified model to your system.

ollama list Lists all the downloaded models

ollama ps Shows the currently running models

ollama stop Stops the specified running model

ollama pull Pulls the specified model to the local system

ollama rm Removes the specified model from your system

bye Exit the ollama conversation

Comamnd	Description
ollama serve	Starts Ollama on your local system.
ollama show	Displays details about a specific model, such as its configuration and release date
ollama run	Downloads the specified model to your system.
ollama list	Lists all the downloaded models
ollama ps	Shows the currently running models
ollama stop	Stops the specified running model
ollama pull	Pulls the specified model to the local system
ollama rm	Removes the specified model from your system
bye	Exit the ollama conversation

Challenges and Limitations of running LLMs locally on CPUs

While running LLMs locally with Ollama has many advantages, there are some limitations to keep in mind:

Performance on CPUs: Running large models on CPUs can be slower than using GPUs. Although Ollama is optimized for CPUs, you may still experience slower response times, especially with more complex tasks.

Memory Usage: LLMs can consume a lot of memory, and running them locally may require a significant amount of RAM. Ensure that your machine has at least 16 GB of RAM for decent performance. Larger models will require more memory, which could lead to slowdowns or crashes on systems with limited resources.

Model Size: Some larger models, such as GPT-3, may not be practical to run on a CPU due to their massive size and resource requirements. Ollama offers smaller models that are more feasible to run on CPU-based machines, but for the largest models, you may still need a GPU for optimal performance.

Conclusion:

Ollama makes running LLMs locally on a CPU simple and accessible, offering an easy-to-use CLI for tasks like text generation and question answering, without needing code or complex setups. It lets you run models efficiently on various hardware, providing privacy, cost savings, and full data control.