Yla - Your Local AI Assistant

AI democratization Ollama
Few Shot Learning

Overview

Yla is a lightweight, privacy-focused AI chatbot application that runs completely locally using Ollama. It enables users to interact with powerful open-source large language models without sending data to external servers, ensuring complete data privacy and avoiding cloud breaches.

Key Features

Yla offers a compelling set of features focused on privacy, flexibility, and control:

  • Local AI Control: Select from multiple configured models
  • Privacy First: All processing happens on your machine
  • No Internet Dependencies: Works completely offline
  • Custom Personalities: Create specialized assistants with Modelfiles
  • Parameter Experimentation: Resend messages with different temperature, top_p, and top_k settings
  • Model Extensibility: Easily add new models as they become available on Ollama
  • Reasoning Visualization: View model reasoning process when wrapped in the <think> tag

Technical Implementation

Yla is built with simplicity and efficiency in mind:

  • Lightweight architecture using just HTML, CSS, and JavaScript
  • No frameworks like React required - just Ollama, a simple HTTP server, and a browser
  • Communicates with the local Ollama server via REST API
  • Configurable through a straightforward JSON configuration file
  • Supports model validation and dynamic downloading

Model Management

Yla provides comprehensive model management capabilities:

  • Automatic Validation: Checks installed models on application launch
  • Visual Selection: Choose models from an intuitive carousel interface
  • Download Integration: Pull missing models directly from the UI
  • State Persistence: Selected model remembered until page refresh
  • Status Indicators:
    • Available models - full color, clickable
    • Missing models - grayed out with warning and download option

Custom Model Configuration

One of Yla's powerful features is the ability to create and use specialized AI assistants:

Creating Custom Models

Users can create personalized AI assistants by defining Modelfiles with specific instructions and parameters:

# Example: technical-assistant.txt
FROM deepseek-r1:7b
SYSTEM """
You are an expert technical assistant. 
Respond in markdown format with detailed explanations.
"""
PARAMETER num_ctx 32768

Configuration System

Yla uses a flexible configuration system to manage models and their parameters:

// config.js
const config = {
    models: [
        {
            name: "Yla:latest",        // Must match Ollama model name
            num_ctx: 65536,            // Context window size
            temperature: 0.7,          // 0-2 (0=precise, 2=creative)
            top_k: 40,                 // 1-100
            top_p: 0.9,                // 0-1
            systemMessage: "Friendly general assistant",
            size: "4.1GB"
        }
    ],
    
    api: {
        endpoint: "http://localhost:11434/v1/chat/completions",
        available_models: "http://localhost:11434/v1/models",
    }
};

Configuration Reference

Field Description Example
name Ollama model name (exact match required) "my-expert:latest"
description Brief description (optional) "smart coding assistant"
size Size of the model (informational only) "4.1GB"
num_ctx Context window size (tokens) 32768
systemMessage Hidden behavior instructions "You are an expert..."
temperature Response creativity (0.3-1.8) 0.7

Technical Architecture

Yla's architecture is elegantly simple:

  • Frontend: HTML/CSS/JavaScript user interface
  • Backend: Ollama server running locally on port 11434
  • Communication: REST API calls to Ollama's endpoints
  • Storage: Local file system for models and configurations
  • Deployment: Simple HTTP server (Python's http.server or similar)

Privacy & Security Benefits

Yla offers significant advantages for privacy-conscious users:

  • Complete data sovereignty with all processing happening locally
  • No data sent to external servers or cloud services
  • Works offline, making it suitable for sensitive environments
  • Open-source approach enables security auditing
  • Avoids subscription costs and usage tracking

Performance Considerations

For optimal performance with Yla:

  • Choose model size based on available hardware resources
  • Adjust context window size (num_ctx) for memory constraints
  • Consider smaller model variants for lighter hardware (e.g., 7B vs 32B)
  • Close memory-intensive applications when using larger models

Demo

GitHub Repository

View the code on GitHub