Back to Projects

Yla - Your Local AI Assistant

AI democratization Ollama

Overview

Yla is a lightweight, privacy-focused AI chatbot application that runs completely locally using Ollama. It enables users to interact with powerful open-source large language models without sending data to external servers, ensuring complete data privacy and avoiding cloud breaches.

Key Features

Yla offers a compelling set of features focused on privacy, flexibility, and control:

Local AI Control: Select from multiple configured models
Privacy First: All processing happens on your machine
No Internet Dependencies: Works completely offline
Custom Personalities: Create specialized assistants with Modelfiles
Parameter Experimentation: Resend messages with different temperature, top_p, and top_k settings
Model Extensibility: Easily add new models as they become available on Ollama
Reasoning Visualization: View model reasoning process when wrapped in the <think> tag

Technical Implementation

Yla is built with simplicity and efficiency in mind:

Lightweight architecture using just HTML, CSS, and JavaScript
No frameworks like React required - just Ollama, a simple HTTP server, and a browser
Communicates with the local Ollama server via REST API
Configurable through a straightforward JSON configuration file
Supports model validation and dynamic downloading

Model Management

Yla provides comprehensive model management capabilities:

Automatic Validation: Checks installed models on application launch
Visual Selection: Choose models from an intuitive carousel interface
Download Integration: Pull missing models directly from the UI
State Persistence: Selected model remembered until page refresh
Status Indicators:
- Available models - full color, clickable
- Missing models - grayed out with warning and download option

Custom Model Configuration

One of Yla's powerful features is the ability to create and use specialized AI assistants:

Creating Custom Models

Users can create personalized AI assistants by defining Modelfiles with specific instructions and parameters:

# Example: technical-assistant.txt
FROM deepseek-r1:7b
SYSTEM """
You are an expert technical assistant. 
Respond in markdown format with detailed explanations.
"""
PARAMETER num_ctx 32768

Configuration System

Yla uses a flexible configuration system to manage models and their parameters:

// config.js
const config = {
    models: [
        {
            name: "Yla:latest",        // Must match Ollama model name
            num_ctx: 65536,            // Context window size
            temperature: 0.7,          // 0-2 (0=precise, 2=creative)
            top_k: 40,                 // 1-100
            top_p: 0.9,                // 0-1
            systemMessage: "Friendly general assistant",
            size: "4.1GB"
        }
    ],
    
    api: {
        endpoint: "http://localhost:11434/v1/chat/completions",
        available_models: "http://localhost:11434/v1/models",
    }
};

Configuration Reference

Field	Description	Example
`name`	Ollama model name (exact match required)	"my-expert:latest"
`description`	Brief description (optional)	"smart coding assistant"
`size`	Size of the model (informational only)	"4.1GB"
`num_ctx`	Context window size (tokens)	32768
`systemMessage`	Hidden behavior instructions	"You are an expert..."
`temperature`	Response creativity (0.3-1.8)	0.7

Technical Architecture

Yla's architecture is elegantly simple:

Frontend: HTML/CSS/JavaScript user interface
Backend: Ollama server running locally on port 11434
Communication: REST API calls to Ollama's endpoints
Storage: Local file system for models and configurations
Deployment: Simple HTTP server (Python's http.server or similar)

Privacy & Security Benefits

Yla offers significant advantages for privacy-conscious users:

Complete data sovereignty with all processing happening locally
No data sent to external servers or cloud services
Works offline, making it suitable for sensitive environments
Open-source approach enables security auditing
Avoids subscription costs and usage tracking

Performance Considerations

For optimal performance with Yla:

Choose model size based on available hardware resources
Adjust context window size (num_ctx) for memory constraints
Consider smaller model variants for lighter hardware (e.g., 7B vs 32B)
Close memory-intensive applications when using larger models

Demo

GitHub Repository

View the code on GitHub