Grazie per aver inviato la tua richiesta! Uno dei nostri team membri ti contatterà a breve.
Grazie per aver inviato il tuo prenotazione! Uno dei nostri team membri ti contatterà a breve.
Struttura del corso
AI Sovereignty and Local LLM Deployment
- Risks associated with cloud LLMs: data retention, training on user inputs, and foreign jurisdiction.
- Ollama architecture: model server, registry, and OpenAI-compatible API.
- Comparison with vLLM, llama.cpp, and Text Generation Inference.
- Model licensing terms for Llama, Mistral, Qwen, and Gemma.
Installation and Hardware Setup
- Installing Ollama on Linux with CUDA and ROCm support.
- CPU-only fallback options and AVX/AVX2 optimization.
- Docker deployment and persistent volume mapping.
- Multi-GPU setup strategies and VRAM allocation.
Model Management
- Pulling models from the Ollama registry (e.g., ollama pull llama3).
- Importing GGUF models from HuggingFace and TheBloke.
- Understanding quantization levels: trade-offs between Q4_K_M, Q5_K_M, and Q8_0.
- Model switching and limits on concurrent model loading.
Custom Modelfiles
- Writing Modelfile syntax: FROM, PARAMETER, SYSTEM, and TEMPLATE directives.
- Tuning temperature, top_p, and repeat_penalty.
- Engineering system prompts for role-specific behavior.
- Creating and publishing custom models to the local registry.
API Integration
- Utilizing the OpenAI-compatible /v1/chat/completions endpoint.
- Implementing streaming responses and JSON mode.
- Integrating with LangChain, LlamaIndex, and custom applications.
- Managing authentication and rate limiting via reverse proxy.
Performance Optimization
- Sizing the context window and managing the KV cache.
- Handling batch inference and parallel requests.
- Allocating CPU threads and understanding NUMA architecture.
- Monitoring GPU utilization and memory pressure.
Security and Compliance
- Ensuring network isolation for model serving endpoints.
- Implementing input filtering and output moderation pipelines.
- Audit logging of prompts and completions.
- Verifying model provenance and hash integrity.
Requisiti
- Intermediate knowledge of Linux and container administration.
- A high-level understanding of machine learning and transformer models.
- Familiarity with REST APIs and JSON.
Target Audience
- AI engineers and developers seeking to replace cloud LLM APIs.
- Organizations with data sensitivity issues that prevent the use of cloud models.
- Government and defense teams requiring air-gapped language models.
14 ore