Trilium Catalyst is a multi-source content ingestion engine within the Trilium AI ecosystem. It takes audio files, YouTube videos, live voice recordings, and other forms of spoken content, automatically transcribes them into text, refines them with AI, and archives everything into your Trilium Notes knowledge base with a single click.
Picture this: you just watched a two-hour YouTube tech talk and don’t feel like taking notes word by word. Just drop the video link into Trilium Catalyst — it’ll extract the subtitles (or download the audio and transcribe it if there are none), have AI strip out filler words, fix typos, organize everything into clean paragraphs, and save it to Trilium Notes. The whole process takes one button click.
Trilium Catalyst isn’t just another “speech-to-text” tool. It’s a complete automated pipeline — from voice capture to AI refinement to knowledge archival.
🏗️ Product Positioning: The Content Ingestion Layer of the Trilium AI Ecosystem
Within the Trilium AI plugin ecosystem, Trilium Catalyst focuses on solving one core problem: how to efficiently transform non-text knowledge into structured notes.
┌───────────────────────────────────────────────────────────────┐
│ Trilium AI Plugin Ecosystem │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────────┐ │
│ │ Trilium AI │ │ Trilium AI │ │ Trilium Catalyst │ │
│ │ Chat │ │ Agent │ │ │ │
│ │ │ │ │ │ 🎵 Audio Transcr. │ │
│ │ 💬 AI Chat │ │ 🤖 Autonomous │ │ 📺 YouTube Transc.│ │
│ │ 🧱 Workflows │ │ 📂 KB Ops │ │ 🎙️ Live Recording │ │
│ │ 🔌 Multi-model│ │ 🧠 Memory │ │ 🤖 AI Refinement │ │
│ └──────┬───────┘ └──────┬───────┘ │ 💾 Auto-archiving │ │
│ │ │ └────────┬─────────┘ │
│ ▼ ▼ ▼ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ Trilium Notes Knowledge Base · AI Model Layer │ │
│ │ Whisper ASR · Google/OpenAI/Ollama · YouTube API │ │
│ └──────────────────────────────────────────────────────┘ │
└───────────────────────────────────────────────────────────────┘
Trilium Catalyst runs as a sub-plugin of Trilium AI Chat, sharing its AI model integration layer and Trilium save capabilities. All you need is the Trilium WP core plugin and Trilium AI Chat installed — then add Catalyst on top to unlock the full multi-source content ingestion workflow.
🎵 Audio Transcription — Never Let a Recording Go to Waste
Trilium Catalyst ships with a high-accuracy transcription engine powered by Whisper, capable of converting local audio files and live recordings into text automatically.
📁 Local File Transcription
Drag and drop or select your audio files — Catalyst handles the entire pipeline of transcription, AI refinement, and saving:
- 🎶 Broad format support — WAV, MP3, M4A, FLAC, OGG, AAC, WMA, and WebM — virtually every common audio format covered
- 📦 Multi-file batch processing — Select up to 10 files at once. The system uses a “process-and-save” approach: each file is archived to Trilium the moment it’s done, so you don’t have to wait for the entire batch
- 🎛️ Multiple accuracy tiers — From Base (fastest) to Large V2 (highest accuracy), four Whisper model tiers let you balance speed and precision however you like
- 📊 Real-time file preview — Instantly see file names, sizes, and format validation after selection — unsupported file types are flagged in red right away
🎙️ Live Voice Recording
No need to record first, export, then upload — Catalyst comes with a full browser-based voice recorder built right into the page:
- ⏸️ Pause & resume — Pause at any time during recording and pick back up when you’re ready. No pressure to get it all in one take
- ⏱️ No time limit — Record for as long as you need. You’ll never get cut off mid-sentence
- 🔒 Local processing — Audio data is processed entirely on your machine — it never touches a third-party server
The best part is the dual-mode option: after recording, you can either send the transcribed text to the AI chat window for a conversation, or run it through the standard “AI refinement → save to Trilium” archival flow. The former is perfect for quickly dictating a question; the latter is ideal for turning a spoken piece into a polished note.
🎙️ Recording Complete
│
├──► 💬 Send to Chat ──► AI Chat Window (voice-to-text input)
│
└──► 📝 AI Refine & Save ──► Whisper Transcription ──► AI Refinement ──► 💾 Trilium Notes
📺 YouTube Smart Transcription — One Link, Everything Extracted
YouTube is the world’s largest video knowledge base. But video content has a major pain point: you can’t search, cite, or organize it the way you can with text. Trilium Catalyst solves this with a smart processing pipeline.
🔄 Intelligent Fallback: Ensuring You Always Get Text
Not every YouTube video has subtitles. Catalyst features a two-tier intelligent fallback strategy that guarantees you’ll get full text content regardless of whether the video has subtitles:
📎 Paste YouTube Link
│
▼
🔍 Step 1: Attempt subtitle extraction
│
├── ✅ Success ──► Retrieve subtitle text (manual + auto-generated, multilingual)
│ │
│ ▼
│ 🤖 AI Refinement ──► 💾 Save to Trilium
│
└── ❌ Failure ──► Automatic fallback
│
▼
🔄 Step 2: Download video audio
│
▼
🎵 Whisper Audio Transcription
│
▼
🤖 AI Refinement ──► 💾 Save to Trilium
Subtitles first: If the video has subtitles — whether manually added by the creator or auto-generated by YouTube — Catalyst extracts them directly. It’s fast and produces high-quality results.
Audio transcription as fallback: If subtitle extraction fails (no subtitles, empty subtitle content, API connection issues, etc.), Catalyst doesn’t give up — it automatically downloads the video’s audio track and runs Whisper speech recognition to make sure you still get text output.
This fallback process is fully transparent to the user. After processing, the results page clearly indicates which method was actually used (subtitle extraction or audio transcription), along with timing and details for each step.
🌍 Multilingual Subtitle Support
YouTube subtitle extraction supports Chinese, English, and auto-detect language preferences. The system prioritizes your selected language, and if it’s unavailable, automatically falls back to other available languages.
🏢 Channel Batch Processing
Want to transcribe a bunch of videos from a single channel? No need to paste links one by one. Catalyst’s channel batch processing feature lets you:
- Enter a channel address — Paste a channel URL or @username
- Fetch the video list — The system automatically pulls the channel’s videos, showing titles, durations, upload dates, and view counts
- Pick and choose — Check the videos you’re interested in — select all or cherry-pick as you like
- Batch process with one click — Hit start and the system runs smart transcription on each video sequentially, with real-time progress updates
┌─────────────────────────────────────────────────────────┐
│ 🏢 YouTube Channel Batch Processing │
│ │
│ 📊 50 videos found │
│ ┌──────────────────────────────────────────────────┐ │
│ │ ☑ Intro to Deep Learning (45:32) 📅 2025-12-01│ │
│ │ ☑ Transformer Architecture (1:23:15) 📅 2025-11-28│ │
│ │ ☐ Vlog: Weekend Routine (12:05) 📅 2025-11-25│ │
│ │ ☑ PyTorch Hands-on Tutorial (58:42) 📅 2025-11-20│ │
│ │ ... │ │
│ └──────────────────────────────────────────────────┘ │
│ │
│ [✅ Select All] [❌ Deselect All] [🚀 Start (3)] │
│ │
│ ┌──────────────────────────────────────────────────┐ │
│ │ 📊 Progress: ██████████░░░░░ 67% │ │
│ │ Success: 2 | Fallback: 0 | Failed: 0 | Pending: 1│ │
│ └──────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘
During batch processing, results for each video are appended in real time — you can see exactly which videos were handled via subtitle extraction, which went through the intelligent fallback, and which ran into issues. When it’s all done, everything is neatly tucked away in your Trilium Notes.
🤖 AI-Powered Refinement — From Raw Transcript to Readable Note
Raw speech-to-text output is typically riddled with filler words, repetitions, typos, and awkward sentence breaks. Save it as-is and you’ll barely want to read it later. Trilium Catalyst’s AI refinement layer is built specifically to fix this.
✨ What Does the AI Do?
After transcription, the AI processes the raw text through the following steps:
- 🧹 Strip filler words — “um,” “uh,” “you know,” “like,” and other verbal fillers are automatically cleaned out
- 🔄 Remove repetitions — Redundant phrases and sentences common in speech are consolidated
- ✏️ Fix transcription errors — Homophones and near-sound errors introduced by Whisper are intelligently corrected
- 📐 Paragraph organization — Disjointed text is restructured into logically coherent paragraphs for better readability
- 📝 Auto-generate titles — The AI produces an appropriate title based on the content
🔌 Flexible AI Model Selection
The AI refinement feature hooks directly into Trilium AI Chat’s multi-model integration layer, supporting every AI service you’ve already configured:
- Google Gemini — e.g., Gemini 2.5 Pro, excels at processing long-form text
- OpenAI-compatible APIs — GPT-4o, Claude, DeepSeek, and more, connected through a unified interface
- Ollama local models — Run entirely on your own machine — your data never leaves your server
You can freely select your preferred AI provider and model in the settings page, and even customize the refinement prompt. If the default output doesn’t match your expectations, write your own prompt to fine-tune exactly how the AI behaves.
🔧 Optional, Not Mandatory
AI refinement is a toggle. You can enable or disable it in the settings. When turned off, raw transcription text is saved directly to Trilium Notes with zero AI processing — perfect for when you just need a quick archive and don’t care about text quality.
When enabled, notes saved to Trilium include both the AI-refined version and the original transcript, so you can always cross-reference against the raw text.
💾 Auto-Archive to Trilium Notes — The Final Mile for Knowledge Capture
Everything processed by Catalyst is automatically saved to your Trilium Notes knowledge base. You never need to manually copy and paste a single word.
📋 Note Structure
Every note saved to Trilium contains rich metadata:
# 🎵 Audio Transcription: Meeting_Recording_20260314 (AI-Refined)
**Processed at:** 2026-03-14 15:30:25
**Content type:** Audio transcription
**File size:** 12.5 MB
**Transcription time:** 8.42s
**Transcription engine:** Systran/faster-whisper-medium
**AI refinement time:** 3.15s
**AI provider:** OpenAI-compatible API
**AI model:** gemini-2.5-pro
**Tags:** content-processing, trilium-ai, ai-processed, audio-transcription
## 📝 AI-Refined Content
(Clean, polished text after AI refinement…)
---
## 📋 Original Content
(Raw Whisper transcription text…)
*Auto-generated by Trilium Catalyst v4.10.0*
🏷️ Smart Tagging System
Catalyst automatically applies tags based on the content’s source and processing method:
| Processing Method | Auto-applied Tags |
|---|---|
| Local audio transcription | audio-transcription |
| YouTube subtitle extraction | youtube-subtitle |
| YouTube intelligent fallback | youtube-fallback + audio-transcription |
| YouTube audio transcription | youtube-audio-transcription |
| AI-refined | ai-processed |
You can also define custom global tags in the settings (e.g., content-processing, trilium-ai) — all notes will automatically carry these tags. This lets you quickly filter and search for all Catalyst-generated content in Trilium Notes.
⚡ Process-and-Save
When batch-processing audio files, Catalyst doesn’t wait until every file is done before saving — it saves each file to Trilium the moment its transcription and AI refinement are complete. This means even if your network drops or the browser crashes mid-batch, everything that’s already been processed is safe.
🎛️ Admin Settings — Full Control at Your Fingertips
Catalyst’s settings page (WordPress Admin → Trilium AI → Content Processor) gives you granular control over every feature:
┌────────────────────────────────────────────────────────┐
│ 🎯 Trilium Catalyst Settings │
│ │
│ 🎵 Audio Transcription Settings │
│ ├── Whisper API Endpoint │
│ ├── Transcription Model (Base / Small / Medium / Large V2)│
│ └── Quality Mode (Fast / Balanced / High Quality) │
│ │
│ 📺 YouTube Processing Settings │
│ ├── YouTube API Server Address │
│ └── Default Subtitle Language (Chinese / English / Auto)│
│ │
│ 🤖 AI Refinement Settings │
│ ├── Enable / Disable AI Refinement │
│ ├── AI Provider (Google / OpenAI-compatible / Ollama) │
│ ├── AI Model Name │
│ └── Custom Refinement Prompt │
│ │
│ 📁 Trilium Save Settings │
│ ├── Auto-organize by Year/Month Folders │
│ ├── Custom Note Tags │
│ └── Process-and-Save Mode │
│ │
│ 📊 Integration Status Panel │
│ ├── ✅ Trilium WP Core Plugin │
│ ├── ✅ TriliumAI Chat Plugin │
│ ├── ✅ TriliumAI API Manager │
│ ├── ✅ Whisper API Server │
│ └── ✅ YouTube Media API Server │
│ ├── 📺 Subtitle Extraction: ✅ Available │
│ └── 🎵 Audio Download: ✅ Available │
└────────────────────────────────────────────────────────┘
The integration status panel checks all dependent services in real time, giving you an at-a-glance view of which features are available and which need configuration.
📊 Feature Overview
| Capability | Description |
|---|---|
| 🎵 Local Audio Transcription | Supports WAV, MP3, M4A, FLAC, OGG, AAC, WMA, and WebM; batch processing up to 10 files |
| 🎙️ Live Voice Recording | Browser-based recording with pause/resume, no time limit, dual-mode (chat / archive) |
| 📺 YouTube Subtitle Extraction | Auto-extracts manual + auto-generated subtitles; supports Chinese, English, and auto-detect |
| 🔄 YouTube Intelligent Fallback | Automatically downloads audio → Whisper transcription when subtitle extraction fails |
| 🏢 YouTube Channel Batch Processing | Fetch channel video list, pick and choose, batch smart transcription with real-time progress |
| 🤖 AI-Powered Refinement | Strips filler words, removes repetitions, fixes errors, organizes paragraphs, generates titles |
| 🔌 Multi-model AI Support | Google Gemini · OpenAI-compatible · Ollama local models, powered by Trilium AI Chat’s integration layer |
| 💾 Auto-save to Trilium | Process-and-save, rich metadata, smart tagging, year/month folder organization |
| 🎛️ Whisper Model Selection | Base / Small / Medium / Large V2 — four tiers balancing speed and accuracy |
| 📝 Custom Prompts | Fully customizable AI refinement prompts for precise output control |
| 📱 Responsive Frontend | Smooth experience on both desktop and mobile, with automatic dark mode support |
| ⚡ Async AJAX Processing | No page refreshes, real-time status feedback, 1-hour extended timeout support |
🚀 Getting Started
📋 Prerequisites
- ✅ Trilium WP Core Plugin — Provides connectivity to Trilium Notes
- ✅ Trilium AI Chat Plugin — Provides AI model integration and note-saving capabilities
- ✅ Whisper API Service — e.g., faster-whisper-server, provides audio transcription
- ✅ YouTube Media API Service (optional) — Provides subtitle extraction and audio download capabilities
📥 Installation
1️⃣ Verify dependencies — Make sure Trilium WP and Trilium AI Chat are installed and activated, and that the Trilium Notes ETAPI connection is working properly.
2️⃣ Install Catalyst — Upload trilium-catalyst-4.10.0.zip and activate it. The plugin will automatically register itself under the Trilium AI admin menu.
3️⃣ Configure services — In the “Trilium AI → Content Processor” settings page, enter your Whisper API and YouTube API endpoints, and select your preferred AI model.
4️⃣ Start using it — Add the shortcode to any WordPress page or post:
[trilium_content_processor]
The full Catalyst frontend will appear on the page — complete with “Audio Transcription,” “YouTube Transcription,” and “YouTube Batch Processing” tabs. You’re ready to go.
⚙️ Technical Highlights
🔄 Zero-memory streaming downloads — YouTube audio downloads use cURL streaming writes, piping audio data straight from the network to disk without passing through PHP’s memory buffer. This means even multi-hundred-megabyte audio tracks from long videos won’t cause PHP memory overflow.
⏱️ Timeout-free processing — Frontend AJAX requests use a uniform 1-hour timeout, and backend PHP execution time is set to 2 hours. Combined with Nginx’s fastcgi_read_timeout configuration, this completely eliminates timeout issues when processing long audio and video content.
🔒 Security by design — All AJAX requests use WordPress nonce verification. File download paths go through realpath safety checks to prevent directory traversal attacks. Temporary files are automatically cleaned up after 1 hour.
🧩 Loosely coupled architecture — Subtitle extraction, audio download, Whisper transcription, AI refinement, and Trilium save are all independent, swappable modules. If any single step fails, it won’t affect work already completed by other steps.
🔮 Synergy with the Trilium AI Ecosystem
Notes generated by Trilium Catalyst aren’t isolated — once they’re in Trilium Notes, they become knowledge assets that the entire Trilium AI ecosystem can work with:
🔍 Searchable by AI Agent — With Trilium AI Agent installed, the AI agent can search and read all notes generated by Catalyst. Ask something like “Find that YouTube video about Transformer architecture I transcribed last month,” and the Agent will locate the note in your knowledge base and return the content.
🧱 Orchestrable via workflows — Using Trilium AI Chat’s Gutenberg Block workflow capabilities, you can design AI analysis workflows that perform secondary analysis, summarization, or translation on content collected by Catalyst.
🧠 Accumulative memory — Through OpenClaw’s persistent memory system, the AI agent can remember knowledge content you’ve captured via Catalyst and reference it in future conversations.
🎵 Audio / 📺 YouTube ──► Trilium Catalyst ──► 💾 Trilium Notes
│
┌─────────────────────────┤
▼ ▼
🔍 AI Agent Search 🧱 Workflow Analysis
│ │
▼ ▼
🧠 Memory Accumulation 📊 Deep Insights
💡 Why Choose Trilium Catalyst?
🎯 Focused on content ingestion, refined to perfection — This isn’t a Swiss Army knife that tries to do everything. It’s a purpose-built tool that does one thing exceptionally well: turning sound into text. Intelligent fallback, batch processing, process-and-save — every design decision ensures your content is never lost or overlooked.
🔄 Fully automated pipeline — From audio input to note archival, zero manual work in between. Click a button (or paste a link), grab a coffee, and come back to find your notes neatly organized and saved.
🧠 AI makes content truly usable — Raw speech-to-text output has almost no value for direct reading or searching. The AI refinement layer transforms “machine-generated text streams” into “notes humans can comfortably read” — and that’s the crucial step that turns content into actual knowledge.
🏠 Fully self-hostable — Whisper, the YouTube API service, and Ollama models can all run on your own server. From recording to transcription to AI refinement to saving, no data in the entire pipeline needs to pass through any third-party cloud service.
🔗 Ecosystem synergy — As part of the Trilium AI ecosystem, content captured by Catalyst can be searched by AI Agent, orchestrated through workflows, and accumulated in AI memory. It’s not a standalone tool — it’s one of the gateways into your AI-powered knowledge system.
🧑💻 About
Trilium Catalyst (current version v4.10.0) is developed and maintained by SatoshiWP as a sub-plugin within the Trilium AI plugin ecosystem. Released under the GPL v2+ open-source license.
Dependencies: Trilium WP · Trilium AI Chat
💡 Trilium Catalyst — There’s too much valuable knowledge spoken out loud in this world that simply disappears once heard. Let every talk, every video, every flash of inspiration become a text asset in your knowledge base — searchable and understandable by AI. This isn’t just “speech-to-text.” It’s an automated pipeline from sound to knowledge.