TLDR; Built a chrome app that is smart about highlighting your most relevant notes while you browse any website. Give Klue a try on the Chrome Web Store.
I've been a power user for a lot of note-taking apps over the years. Joplin, Obsidian, Apple Notes, Notion, OneNote, OneTab, etc — honestly, I lost count. The pattern was always the same: save something that feels important, never see it again. Or stumble across it six months later with zero memory of why I saved it.
What bothered me wasn't the apps themselves, they're all well-built. It was the moment of friction. I'm deep into a report about UK sub-prime markets, and somewhere in my brain there's a note about 2020 sub-primes I saved months ago. But to find it means stopping everything, opening the app, guessing which tag past-me used (was it some generic like #finance? maybe specific like #primes?), scrolling through pages of stuff. By the time I find it, the connection I was chasing is gone.
Lately I'd been reading about vector search making this kind of "meaning-based" retrieval actually feasible. Transformers.js could run embedding models in the browser for years now. And LLMs don't require a PhD to set up anymore with the carnival of APIs out there. I wanted to test if I could build something that just worked so I gave myself a weekend to find out.
Breaking Down the Problem
Before writing any code, I tried to figure out who actually has this problem and what everyone else is building. Turns out, a lot.
The Market Split
I spent a few days prior looking at what's out there. The landscape basically splits two ways:
Storage-focused tools like Obsidian, Notion and Evernote let you build elaborate systems. You can customise everything, which is powerful, but I found myself spending more time organising than actually using my notes. Obsidian's web clipper asks you to pick a vault, set template properties, add web page properties, all before you can save anything. It's very flexible. Also very slow when you just want to clip something quick.
AI-native tools like Readwise get the synthesis part. Readwise's AI integration is clever. You can chat with your articles. Popular tools also now auto-summarise YouTube videos. But everything routes through cloud LLMs, which makes sense for their model but felt uncomfortable for research notes or anything work-related. Your queries, your data, all going to OpenAI's servers.
The gap I saw: Everything treats clipping like a library. You go to the tool to find things. Nothing proactively surfaces "hey, you have 3 notes related to what you're reading right now." The intelligence is reactive, not ambient.
The User Personas
I developed three archetypes, not for documentation but more like decision filters for myself that are based on my own workflows I recorded myself doing. Every time I was about to build something, I'd ask: does this actually help one of these people?
1. Deep Diver Researcher — Academic or journalist working on long-form synthesis. Pain: Loses the source of quotes. Can't find that one paragraph they swear they saved. The test: Can they write a synthesis paper citing 50 pages using only this extension?
2. Frontend Architect — Hobby programmer researching solutions to specific bugs or implementations. Pain: Saves code snippets with explanations, can't find them three months later when hitting the same bug. The test: Can they solve problems from their own notes faster than re-Googling or asking AI?
3. Avid Hiker — Likes to build mood boards and reference hiking routes to plan for. Pain: Screenshots pile up in Downloads. Pinterest is too public. Text-only clippers don't help. The test: Can they design a hiking trip itinerary using only the #hiking tag?
These became the lens for every decision. Ghost tags (which we'll cover later) shipped because the Deep Diver needed serendipity and the Hobby Programmer needed precision — it solved for both. Live context in chat shipped because all three personas needed to compose context dynamically without starting over.
The Setup
I had a weekend to prototype so I structured my AI-assisted workflows like a pod team, except the team was LLM agents with distinct roles:
- Me (PM): Making product decisions, writing specs, filtering everything through the three personas
Gemini.MD(Orchestrator): Reading Linear issues, exploring the codebase, writing technical specs, reviewing codeClaude.MD(Engineer): Reading specs, implementing with TDD, checking off boxes, committing code for review- Advisory skills: Principal Designer persona for UX critiques, ML Principal Engineer for architecture advice
The Workflow:
I contemplated using one agent for everything initially but I would have to settle for reducing my chances of one-shotting features with okay architecture and okay code. Claude is token heavy when it thinks and writes code and quickly runs out of the 5-hour usage limits. Gemini has a large 1 million token context window which is useful for managing larger thinking tasks and has forgiving usage limits.
Design System
Before building anything, I needed to answer: what outcome do I need, what should this feel like for the user, how do I keep the momentum of browsing the web without being intrusive?
Design Philosophy:
I really liked Linear's "intent driven" philosophy: users don't "view lists," they "act on intents." So not "Bookmarks," but "Read Later." Not "Tags," but "Research Stack."
Why It Fit the Personas:
- Hobby Programmer: Keyboard shortcuts everywhere, compact UI, fast navigation
- Deep Diver: Dense information display, clear visual hierarchy for scanning 50+ notes
- Avid Hiker: Subtle animations, unintrusive
Everything shipped with full design tokens documented in DESIGN_SYSTEM.md which helped make the codebase maintainable.
Solution Architecture
The AI Harness
The personas needed some machine learning infra to help them answer "what connects these notes?". I evaluated three options.
Option 1: Gemini Nano (Local)
Stable Chrome now has a built-in on-device AI. It's privacy-first, near zero latency and completely offline after the first prompt. Sadly, it requires a minimum of 22GB storage, 16GB RAM, or 4GB VRAM GPU which my laptop barely met. The addressable market doesn't have these specs and I can't build for hardware most people don't own.
Option 2: Google AI APIs (Early Preview)
Google's preview AI APIs looked promising and had better hardware requirements than Nano but the API surface changed weekly — locking the extension into Google's ecosystem with experimental APIs meant breaking changes every month.
Option 3: Provider Abstraction
Having a single provider also meant a single point of failure, so I built an abstraction layer (popularly called an AI harness today) with pluggable providers. Settled on using OpenRouter's free model router for v1. This keeps AI features free with a BYOKey feature and still keeps the door open for Gemini Nano when the ecosystem matures.
class AIHarness {
async initialize(providerName = 'openrouter') { }
async sendMessage(text, context, onChunk, onComplete) { }
}
Providers live in ai-harness/providers/: openrouter.js (shipped in v1), gemini.js (placeholder), claude.js (placeholder). Each provider normalises messages, handles streaming, implements fallback logic. The chat UI doesn't know which provider it's talking to.
In light of AI tooling rapidly maturing in 2025, I was itching to test out eval frameworks, token usage tracking, concept drift monitoring, orchestration for multi-step agentic loops. But had to keep it simple because observability is for scale. V1 is about validating the core loop first.
Local RAG Stack
To enable active surfacing of relevant notes based on context, we couldn't simply send the current page and note contexts to an LLM on most user actions — it's both computationally expensive and slow. The Deep Diver researcher also can't send proprietary work to cloud LLM.
I needed a local-first, zero-API-calls approach for embeddings. Privacy by architecture, not by promise. Your notes should never leave your machine unless you explicitly use chat.
Embedding and Clustering
I settled on the classic all-MiniLM-L6-v2 sentence transformer. It turns text into a numerical 'fingerprint' that represents its meaning. Larger model variants (768-dim) were more accurate but 4x bigger. Smaller models (128-dim) were faster but felt too imprecise. 384-dim hit the sweet spot.
Search
For search, I tried a simple similarity search (raw cosine similarity), but it couldn't keep up with 1,000+ notes. Then I looked into FAISS (via WASM), but it was too bulky to load quickly in a browser. I discovered Orama, which gave hybrid search (text + vector) with a tiny footprint and IndexedDB persistence out of the box.
I implemented a queue to process one note at a time because initial versions let embedding requests run concurrently, leading the Chrome service worker to crash after indexing 50 notes. The queue processes slower, but is stable for v1.
The Final stack:
- Transformers.js: Runs
Xenova/all-MiniLM-L6-v2in the browser (384-dimensional embeddings, 25MB model) - Orama: In-memory vector database with hybrid search (keyword + semantic)
- IndexedDB: Persistence layer (bypasses chrome.storage's 5MB limit)
- Service Worker: Sequential task queue to prevent memory spikes
UX: The Context Surfacing Problem
While setting up the backend, I was also trying to address the core UX problem: how do users tell the app what context to use? Took a few tries to get this right.
Iteration 1: Context Pills
First attempt had fragmented context management. Header pill showed "Related Notes," dropdown hiding Tag filters, chat having no explicit context control. Users couldn't compose specific context like "analyse This Page + #research notes."
Iteration 2: Stack Unification
I killed the header pill and moved everything into "Stack" chips — a single source of truth for context. I used semantic search for suggestion pills instead. When viewing a page with no saved notes, we extract the page's title and text, generate embeddings, query vector DB for similar notes and then pull their tags.
The strategy:
- Stack = one state driving both Library filtering AND chat context
- Chips for everything: "This Page,"
#tags, "Starred," "Read Later" - AND logic: Stack filters + Search
- Live context: Changing filters updates the next chat message, doesn't reset the conversation
Users today treat classic chat UIs as a workspace. They want to bring new tools into the conversation, not start over.
Iteration 3: Stack Context Bar
I refined the stack with a fixed 'This Page' button, a new pop-up context menu for more filters and sorting, and a scrollable horizontal bar. The tags appear as light grey chips in the Stack. Click one, it becomes an active filter, surfacing the related notes you forgot you had.
If Stack is active and search returns nothing, we show a helper: "Searching in context. Search all notes?" One click clears Stack filters but keeps the search term.
What Shipped
Shipped v1.0.0 to Chrome Web Store over the weekend.
Core Features:
- Chrome side panel with Library + Chat views
- Semantic search: vector embeddings, hybrid with keyword search via Orama
- Basic Note creation and editing tools
- AI chat with smart context injection (Stack filters + "This Page" content)
- Smart metadata extraction (title, description, auto-tags from content)
- Ghost tags (semantic suggestions from similar notes)
- Linear-inspired design system (color tokens, spacing scales, full motion system)
- Privacy-first architecture (local embeddings, only chat hits OpenRouter)
Is it perfect? No. The image handling is basic. The "This Page" filter sometimes gets confused on sites with complex URL parameters. Ghost tags occasionally suggest connections that make zero sense. But the core loop works: I can find that 2020 sub-primes note by typing "loans" into search. That's the thing I needed to work.
The Tests:
- Deep Diver: Can cite old notes without remembering exact keywords. ✓
- Hobby Programmer: Solves problems from personal solution library faster than re-Googling. ✓
- Avid Hiker: Images and metadata save, but no gallery view yet. Partial.
I've been using it daily since shipping. That's the real test for me.
What's next?
A product shipped doesn't mean product launched. I still want to convince people to try this and bring their challenges. There's more feedback to be captured from real users and testing to refine the experience. The first three personas were educated guesses. Now I'll have actual data.
If you found what I'm trying to build interesting, give Klue a try on the Chrome Web Store.