Builds & Breakthroughs #1

Each week, we share what AI models just unlocked and what builders are doing with it.

Vedika Jain

Jun 06, 2025

Build: Peek – Your AI money ops room

Builder: Sherry Jiang

What it does

AI workspace that turns raw finance data into concrete money moves—no spreadsheets.

“Instead of dashboards, we turn their transaction data into behavioral nudges that drive action - delivered through Gen Z-friendly interfaces like reels and chat.”

Breakthroughs

Cheaper tokens + larger context windows kill RAG. Full ledgers now sit in-prompt, and Peek ingests 24 months of transactions on day one.
“Cost drop + larger context window helps us achieve a lot of things. Analyzing 2 years worth of transactions in the first 5 mins of users onboarding onto Peek (after connecting Plaid) was not possible with our version 1 because gemini wasn't as good back then. we had to scope down requests to 1-2 months of transaction data. It also means we save cost on infrastructure since we don't need things like RAG and everything can be loaded into the context window directly.”

Backend flipped to “thin shell.” LLM now runs portfolio math, savings logic, and drives the UI; every layer is exposed as a callable tool.
“We had to preprocess everything - portfolio math, savings calcuations, even basic categorization logic ran on the backend, because the models couldn’t handle it cleanly.”

Groq’s Llama 8B cleans thousands of descriptions in under 10s and answers chats in under 5s.
“Users are impatient - they are expecting <5 second chat responses + <5 min total time to finish task. there's something to be said about UX when users can quickly get analysis done across large data sets. newer models like gemini helps us achieve that.”

Stack

Models: Gemini 2.5 Pro, GPT-4o-mini, Llama-3.1-8B-Instant (Groq)
Memory: Zep long-term store injected on demand
Infra: Next.js on Vercel, serverless functions, Plaid & Alpaca APIs
Architecture: Backend = thin controller.

Still janky

Native, in-model memory is thin; external stores add fragility.
"Native memory support from model providers is still lacking. Right now we use external memory services (like Zep) and manually inject context - adds overhead, increases fragility, and creates a single point of failure.”

TypeScript agent tooling & eval libs trail Python.
"Too much of the good tooling lives in Python. understandable - ML engineers live there. but the web runs on TypeScript. if you're building agentic UX or real-time LLM interactions, you're stuck reinventing wheels or writing your own wrappers.”

Finance-specific evals are home-grown (catching hallucinated CAGR or bogus advice).
"Eval infra for financial correctness is almost nonexistent. We're hacking together our own tools to catch hallucinations in money-specific domains (e.g., miscalculated CAGR, incorrect portfolio suggestions). not ideal.”

Unlocks

Data-to-Action agents: Instead of dashboards, the agent pushes prescriptive steps (“cancel X”, “invest Y”).
Full-dataset in-prompt: Cheap tokens + big windows let you drop RAG and feed entire ledgers (or logs, telemetry, CRM history) straight to the model.
Thin-shell backends Off-load domain math (CAGR, cohort LTV) to the LLM; server code becomes a lightweight controller.
Promptable UI surfaces Every screen is exposed as a callable tool, so the model can edit state directly—no middle-layer glue.

Try it: Peek

Build: Metrica – Your AI Endurance Coach

Builder: Max Grigoryev

What it does

Adaptive training + nutrition plans for swim, bike, run, and strength—rebalanced daily from your performance.

“It doesn’t just build a training plan — it adapts in real time to how you feel.”

Breakthroughs

Embedded sports-science expertise: GPT-4.1 understands endurance, HRV, and recovery at coach level.
“New OpenAI GPT 4.1 model contains a lot of research data around endurance in general, heart rate behavior, recovery and it can align the plans based on individual factors.

Long-horizon planning: Larger context windows enable 12-week periodization without prompt chaos.
“Previous versions were good, but not as good as current one with long term planning.”

Full log in context: Cheaper tokens let Metrica load an athlete’s entire training history day one.

Stack

Infra: Supabase + Vercel
Models: Claude 3.5 Sonnet, GPT-4.1 (primary for training logic), Gemini 2.5 for coding
Coming soon: OpenAI Codex for dev assist, Whoop + Garmin integration (pending API access)

Still janky

Long-term memory & progression tuning still brittle.
“Generally speaking - context windows, but it improved with Sonnet 4 and also going to try Codex of OpenAI as well which would address some of coding challenges.”

Needs full wearable data for deeper personalization.

Unlocks

GPT-4.1 can simulate expert-level expertise once off-loaded to backend code.
Dynamic re-tuning: AI-generated plans can re-tune to subjective states (e.g. fatigue, motivation).
Whole-history context: Long-horizon planning is now viable with richer embedded knowledge + larger context windows.
Pro-grade guidance at consumer latency: Sub-second replies turn “expert report” into an interactive advisor.

Try it: Metrica

Build: Nelima – Your Agentic Personal Assistant

Builder: James Kachamila

What it does

Full agent that chains scraping, scheduling, and editing across tools like a power VA.

“She pulls KPIs from your database, spins up cold email campaigns, chains tasks across tools, and just gets it done — all from a prompt.”

Breakthroughs

Human-paced interaction: GPT-4o’s sub-second reasoning and voice/vision I/O finally make agentic workflows feel like working with a human teammate.
Custom SWARM: A SWARM router hands each sub-task to the best model, Claude 3.5, Gemini 2.5, or a fine-tuned LLaMA 3, boosting accuracy while reducing cost.
“The combo of GPT-4o’s real-time reasoning, memory, and multi-modal input/output made it finally viable to run agentic workflows that feel human-like, responsive, and somewhat reliable at scale. Note that we do use other models as well.“

Stack

Node.js backend + MongoDB (w/ vector store), Docker
Deployed across Azure + GCP (Cloud Storage + Compute)
BrightData for proxy routing, SerpAPI for scraping, μWebSocket for real-time comms
AI orchestration: GPT-4o, Claude 3.5, Gemini 2.5, LLaMA 3 (QLoRA), Voyage AI for embedding search

Still janky

Long-chain drift: The agent still forgets context and breaks on extended workflows.
Needs: durable memory, stronger tool-calling APIs, and models that honor system prompts even in huge context windows.
“We still run into friction with long-term memory and cross-task continuity, especially when chaining very complex actions over time. Many AI models lack persistent state or structured memory, which limits reliability in complex workflows.”

Unlocks

Human-paced multimodal execution: sub-second voice, vision, and text I/O, enabling continuous, conversational task chains.
Model-specialist orchestration: a SWARM router assigns each sub-task to GPT-4o, Claude 3.5, Gemini 2.5, or a fine-tuned LLaMA 3, balancing speed, quality, and cost.
Tool-chain autonomy: structured function calls let the agent scrape data, update databases, schedule events, and edit documents without manual glue code.
Infra-level reliability: retries, fallbacks, and persistent state maintain context across long, multi-step workflows.

Demo: Watch here

Try it: sellagen.com/nelima

X: @nelima_ai

Reddit: use case post

Your Turn

Built something on the frontier recently? Hit reply with the details. We’ll feature the best one next edition.

Builds and Breakthroughs

Discussion about this post