Patrick John Kelly

The Human-in-the-Loop Platform

In November 2024, I launched a human-in-the-loop platform for AI automations. The idea was simple: AI workflows in tools like Make and Zapier produce unpredictable output, and businesses need a way to review and approve that output before it goes live.

The product was a hosted web app. Your automation would send an API request with an arbitrary JSON data payload - could be anything, a draft email, a product listing, a customer response. The platform returned a UUID link to a review interface that rendered your data in a clean, editable UI. A human could review the content, edit fields, and hit approve or deny. When they did, the response went back to your automation, which had been waiting for the decision, and continued executing.

No predefined schemas. No configuration. Whatever JSON you sent, we’d render it as a reviewable form. That flexibility was the whole point - Make and Zapier agencies build hundreds of different automations, and a rigid approval UI would’ve been useless to them.

Traction

First paid user within 30 days of launch. Make.com reached out and asked us to build a native module for their platform - they wanted human-in-the-loop as a first-class step in their automation builder. We were booking sales calls every day of the week from agencies that were building AI automations for their clients and needed a way to keep humans in the loop before anything customer-facing went out.

The product-market fit signal was clear. Agencies were already solving this problem with ugly workarounds - Slack messages with approve/deny buttons, email-based reviews, manual copy-paste steps. A purpose-built approval layer that worked with any data structure was obviously better.

The insight

The conversations I had during those early weeks shaped how I think about AI systems.

Every agency I talked to had the same underlying concern: their clients didn’t trust AI output enough to let it run autonomously. Not because the AI was always wrong - but because when it was wrong, the consequences were real. A bad product description goes live on Shopify. A poorly worded customer email gets sent. A social post goes out with the wrong tone.

The standard response in the AI industry is to treat human oversight as a bottleneck to be eliminated. Make the model better, add more guardrails, automate the approval away. But that framing misses what’s actually happening in most businesses.

For the agencies I talked to, human review wasn’t slowing them down - it was the feature that made AI adoption possible in the first place. Their clients would never have agreed to AI-generated content going straight to production. The approval step wasn’t friction. It was the trust layer that let them use AI at all.

The core insight: for most businesses, quality matters more than speed. A 30-second human review that catches one bad output per week is worth more than saving 30 seconds per automation run. The math isn’t even close.

Why I moved on

I didn’t stop because the product wasn’t working - the traction was real, and we shipped the Make.com integration. I stopped because I realized human-in-the-loop is a feature, not a product. The real value was as a native capability inside automation platforms like Make and Zapier, not as a standalone SaaS.

That realization, combined with the practical reality of maintaining Frontly’s entire business as a solo founder while trying to commercialize a separate product with a 6-12 month enterprise sales cycle, made the decision clear. The right move was to stop building the standalone product and let the platforms build this themselves - which, increasingly, they are.

What stayed with me

Building the HITL platform taught me that the most interesting problems in AI aren’t model quality problems. They’re systems problems. How do you integrate AI into existing workflows without breaking trust? How do you give humans meaningful oversight without creating so much friction that nobody uses the automation? How do you handle the fact that every business has different data structures, different approval criteria, different risk tolerances?

These are engineering problems, not research problems. And they’re the problems that determine whether AI actually gets adopted in the real world - not benchmark scores, not parameter counts, not capability demos.