Recce | Blog

Introducing Data Renegades: The Podcast for the Real Stories Behind Data Tools

Written by Dori Wilson | Dec 12, 2025 4:22:47 AM

The best conversations about data infrastructure tend to happen in private at conferences, over drinks, or in DMs. Someone will tell you why they actually built the thing, what broke, what they'd never do again. Those conversations shape how we think, but rarely get heard.

We wanted to bring these conversations and make them public. So we made a podcast.

Data Renegades is a podcast featuring the engineers behind tools like Airflow, Django, Datasette, and Flink. We share the unfiltered stories behind how they were built, what broke, and what they'd never do again.

The Real Stories, Not the Conference Talks

The premise: sit down with the engineers who built the tools data teams use every day and get the real story.

"Out of 100 engineers at Lyft, maybe three could actually use Flink without hand-holding." - Micah Wylde

The version where Max Beauchemin admits he'd murder someone if he had to write another line of SQL. The version where Simon Willison explains how Bellingcat used his software to track Russian intelligence agents through their late-night food delivery orders.

That stuff rarely gets written down. We wanted to capture it before it disappears.

Meet Your Hosts

CL Kao, Founder & CEO of Recce:  Built version control systems before Git existed. In 2012, started g0v, a civic tech community in Taiwan that grew to 13,000 people digitizing government data. One project got 10,000 strangers to transcribe campaign finance records in 24 hours—exposing politicians using shadow companies to skirt donation limits. He's been thinking about how tools reshape workflows for 20 years.

Dori Wilson, Head of Data & Growth at Recce: Started in economics research at the Federal Reserve Bank of San Francisco, then moved into data roles at Uber, Mux, and startups. Her questions are less about architecture and more about teams: How do you prove value when you're seen as a cost center? How do you get stakeholders to trust the numbers?

The Episodes

Episode 1: Introducing Data Renegades

CL and Dori kick things off by talking about the podcast's origin: the g0v story, Software 3.0, and a core thesis:

"Code review is becoming data review. When your software invokes LLMs, you stop critiquing prompts and start evaluating input-output pairs. It's a data problem now." - CL Kao

Listen if you're thinking about: How agents change developer workflows, what happens when software becomes non-deterministic, and why data engineering mindsets matter for AI-driven development.

 

Episode 2: Data Journalism Unleashed with Simon Willison

Simon co-created Django at a newspaper in Kansas. He later built Datasette to make publishing structured data as simple as publishing a blog post. The Wall Street Journal uses it for CEO compensation tracking. Bellingcat used it to identify FSB agents from leaked food delivery records.

He runs Friday office hours where anyone can book 25 minutes to show him what they're building, and he thinks most data practitioners will be using coding agents for cleanup work within a year.

Listen if you care about: Data journalism, documentation culture, LLM applications for data extraction and enrichment, and fixing your team's broken wiki.

 

Episode 3: Building Tools that Shape Data with Max Beauchemin

Max created Airflow and Superset at Airbnb. Drove around Silicon Valley pitching Airflow to 50+ companies before it reached Apache. He started Preset to take a hammer to proprietary BI. These days he spends most of his time in Claude Code, building projects he claims would take a 12-person team a year.

"Analytics engineering faces higher AI disruption risk than software engineering. If you're not using agents daily, you're falling behind." - Max Beauchemin

Listen if you care about: Open source community building, functional data engineering, entity-centric data modeling, and what 20x productivity with AI actually looks like.

 

Episode 4: Streaming Made Practical with Micah Wylde

Micah built real-time fraud systems at Sift Science, ran streaming infrastructure at Lyft, and created Arroyo after realizing Flink's complexity was a product problem. Arroyo was his attempt to start over with SQL as a first-class citizen.

He's now at Cloudflare building data infrastructure on object storage. His prediction: in five years, giving all your data to one cloud vendor will look insane. Also, someone needs to replace Debezium—the Java ecosystem is too heavy for what should be lightweight data movement.

Listen if you care about: Real-time streaming, Apache Flink alternatives, CDC pain, escaping cloud lock-in, and positioning data teams as product teams.

 

Subscribe Now

First four episodes are live. New episodes drop every two weeks. Listen at: 

 

Feedback or Have a Guest in Mind?

We're looking for the engineers behind the tools data teams rely on. If you have guests you want us to talk to or questions you want answered email us or find us on LinkedIn.