SQL, SQL Everywhere

RRedMug

March 29, 20266 min read

There's a running joke in data circles: SQL is the programming language everyone claims to know and no one claims to be an expert in. It shows up on virtually every job posting. It powers dashboards at billion-dollar companies and weekend side projects alike. You can learn the basics in an afternoon, and today you can ask an AI to write it for you in seconds.

So here's the uncomfortable question — if SQL is effectively commoditized, where does data expertise actually live?

I've been sitting with that question for a while, and I think the answer is quieter than most people expect: the moat isn't the SQL. It's what the SQL knows.

The Syntax Is Solved

Let's be honest about where we are. The mechanical act of writing SQL — joins, aggregates, window functions, CTEs — is not the hard part anymore. It hasn't been the hard part for a long time. Stack Overflow solved most of it years ago. LLMs are finishing the job. If you hand a capable model a schema and a question in plain English, you'll get syntactically correct, often pretty good SQL back in seconds.

This is a good thing. The tedium of translating intent into correct syntax was never where the value was anyway.

But here's what the model doesn't know, and what no amount of fine-tuning on public data will give it:

It doesn't know your business.

The Query That Took Three Years to Write

I want to describe a type of SQL query that I think every experienced data practitioner has encountered — or written. It's a query that, on the surface, looks almost embarrassingly simple. A few joins, maybe a WHERE clause or two, a CASE statement that seems arbitrary. But every single line of it represents a decision that was made after something went wrong, or after a conversation with someone who understood the business in a way the data didn't yet reflect.

Lines like:

WHERE status NOT IN ('VOID', 'TEST', 'MIGRATED_LEGACY')

That MIGRATED_LEGACY value? It came from a system conversion in 2021. Those records are technically valid in the source table but should never appear in any operational report. That filter didn't exist on day one. It exists now because someone spent three hours on a call with the controller explaining why a number was wrong, and afterward went and added one line to the query.

Or:

AND fiscal_period != accounting_period

This join condition looks wrong to anyone who hasn't lived through the month-end close process at your specific organization. It's not wrong. It's load-bearing. Remove it and the numbers are off by tens of thousands of dollars in ways that take weeks to trace.

These aren't bugs. They're archaeology. They're compressed institutional memory, encoded in SQL.

Where the Moat Actually Forms

The information asymmetry in data work doesn't live in knowing how to write a subquery. It lives in:

Edge cases that only surfaced through time. You don't know that the GL module double-posts in certain intercompany scenarios until you've worked through a period-end close and had someone flag it. Once you know, you know. And that knowledge sits in a filter clause that makes no sense to anyone reading the query cold.

Business definitions that are more contested than they appear. "Revenue" is a three-letter word that means at least four different things depending on which team you ask. The version of revenue in your query reflects a negotiation that happened — maybe explicitly in a meeting, maybe implicitly over months of iteration — and the result is encoded in logic that looks arbitrary unless you were in the room.

Organizational context that shapes what the data means. Which cost centers should be excluded from a departmental rollup? Which vendors are actually subsidiaries that need to be treated differently? Which accounts were deprecated but still receive legacy postings? This isn't in the schema. It's in people's heads, and eventually — if you're doing your job well — it ends up in your SQL.

The silent rules nobody wrote down. Every data environment has them. The report that's "correct" only if you run it after a certain batch job completes. The table that looks authoritative but is actually superseded by another table for anything after a certain date. The fact that one business unit's fiscal year doesn't align with the rest of the company's. These things get discovered, not documented.

Why AI Makes This More Valuable, Not Less

Here's the counterintuitive thing: the rise of AI-assisted SQL generation increases the value of embedded business knowledge rather than diminishing it.

When generating SQL becomes cheap and fast, the bottleneck shifts entirely to knowing what to generate — and whether the output is actually correct for your context, not just syntactically valid. An AI can produce a query that runs cleanly and returns results with complete confidence, while completely missing the MIGRATED_LEGACY filter that makes those results meaningful.

The practitioner who can look at that output and immediately say "this is wrong, here's why, here's the fix" — that's not a SQL skill. That's a business knowledge skill that happens to be expressed in SQL.

This is the moat. And it compounds in a way that pure technical skill doesn't. Every quarter you work closely with a finance team, an operations team, a clinical team — you accumulate more of it. The query you write in year three is qualitatively different from the query you wrote in year one, not because your syntax improved, but because you now know things about that business that took three years of proximity to learn.

What This Means in Practice

A few implications worth sitting with:

Documentation is more important than it's ever been. If your institutional knowledge lives only in undocumented queries, you have a moat — but it's also a single point of failure. The goal should be to externalize it: comments in the SQL, data dictionaries, runbooks, onboarding guides. The moat should belong to the team, not just to the person who's been there longest.

Business fluency is the career differentiator. If you're early in a data career and thinking about where to invest your learning time, spend as much time understanding the business you support as you do learning technical tools. The tools change. Business context compounds.

"Just write a query" is a misleading ask. When a stakeholder asks for a simple number, the complexity isn't in retrieving it — it's in knowing which definition of that number they actually need, which edge cases to exclude, and whether the source data is trustworthy for this particular use case. That's the real work.

SQL isn't going anywhere. It's too good at what it does, too deeply embedded in too many systems, and now too accessible to too many people for any single alternative to displace it. Its ubiquity is exactly what makes the knowledge embedded in it so durable.

The syntax is a solved problem. The business logic is the work.

And the business logic, it turns out, takes years to write.