Press "Enter" to skip to content

Curated SQL Posts

Accelerated Database Recovery in tempdb for SQL Server 2025

Rebecca Lewis looks into a feature:

Two weeks ago I covered the Resource Governor changes in SQL Server 2025 — specifically, capping how much tempdb data space a workload group can consume. That was the data-file side. For the log side, SQL Server 2025 now lets you enable Accelerated Database Recovery (ADR) on tempdb. Enable it and cancelled queries stop grinding, the tempdb log stops bloating, and recovery gets faster. Sounds like an easy yes — but you’ve got to read the fine print.

Click through for that fine print.

Leave a Comment

A Gotcha with pg_column_size()

Laetitia Avrot wants to know how large a column is:

Thanks to my colleague Ozair, who sent me a JIRA ticket saying “I need to drop that huge column, what are the consequences?” My first question was: how huge? That’s when the rabbit hole opened.

It looks simple. It is simple. Just use the administrative function pg_column_size(). Until you have toasted attributes. Then it gets interesting.

Read on for a demonstration.

Leave a Comment

Shortcut Transformations now GA in Microsoft Fabric

Pernal Shah transforms some data:

Organizations today manage data across multiple storage systems, often in formats like CSV, Parquet, and JSON. While this data is readily available, turning it into analytics-ready tables typically requires building and maintaining complex ETL pipelines.

Shortcut transformations remove that complexity.

With Shortcut transformations, you can convert structured files referenced through OneLake shortcuts into Delta tables without building pipelines or writing code.

This currently works for CSV, Parquet, and JSON data and does cut out a very common step for raw-layer transformation.

Leave a Comment

When Multi-Column Statistics (Mostly) Aren’t

Brent Ozar digs into multi-column statistics:

The short answer: in the real world, only the first column works. When SQL Server needs data about the second column, it builds its own stats on that column instead (assuming they don’t already exist), and uses those two statistics together – but they’re not really correlated.

For the longer answer, let’s take a large version of the Stack Overflow database, create a two-column index on the Users table, and then view the resulting statistics:

Click through for a dive into row estimation when you have multiple columns in a WHERE clause and how statistics come into play.

Leave a Comment

TOP(1) with Ties

Andy Brownsword can’t stop at one:

Having TOP (1) return multiple rows feels wrong… but that’s what WITH TIES can do.

For a long time I used patterns like this to get the first record in a group:

Andy goes on to explain how WITH TIES works in T-SQL, shows an alternative to using a common table expression + window function to narrow down to the first logical group, and digs into when you might not want to use that alternative.

Leave a Comment

Highlighting Rows in DAX via Visual Calculations

Marco Russo and Alberto Ferrari point out a row:

When it comes to visuals, users may want to specific cells highlighted in order to spot important information quickly. While browsing the forums, we came across an interesting requirement that can easily be solved with a DAX measure: highlight an entire row based on the value in the last column of the visual only. In our example, we highlight Wide World Importers because it has the maximum value (71,904.98) in the last year (2026).

I’ve had a need for this several times in the past, so it’s nice to see you can do it via visual calculations.

So, let’s use this in Power BI Report Server. Oh, wait, you can’t. But if you’re not shackled to that train wreck, click through for a nice solution.

Leave a Comment

Goodhart’s Law and KPIs

Alexander Arvidsson talks metrics:

In 1974, the Soviet Union set production quotas for nails measured in tons.

Factories responded rationally. They produced a small number of enormous, completely useless nails. Problem solved. Quota met.

Moscow reconsidered and switched to quotas measured in number of nails.

Factories responded rationally. They produced millions of tiny, flimsy nails—too small to hold anything together.

The Soviet nail industry was, by any measurable standard, thriving. It was also producing almost nothing of value.

Click through for a detailed description of the problem and Alexander’s push for an answer. I typically summarize Goodhart’s Law in my own words as, any system can be gamed. I think Alexander’s tips on good measurement architecture are sound (and he even mentions gaming systems here), but even with all of those in place, you can still end up with a monkey’s paw of a KPI suite. Even the example of “ensure that users who log in are getting measurable value from the product, as evidenced by X” falls apart in the end because we now maximize for X while hollowing out Y and Z, because nobody’s watching those, as we’re focused on X-maximization. Like, say, “ensure that users who long in are getting measurable value from the product, as evidenced by average order value.” Now we maximize for AOV, streamlining the experience for people making large orders and relentlessly pushing more cart items at the user. The other points aim to mitigate the worst of these results, but it’s an off-the-cuff example of how no system is perfect. But some can still be superior.

Leave a Comment

Page-Level Caching in PostgreSQL with PAX

Laetitia Avrot has a two-parter on PAX. The first part is from a couple of months ago and sets the stage:

Picture this: you walk into a library. Each book is a database record. PostgreSQL’s traditional storage (NSM – N-ary Storage Model) stores complete books on each shelf: chapter 1, chapter 2, chapter 3, all bound together.

Here’s the problem that keeps me up at night: when you need only chapter 3 from 1,000 books, you must pull each complete book off the shelf, flip through chapters 1 and 2 that you’ll never read, grab chapter 3, and move to the next book.

You’re wasting time. You’re wasting energy. You’re wasting cache bandwidth.

But it’s not all roses:

PAX looks elegant on paper: minipages, cache locality, column-oriented access inside an 8KB page. But the moment you ask how this actually would work with Postgres, the complexity arrives fast. NULL values, variable-length types, MVCC, boundary shifts. Let’s go through it all.

To be clear, this is not a product (today). It’s thinking about how to apply the results of an academic paper to Postgres and all of the tricky challenges that can come from what seems to be an easy idea.

Leave a Comment

ANY_VALUE() in Fabric Data Warehouse

Jovan Popovic notes a feature going GA:

Fabric Data Warehouse now supports the ANY_VALUE() aggregate, making it easier to write readable, efficient T-SQL when you want to group by a key but still return descriptive columns that are functionally the same for every row in the group.

Right now, this is only available in the Fabric Data Warehouse, so no Azure SQL DB, Managed Instance, or box product support at this time.

Leave a Comment