Press "Enter" to skip to content

Curated SQL Posts

When Order Matters

The Rtask people tell a story:

You have inherited (or written) a data pipeline originally coded in SAS. It processes administrative billing records: matching line items against reference tables, applying time-varying coefficients, deduplicating based on business identifiers, computing running counters. Classic ETL work.

The migration to R goes well. You use {DBI} to open a DuckDB connection, load your source files as lazy tables via {arrow} or dplyr::tbl(), build the transformations with {dbplyr}, and collect the result at the very end. Your code is readable, your tests compare the R output to the SAS reference, and they pass (maybe using {datadiff}).

Then you run the pipeline again.

The numbers are different.

Give yourself 100 points if you answered “Because you need an ORDER BY clause” during the explanation. They also cover a few other places where DuckDB interactions in R can cause issues. Most of this is straightforward for data platform people, but can cause consternation for developers. H/T R-Bloggers.

Leave a Comment

Microsoft Fabric Mirroring and SQL Server 2025

Meagan Longoria takes a peek at mirroring in Microsoft Fabric:

Mirroring of SQL Server databases in Microsoft Fabric was first released in public preview in March 2024. Mirrored databases promise near-real-time replication without the need to manage and orchestrate pipelines, copy jobs, or notebooks. John Sterrett blogged about them last year here. But since that initial release, the mechanism under the hood has evolved significantly.

Read on to see how this behaves for versions of SQL Server prior to 2025, and how it changes in 2025.

Leave a Comment

Making Row-Level Security Faster

Brent Ozar speeds up some operations:

The official Azure SQL Dev’s Corner blog recently wrote about how to enable soft deletes in Azure SQL using row-level security, and it’s a nice, clean, short tutorial. I like posts like that because the feature is pretty cool and accomplishes a real business goal. It’s always tough deciding where to draw the line on how much to include in a blog post, so I forgive them for not including one vital caveat with this feature.

Click through for that caveat, as well as how you can mitigate its performance impact.

Leave a Comment

Testing Implicit Conversion and Performance in SQL Server

Louis Davidson runs some tests:

If you have ever done any performance tuning of queries in SQL Server, no doubt one of the first thing you have heard is that your search argument data types need to match the columns that you are querying. Not one thing in this blog is going to dispute that. Again, the BEST case is that if your column is an nvarchar, your search string matches that column datatype. But why is this? I will do my best to make this pretty clear, especially why it doesn’t always matter.

Read on as Louis lays out the explanation.

Leave a Comment

Major Security Update for SQL Server

Rebecca Lewis digs into an important patch:

Yesterday was Patch Tuesday, and this month we’ve got a good one. CVE-2026-21262 was already publicly disclosed before Microsoft shipped the fix – and it lets an authenticated SQL Server user escalate straight to sysadmin. SQL Server 2016 through 2025, Windows and Linux. No physical access required. No user interaction required. Just a valid login and a network path to your instance. Go patch!

If you’re a SQL Server DBA or consultant and you’re reading this before patching, stop reading and go patch.

Read on for more information about the vulnerability and how to make sure you’re on the latest CU or GDR for SQL Server.

Also, the fact that there are fixes going back to 2016 doesn’t mean that 2014 and earlier are fine. It just means that Microsoft is serious about not patching versions 10+ years out of date.

Leave a Comment

LOB Logical Reads and Columnstore Indexes

Brent Ozar notices a difference:

Forever now, FOREVER, it’s been a standard thing where I can say, “When you’re measuring storage performance during index and query tuning, you should always use logical reads, not physical reads, because logical reads are repeatable, and physical reads aren’t. Physical reads can change based on what’s in cache, what other queries are running at the time, your SQL Server edition, and whether you’re getting read-ahead reads. Logical reads just reflect exactly the number of pages read, no matter where the data came from (storage or cache), so as long as that number goes down, you’re doing a good job.”

But this is not always the case, as Brent demonstrates.

Leave a Comment

Dynamic Data Masking Isn’t Security

John Sterrett provides a reminder:

SQL Server Dynamic Data Masking (DDM) is one of those SQL Server features that is commonly misused as a primary security feature used in production. Since it landed in SQL Server 2016, I’ve seen teams throw it at compliance requirements and call it a day, only to find out later that their “masked” data was completely readable by anyone willing to spend 20 minutes in SSMS.

Let me show you some data masking limitations in SQL Server when used for security, and what you should be doing instead (or alongside it) if you’re serious about Zero Trust data security.

I think John is a bit kinder to the value of Dynamic Data Masking than I am. Even so, this article shows several major problems it brings to the fore.

Leave a Comment

Eventstream Not Sending Data to KQL Database after Resuming Fabric Capacity

Olivier Van Steenlandt troubleshoots an issue:

To continue the development of my mobile app, whose core ability is to scan barcodes of consumable articles and send them over for analytics, I’m resuming my capacity, starting to scan barcodes again, sending them to my Eventstream, and finally saving them in my KQL database.

After a couple of minutes, I wanted to validate all the scanned results in my KQL database and navigate to my scanned_barcode table.

Read on to see how Olivier diagnosed and corrected the problem.

Leave a Comment

Data Extraction from Unstructured Data with Fabric AI Functions

Sandeep Pawar demonstrates functionality:

Most enterprise data lives in free text – tickets, contracts, feedback, clinical notes, and more. It holds critical information but doesn’t fit into the structured tables that pipelines expect. Traditionally, extracting structure meant rule-based parsers that break with every format to change, or custom NLP models that take weeks to build. LLMs opened new possibilities, but on their own they bring inconsistent outputs, no type of enforcement, and results that vary between runs. What production workflows need is LLM intelligence with structured-output guarantees, delivered inside the data platform teams already use.

Microsoft Fabric AI Functions deliver exactly that. Functions like ai.summarize, ai.classify, ai.translate, and ai.extract let you transform and enrich unstructured data at scale with a single line of code – no model deployment or ML infrastructure needed. For the full list, see Transform and enrich data with AI functions.

Click through for an example. The tricky part of this is, because answers won’t be deterministic, you have to do a lot of testing and verification to ensure things are working reasonably well.

Leave a Comment