Press "Enter" to skip to content

Curated SQL Posts

Building a Lakehouse Table Health Pipeline

Gilbert Quevauvilliers builds a pipeline:

In this post I will show how I used GitHub Copilot CLI / Agent mode in VS Code to create a Microsoft Fabric pipeline that checks Lakehouse table health and only optimizes tables that require maintenance. I’ll also show the prompts I used, the issues I ran into, and how Copilot helped me resolve them.

Recently Microsoft announced Lakehouse table health. The post showed how you can check can check the health of your Lakehouse table in a very simple way.

Read on to see what this feature can do, as well as how Gilbert was able to gin up (with the help of GitHub Copilot) a pipeline to optimize lakehouse tables. What I appreciate is that Gilbert also included the errors in the process, rather than making it seem like everything with perfectly the first time around.

Leave a Comment

SSMS Tools in VS Code

Denny Cherry has a new VS Code extension:

Are you a user of VS Code, but you wish that the MSSQL extension had more of the SQL Server Management Studio features? Good news, there’s now a VS Code Plugin that gives you those capabilities in VS Code by using the “SSMS Tools for SQL Server” extension.

Click through to see what it includes. There’s a pretty decent amount of existing SSMS functionality in this extension, especially on the database administrator side, where Azure Data Studio and VSCode historically have not focused.

Leave a Comment

Error 1408 with Availability Group Automatic Seeding

Sean Gallardy troubleshoots an error:

This used to be a very hot topic around SQL Server 2016 when automatic seeding first came out, then everyone learned how automatic seeding was just manual seeding but done for you and the topic went away. A decade later, it’s somehow back to being a hot topic. I guess all that is old is new again.

Error 1408 states that the database doesn’t have enough log to find a starting point with the primary/principal – which, let’s just use AG terminology from this point on because mirroring has been dead for 14 years and counting.

Mirroring has been dead and yet it lives on in undead form. Regardless of that, Sean provides some good information around what automatic seeding actually means and what you can do if you get stuck with Error 1408.

Leave a Comment

An Introduction to Spindle

Jonathan Stewart has an open-source tool:

A client wanted to see a feature that I had previously built for another customer. I obviously couldn’t just use real client data. When I used Faker and other tools like that, I got data that looked fine on its own but fell apart when I needed relational integrity such as needing an order to point to a customer which points to an address. AdventureWorks, Contoso, WideWorldImporters and the like are great for what they are, but they are either too small, too clean or not relevant to the demo.

I needed data that would not only look the part for the subject, but also follow relationships, have proper distributions, and so on.

So I built Spindle.

I’ve used Spindle for work purposes. It’s a good product for generating artificial data, either from one of the default data domains (e.g., HR, insurance, health care, manufacturing) or based on an existing database schema.

Leave a Comment

Too Many Tables and PostgreSQL

Laurenz Albe uncovers an oddity:

Recently, I helped a customer investigate database problems. It turned out that these problems could be traced back to too many tables in the database. Since this may come as a surprise to many users, I thought it worth the while to write about it.

Click through to see what was happening, as well as a demonstration of the problem. Granted, when you start talking about tens of thousands of tables, something has gone catastrophically wrong in your database architecture.

Leave a Comment

Share Groups and Sub-Optimal Performance

Jack Vanlightly creates a problem:

In this post we’re going to see how share.acquire.mode=record_limit combined with:

  • fewer consumers than partitions
  • and various cases of “partition skew”

…can result in subpar performance with share groups. 

I stumbled on these issues when running large sets of dimensional tests with Dimster’s explore-limits mode, which finds the highest sustainable throughput while staying within a target end-to-end latency target. There was a specific subset of the tests that explore-limits mode would consistently fail to complete, and they all happened to be with record_limit and a consumer count lower than the partition count. In this test, we’ll understand why Dimster had such a hard time with this combination.

Click through for the details, as well as how to mitigate this sort of scenario.

Leave a Comment

Saving Unity Catalog Tables in Microsoft OneLake

Gerhard Brueckl pushes boundaries:

Microsoft and Databricks recently announced the next step of their collaboration and integration. It is now possible to store Databricks Unity Catalog tables directly in Microsoft OneLake. Here are the official announcement from Microsoft: https://community.fabric.microsoft.com/t5/Fabric-Updates-Blog/Extending-interoperability-Azure-Databricks-can-now-store-Unity/ba-p/5199741

Both parties have been working together to make this possible: Microsoft introduced the new item type Azure Databricks Storage and Databricks added support for OneLake for Unity Catalog External Locations (which can then be used to store the actual data). The UC External Location would then simply point to the storage endpoint provided by the Azure Databricks Storage item in Microsoft Fabric.

Click through to see what Gerhard found, as well as the results of some experimentation.

Leave a Comment

A Use for Secondary XML Indexes

Hugo Kornelis follows up from an earlier post:

Welcome back to my plansplaining blog series, where I dive deep into the details of non-obvious execution plans. This part is also a sort of follow up on my post from two weeks ago, when I wrote about the structure and usage of XML indexes, and had to admit that I had not been able to come up with good use cases for all types of secondary XML index.

That very same day, I received an email from Mikael (Micke) Eriksson, who pointed me to this question and answer on Stack Exchange for Database Administrators. I then modified that example a bit, to come up with an execution plan that I consider interesting enough that I want to describe it here.

Click through for the example and a dive into what the plan does.

Leave a Comment

Fabric Warehouse CU Metering and Workloads

Nikola Ilic digs into the numbers:

If you are using Microsoft Fabric, you’ve probably received the same email notification as me:)

I’m talking about the one about Fabric Data Warehouse and SQL analytics endpoint metering.

Starting in August 2026, Warehouse CU consumption is moving away from the current per-query CPU-time model and toward a per-workspace, virtual-node time model.

Read on to see what will change and how that should affect the way you think about this infrastructure.

Leave a Comment

The Basics of Query Folding with Power BI

Andy Brownsword explains one performance improvement technique for Power BI data transformations:

As a database developer, when I started using Power BI, I was concerned about it retrieving reams of data only to perform transformations downstream. The Power Query editor misleads us into thinking the retrieval and transformations are applied sequentially.

Thanks to Query Folding, that’s not usually the case. And that gives us more power to extract performance from the database.

This only works in situations where there’s something downstream to perform that processing, like a relational database. And one of the areas where you can affect performance, either positively or negatively, is in organizing operations such that you have a stretch of foldable operations. That way, all of it can run as one operation in the database.

Leave a Comment