Press "Enter" to skip to content

Curated SQL Posts

Diskless Topics in Apache Kafka

Paul Brebner extends a metaphor:

I’ve been tracking the progress of Apache Kafka “Diskless Topics” for a while now. It’s a topic that sparks curiosity—mostly because the name itself sounds like an oxymoron. How can a topic be diskless? Where does the data go? 

With the recent voting on KIP-1150, I decided it was time to dive deep into the architectural changes. There are several related Kafka Improvement Proposals (KIPs) floating around, but KIP-1150 is dependent on KIP-1163 and KIP-1164, and the designs are still in flux. Consider this blog post a “theory” in the true scientific sense: a best-guess model based on current evidence that will almost certainly evolve. 

Click through for your moment of zen.

Leave a Comment

Creating Fabric Linked Service Parameters for ADO Deployment

Koen Verbeeck glues together several technologies:

Quite the title, so let me set the stage first. You have an Azure Data Factory instance (or Azure Synapse Pipelines) and you have a couple of linked services that point to Fabric artifacts such as a lakehouse or a warehouse. You want to deploy your ADF instance with an Azure Devops build/release pipeline to another environment (e.g. acceptance or production) and this means the linked services need to change as well because in those environments the lakehouse or warehouse are in a different workspace (and also have different object Ids).

When you want to deploy ADF, you typically use the ARM template that ADF automatically creates when you publish (when your instance is linked with a git repo). More information about this setup can be found in the documentation. To parameterize certain properties of a linked service, you can use custom parameterization of the ARM template. Anyway, long story short, I tried to parameterize the properties of the Fabric linked service. 

Read on to see how that went, as well as what you need to do to solve this issue.

Leave a Comment

Scan Types in PostgreSQL

Warda Bibi lays out four classes of scan in PostgreSQL:

To understand how PostgreSQL scans data, we first need to understand how PostgreSQL stores it.

  • A table is stored as a collection of 8KB pages (by default) on disk.
  • Each page has a header, an array of item pointers (also called line pointers), and the actual tuple data growing from the bottom up.
  • Each tuple has its own header containing visibility info: xmin, xmax, cmin/cmax, and infomask bits.

There are different ways PostgreSQL can read data from disk. Depending on the query and available indexes, it can choose from several scan strategies:

  1. Sequential Scan 
  2. Index Scan
  3. Index-Only Scan
  4. Bitmap Index Scan

Read on for a description of those types, as well as when it makes sense for the database engine to select a particular scan type.

Leave a Comment

Alerting People in Microsoft Teams from Data Factory Pipelines

Andy Brownsword sends a message:

Whether running Data Factory, Synapse, or Fabric pipelines, things go wrong – and the de facto response is to send an email. We’ve looked at sending emails from pipelines before, but at scale they can become noise and are easy to ignore.

A more effective option is to surface alerts where collaboration already exists, such as Teams.

In this post we’re going to start looking at using Teams and consolidate notifications into a channel. This functionality gives team members visibility, the ability to update in threads, and the option to tag people for a tighter response loop than typical emails bring.

Click through for the process.

Leave a Comment

Where the Buck Stops

Louis Davidson talks slop:

I loathe the phrase AI Slop. I have said it before, I don’t like the phrase because it is generally attributed to some content that a person has posted. I blame the poster, not the generator. We all use AI these days, just like they used tractors to farm, computers to do accounting work, and CGI to produce movies. These are all tools.

But when I sign my name to something, it is really and truly mine. In this blog, I will discuss this and more. So as the title says, don’t blame AI, Google, a person’s teachers in grade school, nope. Blame the person who said, “This is good enough to put out in my name”, or in other words, the person in the byline. For this post and video, that is Louis Davidson.

I understand where Louis is going with this and it’s fair. When you publish something, the person ultimately responsible looks suspiciously like the picture on your driver’s license. But I think it can serve as a useful descriptive term for a category of garbage output without removing agency from the perpetrator.

Leave a Comment

Performance Studio

Erik Darling has a new free tool:

Stop clicking through SSMS execution plans like it’s 2005.
Performance Studio is a free, open-source plan analyzer that tells you what’s wrong,
where it’s wrong, and how bad it is — from the command line, a desktop GUI,
an SSMS extension, or an AI assistant.

Built by someone who has stared at more execution plans than any reasonable person should.

Click through for some of its capabilities, as well as how to get your hands on a copy.

Leave a Comment

The Challenge of using Questions as Slide Titles

Simon Rowe explains a challenge:

The importance of an effective slide title cannot be overstated. Positioned in prime real estate at the top of the page, it is often where an audience’s eyes will land first. With that in mind, it is worth investing time to craft a title that introduces the content below and establishes a clear purpose. Too often, this valuable space is used for purely descriptive statements. Let’s look at an example.

Read on to see one example, showing how the change of titles and a bit of thought around the use of color as an identifying feature can make a big difference for viewers.

Leave a Comment

Permanently Empty Statistics

Guy Glantser takes us through an edge case:

Many SQL Server DBAs rely on automated statistics maintenance solutions such as Ola Hallengren’s maintenance scripts. These scripts typically update statistics only when the modification counter exceeds a threshold.

But there is a corner case that can cause statistics to remain empty forever, and many DBAs are not aware of it.

Read on to see how you can end up with no statistics at all on a table.

Leave a Comment

The State of Vector Indexes in SQL Server 2025

Rebecca Lewis separates marketing hype from reality:

Microsoft’s entire marketing pitch for SQL Server 2025 is ‘the AI-ready database.’ It went GA on November 18, 2025. We are now four months in. Here is what is actually GA, what is still behind a preview flag, and what that means if you are evaluating this for production.

Read on for a list, as well as a summary of Erik Darling’s great work on the topic.

My take on this is that vector indexes are where columnstore indexes were in SQL Server 2012: a neat idea, but not ready for prime time. It took until 2016 before columnstore indexes were actually worthwhile (primarily, the introduction of clustered columnstore indexes and ability to rebuild indexes), so we’ll see if it takes as long for vector indexes to get all of the necessary functionality.

1 Comment

What’s New in SSIS 2025

Koen Verbeeck actually gets an article’s length out of this:

There’s a new version of SQL Server released and we’re mainly an on-premises SQL Server shop. We’ve been using Integration Services (SSIS) for years now for all our ETL and data integration needs. With Microsoft’s focus on cloud (Azure and Fabric), does it make sense to upgrade our SSIS packages? Are there any new features?

Click through for the answer, though “stuff that’s gone away” or “stuff that you have to change because of drivers” make up almost 100% of this.

Leave a Comment