Press "Enter" to skip to content

Curated SQL Posts

What’s New in Cassandra 6

Mariah McLaughlin lays out some of the new features in the latest version of Cassandra:

Accord is a general-purpose transaction framework that uses a leaderless consensus protocol to have highly available transactions and is used in Cassandra 6. The goal is broader transactional support across multiple keys, with strict serializable isolation and without a central bottleneck.

This matters because multi-key consistency is hard to handle cleanly in application code. Once a workflow spans more than one partition, the application often ends up doing coordination work that really belongs in the database.

Accord enables ACID behavior on transactional tables, which lets developers coordinate multi-step, multi-partition changes with stronger correctness guarantees, reducing the amount of custom consistency logic they have to build in the application.

Click through for more information on this, as well as a few other significant features.

Leave a Comment

Scoring the Quality of Binary Classification with SQL Server

Sebastiao Pereira quantifies a result:

Machine Learning (ML) is a way of teaching computers to learn from data instead of being explicitly programmed. Performance metrics are essential tools for understanding how well a model actually works. They tell you not just how accurate the model is, but how reliablefair, and useful it will be in real-world applications. In other words, without them, machine learning would be a trial-and-error guesswork.

Binary classification is when each sample is labeled as one of two mutually exclusive classes, referenced to a categorization, like positive or negative.

How do you implement the binary classification performance metric in SQL Server without using external tools?

Click through for a series of metrics to determine how well a binary classification process performed. This post doesn’t include details on how to perform the classification, just what to do once you have the results.

Leave a Comment

Eleven CVEs for PostgreSQL

Christophe Pettus takes us through just shy of a dozen security issue fixes:

PostgreSQL 18.4, 17.10, 16.14, 15.18, and 14.23 are out as of May 14, 2026. The release fixes eleven security issues and more than sixty bugs. That is not a typo. Eleven CVEs is the largest single-release security batch I can remember, and three of them are CVSS 8.8 with practical exploitation paths. Patch this week. If you can patch tomorrow, patch tomorrow.

Click through for a rundown.

Leave a Comment

Partitioning and Columnstore Indexes

Erik Darling puts together a great combination for a very large dataset:

 So, today we’re going to talk about partitioning in columnstore because there are important differences between partitioned columnstore tables and partitioned rowstore tables. One of the sort of superpowers that columnstore has is the ability to use metadata about which row groups have which data in them, and it can skip entire segments that do not contain relevant data.

I agree with Erik’s point that you do need around 500 million or so rows before this capability really shines, but if you do pick the right partition key, you get one of those rare and coveted performance improvements from partitioning.

Leave a Comment

When R^2 Misleads

Holger von Jouanne-Diedrich explains a common quality metric for regression analyses:

A high R^2 can make a regression model look impressively accurate — but this number can be deceptive. If you want to understand why a high R^2 is not always a sign of a good model, read on!

Click through for that explanation. This post does a fantastic job of explaining the technical reasons why a high R^2 might not be indicative of a good model specification. But I’d add one other piece to the puzzle: what constitutes a high R^2 will depend very much on the domain. For example, if you are performing a regression of some process in physics, an R^2 of 0.90 is probably so low as to indicate you’ve made a horrible mistake somewhere to have a number so low.

By contrast, an R^2 of 0.90 in the context of a social studies analysis would get you laughed out of the room for obviously faking the data or misunderstanding the specification to get a number that high.

Leave a Comment

Metadata-Driven Frameworks for Change Detection in Microsoft Fabric

Kevin Chant builds a table:

I had various options for this months contribution due to my experience with various change detection solutions. Including Azure Synapse Link for SQL Server 2022. Which I covered in previous posts. Including one that covered some excessive file tests for Azure Synapse Link for SQL Server 2022.

In the end I decided to cover developing metadata-driven frameworks for Microsoft Fabric. Due to the fact that it is such a hot topic for multiple reasons. One of which is the growing availability of open-source, metadata-driven frameworks for Microsoft Fabric.

Read on for three such frameworks and some advice on how to use them.

Leave a Comment

Creating Better Scatterplots

Ruben Van de Voorde embraces the second dimension:

Scatterplots are in a weird place in Power BI reports. They’re incredibly good at their core business: showing how two metrics relate across many things, like products, customers, or suppliers.

But they can miss the landing in a few ways. Sometimes the relationship itself matters but the chart asks the reader to do too much inference: “Why should I care about a product’s Gross Margin % vs. Shipping Weight?” Other times, the reader can’t tell what the dots actually are. A reader asking “What does one dot represent?” is the clearest tell, sometimes followed by “Can’t this be a table instead of these dots?”

Click through for musings about scatterplots, their bubble plot cousins, and what’s available in DAX and Power BI to make them work for you.

Leave a Comment

Semantic Sonar: Canary Testing Power BI Semantic Models

Jens Vestergaard has a new tool:

Quarterly sales season always brings the same ritual: a business user opens Power BI, clicks into the sales dashboard, and waits for the numbers to load. It has been a while since anyone looked at this particular report. The numbers look off. Someone hits refresh. The spinner turns, but the data does not budge. The last refresh was weeks ago.

That is when the emails start. “Can you check why the sales model hasn’t updated?” The support thread grows. IT investigates. The answer is usually not in the data. The refresh failed at 3 AM. Or the data source changed its connection string last Tuesday. Or someone revoked the service principal without telling anyone.

By the time the root cause is found, the report has been open in meetings for hours, and the numbers have been wrong for days.

Read on for a free tool that can prevent this sort of issue.

I think the biggest IT super-power is the ability consistently to troubleshoot a problem from limited information. The second-biggest super-power is the ability to discover issues before the people who rely on your services do. Then you get them fixed and they have no idea there was even a potential problem.

Leave a Comment

Tracking Data Changes

Louis Davidson shows a technique:

A few months ago, I wrote a post about comparing sets of data in SQL, This focussed on a type of challenge that is often a one off challenge. The techniques lists (along with a tool like Redgate’s SQL Data Compare aren’t the point of this post, but they are related because when you do change detection from a source, it is super important to check your results occasionally. Having a copy of the complete source to compare to your destination (even if it is just checksums of the data,) is important.

The techniques that Meagan is asking about this month are more ETL related, where you check a stream of data and sync them as changes are made. Typically, you don’t want to compare all the rows in a set, but just the ones that have changed. At some frequency, give me the changes to one set of data and keep it up to date with another.

I’m still partial to using HASHBYTES() on the fields I care about because I don’t trust modified dates unless I know the only way to access that table is via stored procedure, and all of the stored procedures handle updating the modified date correctly.

For row comparisons, I’d also look at EXCEPT for overall row comparison. You do need to do it in both directions, so the pattern is more like A EXCEPT B UNION ALL B EXCEPT A but it works great and natively handles any NULLs along the way.

Leave a Comment

Lessons Learned from Change Data Capture

Deborah Melkin shares some lessons from working with Change Data Capture:

This has definitely been something that is I’ve had some experience with recently. It inspired my “Change Tracking in SQL Server 2025: Exploring Change Event Streaming vs CDC” session that I presented at SQLCON 26. We had been having problems troubleshooting various issues around Change Data Capture (CDC) and someone suggested that I take look at Change Event Streaming, which is new for SQL Server 2025. One of the great things about putting together sessions like this is that I was able to create a very simple POC to understand it all works. It was also helpful for me to understand some key takeaways with both of these.

The Change Event Streaming lessons are fairly limited (for good reason) but Deb shares some nice tips on working with CDC.

Leave a Comment