Press "Enter" to skip to content

Curated SQL Posts

Migrating testthat to testit

Yihui Xie explains how to switch test frameworks in R:

Back in 2013, I wrote about testing R packages when I first released testit. Thirteen years later, I still believe that unit testing should be nothing more than “tell me if something unexpected happened.” Recently I converted a large testthat test suite to testit, and I thought I’d share a practical guide for anyone considering the same move.

Click through for that guide.

Leave a Comment

PostgreSQL Removing MD5 Hashing for Authentication

Lukas Vileikis covers the consequences:

In late 2024, a message by Nathan Bossart hit the database spotlight. Within it, he proposed a “multi-year, incremental approach to remove MD5 password support from PostgreSQL.”

Before we dive in completely, let’s establish one important thing first: what exactly is MD5?

One thing I strongly disagree with: Lukas’s comment that “A decade or so ago, when computing power was far smaller than it is now, MD5 was considered an ‘okay’ hashing mechanism.” There were MD5 rainbow tables readily available 15 years ago and people already realized MD5 was not good for password hashing, even with a salt. To the extent that these platform vendors thought it was “okay” a decade ago, they were already way out of date.

Leave a Comment

Recovering from a Full Transaction Log File

Jeff Iannucci sneaks in a fix:

We received an emergency call from a client that noted that their SQL Server instances was unresponsive. (This was an Amazon RDS instance, although that didn’t play much into the ultimate root cause.) The client had some technical staff already looking at the issue, and when we joined the call we were informed that the transaction log for their main production database was completely full, and all transactional activity in the database had stopped.

Read on to see how Jeff and team were able to fix it.

Leave a Comment

Comparing Postgres Kubernetes Operators: CloudNativePG vs Crunchy PGO

Gabriele Bartolini makes a comparison:

For years, I resisted writing a direct comparison between CloudNativePG and Crunchy PGO. It felt like the wrong kind of article to write from where I sit. But after several years of both projects maturing, and particularly since Crunchy Data was acquired by Snowflake, I have been asked with increasing frequency how the two operators compare. I now think the time is right. Last week, I wrote Recipe 24 to answer the practical question of how to migrate. This post attempts something harder: an honest assessment of why the two operators differ and what those differences mean for teams choosing a long-term platform for PostgreSQL on Kubernetes.

I will acknowledge Crunchy’s legacy, explain the architectural choices that I believe make CloudNativePG the stronger foundation, point to data where it exists, and flag the areas where my view is unavoidably subjective. I will not pretend this is a neutral document.

Gabriele is a maintainer on CloudNativePG, so take that into consideration. I do appreciate the upfront statement of bias and think the post is well worth reading.

Leave a Comment

A Primer on Partitioned Views

Erik Darling talks about an old-style way of partitioning in SQL Server:

Erik Darling here with Darling Data. And we’re going to finish off this Friday by talking about partitioned views. And look, there are a lot of things I could say about partitioned views that are great and grand and that have come in handy for me over the years in ways that I’m like, wow, thank you partitioned views. Thank you for not being normal table partitioning. Thank you for existing. 

Read on to see how they work, how you can write into them, things that might prevent you from writing into partitioned views directly, and why you probably don’t want writable partitioned views anyhow.

Leave a Comment

Exceeding the Capacity Limit for Power BI Dataset Refreshes

Chris Webb explains an error:

If you have a lot of Power BI semantic models that are scheduled to refresh at the same time in the Service then you may find that some of them fail with the following error:

You’ve exceeded the capacity limit for dataset refreshes. Try again when fewer datasets are being processed.

[Note: “dataset” is the old name for a Power BI semantic model. Someone should update the error message.]

Read on to see what can cause this error and what you can do about it.

Leave a Comment

What’s New in Cassandra 6

Mariah McLaughlin lays out some of the new features in the latest version of Cassandra:

Accord is a general-purpose transaction framework that uses a leaderless consensus protocol to have highly available transactions and is used in Cassandra 6. The goal is broader transactional support across multiple keys, with strict serializable isolation and without a central bottleneck.

This matters because multi-key consistency is hard to handle cleanly in application code. Once a workflow spans more than one partition, the application often ends up doing coordination work that really belongs in the database.

Accord enables ACID behavior on transactional tables, which lets developers coordinate multi-step, multi-partition changes with stronger correctness guarantees, reducing the amount of custom consistency logic they have to build in the application.

Click through for more information on this, as well as a few other significant features.

Leave a Comment

Scoring the Quality of Binary Classification with SQL Server

Sebastiao Pereira quantifies a result:

Machine Learning (ML) is a way of teaching computers to learn from data instead of being explicitly programmed. Performance metrics are essential tools for understanding how well a model actually works. They tell you not just how accurate the model is, but how reliablefair, and useful it will be in real-world applications. In other words, without them, machine learning would be a trial-and-error guesswork.

Binary classification is when each sample is labeled as one of two mutually exclusive classes, referenced to a categorization, like positive or negative.

How do you implement the binary classification performance metric in SQL Server without using external tools?

Click through for a series of metrics to determine how well a binary classification process performed. This post doesn’t include details on how to perform the classification, just what to do once you have the results.

Leave a Comment

Eleven CVEs for PostgreSQL

Christophe Pettus takes us through just shy of a dozen security issue fixes:

PostgreSQL 18.4, 17.10, 16.14, 15.18, and 14.23 are out as of May 14, 2026. The release fixes eleven security issues and more than sixty bugs. That is not a typo. Eleven CVEs is the largest single-release security batch I can remember, and three of them are CVSS 8.8 with practical exploitation paths. Patch this week. If you can patch tomorrow, patch tomorrow.

Click through for a rundown.

Leave a Comment

Partitioning and Columnstore Indexes

Erik Darling puts together a great combination for a very large dataset:

 So, today we’re going to talk about partitioning in columnstore because there are important differences between partitioned columnstore tables and partitioned rowstore tables. One of the sort of superpowers that columnstore has is the ability to use metadata about which row groups have which data in them, and it can skip entire segments that do not contain relevant data.

I agree with Erik’s point that you do need around 500 million or so rows before this capability really shines, but if you do pick the right partition key, you get one of those rare and coveted performance improvements from partitioning.

Leave a Comment