Press "Enter" to skip to content

Curated SQL Posts

Probabilistic Time Series Cross-Validation in R

Thierry Moudiki checks an interval:

A previous post introduced the crossvalidation package for R. This time, the focus is on probabilistic forecasting — evaluating not just how accurate point forecasts are, but how well-calibrated prediction intervals are, using empirical coverage rates and Winkler scores – and crossvalidation.

Click through for the code and not much additional commentary. H/T R-Bloggers.

Leave a Comment

Cross-Database Access after Tampering with Indexed View Metadata

Fabiano Amorim describes a security concern:

This article describes a restore-boundary weakness involving indexed views. An attacker prepares a database backup on an attacker-controlled instance, tampers with the persisted definition of an indexed view, and delivers that database through an otherwise ordinary backup-and-restore workflow.

After the restore, SQL Server evaluates the preserved metadata during indexed-view optimizer-driven execution. Data from databases the attacker cannot directly query may still be pulled into the attacker’s own restored database through trusted internal processing paths. This is a clear cross-database confidentiality problem.

It’s an interesting post. The scope of damage is somewhat limited considering that the attacker would need legitimate permissions to the instance, but something to keep in mind nonetheless.

Leave a Comment

Fuzzy Matching in SQL Server 2025

John Deardurff takes a look at a new capability in SQL Server 2025:

Data rarely arrives in perfect condition. Typos, regional spelling differences, and inconsistent formats make exact matching unreliable in real-world scenarios. That’s where fuzzy matching comes in; and SQL Server 2025 introduces powerful built-in functions to handle it directly in T-SQL.

None of the functions are particularly novel, but it is nice to have them directly available in SQL Server, especially because Integration Services (where some of this functionality lived) has been on life support for a decade.

Leave a Comment

ForEach Loops in Powershell

Garry Bargsley continues a series on Powershell capabilities:

Welcome back to PowerShell Strikes Back. We’re three weeks in, and the training is paying off. In Week 1, we learned that quotes are not interchangeable. In Week 2, we put variables to work – storing server names, config values, service objects, and boolean results. If you’ve been following along and running the examples in your own environment, you’re already writing better PowerShell than you were a month ago.

This week, we tackle the concept that transforms a script from a one-time operation into an actual tool: the ForEach loop.

Garry also ties in error handling, which is important during loop iteration.

Leave a Comment

Performing ELT with Python and DuckDB

Jamal Hansen shows off a capable in-memory analytic database:

This is a real-world example of a common data engineering pattern. You may have heard of ETL (Extract, Transform, Load), where data is transformed before it reaches its destination. What we are actually building today is the more modern variant, ELT: Extract, Load, Transform.

Read on for the process. I like DuckDB a lot and this is one of the types of use cases in which it excels.

Leave a Comment

T-SQL Tuesday 198 Round-Up

Meagan Longoria wraps up another T-SQL Tuesday:

Thank you to everyone who participated in T-SQL Tuesday #198! When I wrote the invitation post, I intentionally kept the prompt broad because change detection looks different depending on your source system, your infrastructure, your data volumes, and what you need to do with the changes once you have them. The responses covered SQL Server internals, Microsoft Fabric and Synapse, hashing strategies, metadata-driven frameworks, and Synapse workspace diffing with Python. Here’s a summary of each contribution.

Read on for links to eight responses.

Leave a Comment

Migrating testthat to testit

Yihui Xie explains how to switch test frameworks in R:

Back in 2013, I wrote about testing R packages when I first released testit. Thirteen years later, I still believe that unit testing should be nothing more than “tell me if something unexpected happened.” Recently I converted a large testthat test suite to testit, and I thought I’d share a practical guide for anyone considering the same move.

Click through for that guide.

Leave a Comment

PostgreSQL Removing MD5 Hashing for Authentication

Lukas Vileikis covers the consequences:

In late 2024, a message by Nathan Bossart hit the database spotlight. Within it, he proposed a “multi-year, incremental approach to remove MD5 password support from PostgreSQL.”

Before we dive in completely, let’s establish one important thing first: what exactly is MD5?

One thing I strongly disagree with: Lukas’s comment that “A decade or so ago, when computing power was far smaller than it is now, MD5 was considered an ‘okay’ hashing mechanism.” There were MD5 rainbow tables readily available 15 years ago and people already realized MD5 was not good for password hashing, even with a salt. To the extent that these platform vendors thought it was “okay” a decade ago, they were already way out of date.

Leave a Comment

Recovering from a Full Transaction Log File

Jeff Iannucci sneaks in a fix:

We received an emergency call from a client that noted that their SQL Server instances was unresponsive. (This was an Amazon RDS instance, although that didn’t play much into the ultimate root cause.) The client had some technical staff already looking at the issue, and when we joined the call we were informed that the transaction log for their main production database was completely full, and all transactional activity in the database had stopped.

Read on to see how Jeff and team were able to fix it.

Leave a Comment

Comparing Postgres Kubernetes Operators: CloudNativePG vs Crunchy PGO

Gabriele Bartolini makes a comparison:

For years, I resisted writing a direct comparison between CloudNativePG and Crunchy PGO. It felt like the wrong kind of article to write from where I sit. But after several years of both projects maturing, and particularly since Crunchy Data was acquired by Snowflake, I have been asked with increasing frequency how the two operators compare. I now think the time is right. Last week, I wrote Recipe 24 to answer the practical question of how to migrate. This post attempts something harder: an honest assessment of why the two operators differ and what those differences mean for teams choosing a long-term platform for PostgreSQL on Kubernetes.

I will acknowledge Crunchy’s legacy, explain the architectural choices that I believe make CloudNativePG the stronger foundation, point to data where it exists, and flag the areas where my view is unavoidably subjective. I will not pretend this is a neutral document.

Gabriele is a maintainer on CloudNativePG, so take that into consideration. I do appreciate the upfront statement of bias and think the post is well worth reading.

Leave a Comment