Press "Enter" to skip to content

Curated SQL Posts

Data Type Precedence in SQL Server

Louis Davidson has a type:

There is one topic in query and equation writing that is constantly a minor issue for SQL programmers: implicit data type conversions. Whenever you don’t specifically state the datatype of an expression, like when you write SELECT 1;, it can feel a bit of a mystery what the datatypes of your literal values are. Like in this case, what is 1 ? You probably know from experience that this is an integer, but then what happens when you compare CAST(1 as bit) to the literal 1. Is that literal 1 now a bit? Or is it still an integer?

Perhaps even more importantly, why does this query succeed?

Click through to learn more.

Leave a Comment

Porting Statistics in PostgreSQL

Radim Marek imports production statistics:

In the previous article we covered how the PostgreSQL planner reads pg_class and pg_statistic to estimate row counts, choose join strategies, and decide whether an index scan is worth it. The message was clear: when statistics are wrong, everything else goes with it.

But there was one thing we didn’t talk about. Statistics are specific to the database cluster that generated them. The primary way to populate them is `ANALYZE` which requires the actual data.

Click through to see how Postgres handles this. It’s quite similar to SQL Server’s DBCC CLONEDATABASE in practice, it seems.

Leave a Comment

Avoid Hard-Coding Linked Server Names

Greg Low provides some good advice:

I’m not a great fan of linked servers in SQL Server but they are often necessary. If I’m working with the latest version of SQL Server, I really prefer to use External Data Sources and External Tables. But not everyone is on the latest version. In the meantime, what I see all the time, is people hardcoding server names like this:

SDUPROD2022.WWIDB.Payroll.Employees

That makes your code really hard to manage.

Read on for several options. At a prior company quite a while ago, we went with DNS entries and they worked reasonably well.

Leave a Comment

Preview-Only Steps in Microsoft Fabric Dataflows

Chris Webb covers a new feature:

I have been spending a lot of time recently investigating the new performance-related features that have rolled out in Fabric Dataflows over the last few months, so expect a lot of blog posts on this subject in the near future. Probably my favourite of these features is Preview-Only steps: they make such a big difference to my quality of life as a Dataflows developer.

The basic idea (which you can read about in the very detailed docs here) is that you can add steps to a query inside a Dataflow that are only executed when you are editing the query and looking at data in the preview pane; when the Dataflow is refreshed these steps are ignored. This means you can do things like add filters, remove columns or summarise data while you’re editing the Dataflow in order to make the performance of the editor faster or debug data problems. It’s all very straightforward and works well.

First up, that feature is pretty interesting, though I could see things break if you only do your testing in the preview pane. Second, what Chris does with this is quite interesting.

Leave a Comment

Troubleshooting Bad Request in ADF Pipelines

Koen Verbeeck said something bad:

A while ago I blogged about a use case where a pipeline fails during debugging with a BadRequest error, even though it validates successfully. If you’re wondering, this is the helpful error message that you get:

Click through for an image of the 400 Bad Request message, how Koen fixed it originally, and then a different scenario in which that 400 message popped up.

Ultimately, a 400 Bad Request comes down to “You sent me information that doesn’t make sense and I can’t fulfill your request, so fix it, dummy.” 400 status codes are very rude and insulting. Especially 418–that thing has a mouth like a sailor’s.

Leave a Comment

Diskless Topics in Apache Kafka

Paul Brebner extends a metaphor:

I’ve been tracking the progress of Apache Kafka “Diskless Topics” for a while now. It’s a topic that sparks curiosity—mostly because the name itself sounds like an oxymoron. How can a topic be diskless? Where does the data go? 

With the recent voting on KIP-1150, I decided it was time to dive deep into the architectural changes. There are several related Kafka Improvement Proposals (KIPs) floating around, but KIP-1150 is dependent on KIP-1163 and KIP-1164, and the designs are still in flux. Consider this blog post a “theory” in the true scientific sense: a best-guess model based on current evidence that will almost certainly evolve. 

Click through for your moment of zen.

Leave a Comment

Creating Fabric Linked Service Parameters for ADO Deployment

Koen Verbeeck glues together several technologies:

Quite the title, so let me set the stage first. You have an Azure Data Factory instance (or Azure Synapse Pipelines) and you have a couple of linked services that point to Fabric artifacts such as a lakehouse or a warehouse. You want to deploy your ADF instance with an Azure Devops build/release pipeline to another environment (e.g. acceptance or production) and this means the linked services need to change as well because in those environments the lakehouse or warehouse are in a different workspace (and also have different object Ids).

When you want to deploy ADF, you typically use the ARM template that ADF automatically creates when you publish (when your instance is linked with a git repo). More information about this setup can be found in the documentation. To parameterize certain properties of a linked service, you can use custom parameterization of the ARM template. Anyway, long story short, I tried to parameterize the properties of the Fabric linked service. 

Read on to see how that went, as well as what you need to do to solve this issue.

Leave a Comment

Scan Types in PostgreSQL

Warda Bibi lays out four classes of scan in PostgreSQL:

To understand how PostgreSQL scans data, we first need to understand how PostgreSQL stores it.

  • A table is stored as a collection of 8KB pages (by default) on disk.
  • Each page has a header, an array of item pointers (also called line pointers), and the actual tuple data growing from the bottom up.
  • Each tuple has its own header containing visibility info: xmin, xmax, cmin/cmax, and infomask bits.

There are different ways PostgreSQL can read data from disk. Depending on the query and available indexes, it can choose from several scan strategies:

  1. Sequential Scan 
  2. Index Scan
  3. Index-Only Scan
  4. Bitmap Index Scan

Read on for a description of those types, as well as when it makes sense for the database engine to select a particular scan type.

Leave a Comment

Alerting People in Microsoft Teams from Data Factory Pipelines

Andy Brownsword sends a message:

Whether running Data Factory, Synapse, or Fabric pipelines, things go wrong – and the de facto response is to send an email. We’ve looked at sending emails from pipelines before, but at scale they can become noise and are easy to ignore.

A more effective option is to surface alerts where collaboration already exists, such as Teams.

In this post we’re going to start looking at using Teams and consolidate notifications into a channel. This functionality gives team members visibility, the ability to update in threads, and the option to tag people for a tighter response loop than typical emails bring.

Click through for the process.

Leave a Comment