Press "Enter" to skip to content

Curated SQL Posts

Read Efficiency in PostgreSQL Queries

Michael Christofides explains what’s happening under the covers:

A lot of the time in database land, our queries are I/O constrained. As such, performance work often involves reducing the number of page reads. Indexes are a prime example, but they don’t solve every issue (a couple of which we’ll now explore).

The way Postgres handles consistency while serving concurrent queries is by maintaining multiple row versions in both the main part of a table (the “heap”) as well as in the indexes (docs). Old row versions take up space, at least until they are no longer needed, and the space can be reused. This extra space is commonly referred to as “bloat”. Below we’ll look into both heap bloat and index bloat, how they can affect query performance, and what you can do to both prevent and respond to issues.

Read on for a detailed explanation.

Leave a Comment

Using Database Properties to Assist Generative AI Solutions

Brent Ozar makes use of extended properties:

You can add database instructions as extended properties at the database or object level, and when Copilot works with those objects, it’ll read your instructions and use them to shape its advice.

For example, you can add a database-level property called a “constitution” with your company’s coding standards, like this:

Andy Brownsword has another example:

The new Database Instructions are text stored against database objects to add more context about the object and how it should be used. A simple example:

We find a Sales table with a Price column. Is that the price for a single unit or the line total? Does that include or exclude VAT? What about discounts?

This is where context is king, and Database Instructions allow us to annotate these details and remove the ambiguity.

Database properties are a criminally underused part of SQL Server—in part because there wasn’t great tooling around how to display or work with these properties—and if this forces people to be a bit thoughtful in design and after-the-fact documentation on database objects, so much the better.

Leave a Comment

Code Pages in PostgreSQL on Windows

Kellyn Gorman tells a story:

Running PostgreSQL on Windows feels deceptively simple for anyone with a Windows laptop that just wants a local database to test or demo on.  Just a few clicks and I’ve installed it, started the service, open psql and I’m up and running.
Except… not quite.
Because if you’ve ever seen this message:

Console code page set to 1252 for psql compatibility

You’ve stepped into one of the more subtle, frustrating challenges of running PostgreSQL on Windows. So, let’s talk about why this happens, what it is and why it matters more than you might think…

This has several downstream problems, as Kellyn points out. Read on to see how you can fix the issue.

Leave a Comment

When Warehouse Beats Lakehouse

Gilbert Quevauvilliers runs a test:

After my previous blog post on the different semantic model options and at the same time working with a Fabric customer, it got me thinking which is faster and which consumes less capacity when ingesting data into Power BI either via the SQL Endpoint to a Lakehouse or a query from the Warehouse.

Below you will find the information which I found very interesting indeed.

For both the Lakehouse and Warehouse source CSV’s there was a total of 237,245,585 rows.

Click through for the numbers, and a scenario in which the warehouse loads data faster than a lakehouse.

Leave a Comment

Load Testing Microsoft Fabric Redux

Reitse Eskens goes back to the well:

For those of you who have been following this blog for a long time, you may have read the posts on Fabric where I’m comparing the F64 Trial with the F2, and other shenanigans. Because Fabric keeps evolving, and new releases keep coming that improve or change the behaviour, I felt it was only fair to give Fabric a new run for its capacities.

The idea is not to create a solution that works as quickly as possible. It’s not the goal to tune Fabric, nor to get the most excitement for your Euro. The main goal of this blog (and the session that I’m presenting on this topic), is to show you the differences, the error messages and where to look when you get lost. Because, for all its intents and purposes, error handling is still tricky, and it seems to be very hard to get rid of “Something went wrong” messages.

I appreciate Reitse’s localization of the well-known phrase “most win for your Yen.”

Click through for plenty of graphs and lots of testing.

Leave a Comment

Measuring Page Load Times in Power BI

Chris Webb breaks out the stopwatch:

If you’re performance tuning a Power BI report the most important thing you need to measure – and the thing your users certainly care about most – is how long it takes for a report page to load. Yet this isn’t something that is available anywhere in Power BI Desktop or in the Service (though you can use browser dev tools to do this) and developers often concentrate on tuning just the individual DAX queries generated by the report instead. Usually that’s all you need to do but running multiple DAX queries concurrently can affect the performance of each one, and there are other factors (for example geocoding in map visuals or displaying images) that affect report performance so if you do not look at overall page render times then you might miss them. In this post I’ll show you how you can measure report page load times, and the times taken for other forms of report interaction, using Performance Analyzer in the Service and Power Query.

Read on to see how.

Leave a Comment

License Types now Workspace Types in Microsoft Fabric

Nicky van Vreonhoven notices a change in language:

Just a quick post because I noticed a change in the Fabric UI, specifically in the Workspace settings.

I am working on a demo for my Power BI Gebruikersdagen session, and wanted to switch a workspace to Fabric capacity. I noticed that the setting License type has changed, and is now called Workspace type.

Read on to see where this has changed and a few more notes from Nicky.

Leave a Comment

Snapshot Reporting in Microsoft Fabric via Fabric Pipelines

Kenneth Omorodion builds a Dataflows Gen2 pipeline:

In a previous tip, I described how we can implement snapshot reporting using Microsoft Fabric Dataflow Gen2. In this article, I will describe how to achieve the same using Microsoft Fabric Pipelines. I previously described how important snapshot reporting can be in Business Intelligence reporting. Some reasons why developers/engineers might prefer to leverage a Fabric pipeline instead of a Dataflow Gen 2 include considerations around cost efficiency and data volumes.

My strong preference is still to do this in code (notebooks, Spark jobs), but at least Dataflows Gen2 aren’t literally 100x slower than the alternatives anymore.

Leave a Comment

Memory-Optimized Storage Structures in SQL Server

Hugo Kornelis digs into another storage structure:

After discussing traditional on-disk rowstore storage in part 1 and columnstores in part 2, it is now time to turn our eye towards memory-optimized storage structures in SQL Server.

Memory-optimized storage was introduced in SQL Server 2014, as part of a project that was codenamed “Hekaton” and later renamed to in-memory OLTP. Whereas columnstore indexes were specifically targeted towards large scale analytical work, Hekaton and memory-optimized tables are specifically geared towards high volume OLTP workloads. By fully eliminating locks and latches, and using precompiled machine code where possible, the processing time of transactions is significantly reduced, allowing for throughput numbers that were previously impossible to achieve.

Read on to learn much more about how SQL Server manages memory-optimized data and the types of operations that are permissible on this internal storage.

Leave a Comment

Running PostgreSQL Tasks as Background Operations

Vibhor Kumar describes a PostgreSQL extension:

That’s the promise of pg_background: execute SQL asynchronously in background worker processes inside PostgreSQL, so your client session can move on—while the work runs in its own transaction. 

It’s a deceptively simple superpower:

  • Kick off a long-running query (or maintenance) without holding the client connection open
  • Run “autonomous transaction”-style side effects (commit/rollback independent of the caller)
  • Monitor, wait, detach, or cancel explicitly
  • Keep the operational model “Postgres-native” instead of adding another job system  

Read on to learn more about it, including tips on how to use it and some examples of when you might want to use it.

Leave a Comment