Return Zero

What’s worse than an N+1: Accidental Cartesian Products

Mr Silva

Object-Relational Mapping (ORM) tools are widely used in modern web development for their convenience in database interactions. However, they can sometimes lead to unexpected and severe performance issues. One such issue is the generation of queries that result in cartesian products, potentially returning an enormous number of rows.

Understanding the Cartesian Product Problem

A cartesian product occurs when a query joins tables in a way that creates a combination of every row from one table with every row from another, without proper filtering conditions. This can happen when:

  1. Join conditions are missing or incorrect.
  2. Multiple many-to-many relationships are queried simultaneously without proper constraints.

The result is a query that returns far more rows than intended, often growing exponentially with the size of the involved tables.

A Real-World Example

Let’s consider the following database schema and query:

CREATE TABLE users (
  id serial PRIMARY KEY,
  name TEXT
);

CREATE TABLE movies (
  id serial PRIMARY KEY,
  name TEXT
);

CREATE TABLE songs (
  id serial PRIMARY KEY,
  name TEXT
);

CREATE TABLE watched_movies (
  user_id int,
  movie_id int,
  FOREIGN KEY (user_id) REFERENCES users (id),
  FOREIGN KEY (movie_id) REFERENCES movies (id)
);

CREATE TABLE songs_heard (
  user_id int,
  song_id int,
  FOREIGN KEY (user_id) REFERENCES users (id),
  FOREIGN KEY (song_id) REFERENCES songs (id)
);

Let us assume we have only one user in the users table and that user has seen 10 movies and heard 25 songs. Your ORM’s find clause may look something like this,

const result = await User.findAll({
  include: [
    {
      model: Movie,
    },
    {
      model: Song,
    }
  ]
});

Most ORMs aren’t really smart and willl not know the context of the query and may generate SQL code that looks like the one below,

SELECT 
    usr.id AS uid, usr.name AS uname, s.name AS song, m.name AS OrmRecord 
FROM 
    users usr
JOIN 
    watched_movies wm ON wm.user_id = usr.id
JOIN 
    movies m ON m.id = wm.movie_id
JOIN 
    songs_heard sh ON sh.user_id = usr.id
JOIN 
    songs s ON s.id = sh.song_id;

At first glance, You might think this query is really good, it is a single query and it does not have the N+1 problem, but what you might forget is the amount of rows it returned internally for the ORM to then map into objects.

In this case it produces a cartesian product between watched movies and heard songs for each user. it returns 250 rows. You can see it return every combination of user, song and movie.
Imagine we pulled in 1000 users in a list instead with similar number of songs and movies viewed, that would be quarter of a million rows or 250k for a single fetch. It might be more than enough for your OS or cluster orchestrator to kill the process immediately.

How ORMs Can Lead to This Problem

ORMs might generate such queries when:

  1. Developers use eager loading on multiple relationships without considering their interaction.
  2. The ORM’s query builder is used to join multiple many-to-many relationships without proper constraints.
  3. Complex object graphs are fetched without careful consideration of the underlying SQL generated.

Mitigating the Cartesian Product Problem

When using an ORM and your models are frequently changing, you may not always notice performance issues right away. One common indicator is a sudden spike in memory usage that occurs that is reproducable with similar or repeated requests. To diagnose this, look for instances where both high application memory consumption and elevated database memory usage occur simultaneously, and try correlating these spikes with specific requests in your logs.

If you’re working in a team, make sure everyone is aware of this potential issue. Most ORMs provide SQL logging tools, which can help you track and analyze the SQL queries being generated. Regularly reviewing these logs, especially for complex queries, can reveal inefficiencies that might otherwise go unnoticed.

Update 04-Oct-2024:

I was asked by a few to include some points on how to solve the example presented. This seems to be more of a common problem with ORMs than I expected.

I can see two solutions to this:

users = Users.findMany({ where: inArray('id',[1,2,3....]) })
wathchedMovies = WatchedMovies.findAll( {where: inArray('user_id',[1,2,3...]) })
songsHeard = SongsHeard.findAll( {where: inArray('user_id',[1,2,3...]) })


Fetch all the movies and songs of users in individual queries, Then manually map the movies and songs to a user. These works really well when the movies and songs are usually common among users.

alternatively, a lot of databases support JSON Objects and you could rewrite it as a single query too,

SELECT 
    usr.id AS uid, 
    usr.name AS uname, 
    COALESCE(m.movies, '[]') AS movies, 
    COALESCE(s.songs, '[]') AS songs
FROM 
    users usr
LEFT JOIN (
    SELECT wm.user_id, json_agg(json_build_object('id', m.id, 'name', m.name)) AS movies 
    FROM watched_movies wm
    JOIN movies m ON m.id = wm.movie_id
    GROUP BY wm.user_id
) m ON m.user_id = usr.id
LEFT JOIN (
    SELECT sh.user_id, json_agg(json_build_object('id', s.id, 'name', s.name)) AS songs 
    FROM songs_heard sh
    JOIN songs s ON s.id = sh.song_id
    GROUP BY sh.user_id
) s ON s.user_id = usr.id;

Tags:
Subscribe
Notify of
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Back to top