Combining DISTINCT and group_concat() with custom delimiters in SQLite3
In this blog post, I'll introduce two useful tools in SQLite3: group_concat()
and DISTINCT
.
I show how they can be used individually, and that you need to watch out when combining them with custom delimiters.
I assume you have a basic understanding of SQL, but mostly you will need to know about the concepts of a table with data and simple SELECT
queries.
But first, we need a table to work with. For that, we'll create a table of hobbit names, and fill it with a few hobbits from Tolkien's fictional universe:
CREATE TABLE hobbits (
first_name VARCHAR(64),
last_name VARCHAR(64)
);
INSERT INTO hobbits VALUES
("Frodo", "Baggins"),
("Bilbo", "Baggins"),
("Sam", "Gamgee"),
("Pippin", "Took"),
("Merry", "Brandybuck")
;
To simply select the first and last names of each of the hobbits, the following query can be used:
SELECT first_name, last_name FROM hobbits;
first_name | last_name |
---|---|
Frodo | Baggins |
Bilbo | Baggins |
Sam | Gamgee |
Pippin | Took |
Merry | Brandybuck |
Great! But we can do more interesting things, as demonstrated next.
Concatenating
One of the tools in SQL is the group_concat()
function, which returns a string that is the concatenation of all non-null values of the given result set.
For example, this query concatenates all of the hobbits' first names:
SELECT group_concat(first_name) FROM hobbits;
Frodo,Bilbo,Sam,Pippin,Merry
You can see that, by default, the values are delimited by commas (,
).
To customise this, it is possible to provide an additional argument that sets the delimiter:
SELECT
group_concat(first_name, " and ")
FROM
hobbits
;
Frodo and Bilbo and Sam and Pippin and Merry
That's a little more readable.
Distinct values
Another tool in SQL is the DISTINCT
keyword, which can remove duplicate entries from result sets.
The keen-eyed (or well-read) among you may have observed (or remembered) that Frodo and Bilbo share their last names! This means that when selecting the last names, we would see a duplicate entry. And indeed:
SELECT last_name FROM hobbits;
last_name |
---|
Baggins |
Baggins |
Gamgee |
Took |
Brandybuck |
To remove the duplicate Bagginses from the result set, you can specify that the result set should be made distinct with the DISTINCT
keyword:
SELECT DISTINCT last_name FROM hobbits;
last_name |
---|
Baggins |
Gamgee |
Took |
Brandybuck |
As expected, only a single Baggins remains in the result.
Concatenating distinct values
So, what if we want to combine group_concat()
and DISTINCT
?
We can!
This will concatenate all the distinct last names:
SELECT
group_concat(DISTINCT last_name)
FROM
hobbits
;
Baggins,Gamgee,Took,Brandybuck
Fabulous! We're wielding these tools like a pro! Feeling overly confident, we may even try to customise the delimiter for extra points:
SELECT
group_concat(DISTINCT last_name, " and ")
FROM
hobbits
;
Parse error near line 2: DISTINCT aggregates must
have exactly one argument
Uh oh.
It looks like the way we use the DISTINCT
keyword confuses the query parser.
It seems like the last_name, " and "
part is interpreted as two arguments for the DISTINCT
keyword?
Let's try to fix that with parentheses so that the DISTINCT
keyword only applies to last_name
:
SELECT
group_concat((DISTINCT last_name), " and ")
FROM
hobbits
;
Parse error near line 39: near "DISTINCT": syntax
error
SELECT group_concat((DISTINCT last_name),
" and ") FROM hobbits ;
error here ---^
Nope! It seems like this is not allowed.
Consulting the SELECT
grammar, it is indeed the case that it is required for the DISTINCT
keyword to directly follow the opening parenthesis of the group_concat()
(or other) function.
It looks like the DISTINCT
keyword is only allowed in certain contexts, and a special case has been made for adding it in function invocations.
This special case prohibits extra arguments in the function invocation.
The Solution
So, how do we solve this? Well, an answer on the SQLite Forum provided a solution! It uses a subquery as a workaround:
SELECT
group_concat(distinct_last_name, " and ")
FROM (
SELECT DISTINCT
last_name AS distinct_last_name
FROM hobbits
);
Baggins and Gamgee and Took and Brandybuck
The inner query creates an intermediate result set that we can use in the outer query. This workaround definitely feels like... well, a workaround. If you find a more elegant solution to this, please let me know!
Conclusion
Apparently, the group_concat()
and DISTINCT
tools do not work together as I expected when using a custom delimiter.
As Bilbo Baggins would put it: I don't know half of SQL as well as I should like; and I like less than half of it half as well as it deserves.