SQL Joins: Returning Multiple Versions of a Relationship in the same Row
Last updated:Imagine the following situation: you're building a system and so you've modeled persons
and attributes
like so:
And your persons
table looks like this:
And your attributes
table like this:
And this is the relationship table (because it's an N-to-N relationship)
This means that john is fat and talkative and that mary is tall, rich and talkative.
Great news - but wait a second: imagine you had thousands of rows rather than just two and your boss has requested that you represent all that data with only one person per row. How would you do it?
How do you display results with only one person per row?
Attempt 1: Simple join
SELECT p.value AS NAME, a.value AS ATTRIBUTE
FROM persons p
JOIN persons_x_attributes pa ON pa.person_id=p.id
JOIN attributes a ON a.id=pa.attribute_id
FAIL: That's the same info, but the boss said one person per row!
Attempt 2: multiple joins
One naive way of having a go at this would be to join with the same table multiple times - as many as there are different results for an attribute.
SELECT
p.value AS NAME,
a1.value AS TALL,
a2.value AS FAT,
a3.value AS RICH,
a4.value AS TALKATIVE
FROM persons p
-- join to find out whether person is tall
LEFT JOIN persons_x_attributes pa1 ON pa1.person_id = p.id
LEFT JOIN attributes a1 ON a1.id = pa1.attribute_id AND a1.value='tall'
-- another one for 'fat'
LEFT JOIN persons_x_attributes pa2 ON pa2.person_id = p.id
LEFT JOIN attributes a2 ON a2.id = pa2.attribute_id AND a2.value='fat'
-- another join to find out if person is rich
LEFT JOIN persons_x_attributes pa3 ON pa3.person_id = p.id
LEFT JOIN attributes a3 ON a3.id = pa3.attribute_id AND a3.value='rich'
-- null means person is not talkative
LEFT JOIN persons_x_attributes pa4 ON pa4.person_id = p.id
LEFT JOIN attributes a4 ON a4.id = pa4.attribute_id AND a4.value='talkative'
(some rows suppressed)
FAIL: The info is there, but is there any way to do away with all the noise?
Attempt 3: Multiple joins and group concat
Yes, there is:
SELECT
p.value AS NAME,
GROUP_CONCAT(DISTINCT a1.value) AS TALL,
GROUP_CONCAT(DISTINCT a2.value) AS FAT,
GROUP_CONCAT(DISTINCT a3.value) AS RICH,
GROUP_CONCAT(DISTINCT a4.value) AS TALKATIVE
FROM persons p
-- first join
LEFT JOIN persons_x_attributes pa1 ON pa1.person_id = p.id
LEFT JOIN attributes a1 ON a1.id = pa1.attribute_id AND a1.value = 'tall'
-- second join
LEFT JOIN persons_x_attributes pa2 ON pa2.person_id = p.id
LEFT JOIN attributes a2 ON a2.id = pa2.attribute_id AND a2.value = 'fat'
-- third join
LEFT JOIN persons_x_attributes pa3 ON pa3.person_id = p.id
LEFT JOIN attributes a3 ON a3.id = pa3.attribute_id AND a3.value = 'rich'
-- fourth join
LEFT JOIN persons_x_attributes pa4 ON pa4.person_id = p.id
LEFT JOIN attributes a4 ON a4.id = pa4.attribute_id AND a4.value = 'talkative'
-- only one name per row
GROUP BY NAME
Great success!
GROUP_CONCAT
is a special aggregate function that sucks up all results by concatenating them (this is a useful way to get rid of NULL
s in your data).
Note that, instead of GROUP_CONCAT, you can also use good ole SUM
if your data is composed of integers rather than strings; NULLs don't interfere with SUMming on SQL.
References