SQL Joins: Returning Multiple Versions of a Relationship in the same Row

Last updated:

Imagine the following situation: you're building a system and so you've modeled persons and attributes like so:

database-diagram

And your persons table looks like this: persons-table

And your attributes table like this:

attributes-table

And this is the relationship table (because it's an N-to-N relationship)

persons-x-attributes-table

This means that john is fat and talkative and that mary is tall, rich and talkative.

Great news - but wait a second: imagine you had thousands of rows rather than just two and your boss has requested that you represent all that data with only one person per row. How would you do it?

How do you display results with only one person per row?

Attempt 1: Simple join

SELECT p.value AS NAME, a.value AS ATTRIBUTE 
FROM persons p
JOIN persons_x_attributes pa ON pa.person_id=p.id
JOIN attributes a ON a.id=pa.attribute_id

attempt-1

FAIL: That's the same info, but the boss said one person per row!

Attempt 2: multiple joins

One naive way of having a go at this would be to join with the same table multiple times - as many as there are different results for an attribute.

SELECT 
    p.value AS NAME, 
    a1.value AS TALL, 
    a2.value AS FAT, 
    a3.value AS RICH, 
    a4.value AS TALKATIVE
FROM persons p
-- join to find out whether person is tall
LEFT JOIN persons_x_attributes pa1 ON pa1.person_id = p.id
LEFT JOIN attributes a1 ON a1.id = pa1.attribute_id AND a1.value='tall'
-- another one for 'fat'
LEFT JOIN persons_x_attributes pa2 ON pa2.person_id = p.id
LEFT JOIN attributes a2 ON a2.id = pa2.attribute_id AND a2.value='fat'
-- another join to find out if person is rich
LEFT JOIN persons_x_attributes pa3 ON pa3.person_id = p.id
LEFT JOIN attributes a3 ON a3.id = pa3.attribute_id AND a3.value='rich'
-- null means person is not talkative
LEFT JOIN persons_x_attributes pa4 ON pa4.person_id = p.id
LEFT JOIN attributes a4 ON a4.id = pa4.attribute_id AND a4.value='talkative'

attempt-2

(some rows suppressed)

FAIL: The info is there, but is there any way to do away with all the noise?

Attempt 3: Multiple joins and group concat

Yes, there is:

SELECT 
    p.value AS NAME, 
    GROUP_CONCAT(DISTINCT a1.value) AS TALL, 
    GROUP_CONCAT(DISTINCT a2.value) AS FAT, 
    GROUP_CONCAT(DISTINCT  a3.value) AS RICH, 
    GROUP_CONCAT(DISTINCT a4.value) AS TALKATIVE
FROM persons p
-- first join
LEFT JOIN persons_x_attributes pa1 ON pa1.person_id = p.id
LEFT JOIN attributes a1 ON a1.id = pa1.attribute_id AND a1.value =  'tall'
-- second join
LEFT JOIN persons_x_attributes pa2 ON pa2.person_id = p.id
LEFT JOIN attributes a2 ON a2.id = pa2.attribute_id AND a2.value =  'fat'
-- third join
LEFT JOIN persons_x_attributes pa3 ON pa3.person_id = p.id
LEFT JOIN attributes a3 ON a3.id = pa3.attribute_id AND a3.value =  'rich'
-- fourth join
LEFT JOIN persons_x_attributes pa4 ON pa4.person_id = p.id
LEFT JOIN attributes a4 ON a4.id = pa4.attribute_id AND a4.value =  'talkative'
-- only one name per row
GROUP BY NAME

attempt-3

Great success!

GROUP_CONCAT is a special aggregate function that sucks up all results by concatenating them (this is a useful way to get rid of NULLs in your data).

Note that, instead of GROUP_CONCAT, you can also use good ole SUM if your data is composed of integers rather than strings; NULLs don't interfere with SUMming on SQL.

References

Dialogue & Discussion