Lego Project

Table of Contents

Description
Meta-graph
Import database
- Schema definition
- The import scripts
Some queries

Description

This project aims to create a referencial of all the lego sets, with theirs pieces. So after, if you have a partial set, you will know wich piece to search to have a complete one.

All the information comes from the website rebrickable.com, where you can download all the lego information under a CC-BY-SA : http://rebrickable.com/downloads

Meta-graph

Import database

The dataset is composed of 4 files :

colors.csv : id,descr
pieces.csv : piece_id,descr,category
sets.csv : set_id,year,pieces,t1,t2,t3,descr
set_pieces.csv : set_id,piece_id,num,color,type (type : 1=normal, 2=spare)

Just put those files into NEO4J_HOME/import folder.

Schema definition

Constraints

Execute those commands with NEO4J_HOME/bin/neo4j-shell tools :

CREATE CONSTRAINT ON (n:PieceCategory) ASSERT n.name IS UNIQUE;
CREATE CONSTRAINT ON (n:Color) ASSERT n.id IS UNIQUE;
CREATE CONSTRAINT ON (n:Set) ASSERT n.id IS UNIQUE;
CREATE CONSTRAINT ON (n:PieceType) ASSERT n.id IS UNIQUE;

Indexes

Execute those command with NEO4J_HOME/bin/neo4j-shell tools :

CREATE INDEX ON :Set(name);
CREATE INDEX ON :PieceType(name);
CREATE INDEX ON :Color(name);
CREATE INDEX ON :Theme(name);

The import scripts

You can execute those scripts one at a time into the browser (http://localhost:7474/browser) or via the neo4j-shell tool.

Colors

LOAD CSV WITH HEADERS FROM 'file:///colors.csv' AS row
  CREATE (c:Color { id:row.id, name:row.descr });

Pieces Type

LOAD CSV WITH HEADERS FROM 'file:///pieces.csv' AS row
  CREATE (pt:PieceType { id:row.piece_id, name:row.descr })
  MERGE (c:PieceCategory { name:row.category })
  CREATE UNIQUE (pt)-[:HAS_CATEGORY]->(c);

Sets

Firstly, we create the RootTheme node :

CREATE (:RootTheme);

Then we import the sets.csv files :

LOAD CSV WITH HEADERS FROM 'file:///sets.csv' AS row
  MATCH (rt:RootTheme)

  CREATE (s:Set { id:row.set_id, name:row.descr, year:toInt(row.year), nbPieces:0})

  MERGE (rt)-[:HAS_CHILD]->(t1:Theme { name: coalesce(row.t1, '@@')})
  MERGE (t1)-[:HAS_CHILD]->(t2:Theme { name: coalesce(row.t2, '@@')})
  MERGE (t2)-[:HAS_CHILD]->(t3:Theme { name: coalesce(row.t3, '@@')})

  CREATE (s)-[:HAS_THEME]->(t3);

Note that sometimes t1, t2 & t3 can be null, so in the above script we make some trick to import them with some dummy value. But this mean that after the execution, we have some clean-up todo :

Clean-up level 3

MATCH (rt:RootTheme)-[:HAS_CHILD]->(t1:Theme)-[:HAS_CHILD]->(t2:Theme)-[:HAS_CHILD]->(t3:Theme)
WHERE t3.name CONTAINS '@@'
WITH t2, t3
  MATCH (t3)<-[r:HAS_THEME]-(s:Set)
  DELETE r
  CREATE (t2)<-[:HAS_THEME]-(s);

Clean-up level 2

MATCH (rt:RootTheme)-[:HAS_CHILD]->(t1:Theme)-[:HAS_CHILD]->(t2:Theme)
WHERE t2.name CONTAINS '@@'
WITH t1, t2
  MATCH (t2)<-[r:HAS_THEME]-(s:Set)
  DELETE r
  CREATE (t1)<-[:HAS_THEME]-(s);

Delete orphelan theme node

MATCH (t:Theme)
WHERE
  size((t)-[:HAS_CHILD]->()) = 0 AND
  size((t)<-[:HAS_THEME]-(:Set)) = 0
WITH t
  DETACH DELETE t;

Sets pieces

To have better performances, we split the script into three parts :

The creation of the Piece node (unique per type and color):

LOAD CSV WITH HEADERS FROM 'file:///set_pieces.csv' AS row
  MATCH (pt:PieceType {id:row.piece_id})
  MATCH (c:Color {id:row.color})
  WITH pt, collect(DISTINCT c) AS colors
    WITH pt, colors
      UNWIND colors AS color
        CREATE (p:Piece)
        CREATE (p)-[:IS_TYPE_OF]->(pt)
        CREATE (p)-[:HAS_COLOR]->(color);

And the link between the Set and its Piece :

USING PERIODIC COMMIT 10000
LOAD CSV WITH HEADERS FROM 'file:///set_pieces.csv' AS row
WITH row WHERE row.type = '1'

  MATCH (s:Set {id:row.set_id})
  MATCH (:Color {id:row.color})<-[:HAS_COLOR]-(p:Piece)-[:IS_TYPE_OF]->(:PieceType {id:row.piece_id})

  CREATE (s)-[r:IS_COMPOSED_OF {occurrence:toInt(row.num)}]->(p)

  SET s.nbPieces = s.nbPieces + toInt(row.num);

Merge the link between the Set and Piece for spare pieces :

USING PERIODIC COMMIT 10000
LOAD CSV WITH HEADERS FROM 'file:///set_pieces.csv' AS row
WITH row WHERE row.type = '2'

  MATCH (:Set {id:row.set_id})-[r:IS_COMPOSED_OF]->(p:Piece),
        (:Color {id:row.color})<-[:HAS_COLOR]-(p)-[:IS_TYPE_OF]->(:PieceType {id:row.piece_id})

  SET r.spare = toInt(row.num);

Some queries

Piece queries

Number of piece type

MATCH (pt:PieceType)
RETURN count(pt) AS nbPieceType;

Number of piece

MATCH (p:Piece)
RETURN count(p) AS nbPiece;

Number of pieces type per category

MATCH (p:PieceType)-[:HAS_CATEGORY]->(c:PieceCategory)
RETURN c.name AS category, count(p) AS count
ORDER BY category ASC;

Get piece type info

MATCH (c:Color)<-[:HAS_COLOR]-(:Piece)-[:IS_TYPE_OF]->(pt:PieceType {name:"Technic Beam 1 x 11 Thick" })-[:HAS_CATEGORY]->(c:PieceCategory)
RETURN pt.name AS name, c.name AS category, COLLECT(c.name) AS colors;

Fabrications years of a piece

MATCH (s:Set)-[:IS_COMPOSED_OF]->(:Piece)-[:IS_TYPE_OF]->(:PieceType {name:"Technic Beam 1 x 11 Thick" })
RETURN DISTINCT s.year AS year  ORDER BY year ASC;

Colors available for a piece type

MATCH (p:Piece)-[:IS_TYPE_OF]->(pt:PieceType {name:"Technic Beam 1 x 11 Thick" }),
      (p)-[:HAS_COLOR]->(c:Color)
RETURN DISTINCT c.name AS color
ORDER BY color ASC;

List of set where a piece type appears by year

MATCH (s:Set)-[:IS_COMPOSED_OF]->(:Piece)-[:IS_TYPE_OF]->(:PieceType {name:"Technic Beam 1 x 11 Thick" })
RETURN DISTINCT s.name AS set, s.year AS year
ORDER BY year DESC, set ASC;

List of set where piece appears with its number of occurrence

This query can be usefull if you have a missing piece and you want to search a partial set where it’s frequent to find it.

MATCH (s:Set)-[r:IS_COMPOSED_OF]->(p:Piece),
      (p)-[:IS_TYPE_OF]->(:PieceType {name:"Technic Beam 1 x 11 Thick" }),
      (p)-[:HAS_COLOR]->(:Color { name:"Black"})
RETURN s AS set, r.occurrence AS occurrence
ORDER BY occurrence DESC, set.name ASC;

Set queries

Number of sets

MATCH (s:Set)
RETURN count(*) AS nbSet;

Number of set per top-theme

MATCH (:RootTheme)-[:HAS_CHILD]->(t),
      (t)-[:HAS_CHILD*]->(st)-[:HAS_THEME]-(s:Set)
RETURN t.name AS name, count(s)+ size((t)<-[:HAS_THEME]-(:Set)) AS count
ORDER BY name ASC;

Number of set per year

MATCH (s:Set)
RETURN s.year AS year, count(s)
ORDER BY year ASC;

Is there some sub-set of a set ?

Can I do an other (sub)set with a set ?

MATCH (s1:Set)-[r1:IS_COMPOSED_OF]->(p:Piece)<-[r2:IS_COMPOSED_OF]-(s2:Set)
WHERE s1.name = "The Kwik-E-Mart" AND
      r2.occurrence <= (r1.occurrence + coalesce(r1.spare, 0))
WITH s1, s2, collect(id(p)) AS pieces
  WHERE size(pieces) = size((s2)-[:IS_COMPOSED_OF]->(:Piece))
  RETURN s2.name;

Give me some pieces and I will construct the world

Initialize piece set for a person

Here we will create a user node, and attach to it the list of its pieces.

Firstly we will create a new constraint based on user email address :

CREATE CONSTRAINT ON (n:Person) ASSERT n.email IS UNIQUE;

Then we create the user :

CREATE (me:Person { name:'Simard', firstname:'Benoit', email:'benoit@lego.com'});

For the example, let’s consider that Benoit has all the piece to make the Taj Mahal

MATCH (me:Person {email:'benoit@lego.com'}),
      (s:Set)-[r:IS_COMPOSED_OF]->(p:Piece)
WHERE s.name = 'Taj Mahal'
WITH me, r.occurrence AS nb, p
        CREATE (me)-[:HAS_PIECE {occurrence:nb}]->(p);

Give me all sets that can I do

With a given number of piece (type and number), what set can I do ?

// look at piece in common where user have enought number of pieces
MATCH (me:Person {email:'benoit@lego.com'})-[r:HAS_PIECE]->(p:Piece)<-[r2:IS_COMPOSED_OF]-(s:Set)
WHERE r2.occurrence <= r.occurrence
WITH s, count(p) AS nbPiecesType
  // we look if the set is complete
  WHERE nbPiecesType = size((s)-[:IS_COMPOSED_OF]->(:Piece))
  RETURN s;

Give me all sets that can pretty much do

With a given number of piece (type and number), what set can I pretty much do (missing at max 10% of pieces type) ?

MATCH (me:Person {email:'benoit@lego.com'}),
      (s:Set)-[r:IS_COMPOSED_OF]->(p:Piece)
OPTIONAL MATCH (me)-[r2:HAS_PIECE]->(p)
WITH s,
     p,
     CASE
       WHEN (r2.occurrence - r.occurrence) IS null THEN r.occurrence
       WHEN (r2.occurrence - r.occurrence) < 0 THEN (r.occurrence - r2.occurrence)
       ELSE 0
     END AS missing
     WITH s, sum(missing) AS nbMissingPieces
       WHERE nbMissingPieces > 0 AND
             (nbMissingPieces/toFloat(s.nbPieces) < 0.1
       RETURN s, nbMissingPieces
       ORDER BY nbMissingPieces ASC;