Conditional Traversals With Gremlin

The problem

I recently did a spike for CreditUnionFindr that used a graph to determine if the user is eligable for a credit union’s Credit Union Field of Membership. We tried a several approaches but settled on graph-based approach.

The concept was simple: if you can traverse from the user to a credit union, they are eligable. The majority of the Fields of Membership (FOMs) are simple, and a graph solution was trivial: Max works at ThoughtWorks which qualifies for TW Credit Union

However, some FOMs are more complex. We were concerned about Glass Ceilings so we needed to be sure our graph was flexible.

One problem we hit was conditional traversals. Occasionally we needed to do something like this: Max works at ThoughtWorks which qualifies for TW Credit Union if employment duration is greater than 1 year

I found a few examples of conditional traversals in graphs, but the traverser always knew the condition and often the depth where it occurred. If that was the case, we could use this naive solution for the above graph:

EligabilityGraph.groovy
1
2
3
4
def dumbSolve(Object id) {
    def results = g.v(id).outE.filter{it.duration > 365}.inV.outE.inV.paths {it.name} {it.description}
    solutions = results.toList()
}

This wouldn’t have worked well for us. If we had to know all the conditions (and depths) ahead of time our solution would not be as simple as we originally envisioned. We were looking for a simple traversal that could solve a more complex graph like this: This graph contains two unconditional qualifications and two conditional qualifications. The two conditions (not shown) could be different.

The concept was still simple - traverse from the user to credit unions, avoiding nodes with false conditions. We needed the conditions to be closures to do this without overcomplicating our traversal.

Our solution

We came up with something like this:

EligibilityGraph.groovy
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
def solve(Object id) {
    def results =
//            Find the User node and save it as u
        g.v(id).as('u').sideEffect{u = it}
//            Save the edge before the condition as l
            .outE.sideEffect{l = it}
//            Save the edge to be filtered as x
            .inV.outE.sideEffect{x = it}
//            Filter based on the "condition" closure
            .filter {it.condition == null || this.evaluate("${it.condition}")}
//            Keep going until we find a Credit Union
            .inV.loop('u') {it.object.type != 'CU' && it.loops < 50}
//            Format our results
            .paths {it.name} {it.description}
    solutions = results.toList()
}

The example above is Gremlin Groovy. You do need a minor trick in Groovy. The class containing the solve method needs to be able to call evaluate with the current context. I accomplished this by extending GroovyShell.

You should be able to use this technique with various languages and databases by using the Rexster Gremlin Extension or the Neo4J Gremlin Plugin.

Sample Test

Graphs conveniently give us paths, so we can get information about the final result, or about the path to the result. In this case, we can easily turn the path into a human readable explanation of eligibility. Hopefully that makes the example test below easy to follow.

EligibilityGraph.groovy
1
2
3
4
5
6
7
8
9
List<String> getDisplayPaths() {
    if(solutions == null) throw new IllegalStateException("Solve first")
    solutions.collect{it.join(' ')}
}

List<String> getEligibleCUs() {
    if(solutions == null) throw new IllegalStateException("Solve first")
    solutions.collect{it[-1]}.unique()
}
ComplexGraphTest.groovy
1
2
3
4
5
6
7
8
9
10
11
void testMaxWrongDegree() {
        g.e('Max_Drexel').degree = 'BSCS'
        def results = eg.solve('Max')
        def paths = eg.getDisplayPaths()
        def creditUnions = eg.getEligibleCUs()
        assertTrue(paths.contains('Max works at ThoughtWorks which qualifies for TW Credit Union'))
        assertTrue(paths.contains('Max lives at NYC which qualifies for Big Apple Credit Union'))
        assertTrue(paths.contains('Max lives in Manhattan which qualifies for Big Apple Credit Union'))
        assertEquals(3, results.size)
        assertEquals(2, creditUnions.size)
    }

You can find the full code and more tests on GitHub

Summary

Someone asked “How much logic should we be putting into the queries we run through Gremlin?” in the Gremlin group and the consensus was “minimal”. This technique goes against that somewhat. On the other hand we’ve actually pushed most of the logic out of the traversal - we just moved the logic into the graph rather than the post-processing of results.

The best use cases for graphs called out in NoSQL Distilled are Social graphs; Routing, Dispatch and Location based services, and Recommendation engines. Our spike fell outside those categories, so the techniques that worked for us might not make sense for more typical graph projects.

Comments