The problem
I recently did a spike for CreditUnionFindr that used a graph to determine if the user is eligable for a credit union’s Credit Union Field of Membership. We tried a several approaches but settled on graph-based approach.
The concept was simple: if you can traverse from the user to a credit union, they are eligable. The majority of the Fields of Membership (FOMs) are simple, and a graph solution was trivial:
However, some FOMs are more complex. We were concerned about Glass Ceilings so we needed to be sure our graph was flexible.
One problem we hit was conditional traversals. Occasionally we needed to do something like this:
I found a few examples of conditional traversals in graphs, but the traverser always knew the condition and often the depth where it occurred. If that was the case, we could use this naive solution for the above graph:
1 2 3 4 |
|
This wouldn’t have worked well for us. If we had to know all the conditions (and depths) ahead of time our solution would not be as simple as we originally envisioned. We were looking for a simple traversal that could solve a more complex graph like this: This graph contains two unconditional qualifications and two conditional qualifications. The two conditions (not shown) could be different.
The concept was still simple - traverse from the user to credit unions, avoiding nodes with false conditions. We needed the conditions to be closures to do this without overcomplicating our traversal.
Our solution
We came up with something like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
|
The example above is Gremlin Groovy. You do need a minor trick in Groovy. The class containing the solve method needs to be able to call evaluate with the current context. I accomplished this by extending GroovyShell.
You should be able to use this technique with various languages and databases by using the Rexster Gremlin Extension or the Neo4J Gremlin Plugin.
Sample Test
Graphs conveniently give us paths, so we can get information about the final result, or about the path to the result. In this case, we can easily turn the path into a human readable explanation of eligibility. Hopefully that makes the example test below easy to follow.
1 2 3 4 5 6 7 8 9 |
|
1 2 3 4 5 6 7 8 9 10 11 |
|
You can find the full code and more tests on GitHub
Summary
Someone asked “How much logic should we be putting into the queries we run through Gremlin?” in the Gremlin group and the consensus was “minimal”. This technique goes against that somewhat. On the other hand we’ve actually pushed most of the logic out of the traversal - we just moved the logic into the graph rather than the post-processing of results.
The best use cases for graphs called out in NoSQL Distilled are Social graphs; Routing, Dispatch and Location based services, and Recommendation engines. Our spike fell outside those categories, so the techniques that worked for us might not make sense for more typical graph projects.