Build a Smart Recommendation Engine with EDG and Neo4j by Separating Taxonomy from Data for Scalable, Maintainable AI-Powered Insights
This tutorial demonstrates how to build a graph-based recommendation engine by combining TopQuadrant’s EDG for managing a structured taxonomy and Neo4j for storing and querying instance data. The approach leverages the power of semantic relationships to enable intelligent recommendations based on hierarchical knowledge. The process begins with setting up free trial versions of both EDG and Neo4j. After installation, the next step is to configure EDG to integrate with a Neo4j instance. This is done through the Product Configuration Parameters in EDG, where you add the Neo4j database URL and credentials. Once configured, you can link the two systems and push your taxonomy to Neo4j. The taxonomy used in this example is a hierarchy of STEM categories, such as Mathematical Software, Computer Science, and Mathematics. This is imported into EDG as a TriG or ZIP file. After import, the taxonomy is pushed to Neo4j using the cloud push button. The integration ensures that all concepts in the taxonomy are created as nodes in Neo4j, with parent–child relationships represented as relationships in the graph. Next, a set of fake academic journal articles is imported into Neo4j via a CSV file. Each article has a field called topicUri that references a concept in the taxonomy. Using a Cypher query, the system matches each article to its corresponding concept in the graph and creates a TAGGED_WITH relationship. This links the instance data (articles) to the semantic structure (taxonomy). With the data in place, a recommendation engine is built using graph queries. The first query identifies the category of a given article, such as "Advances in Mathematical Software Studies #7" being tagged with "Mathematical Software." To generate recommendations, a more advanced query is used. It finds other articles that share a common parent category. For example, even if no other article is tagged with "Mathematical Software," the system can recommend articles tagged with other branches of Computer Science, like "Computers and Society," because they are both children of the same parent. The recommendation score is based on the number of shared parent categories, encouraging diversity in suggestions. The results are returned in order of relevance, with the most similar articles appearing first. A key benefit of this architecture is the ability to evolve the taxonomy without disrupting downstream systems. For instance, if "Mathematical Software" is reclassified as a branch of Mathematics instead of Computer Science, a simple drag-and-drop in EDG is all that’s needed. After pushing the updated taxonomy to Neo4j, the recommendation engine automatically reflects the change. Articles that were once recommended based on computer science are now recommended based on mathematics, demonstrating the power of centralized, governed metadata. This approach offers several advantages. It enables inference across categories, aligns multiple data systems through a single source of truth, simplifies change management, and allows each tool to focus on its strengths—EDG for metadata governance and Neo4j for high-performance graph analytics. While the setup requires more components and a steeper learning curve than simpler alternatives, it is especially valuable in complex, multi-team environments where consistency, scalability, and semantic intelligence are critical. The use of open standards like RDF, SHACL, and SPARQL helps reduce long-term technical lock-in, even though the tools themselves are commercial. In conclusion, this architecture shows how separating taxonomy management from data storage creates a more flexible, maintainable, and intelligent system. It transforms static data into a dynamic knowledge graph capable of supporting powerful AI-driven applications like recommendation engines, search, and knowledge discovery.
