Building a Multi-Modal Recommender System

Architecture Overview

Recommender systems power everything from Netflix to Spotify. But what happens when you need to compare fundamentally different approaches—content-based filtering, collaborative filtering, graph embeddings, and deep learning—all within a single production system?

This is the story of building a multi-modal Steam game recommender that serves 15+ different algorithms through a unified API, processing millions of user-game interactions in a graph database.

The Core Challenge: One System, Multiple Paradigms

Most recommendation tutorials focus on a single approach. Real production systems need to experiment with many. Our architecture handles this by creating a modular pipeline where each recommendation paradigm can coexist:

Raw Steam Data (PostgreSQL)
    ↓ 
 ETL Pipeline (to_graph service)
    ↓
 Neo4j Graph Database
    ↓
 Model Training (model service) 
    ↓
 FastAPI Serving Layer (api service)
    ↓
 REST Recommendations

The beauty lies in the separation of concerns. Each service handles one responsibility, yet they compose into a system that can serve content-based recommendations at 9am and deep learning embeddings at 9:01am.

Service Architecture: Microservices

The ETL Foundation

The pg_persistor service handles raw Steam API ingestion. Nothing fancy—just reliable PostgreSQL storage for millions of user interactions, game metadata, and social connections.

The to_graph service transforms this relational data into Neo4j’s graph format. The system offers two approaches, each optimised for different scenarios.

Transaction-Based Import (Live Data)

def transfer_data_batch(self, query_pg: str, query_neo4j: str, data_type: str):
    for batch in pd.read_sql(query_pg, con=self.pg_connection, chunksize=10_000):
        records = batch.to_dict(orient="records")
        with self.graph.session() as session:
            session.run(query_neo4j, records=records)

This approach streams data directly from PostgreSQL to Neo4j in 10K record batches. It’s perfect for incremental updates and smaller datasets where you need transaction guarantees.

Parquet-Based Import (Bulk Data)

def write_playtimes():
    df = pl.read_parquet("Games_filtered.parquet")
    for batch in df.iter_slices(n_rows=5_000):
        records = batch.to_dicts()
        with driver.session() as session:
            session.run(playtime_query, records=records)

For initial loads of 50M+ records, the system bypasses PostgreSQL entirely. Raw data gets converted to Parquet files, then loaded using Polars for maximum throughput.

Why Two Approaches? Transaction-based import handles ongoing updates reliably but caps at ~10K records/second. Parquet import hits 50K+ records/second but requires preprocessing.

The Model Training Engine

The model service is where algorithms come alive. Each recommendation approach gets its own module, but they all inherit from a common Model base class:

class Model:
    def _project(self) -> Graph:
        """Create GDS graph projection"""
        return self.gds.graph.project(self.proj.graph_name, 
                                     self.proj.node_projection, 
                                     self.proj.relationship_projection)
    
    def _write_sim_to_db(self, G: Graph):
        """Write similarities back to Neo4j"""
        raise NotImplementedError
        
    def run(self):
        G = self._project()
        self._write_sim_to_db(G)
        self._post_clean()

This pattern means adding a new algorithm is as simple as implementing _write_sim_to_db(). The projection, cleanup, and orchestration are handled automatically.

The API Layer: 15+ Algorithms in One Factory

The API service uses a factory pattern to serve any trained model:

class ModelFactory:
    def __init__(self):
        self._model = {}
    
    def register_model(self, model_name: ModelName, model: Type[Recommendation]):
        self._model[model_name] = model
    
    def get_model(self, model_name: str) -> Recommendation:
        return self._model[model_name]()
 
# Registration happens at startup
model_factory.register_model("apps_content_based_knn", RecGamesContentBasedKNN)
model_factory.register_model("apps_collaborative_weighted", RecGamesCollaborativeWeighted)
model_factory.register_model("apps_fastrp_direct", RecGamesFastRP)

Now any HTTP client can switch algorithms with a single parameter:

curl "localhost/recommendations/games?user_id=123&model=apps_fastrp_direct"
curl "localhost/recommendations/games?user_id=123&model=apps_content_based_knn"

Four Paradigms, One Graph

Content-Based: Features as Graph Nodes

┌─────────┐     ┌──────────┐     ┌─────────┐
│  User   │────▶│   Game   │────▶│ Feature │
│  Alice  │ OWNS│ Cyberpunk│ HAS │   RPG   │
└─────────┘     └──────────┘     └─────────┘
     │                               ▲
     └───────────LIKES───────────────┘
      (inherited from owned games)

Most content-based systems use feature matrices. We model features as graph nodes, creating richer representations:

def _prepare_features(self):
    # Games connect to their features (genres, developers)
    self.gds.run_cypher('''
        MATCH (a:App)-[:HAS_GENRE]->(g:Genre)
        MERGE (a)-[:FEATURE]->(g:Feature:Genre)
    ''')
    
    # Users inherit features from owned games
    self.gds.run_cypher('''
        MATCH (u:User)-[:OWNS]->(a:App)-[:FEATURE]->(f:Feature)
        WITH u, f, count(a) as weight
        MERGE (u)-[:LIKES {weight: weight}]->(f)
    ''')

Now users and games exist in the same feature space. KNN similarity becomes a graph algorithm rather than a matrix operation.

Collaborative Filtering: Graph Algorithms at Scale

User-Item Bipartite Graph → Item-Item Similarities

┌───────┐    ┌──────┐    ┌───────┐
│ User1 │───▶│Game A│◀───│ User2 │
└───────┘    └──────┘owns└───────┘
     │          ║          │
 owns│       similar       │owns
     ▼          ║          ▼
┌──────┐ ═══════╬═══════ ┌──────┐
│Game B│        ║        │Game C│
└──────┘     Jaccard     └──────┘
              Index

Traditional collaborative filtering computes user-user or item-item similarities in memory. Neo4j GDS can handle graphs with billions of relationships:

# Project user-item bipartite graph
G = gds.graph.project(
    "user_item",
    ["User", "App"], 
    {"PLAYED": {"orientation": "UNDIRECTED"}}
)
 
# Compute item-item similarities using Jaccard
results = gds.nodeSimilarity.write(
    G,
    writeRelationshipType="ITEM_SIMILAR",
    similarityCutoff=0.1,
    topK=20
)

The algorithm runs in parallel across the entire graph, writing similarities back as new relationships. No matrices, no memory constraints.

FastRP: Universal Embeddings Across Entity Types

Multi-Entity Graph → Unified Embedding Space

┌──────┐   ┌──────┐   ┌───────┐   ┌────────┐
│ User │   │ Game │   │Friend │   │ Group  │
└──────┘   └──────┘   └───────┘   └────────┘
    │         │          │           │
    └─────────┼──────────┼───────────┘
              ▼FastRP    ▼
    ┌─────────────────────────────────────┐
    │    128-Dimensional Vector Space     │
    │  [0.2, -0.1, 0.5, ..., 0.3, -0.7]   │
    │     Cross-Type Similarities         │
    └─────────────────────────────────────┘

The breakthrough insight: users, games, groups, and friends can all be embedded in the same vector space using Fast Random Projection.

def fastrp(self, G) -> None:
    results = self.gds.fastRP.mutate(
        G,
        embeddingDimension=128,
        randomSeed=42,
        mutateProperty="embedding",
        iterationWeights=[1.0, 1.0, 1.0]  # 3 iterations
    )

This creates 128-dimensional embeddings for every node. Now you can recommend:

Games similar to other games
Users similar to other users
Games similar to users (cross-type recommendations)
Friends who like similar games

All from the same embedding space.

Deep Learning: Two-Tower Architecture

Two-Tower Neural Architecture

 User Features          Item Features
┌─────────────┐       ┌─────────────┐
│ Age: 25     │       │ Genre: RPG  │
│ Country: US │       │ Price: $60  │
│ Playtime: H │       │ Rating: 9.1 │
└─────────────┘       └─────────────┘
       │                     │
       ▼                     ▼
┌─────────────┐       ┌─────────────┐
│ User Tower  │       │ Item Tower  │
│   Neural    │       │   Neural    │
│   Network   │       │   Network   │
└─────────────┘       └─────────────┘
       │                     │
       ▼                     ▼
┌─────────────┐       ┌─────────────┐
│User Embedding│      │Item Embedding│
│ [64 dims]   │      │ [64 dims]    │
└─────────────┘       └─────────────┘
       │                     │
       └──────────┬──────────┘
                  ▼
            Dot Product
         (Recommendation Score)

The PyTorch service handles neural approaches. The two-tower architecture learns separate embeddings for users and items:

class TwoTowerModel(pl.LightningModule):
    def __init__(self, config: ModelConfig):
        super().__init__()
        self.user_tower = FeatureLayer(config.user_features)
        self.item_tower = FeatureLayer(config.item_features)
        
    def forward(self, batch):
        user_emb = self.user_tower(batch['user_features'])
        item_emb = self.item_tower(batch['item_features'])
        return torch.mm(user_emb, item_emb.t())  # Dot product
    
    def training_step(self, batch, batch_idx):
        scores = self(batch)
        loss = F.binary_cross_entropy_with_logits(scores, batch['targets'])
        return loss

The model trains on implicit feedback (playtime > 0 = positive, else negative) and can incorporate rich features from the graph database.

Docker Orchestration: 7 Services, One Command

The entire system runs with docker-compose up:

services:
  postgres:    # Raw data storage
  neo4j:       # Graph database with GDS plugins
  pg_persistor: # Data ingestion
  to_graph:    # ETL pipeline  
  model:       # Algorithm training
  api:         # FastAPI serving
  pytorch:     # Deep learning experiments

Each service is independently scalable. Need more API throughput? Scale the API service. Training large embeddings? Scale the model service.

When to Use What: Decision Framework

Content-Based excels with:

Cold start users (no interaction history)
Explainable recommendations (“Because you liked RPGs”)
Rich item metadata (genres, developers, tags)

Collaborative Filtering works best with:

Dense interaction matrices
Users with established preferences
Implicit feedback signals (playtime, purchases)

FastRP Embeddings shine for:

Cross-domain recommendations (games → friends → groups)
Scalable similarity computation
Multi-entity recommendation spaces

Deep Learning handles:

Complex feature interactions
Large-scale datasets (millions of users)
Rich side information (user demographics, item features)

The system lets you A/B test these approaches against real user behaviour, not synthetic benchmarks.

Real-World Impact

This architecture powers recommendations for a Steam dataset with:

50M+ user-game interactions
2M+ unique games
200K+ active users
15+ different algorithms

Response times stay under 100ms because similarities are pre-computed and cached in Neo4j. The graph database becomes both the feature store and the serving layer.

What’s Next?

The modular design makes extending the system straightforward:

Add new algorithms by implementing the Model interface
Incorporate new data sources through the ETL pipeline
Scale individual services based on demand
A/B test recommendation strategies in production

Most importantly, you can compare approaches fairly—same data, same evaluation metrics, same serving infrastructure. This is how you move from recommendation research to recommendation systems.

Dr. Riccardo Scott

Explorer