When neural networks learn what linear algebra cannot

Having explored matrix factorisation and graph-based content-based, collaborative, and embedding approaches, we now venture into the realm of deep learning for recommendation systems. This article examines how neural architectures, particularly Two-Tower models and Neural Collaborative Filtering (NCF), can capture complex user-item interactions whilst leveraging the rich feature ecosystem we’ve built with Neo4j and PyTorch Lightning.

The Deep Learning Paradigm Shift

Traditional matrix factorisation assumes linear relationships between latent factors. Deep learning breaks this assumption, enabling the modelling of non-linear interactions that better capture real-world user behaviour patterns. For Steam’s complex ecosystem—where game preferences depend on intricate combinations of genres, social connections, and temporal factors—neural approaches offer compelling advantages.

graph TD
    A[Traditional MF] --> B[Linear Interactions]
    C[Deep Learning] --> D[Non-linear Interactions]
    B --> E[Limited Expressiveness]
    D --> F[Rich Feature Combinations]
    F --> G[Better Steam Recommendations]

Two-Tower Architecture: Scaling Deep Recommendations

The Two-Tower architecture addresses the fundamental challenge of recommendation systems: efficiently computing similarities between users and items at scale whilst maintaining the expressiveness of deep neural networks.

Architectural Foundation

graph LR
    A[User Features] --> B[User Tower]
    C[Item Features] --> D[Item Tower]
    B --> E[User Embedding]
    D --> F[Item Embedding]
    E --> G[Dot Product Similarity]
    F --> G
    G --> H[Recommendation Score]

The elegance lies in the separation: user and item embeddings are computed independently, enabling pre-computation and efficient serving at scale.

Mathematical Framework

For a user and item , the Two-Tower model learns:

Where:

  • is the user tower mapping user features to embeddings
  • is the item tower mapping item features to embeddings
  • The dot product captures preference alignment

Implementation with PyTorch Lightning

import torch
import torch.nn as nn
import pytorch_lightning as pl
from torch.nn import functional as F
 
class TwoTowerModel(pl.LightningModule):
    def __init__(self, user_features_dim, item_features_dim, 
                 embedding_dim=128, hidden_dims=[256, 128]):
        super().__init__()
        
        # User tower
        self.user_tower = self._build_tower(
            user_features_dim, embedding_dim, hidden_dims
        )
        
        # Item tower
        self.item_tower = self._build_tower(
            item_features_dim, embedding_dim, hidden_dims
        )
        
        self.save_hyperparameters()
    
    def _build_tower(self, input_dim, output_dim, hidden_dims):
        """Build a tower with batch normalisation and dropout"""
        layers = []
        prev_dim = input_dim
        
        for hidden_dim in hidden_dims:
            layers.extend([
                nn.Linear(prev_dim, hidden_dim),
                nn.BatchNorm1d(hidden_dim),
                nn.ReLU(),
                nn.Dropout(0.2)
            ])
            prev_dim = hidden_dim
        
        # Final projection to embedding space
        layers.append(nn.Linear(prev_dim, output_dim))
        
        return nn.Sequential(*layers)
    
    def forward(self, user_features, item_features):
        user_embeddings = self.user_tower(user_features)
        item_embeddings = self.item_tower(item_features)
        
        # L2 normalise for cosine similarity
        user_embeddings = F.normalize(user_embeddings, p=2, dim=1)
        item_embeddings = F.normalize(item_embeddings, p=2, dim=1)
        
        return user_embeddings, item_embeddings
    
    def training_step(self, batch, batch_idx):
        user_features, item_features, labels = batch
        
        user_emb, item_emb = self(user_features, item_features)
        scores = torch.sum(user_emb * item_emb, dim=1)
        
        # Binary cross-entropy for implicit feedback
        loss = F.binary_cross_entropy_with_logits(scores, labels.float())
        
        self.log('train_loss', loss)
        return loss

Feature Engineering for Steam Data

The Two-Tower architecture’s strength lies in its ability to incorporate diverse features from our Neo4j ecosystem:

graph TD
    A[Neo4j Features] --> B[User Features]
    A --> C[Item Features]
    
    B --> D[Demographics]
    B --> E[Play Patterns]
    B --> F[Social Connections]
    
    C --> G[Game Metadata]
    C --> H[Genre Vectors]
    C --> I[Developer Info]
    
    D --> J[User Tower]
    E --> J
    F --> J
    
    G --> K[Item Tower]
    H --> K
    I --> K
def extract_steam_features(user_id, item_id, graph_session):
    """Extract rich features from Neo4j for Two-Tower model"""
    
    # User features from graph
    user_query = """
    MATCH (u:USER {steamid: $user_id})
    OPTIONAL MATCH (u)-[:PLAYED]->(games:APP)
    OPTIONAL MATCH (u)-[:FRIENDS]->(friends:USER)
    OPTIONAL MATCH (u)-[:MEMBER_OF]->(groups:GROUP)
    
    RETURN u.membership_duration as membership_duration,
           u.user_tot_playtime as total_playtime,
           count(DISTINCT games) as games_owned,
           count(DISTINCT friends) as friend_count,
           count(DISTINCT groups) as group_count,
           collect(DISTINCT games.genre_onehot) as genre_preferences
    """
    
    # Item features from graph
    item_query = """
    MATCH (app:APP {appid: $item_id})
    OPTIONAL MATCH (app)-[:HAS_GENRE]->(genres:GENRE)
    OPTIONAL MATCH (app)-[:DEVELOPED_BY]->(dev:DEVELOPER)
    
    RETURN app.app_tot_playtime as popularity,
           app.type_onehot as type_vector,
           collect(DISTINCT genres.name) as genres,
           dev.name as developer
    """
    
    user_data = graph_session.run(user_query, user_id=user_id).single()
    item_data = graph_session.run(item_query, item_id=item_id).single()
    
    return user_data, item_data

Neural Collaborative Filtering: Beyond Linear Interactions

Neural Collaborative Filtering (NCF) extends traditional collaborative filtering by replacing the dot product with neural networks, enabling the capture of complex user-item interaction patterns.

The NCF Framework

graph TD
    A[User ID] --> B[User Embedding]
    C[Item ID] --> D[Item Embedding]
    
    B --> E[Generalised MF]
    D --> E
    
    B --> F[Multi-Layer Perceptron]
    D --> F
    
    E --> G[Concatenation]
    F --> G
    
    G --> H[Final Prediction Layer]
    H --> I[Preference Score]

The key insight: combine the expressiveness of neural networks with the proven effectiveness of embedding-based collaborative filtering.

Mathematical Formulation

NCF models the user-item interaction as:

Where:

  • and are user and item embeddings
  • is a neural network with parameters
  • The function can capture arbitrary user-item interactions

Implementation Architecture

class NeuralCollaborativeFiltering(pl.LightningModule):
    def __init__(self, n_users, n_items, embedding_dim=64, 
                 mlp_layers=[128, 64, 32], dropout=0.2):
        super().__init__()
        
        # Embedding layers
        self.user_embedding_mf = nn.Embedding(n_users, embedding_dim)
        self.item_embedding_mf = nn.Embedding(n_items, embedding_dim)
        
        self.user_embedding_mlp = nn.Embedding(n_users, embedding_dim)
        self.item_embedding_mlp = nn.Embedding(n_items, embedding_dim)
        
        # MLP layers
        mlp_input_dim = embedding_dim * 2
        self.mlp_layers = nn.ModuleList()
        
        for i, layer_size in enumerate(mlp_layers):
            if i == 0:
                self.mlp_layers.append(nn.Linear(mlp_input_dim, layer_size))
            else:
                self.mlp_layers.append(nn.Linear(mlp_layers[i-1], layer_size))
        
        # Final prediction layer
        final_input_dim = embedding_dim + mlp_layers[-1]
        self.prediction = nn.Linear(final_input_dim, 1)
        
        self.dropout = nn.Dropout(dropout)
        self.save_hyperparameters()
    
    def forward(self, user_ids, item_ids):
        # MF component
        user_emb_mf = self.user_embedding_mf(user_ids)
        item_emb_mf = self.item_embedding_mf(item_ids)
        mf_output = user_emb_mf * item_emb_mf
        
        # MLP component
        user_emb_mlp = self.user_embedding_mlp(user_ids)
        item_emb_mlp = self.item_embedding_mlp(item_ids)
        
        mlp_input = torch.cat([user_emb_mlp, item_emb_mlp], dim=-1)
        mlp_output = mlp_input
        
        for layer in self.mlp_layers:
            mlp_output = F.relu(layer(mlp_output))
            mlp_output = self.dropout(mlp_output)
        
        # Combine MF and MLP
        final_input = torch.cat([mf_output, mlp_output], dim=-1)
        prediction = torch.sigmoid(self.prediction(final_input))
        
        return prediction.squeeze()

Advanced Feature Integration Strategies

Incorporating Graph-Derived Features

The true power emerges when combining neural architectures with graph-derived features from our Neo4j infrastructure:

class HybridNCF(pl.LightningModule):
    def __init__(self, n_users, n_items, graph_feature_dim, 
                 embedding_dim=64):
        super().__init__()
        
        # Traditional embeddings
        self.user_embedding = nn.Embedding(n_users, embedding_dim)
        self.item_embedding = nn.Embedding(n_items, embedding_dim)
        
        # Graph feature projections
        self.user_graph_projection = nn.Linear(
            graph_feature_dim, embedding_dim
        )
        self.item_graph_projection = nn.Linear(
            graph_feature_dim, embedding_dim
        )
        
        # Attention mechanism for feature fusion
        self.user_attention = nn.MultiheadAttention(
            embedding_dim, num_heads=4
        )
        self.item_attention = nn.MultiheadAttention(
            embedding_dim, num_heads=4
        )
    
    def forward(self, user_ids, item_ids, user_graph_features, 
                item_graph_features):
        # Standard embeddings
        user_emb = self.user_embedding(user_ids)
        item_emb = self.item_embedding(item_ids)
        
        # Graph-derived features
        user_graph_emb = self.user_graph_projection(user_graph_features)
        item_graph_emb = self.item_graph_projection(item_graph_features)
        
        # Attention-based fusion
        user_fused, _ = self.user_attention(
            user_emb.unsqueeze(0), 
            user_graph_emb.unsqueeze(0),
            user_graph_emb.unsqueeze(0)
        )
        
        item_fused, _ = self.item_attention(
            item_emb.unsqueeze(0),
            item_graph_emb.unsqueeze(0), 
            item_graph_emb.unsqueeze(0)
        )
        
        return user_fused.squeeze(0), item_fused.squeeze(0)

Training Strategies and Optimisation

Negative Sampling for Implicit Feedback

Steam’s data is predominantly implicit (play time, purchases), requiring sophisticated negative sampling strategies:

class ImplicitFeedbackDataset(torch.utils.data.Dataset):
    def __init__(self, interactions, n_negatives=4):
        self.interactions = interactions
        self.n_negatives = n_negatives
        self.item_pool = set(interactions['item_id'].unique())
        
        # Pre-compute user item sets for efficient negative sampling
        self.user_items = interactions.groupby('user_id')['item_id'].apply(set).to_dict()
    
    def __getitem__(self, idx):
        row = self.interactions.iloc[idx]
        user_id, pos_item_id = row['user_id'], row['item_id']
        
        # Positive sample
        samples = [(user_id, pos_item_id, 1.0)]
        
        # Negative samples
        user_items = self.user_items.get(user_id, set())
        available_items = self.item_pool - user_items
        
        neg_items = np.random.choice(
            list(available_items), 
            size=min(self.n_negatives, len(available_items)),
            replace=False
        )
        
        for neg_item in neg_items:
            samples.append((user_id, neg_item, 0.0))
        
        return samples

Learning Rate Scheduling and Regularisation

def configure_optimizers(self):
    optimizer = torch.optim.AdamW(
        self.parameters(), 
        lr=1e-3, 
        weight_decay=1e-5
    )
    
    scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(
        optimizer, 
        T_max=100,
        eta_min=1e-6
    )
    
    return {
        'optimizer': optimizer,
        'lr_scheduler': {
            'scheduler': scheduler,
            'monitor': 'val_loss'
        }
    }

Production Deployment Patterns

Model Serving Architecture

graph TD
    A[Real-time Request] --> B[Feature Store]
    B --> C[User Tower Cache]
    B --> D[Item Tower Cache]
    
    C --> E[Similarity Computation]
    D --> E
    
    E --> F[Top-K Selection]
    F --> G[Post-processing]
    G --> H[Recommendations]

Efficient Serving Implementation

class TwoTowerServing:
    def __init__(self, model_path, feature_store):
        self.model = torch.jit.load(model_path)
        self.feature_store = feature_store
        self.user_cache = {}
        self.item_embeddings = None
        
    def precompute_item_embeddings(self, item_features):
        """Pre-compute all item embeddings for efficiency"""
        with torch.no_grad():
            _, item_embeddings = self.model(
                torch.zeros(1, item_features.shape[1]),  # Dummy user
                item_features
            )
            self.item_embeddings = item_embeddings
    
    def get_recommendations(self, user_id, k=10):
        """Get top-k recommendations for user"""
        # Get or compute user embedding
        if user_id not in self.user_cache:
            user_features = self.feature_store.get_user_features(user_id)
            with torch.no_grad():
                user_emb, _ = self.model(
                    user_features.unsqueeze(0),
                    torch.zeros(1, self.item_embeddings.shape[1])  # Dummy item
                )
                self.user_cache[user_id] = user_emb.squeeze(0)
        
        user_embedding = self.user_cache[user_id]
        
        # Compute similarities
        similarities = torch.mm(
            user_embedding.unsqueeze(0), 
            self.item_embeddings.t()
        ).squeeze(0)
        
        # Get top-k
        top_k_scores, top_k_indices = torch.topk(similarities, k)
        
        return list(zip(top_k_indices.tolist(), top_k_scores.tolist()))

Evaluation and Monitoring

See here for a deep-dive into recommender system metrics.

class RecommenderEvaluator:
    def __init__(self, model, test_data):
        self.model = model
        self.test_data = test_data
    
    def evaluate_ranking_metrics(self, k_values=[5, 10, 20]):
        """Evaluate ranking metrics at different K values"""
        metrics = {}
        
        for k in k_values:
            precision_scores = []
            recall_scores = []
            ndcg_scores = []
            
            for user_id, true_items in self.test_data.items():
                pred_items = self.model.recommend(user_id, k)
                pred_set = set(pred_items)
                true_set = set(true_items)
                
                # Precision@K
                precision = len(pred_set & true_set) / len(pred_set)
                precision_scores.append(precision)
                
                # Recall@K
                recall = len(pred_set & true_set) / len(true_set)
                recall_scores.append(recall)
                
                # NDCG@K
                ndcg = self.calculate_ndcg(true_items, pred_items, k)
                ndcg_scores.append(ndcg)
            
            metrics[f'precision@{k}'] = np.mean(precision_scores)
            metrics[f'recall@{k}'] = np.mean(recall_scores)
            metrics[f'ndcg@{k}'] = np.mean(ndcg_scores)
        
        return metrics

Integration with Existing Pipeline

The deep learning models integrate seamlessly with our existing Neo4j and matrix factorisation infrastructure:

graph TD
    A[Neo4j Features] --> B[Feature Engineering]
    C[Matrix Factorisation] --> B
    B --> D[Deep Learning Models]
    D --> E[Ensemble Predictions]
    
    F[Cold Start Handler] --> D
    G[Real-time Serving] --> E
    H[A/B Testing] --> G

Looking Ahead: Future Directions

Deep learning for recommendations continues evolving rapidly. Promising directions include:

  • Graph Neural Networks for better structural understanding
  • Transformer architectures for sequential recommendation
  • Multi-task learning for unified user understanding
  • Causal inference for unbiased recommendation

In our next article, we’ll explore how to systematically evaluate and compare these diverse approaches, establishing robust metrics that align with business objectives whilst maintaining scientific rigour.

Conclusion

Deep learning transforms recommendation systems from pattern matching to genuine understanding of user preferences. The Two-Tower architecture provides scalable serving whilst NCF enables complex interaction modelling. When combined with our rich Neo4j feature ecosystem, these approaches deliver recommendation systems that are both performant and interpretable.

The Steam recommender system demonstrates how classical techniques, graph databases, and modern deep learning can work in harmony, each contributing their strengths to create a robust, production-ready recommendation engine.