Using TensorFlow to Predict Clicks

Predicting a user’s likelihood to click on a given product using the AliCCP dataset.

5 min readMar 12, 2023

I’ll use the TensorFlow Recommenders library and compare it with a TensorFlow native implementation.

In a previous post I explored this dataset and used TensorFlow Recommenders to build user/product embeddings. In this post I’ll extend that to build a Deep & Cross Network to predict how likely a user is to click on an item.

This dataset, once prepared, has 14 fields including the item and user unique identifiers. We’ll use several of the other user associated features to build embeddings that will underpin a set of fully connected layers.

I won’t cover the pre-processing, that’s part of the pipeline detailed in the previous post, instead I’ll assume you’ve run the pre-processing step and are referencing the path for that data.

Feature Engineering

See this notebook.

We’ll read in the dataset and filter to the most common users and products;

#filter out long tail products and reduce size of dataset
df_train['counter'] = 1
df_agg = df_train[['item_id','counter']].groupby('item_id').count()
df_agg = df_agg.sort_values("counter",ascending=False)[:10000]
df_train = pd.merge(df_agg, df_train, on=['item_id', 'item_id'], how='left')
print(len(df_train))
df_train['counter'] = 1
cust_agg = df_train[['user_id','counter']].groupby('user_id').count()
cust_agg = cust_agg.sort_values("counter",ascending=False)[:8000]
df_train = pd.merge(cust_agg, df_train, on=['user_id', 'user_id'], how='left')
print(len(df_train))

Starting with just under 42 million records we end up with 1.47 million. The constraints can be relaxed in order to include more data but this set trains quickly and exercises the architecture.

Next we select our features, converting them to integer from object and filling any nulls;

train = df_train[['item_id','user_id','user_categories', 'user_item_categories',
                                       'user_intentions','user_item_intentions','user_shops','user_item_brands','click']]
features = ['item_id','user_id']
for i in features:
    train[i] = pd.to_numeric(train[i], errors='coerce',downcast="integer")
train= train.fillna(0)

print(len(train))

Selecting our features and making the dataset into a dictionary so we can select and map the features to the embeddings in our model;

items = train['item_id'].unique()
items = tf.convert_to_tensor(items, dtype=tf.int64)

train = tf.convert_to_tensor(train, dtype=tf.int64)


train = tf.data.Dataset.from_tensor_slices(train)

ratings = train.map(lambda x: {
    "item_id": x[0], "user_id": x[1],"user_categories":x[2], "user_item_categories":x[3], 
    "user_intentions":x[4],"user_item_intentions":x[5], "user_shops":x[6], "user_item_brands":x[7],"click":x[8]})

tf.random.set_seed(42)
shuffled = ratings.shuffle(100000, seed=42, reshuffle_each_iteration=False)

split_record = round(len(train)*.8)
train = shuffled.take(split_record)
test = shuffled.skip(split_record).take(len(shuffled)-split_record)

We also split the data into a train and validation set at 80/20.

Finally, for preparing, we build a vocabulary for each of the elements to be used for embeddings;

feature_names = ["item_id", "user_id","user_categories", "user_item_categories", 
    "user_intentions","user_item_intentions", "user_shops", "user_item_brands"]

vocabularies = {}

for feature_name in feature_names:
    print(feature_name)
    vocab = shuffled.batch(1_000_000).map(lambda x: x[feature_name])
    vocabularies[feature_name] = np.unique(np.concatenate(list(vocab)))

These vocabularies will be used to populate the embedding layers in our model.

TFRS DCN Model

First we’ll build a DCN using the TFRS library;

class DCN(tfrs.Model):

  def __init__(self, use_cross_layer, deep_layer_sizes, projection_dim=None):
    super().__init__()

    self.embedding_dimension = 32

    self._all_features = ["item_id", "user_id","user_categories", "user_item_categories", 
    "user_intentions","user_item_intentions", "user_shops", "user_item_brands"]

    self._embeddings = {}


    # Compute embeddings for int features.
    for feature_name in self._all_features:
      vocabulary = vocabularies[feature_name]
      self._embeddings[feature_name] = tf.keras.Sequential(
          [tf.keras.layers.IntegerLookup(
              vocabulary=vocabulary, mask_value=None),
           tf.keras.layers.Embedding(len(vocabulary) + 1,
                                     self.embedding_dimension)
    ])

    if use_cross_layer:
      self._cross_layer = tfrs.layers.dcn.Cross(
          projection_dim=projection_dim,
          kernel_initializer="glorot_uniform")
    else:
      self._cross_layer = None

    self._deep_layers = [tf.keras.layers.Dense(layer_size, activation="relu")
      for layer_size in deep_layer_sizes]

    self._logit_layer = tf.keras.layers.Dense(1, activation='sigmoid')

    #self.task = tfrs.tasks.Ranking(loss=tf.keras.losses.MeanSquaredError(), metrics=[tf.keras.metrics.RootMeanSquaredError("RMSE")])
    self.task = tfrs.tasks.Ranking(loss=tf.keras.losses.MeanSquaredError(), metrics=[tf.keras.metrics.BinaryCrossentropy("BCE")])

  def call(self, features):
    # Concatenate embeddings
    embeddings = []
    for feature_name in self._all_features:
      embedding_fn = self._embeddings[feature_name]
      embeddings.append(embedding_fn(features[feature_name]))

    x = tf.concat(embeddings, axis=1)

    # Build Cross Network
    if self._cross_layer is not None:
      x = self._cross_layer(x)

    # Build Deep Network
    for deep_layer in self._deep_layers:
      x = deep_layer(x)

    return self._logit_layer(x)

  def compute_loss(self, features, training=False):
    labels = features.pop("click")
    scores = self(features)
    return self.task(
        labels=labels,
        predictions=scores,
    )

Here we build embeddings from each of the vocabularies we prepared earlier, add a cross layer, deep layers, and a logit layer that predicts our click likelihood.

This model includes custom call and loss computation, we see invoking call involves iterating over the embeddings and calling the relevant element in the dictionary for features we prepared.

model = DCN(use_cross_layer=True,
                deep_layer_sizes=[192, 192],
                projection_dim=None)
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate))
#seems to be a bug
model.fit(cached_train, epochs=8, verbose=True)

Here when we compile a model we only specify the optimizer, the loss and accuracy evaluation are defined in the model class itself with compute_loss calculating and returning the loss result.

This model reaches 47% or so BCE. The quirk for this architecture is after the first epoch there is no progress and each epoch completes almost instantly. This same behavior can be seen on the IMDB reference example and in our pure Keras implementation.

Bonus: TFRS No Base Class

Here we’ll implement the model but using the tf.keras.Model base and with only two embeddings, resulting in an implementation much more similar to the reference IMDB architecture for retrieval;

class KerasBaseDCN(tf.keras.Model):

    def __init__(self, user_model, item_model, deep_layer_sizes, projection_dim=None):
        super().__init__()
        self.item_model: tf.keras.Model = item_model
        self.user_model: tf.keras.Model = user_model


        self._deep_layers = [tf.keras.layers.Dense(layer_size, activation="relu")
          for layer_size in deep_layer_sizes]

        self._logit_layer = tf.keras.layers.Dense(1, activation='sigmoid')
        
        self.task = tfrs.tasks.Ranking(loss=tf.keras.losses.MeanSquaredError(), metrics=[tf.keras.metrics.BinaryCrossentropy("BCE")])

    def train_step(self, features: Dict[Text, tf.Tensor]) -> tf.Tensor:

        # Set up a gradient tape to record gradients.
        with tf.GradientTape() as tape:

            # Loss computation.
            user_embeddings = self.user_model(features["user_id"])
            positive_item_embeddings = self.item_model(features["item_id"])
            x = tf.concat([user_embeddings,positive_item_embeddings], axis=1)
            # Build Deep Network
            for deep_layer in self._deep_layers:
                x = deep_layer(x)
            x = self._logit_layer(x)
            loss = self.task(predictions = x,labels = features['click'])

            # Handle regularization losses as well.
            regularization_loss = sum(self.losses)

            total_loss = loss + regularization_loss

        gradients = tape.gradient(total_loss, self.trainable_variables)
        self.optimizer.apply_gradients(zip(gradients, self.trainable_variables))

        metrics = {metric.name: metric.result() for metric in self.metrics}
        metrics["loss"] = loss
        metrics["regularization_loss"] = regularization_loss
        metrics["total_loss"] = total_loss

        return metrics

    def test_step(self, features: Dict[Text, tf.Tensor]) -> tf.Tensor:

        # Loss computation.
        user_embeddings = self.user_model(features["user_id"])
        positive_item_embeddings = self.item_model(features["item_id"])
        x = tf.concat([user_embeddings,positive_item_embeddings], axis=1)
        # Build Deep Network
        for deep_layer in self._deep_layers:
            x = deep_layer(x)
        x = self._logit_layer(x)
        loss = self.task(predictions = x,labels = features['click'])

        # Handle regularization losses as well.
        regularization_loss = sum(self.losses)

        total_loss = loss + regularization_loss

        metrics = {metric.name: metric.result() for metric in self.metrics}
        metrics["loss"] = loss
        metrics["regularization_loss"] = regularization_loss
        metrics["total_loss"] = total_loss

        return metrics
    
model = KerasBaseDCN(user_model, item_model,deep_layer_sizes=[192, 192])
model.compile(optimizer=tf.keras.optimizers.Adagrad(learning_rate=0.1))

cached_train = train.shuffle(100_000).batch(8192).cache()
cached_test = test.batch(4096).cache()

model.fit(cached_train, epochs=15)

Here we are still using the TFRS task for Ranking, otherwise it is all ‘regular’ TensorFlow. We are obviously defining the train_step and test_step rather than call and compute_loss.

This implementation also only uses two embeddings, user_id and product_id. As a result BCE is 0.136, considerably lower than the other model which includes more attributes.

Keras with no TFRS

See this notebook.

Data processing is very similar but we are not turning the dataset into a TensorFlow Dataset, instead we will fit against the Dataframe.

This implementation is also demonstrating three embeddings but the approach could be scaled to use more embeddings.

hidden_units = (192,20)
embedding_size = 32

# Each instance will consist of two inputs: a single user id, and a single movie id
user_id_input = keras.Input(shape=(1,), name='user_id')
item_id_input = keras.Input(shape=(1,), name='item_id')
user_categories_input = keras.Input(shape=(1,), name='user_categories')
user_embedding = keras.layers.Embedding(len(train['user_id'])+1, embedding_size, 
                                       input_length=1, name='user_embedding')(user_id_input)
item_embedding = keras.layers.Embedding(len(train['item_id'])+1, embedding_size, 
                                        input_length=1, name='item_embedding')(item_id_input)
user_categories = keras.layers.Embedding(len(train['user_categories'])+1, embedding_size, 
                                        input_length=1, name='user_categories_embedding')(user_categories_input)
# Concatenate the embeddings (and remove the useless extra dimension)
concatenated = keras.layers.Concatenate()([user_embedding, item_embedding,user_categories])
out = keras.layers.Flatten()(concatenated)

# Add one or more hidden layers
for n_hidden in hidden_units:
    out = keras.layers.Dense(n_hidden, activation='relu')(out)

# A single output: our predicted rating
out = keras.layers.Dense(1, activation='sigmoid', name='prediction')(out)

model = keras.Model(
    inputs = [user_id_input, item_id_input, user_categories_input],
    outputs = out,
)
model.summary(line_length=88)

Here we’ve built the model to take three inputs rather than parse a dictionary of inputs and iterate. Also we provide outputs and build the model with a loss function external to the model definition;

model.compile(
    tf.keras.optimizers.Adam(0.01),
    loss='MAE',
    metrics=[tf.keras.metrics.BinaryCrossentropy("BCE")],
)

model.fit(
    [train['user_id'], train['item_id'], train['user_categories']],
    train['click'],
    batch_size=5000,
    epochs=20)

The model fits and can be scaled with additional