Building deep retrieval models

XianxinMao 2021-07-30 23:06:07

In the featurization tutorial we incorporated multiple features into our models, but the models consist of only an embedding layer. We can add more dense layers to our models to increase their expressive power. In general, deeper models are capable of learning more complex patterns than shallower models. For example, our user model incorporates user ids and timestamps to model user preferences at a point in time. A shallow model (say, a single embedding layer) may only be able to learn the simplest relationships between those features and movies: a given movie is most popular around the time of its release, and a given user generally prefers horror movies to comedies. To capture more complex relationships, such as user preferences evolving over time, we may need a deeper model with multiple stacked dense layers.

Of course, complex models also have their disadvantages. The first is computational cost, as larger models require both more memory and more computation to fit and serve. The second is the requirement for more data: in general, more training data is needed to take advantage of deeper models. With more parameters, deep models might overfit or even simply memorize the training examples instead of learning a function that can generalize. Finally, training deeper models may be harder, and more care needs to be taken in choosing settings like regularization and learning rate. Finding a good architecture for a real-world recommender system is a complex art, requiring good intuition and careful hyperparameter tuning. For example, factors such as the depth and width of the model, activation function, learning rate, and optimizer can radically change the performance of the model. Modelling choices are further complicated by the fact that good offline evaluation metrics may not correspond to good online performance, and that the choice of what to optimize for is often more critical than the choice of model itself. Nevertheless, effort put into building and fine-tuning larger models often pays off. In this tutorial, we will illustrate how to build deep retrieval models using TensorFlow Recommenders. We'll do this by building progressively more complex models to see how this affects model performance.

import os
import tempfile
​
%matplotlib inline
import matplotlib.pyplot as plt
​
import numpy as np
import tensorflow as tf
import tensorflow_datasets as tfds
​
import tensorflow_recommenders as tfrs
​
plt.style.use('seaborn-whitegrid')

In this tutorial we will use the models from the featurization tutorial to generate embeddings. Hence we will only be using the user id, timestamp, and movie title features.

ratings = tfds.load("movielens/100k-ratings", split="train")
movies = tfds.load("movielens/100k-movies", split="train")
​
ratings = ratings.map(lambda x: {
    "movie_title": x["movie_title"],
    "user_id": x["user_id"],
    "timestamp": x["timestamp"],
})
movies = movies.map(lambda x: x["movie_title"])

We also do some housekeeping to prepare feature vocabularies.

timestamps = np.concatenate(list(ratings.map(lambda x: x["timestamp"]).batch(100)))
​
max_timestamp = timestamps.max()
min_timestamp = timestamps.min()
​
timestamp_buckets = np.linspace(
    min_timestamp, max_timestamp, num=1000,
)
​
unique_movie_titles = np.unique(np.concatenate(list(movies.batch(1000))))
unique_user_ids = np.unique(np.concatenate(list(ratings.batch(1_000).map(
    lambda x: x["user_id"]))))

Model definition

Query model

We start with the user model defined in the featurization tutorial as the first layer of our model, tasked with converting raw input examples into feature embeddings.

class UserModel(tf.keras.Model):
​
  def __init__(self):
    super().__init__()
​
    self.user_embedding = tf.keras.Sequential([
        tf.keras.layers.experimental.preprocessing.StringLookup(
            vocabulary=unique_user_ids, mask_token=None),
        tf.keras.layers.Embedding(len(unique_user_ids) + 1, 32),
    ])
    self.timestamp_embedding = tf.keras.Sequential([
        tf.keras.layers.experimental.preprocessing.Discretization(timestamp_buckets.tolist()),
        tf.keras.layers.Embedding(len(timestamp_buckets) + 1, 32),
    ])
    self.normalized_timestamp = tf.keras.layers.experimental.preprocessing.Normalization()
​
    self.normalized_timestamp.adapt(timestamps)
​
  def call(self, inputs):
    # Take the input dictionary, pass it through each input layer,
    # and concatenate the result.
    return tf.concat([
        self.user_embedding(inputs["user_id"]),
        self.timestamp_embedding(inputs["timestamp"]),
        self.normalized_timestamp(inputs["timestamp"]),
    ], axis=1)

Defining deeper models will require us to stack mode layers on top of this first input. A progressively narrower stack of layers, separated by an activation function, is a common pattern:

                            +----------------------+
                            |      128 x 64        |
                            +----------------------+
                                       | relu
                          +--------------------------+
                          |        256 x 128         |
                          +--------------------------+
                                       | relu
                        +------------------------------+
                        |          ... x 256           |
                        +------------------------------+

Since the expressive power of deep linear models is no greater than that of shallow linear models, we use ReLU activations for all but the last hidden layer. The final hidden layer does not use any activation function: using an activation function would limit the output space of the final embeddings and might negatively impact the performance of the model. For instance, if ReLUs are used in the projection layer, all components in the output embedding would be non-negative.

We're going to try something similar here. To make experimentation with different depths easy, let's define a model whose depth (and width) is defined by a set of constructor parameters.

class QueryModel(tf.keras.Model):
  """Model for encoding user queries."""
​
  def __init__(self, layer_sizes):
    """Model for encoding user queries.
​
    Args:
      layer_sizes:
        A list of integers where the i-th entry represents the number of units
        the i-th layer contains.
    """
    super().__init__()
​
    # We first use the user model for generating embeddings.
    self.embedding_model = UserModel()
​
    # Then construct the layers.
    self.dense_layers = tf.keras.Sequential()
​
    # Use the ReLU activation for all but the last layer.
    for layer_size in layer_sizes[:-1]:
      self.dense_layers.add(tf.keras.layers.Dense(layer_size, activation="relu"))
​
    # No activation for the last layer.
    for layer_size in layer_sizes[-1:]:
      self.dense_layers.add(tf.keras.layers.Dense(layer_size))
​
  def call(self, inputs):
    feature_embedding = self.embedding_model(inputs)
    return self.dense_layers(feature_embedding)

The layer_sizes parameter gives us the depth and width of the model. We can vary it to experiment with shallower or deeper models.

Candidate model

We can adopt the same approach for the movie model. Again, we start with the MovieModel from the featurization tutorial:

class MovieModel(tf.keras.Model):
​
  def __init__(self):
    super().__init__()
​
    max_tokens = 10_000
​
    self.title_embedding = tf.keras.Sequential([
      tf.keras.layers.experimental.preprocessing.StringLookup(
          vocabulary=unique_movie_titles,mask_token=None),
      tf.keras.layers.Embedding(len(unique_movie_titles) + 1, 32)
    ])
​
    self.title_vectorizer = tf.keras.layers.experimental.preprocessing.TextVectorization(
        max_tokens=max_tokens)
​
    self.title_text_embedding = tf.keras.Sequential([
      self.title_vectorizer,
      tf.keras.layers.Embedding(max_tokens, 32, mask_zero=True),
      tf.keras.layers.GlobalAveragePooling1D(),
    ])
​
    self.title_vectorizer.adapt(movies)
​
  def call(self, titles):
    return tf.concat([
        self.title_embedding(titles),
        self.title_text_embedding(titles),
    ], axis=1)

And expand it with hidden layers:

class CandidateModel(tf.keras.Model):
  """Model for encoding movies."""
​
  def __init__(self, layer_sizes):
    """Model for encoding movies.
​
    Args:
      layer_sizes:
        A list of integers where the i-th entry represents the number of units
        the i-th layer contains.
    """
    super().__init__()
​
    self.embedding_model = MovieModel()
​
    # Then construct the layers.
    self.dense_layers = tf.keras.Sequential()
​
    # Use the ReLU activation for all but the last layer.
    for layer_size in layer_sizes[:-1]:
      self.dense_layers.add(tf.keras.layers.Dense(layer_size, activation="relu"))
​
    # No activation for the last layer.
    for layer_size in layer_sizes[-1:]:
      self.dense_layers.add(tf.keras.layers.Dense(layer_size))
​
  def call(self, inputs):
    feature_embedding = self.embedding_model(inputs)
    return self.dense_layers(feature_embedding)

Combined model

With both QueryModel and CandidateModel defined, we can put together a combined model and implement our loss and metrics logic. To make things simple, we'll enforce that the model structure is the same across the query and candidate models.

class MovielensModel(tfrs.models.Model):
​
  def __init__(self, layer_sizes):
    super().__init__()
    self.query_model = QueryModel(layer_sizes)
    self.candidate_model = CandidateModel(layer_sizes)
    self.task = tfrs.tasks.Retrieval(
        metrics=tfrs.metrics.FactorizedTopK(
            candidates=movies.batch(128).map(self.candidate_model),
        ),
    )
​
  def compute_loss(self, features, training=False):
    # We only pass the user id and timestamp features into the query model. This
    # is to ensure that the training inputs would have the same keys as the
    # query inputs. Otherwise the discrepancy in input structure would cause an
    # error when loading the query model after saving it.
    query_embeddings = self.query_model({
        "user_id": features["user_id"],
        "timestamp": features["timestamp"],
    })
    movie_embeddings = self.candidate_model(features["movie_title"])
​
    return self.task(
        query_embeddings, movie_embeddings, compute_metrics=not training)

Training the model

Prepare the data

We first split the data into a training set and a testing set.

tf.random.set_seed(42)
shuffled = ratings.shuffle(100_000, seed=42, reshuffle_each_iteration=False)
​
train = shuffled.take(80_000)
test = shuffled.skip(80_000).take(20_000)
​
cached_train = train.shuffle(100_000).batch(2048)
cached_test = test.batch(4096).cache()

Shallow model

We're ready to try out our first, shallow, model!

num_epochs = 300
​
model = MovielensModel([32])
model.compile(optimizer=tf.keras.optimizers.Adagrad(0.1))
​
one_layer_history = model.fit(
    cached_train,
    validation_data=cached_test,
    validation_freq=5,
    epochs=num_epochs,
    verbose=0)
​
accuracy = one_layer_history.history["val_factorized_top_k/top_100_categorical_accuracy"][-1]
print(f"Top-100 accuracy: {accuracy:.2f}.")

This gives us a top-100 accuracy of around 0.27. We can use this as a reference point for evaluating deeper models.

Deeper model

What about a deeper model with two layers?

model = MovielensModel([64, 32])
model.compile(optimizer=tf.keras.optimizers.Adagrad(0.1))
​
two_layer_history = model.fit(
    cached_train,
    validation_data=cached_test,
    validation_freq=5,
    epochs=num_epochs,
    verbose=0)
​
accuracy = two_layer_history.history["val_factorized_top_k/top_100_categorical_accuracy"][-1]
print(f"Top-100 accuracy: {accuracy:.2f}.")

The accuracy here is 0.29, quite a bit better than the shallow model.

We can plot the validation accuracy curves to illustrate this:

Even early on in the training, the larger model has a clear and stable lead over the shallow model, suggesting that adding depth helps the model capture more nuanced relationships in the data. However, even deeper models are not necessarily better. The following model extends the depth to three layers:

model = MovielensModel([128, 64, 32])
model.compile(optimizer=tf.keras.optimizers.Adagrad(0.1))
​
three_layer_history = model.fit(
    cached_train,
    validation_data=cached_test,
    validation_freq=5,
    epochs=num_epochs,
    verbose=0)
​
accuracy = three_layer_history.history["val_factorized_top_k/top_100_categorical_accuracy"][-1]
print(f"Top-100 accuracy: {accuracy:.2f}.")

代码链接: https://codechina.csdn.net/csdn_codechina/enterprise_technology/-/blob/master/NLP_recommend/Building%20deep%20retrieval%20models.ipynb

...全文
1059 回复 打赏 收藏 转发到动态 举报
写回复
用AI写文章
回复
切换为时间正序
请发表友善的回复…
发表回复
Key Features Implement various deep-learning algorithms in Keras and see how deep-learning can be used in games See how various deep-learning models and practical use-cases can be implemented using Keras A practical, hands-on guide with real-world examples to give you a strong foundation in Keras Book Description This book starts by introducing you to supervised learning algorithms such as simple linear regression, the classical multilayer perceptron and more sophisticated deep convolutional networks. You will also explore image processing with recognition of hand written digit images, classification of images into different categories, and advanced objects recognition with related image annotations. An example of identification of salient points for face detection is also provided. Next you will be introduced to Recurrent Networks, which are optimized for processing sequence data such as text, audio or time series. Following that, you will learn about unsupervised learning algorithms such as Autoencoders and the very popular Generative Adversarial Networks (GAN). You will also explore non-traditional uses of neural networks as Style Transfer. Finally, you will look at Reinforcement Learning and its application to AI game playing, another popular direction of research and application of neural networks. What you will learn Optimize step-by-step functions on a large neural network using the Backpropagation Algorithm Fine-tune a neural network to improve the quality of results Use deep learning for image and audio processing Use Recursive Neural Tensor Networks (RNTNs) to outperform standard word embedding in special cases Identify problems for which Recurrent Neural Network (RNN) solutions are suitable Explore the process required to implement Autoencoders Evolve a deep neural network using reinforcement learning About the Author Antonio Gulli is a software executive and business leader with a passion for establishing and managing global technological talent, innovation, and execution. He is an expert in search engines, online services, machine learning, information retrieval, analytics, and cloud computing. So far, he has been lucky enough to gain professional experience in four different countries in Europe and managed people in six different countries in Europe and America. Antonio served as CEO, GM, CTO, VP, director, and site lead in multiple fields spanning from publishing (Elsevier) to consumer internet (Ask.com and Tiscali) and high-tech R&D (Microsoft and Google). Sujit Pal is a technology research director at Elsevier Labs, working on building intelligent systems around research content and metadata. His primary interests are information retrieval, ontologies, natural language processing, machine learning, and distributed processing. He is currently working on image classification and similarity using deep learning models. Prior to this, he worked in the consumer healthcare industry, where he helped build ontology-backed semantic search, contextual advertising, and EMR data processing platforms. He writes about technology on his blog at Salmon Run. Table of Contents Chapter 1. Neural Networks Foundations Chapter 2. Keras Installation And Api Chapter 3. Deep Learning With Convnets Chapter 4. Generative Adversarial Networks And Wavenet Chapter 5. Word Embeddings Chapter 6. Recurrent Neural Network — Rnn Chapter 7. Additional Deep Learning Models Chapter 8. Ai Game Playing Chapter 9. Conclusion
图书可以从下面的链接下载 http://download.csdn.net/detail/u013003382/9832573 Deep Learning with Keras by Antonio Gulli English | 26 Apr. 2017 | ASIN: B06Y2YMRDW | 318 Pages | AZW3 | 10.56 MB Key Features Implement various deep-learning algorithms in Keras and see how deep-learning can be used in games See how various deep-learning models and practical use-cases can be implemented using Keras A practical, hands-on guide with real-world examples to give you a strong foundation in Keras Book Description This book starts by introducing you to supervised learning algorithms such as simple linear regression, the classical multilayer perceptron and more sophisticated deep convolutional networks. You will also explore image processing with recognition of hand written digit images, classification of images into different categories, and advanced objects recognition with related image annotations. An example of identification of salient points for face detection is also provided. Next you will be introduced to Recurrent Networks, which are optimized for processing sequence data such as text, audio or time series. Following that, you will learn about unsupervised learning algorithms such as Autoencoders and the very popular Generative Adversarial Networks (GAN). You will also explore non-traditional uses of neural networks as Style Transfer. Finally, you will look at Reinforcement Learning and its application to AI game playing, another popular direction of research and application of neural networks. What you will learn Optimize step-by-step functions on a large neural network using the Backpropagation Algorithm Fine-tune a neural network to improve the quality of results Use deep learning for image and audio processing Use Recursive Neural Tensor Networks (RNTNs) to outperform standard word embedding in special cases Identify problems for which Recurrent Neural Network (RNN) solutions are suitable Explore the process required to implement Autoencoders Evolve a deep neural network using reinforcement learning About the Author Antonio Gulli is a software executive and business leader with a passion for establishing and managing global technological talent, innovation, and execution. He is an expert in search engines, online services, machine learning, information retrieval, analytics, and cloud computing. So far, he has been lucky enough to gain professional experience in four different countries in Europe and managed people in six different countries in Europe and America. Antonio served as CEO, GM, CTO, VP, director, and site lead in multiple fields spanning from publishing (Elsevier) to consumer internet (Ask.com and Tiscali) and high-tech R&D (Microsoft and Google). Sujit Pal is a technology research director at Elsevier Labs, working on building intelligent systems around research content and metadata. His primary interests are information retrieval, ontologies, natural language processing, machine learning, and distributed processing. He is currently working on image classification and similarity using deep learning models. Prior to this, he worked in the consumer healthcare industry, where he helped build ontology-backed semantic search, contextual advertising, and EMR data processing platforms. He writes about technology on his blog at Salmon Run. Table of Contents Neural Networks Foundations Keras Installation and API Deep Learning with ConvNets Generative Adversarial Networks and WaveNet Word Embeddings Recurrent Neural Network — RNN Additional Deep Learning Models AI Game Playing Conclusion
Deep Learning with Keras by Antonio Gulli English | 26 Apr. 2017 | ASIN: B06Y2YMRDW | 318 Pages | AZW3 | 10.56 MB Key Features Implement various deep-learning algorithms in Keras and see how deep-learning can be used in games See how various deep-learning models and practical use-cases can be implemented using Keras A practical, hands-on guide with real-world examples to give you a strong foundation in Keras Book Description This book starts by introducing you to supervised learning algorithms such as simple linear regression, the classical multilayer perceptron and more sophisticated deep convolutional networks. You will also explore image processing with recognition of hand written digit images, classification of images into different categories, and advanced objects recognition with related image annotations. An example of identification of salient points for face detection is also provided. Next you will be introduced to Recurrent Networks, which are optimized for processing sequence data such as text, audio or time series. Following that, you will learn about unsupervised learning algorithms such as Autoencoders and the very popular Generative Adversarial Networks (GAN). You will also explore non-traditional uses of neural networks as Style Transfer. Finally, you will look at Reinforcement Learning and its application to AI game playing, another popular direction of research and application of neural networks. What you will learn Optimize step-by-step functions on a large neural network using the Backpropagation Algorithm Fine-tune a neural network to improve the quality of results Use deep learning for image and audio processing Use Recursive Neural Tensor Networks (RNTNs) to outperform standard word embedding in special cases Identify problems for which Recurrent Neural Network (RNN) solutions are suitable Explore the process required to implement Autoencoders Evolve a deep neural network using reinforcement learning About the Author Antonio Gulli is a software executive and business leader with a passion for establishing and managing global technological talent, innovation, and execution. He is an expert in search engines, online services, machine learning, information retrieval, analytics, and cloud computing. So far, he has been lucky enough to gain professional experience in four different countries in Europe and managed people in six different countries in Europe and America. Antonio served as CEO, GM, CTO, VP, director, and site lead in multiple fields spanning from publishing (Elsevier) to consumer internet (Ask.com and Tiscali) and high-tech R&D (Microsoft and Google). Sujit Pal is a technology research director at Elsevier Labs, working on building intelligent systems around research content and metadata. His primary interests are information retrieval, ontologies, natural language processing, machine learning, and distributed processing. He is currently working on image classification and similarity using deep learning models. Prior to this, he worked in the consumer healthcare industry, where he helped build ontology-backed semantic search, contextual advertising, and EMR data processing platforms. He writes about technology on his blog at Salmon Run. Table of Contents Neural Networks Foundations Keras Installation and API Deep Learning with ConvNets Generative Adversarial Networks and WaveNet Word Embeddings Recurrent Neural Network — RNN Additional Deep Learning Models AI Game Playing Conclusion
This book starts by introducing you to supervised learning algorithms such as simple linear regression, the classical multilayer perceptron and more sophisticated deep convolutional networks. You will also explore image processing with recognition of hand written digit images, classification of images into different categories, and advanced objects recognition with related image annotations. An example of identification of salient points for face detection is also provided. Next you will be introduced to Recurrent Networks, which are optimized for processing sequence data such as text, audio or time series. Following that, you will learn about unsupervised learning algorithms such as Autoencoders and the very popular Generative Adversarial Networks (GAN). You will also explore non-traditional uses of neural networks as Style Transfer. Finally, you will look at Reinforcement Learning and its application to AI game playing, another popular direction of research and application of neural networks. What you will learn Optimize step-by-step functions on a large neural network using the Backpropagation Algorithm Fine-tune a neural network to improve the quality of results Use deep learning for image and audio processing Use Recursive Neural Tensor Networks (RNTNs) to outperform standard word embedding in special cases Identify problems for which Recurrent Neural Network (RNN) solutions are suitable Explore the process required to implement Autoencoders Evolve a deep neural network using reinforcement learning About the Author Antonio Gulli is a software executive and business leader with a passion for establishing and managing global technological talent, innovation, and execution. He is an expert in search engines, online services, machine learning, information retrieval, analytics, and cloud computing. So far, he has been lucky enough to gain professional experience in four different countries in Europe and managed people in six different countries in Europe and Ame
Mastering Java for Data Science by Alexey Grigorev English | 4 May 2017 | ASIN: B01JLBMHMM | 364 Pages | AZW3 | 2.1 MB Key Features An overview of modern Data Science and Machine Learning libraries available in Java Coverage of a broad set of topics, going from the basics of Machine Learning to Deep Learning and Big Data frameworks. Easy-to-follow illustrations and the running example of building a search engine. Book Description Java is the most popular programming language, according to the TIOBE index, and it is a very typical choice for running production systems in many companies, both in the startup world and among large enterprises. Not surprisingly, it is also a common choice for creating Data Science applications: it is fast, has a great set of data processing tools, both built-in and external. What is more, choosing Java for Data Science allows you to easily integrate the solutions with the existent software, and bring Data Science into production with less effort. This book will teach you how to create Data Science applications with Java. First, we will revise the most important things when starting a Data Science application, and then brush up the basics of Java and Machine Learning before diving into more advanced topics.We start with going over the existing libraries for data processing and libraries with machine learning algorithms. After that, we cover topics such as classification and regression, dimensionality reduction and clustering, information retrieval and natural language processing, deep learning and big data. Finally, we finish the book by talking about the ways to deploy the model and evaluate it in production settings. What you will learn Get a solid understanding of the data processing toolbox available in Java Explore the Data Science ecosystem available in Java Find out how to approach different Machine Learning problems with Java Process unstructured information such as natural language texts or images Create your own search engine Get state-of-the-art performance with XGBoost Learn to build deep neural networks with DeepLearning4j Build applications that scale and process large amounts of data Deploy the Data Science models to production and evaluate their performance About the Author Alexey Grigorev is a skilled data scientist, Machine Learning engineer, and software developer with more than 7 years of professional experience. He started his career as a Java developer working at a number of large and small companies, but after a while, he switched to Data Science. Right now Alexey works as a data scientist at Searchmetrics, wherein his day-to-day job he actively uses Java and Python for data cleaning, data analysis, and modeling. His areas of expertise are Machine Learning and Text Mining, but he also enjoys working on a broad set of problems, which is why he often participates in Data Science competitions on platforms such as kaggle.com. You can connect with Alexey on LinkedIn at https://de.linkedin.com/in/agrigorev.

19

社区成员

发帖
与我相关
我的任务
社区描述
开发&划水&摸鱼&Bug灌水乐园 五湖四海的开发工程师们,来!
学习方法跳槽考研 企业社区 上海·徐汇区
社区管理员
  • 夜半被帅醒
  • 护理学_李天使
  • 王大师王文峰
加入社区
  • 近7日
  • 近30日
  • 至今
社区公告
  1. CSDN 划水乐园
  2. CSDN 划水乐园
  3. CSDN 划水乐园
  4. CSDN 划水乐园

试试用AI创作助手写篇文章吧