Regression means predicting a number, rather than one of the categories. We will predict the score based on the embedding of the review's text. We split the dataset into a training and a testing set for all of the following tasks, so we can realistically evaluate performance on unseen data. The dataset is created in the Get_embeddings_from_dataset Notebook.
We're predicting the score of the review, which is a number between 1 and 5 (1-star being negative and 5-star positive).