An Efficient Content-Based Movie Recommendation System Using TF-IDF and Cosine Similarity: Design, Implementation, and Evaluation

Authors

  • Leena Raut
  • Payal Agrawal
  • Salomi Gautam

Keywords:

Content-based filtering, Cosine similarity, Favourite list, Movie recommendation system, Movie trailer, Streamlit, TF-IDF, Watchlist

Abstract

With the rapid growth of online streaming platforms, users are often overwhelmed by the large number of available movies. Identifying relevant content based on individual preferences has become a challenging task. This paper presents the design and implementation of a content-based movie recommendation system using Python. The system analyzes movie metadata such as genre, cast, keywords, and description to generate personalized recommendations. TF-IDF vectorization is applied to convert textual data into numerical feature vectors, and cosine similarity is used to measure similarity between movies. Based on these similarity scores, the system recommends movies that closely match user preferences. To enhance the user experience, additional features such as trailer access, a watchlist, and a favourites list are integrated into the system. An interactive user interface is developed using Streamlit, allowing users to explore recommendations in real time. The proposed system is efficient, scalable, and suitable for both academic and practical applications, providing accurate recommendations along with improved user engagement.

References

J. Bobadilla, F. Ortega, A. Hernando, and A. Gutiérrez, “Recommender systems survey,” Knowledge-Based Systems, vol. 46, pp. 109–132, Jul. 2013.

P. Lops, M. de Gemmis, and G. Semeraro, “Content-based Recommender Systems: State of the Art and Trends,” Recommender Systems Handbook, pp. 73–105, Oct. 2010.

R. Burke, “Hybrid recommender systems: Survey and experiments,” User Modeling and User-Adapted Interaction, vol. 12, no. 4, pp. 331–370, 2002.

G. Salton and C. Buckley, “Term-weighting approaches in automatic text retrieval,” Information Processing & Management, vol. 24, no. 5, pp. 513–523, Jan. 1988.

S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman, “Indexing by latent semantic analysis,” Journal of the American Society for Information Science, vol. 41, no. 6, pp. 391–407, Sep. 1990.

F. Ricci and L, Rokach, Recommender Systems Handbook. Boston, MA: Springer US, 2011.

T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient Estimation of Word Representations in Vector Space,” arXiv.org, Sep. 06, 2013.

F. Pedregosa et al., “Scikit-learn: Machine Learning in Python Gaël Varoquaux Bertrand Thirion Vincent Dubourg Alexandre Passos Pedregosa, Varoquaux, Gramfort et al. Matthieu Perrot Edouard Duchesnay,” Journal of Machine Learning Research, vol. 12, pp. 2825–2830, 2011.

S. Zhang, L. Yao, A. Sun, and Y. Tay, “Deep Learning Based Recommender System,” ACM Computing Surveys, vol. 52, no. 1, pp. 1–38, Feb. 2019.

Documentation, “Streamlit Documentation,” Streamlit Inc. 2023

Netflix Technology Blog, “Research Areas Recommendations,” Netflix Research, 2021.

Y. Koren, R. Bell, and C. Volinsky, “Matrix Factorization Techniques for Recommender Systems,” Computer, vol. 42, no. 8, pp. 30–37, Aug. 2009.

Published

2026-03-28

How to Cite

Raut, L., Agrawal, P., & Gautam, S. (2026). An Efficient Content-Based Movie Recommendation System Using TF-IDF and Cosine Similarity: Design, Implementation, and Evaluation. Journal of Big Data Analytics and Business Intelligence, 3(1), 40–49. Retrieved from https://matjournals.net/engineering/index.php/JoBDABI/article/view/3293