Data Science

Data Science and Recommender Systems with Andrew

How’s it going, 365 fam? If we haven’t had the pleasure of meeting yet, my name is Andrew from DataLeap and you lucky kings, queens, and non binary-eens are currently watching 365 Data Use Cases by 365 Data Science. and smacked that like button, but just in case you haven’t yet, I’ll sip my tea and give you a chance to do so. Now, you did that so well, that I think you would have no problem of heading. If you enjoy comedy and compelling storytelling with corgis, then you should head over to DataLeap, my channel, and the Dataleap Andrew Show, my second channel, . So, what’s Andrew from DataLeap’s favorite use case?

Recommender systems! When you think of FAANG companies, what pops into your mind? Netflix and their newest Dating show suggestion, Google and YouTube’s Home page, Facebook and Instagram’s feed. For B2C companies like Amazon, selling you a recommended product is how they lift their bottom line: sales revenue. But for companies that rely on content, connections, and curation: giving you a recommended video or video game allows you to stay on the platform for longer.

This is beneficial to them in a more subtle way, usually this involves increasing CTR which increases watch time or total session time and that lifts ad revenue and in-app purchases. Hope I didn’t lose you in the data jargon. Let me just clear them out of my system. [clearing throat] KPI! OKR! SGD! Before we get too technical, let’s think about what data Google can even use to build its recommender system. Let’s consider a use case: the infamous YouTube Algorithm. Now, YouTube has an idea of what videos it can recommend to you based on your behavior on and off YouTube.

But it also knows what kind of user you are similar to based on the demographic information and other information you’ve willingly shared with Google. If your friend Clair went to high school with you and has a similar search history with French cuisine, and just shared a video about eclairs, you might just see that very same video on your Home page because Google’s software is aware that you compare to Claire and her eclectic eclair shares. So, take care where you share your personal preferences. Let’s understand recommendations from the source, from the King Regent of Recommendations: Google. Google provides free machine learning courses to pass their developer interviews. In the section about recommendation systems, Google outlines the value add of being able to help a user choose from millions of different applications on the Google app store to billions of different YouTube videos.

You’ve come this far, so let’s get technical. One common architecture for recommendation systems goes as the following: 1. candidate generation 2. scoring 3. re-ranking In the candidate generation stage (step 1), the system considers a large group of videos, called a corpus. Reducing billions into mere hundreds is the goal, and being able to do that quickly, is the key. Rigor and speed are competing against each other, especially since a system could have multiple nominators that vote for their best preference for what a good candidate can be which slows down performance but could increase what kind of candidates come out. Stage number 2, called scoring, involves a more precise model working with less than 100 candidates, and this helps the user see the best recommendations first.

The scoring section does take in your personal data just like how I talked about eclairs earlier, but it also considers the following: “Local” vs “distant” items, taking geographic information, for example. Popular or trending items. And a social graph, which is what your friends could be interacting with or liking. Now, hold on, you might ask. Why not just let the candidate generator also do the scoring step. Well, there’s an important reason why: Some systems, like I said, rely on multiple candidate generators. And those scores might not be compatible with each other.

That’s reason 1. With a smaller pool of candidates, we can use a better model that incorporates more features. This ups the training time a ton but we have a different thing that we are focusing on this time. The first time, it was cutting the fat and making sure only the cream of the crop can rise not really taking into consideration what each individual user could really be liking and ranking. This time we are taking into consideration a lot more information so that we are giving them the best videos first. This goes hand in hand with the final stage: re-ranking. This is where the model takes into consideration what the user has disliked in the past, how fresh the content is, etc. This is how YouTube increases its diversity, freshness, and fairness (fingers crossed) Check out the ML courses on Google as a free resource to cracking the data science interview and as a reward for making it this far into the video.

If you want more,– my information is in the description below, pinned in the comments, and in the little i in the corner of the screen. Over on DataLeap, I bring complex concepts down to earth, demystifying the life of a 6-figure Silicon Valley FAANG data scientist. Head on over if you want a free resume template and a bespoke guide on how to get to the 6-figure data science job in just 6 months. While you’re there, to keep up and stay up to date with the best free industry education out there.,and interviews about personal growth and personal finance. And hey, start with my $0 Data Science bible which boils down to everything you need to know in just 3 weeks, going day by day. Let’s leap together.

Leave a Comment