Ranking at Rocket
Slide deck of the first ranking model deployed at Rocketmiles
Ranking is my single favorite problem in data science.
In terms of theory, the core ranking problem is very hard: given a set of `n` items, return the best possible permutation of those items which minimizes some penalty function. Since there are `n!` permutations for a set of `n` items, this "core" problem is too intractable, and everyone has to make simplifying assumptions. In my opinion, learning different ways to simplify the ranking problem is instructive to learning modeling in general. And the ranking research space moves fast - every year there is something new!
In terms of engineering, ranking is an extremely high-throughput problem with extremely tight latency requirements. Most models predict on one input, but ranking requires predictions on a set of inputs. Depending on your setup, the dataset to predict on can be very large; for each item you could be pulling a lot of relevant information to inform your ranking decision. And no matter what the computational requirements are, you have to do it fast - the user is waiting for you to present a recommendation!
I've spent a lot of time working on sub-problems in ranking at Rocket and at NYU, and I'm happy to share some presentable work here.
Slide deck of the first ranking model deployed at Rocketmiles
In this blog post, I talk about the ranking model in layman’s terms. How does it learn? How can we interpret the model’s definition of a “good” model?
Benchmark of deep learning methods for recommender systems on the Rocket hotel ranking dataset
In this blog post, I talk about LambdaMART, a listwise ranking model based on gradient boosting which has been used in AirBnB and Microsoft. What is its objective function optimizing for? How can we update it to handle display bias? And in what situation would you use it over a collaborative filtering approach?