Hi everyone, I’m @hurutoriya, a Machine Learning Engineer.
On May 9th, 2018, we held the closing ceremony for the Kaggle competition hosted by Mercari: the Mercari Price Suggestion Challenge.
In the Mercari Price Suggestion Challenge, participants competed to improve Mercari’s price suggestion algorithm. For this, we published the product data of items sold through our US app.
For the closing ceremony, we invited the top three contestants from Russia, Poland, and China, as well as top-ranking contestants from Japan. In a panel-discussion format, we asked each contestant to explain their solution and what they think about Kaggle. This challenge also had a Kernel-only (constraints on computer resources and calculation time) rule, which made for a very interesting competition.
We uploaded videos of the panel discussion, so read on to check it out!
Price suggestion at Mercari
The importance of price suggestion at Mercari
First, let me explain why price suggestion is important here at Mercari. On the Mercari app, users can choose whatever price they want when listing an item.
Here are some examples which show why this becomes a problem: -Some items on Mercari cannot be sold because their listing prices are too high compared to the market price. -Conversely, if the listing price is lower than the market price, customers lose out.
As a solution,
- users can search Mercari for an item they plan to list, effectively performing market research.
- this is a lot of effort for the user, and
- this idea may not occur to new Mercari users.
Therefore, listing becomes easier if we
automatically display a suitable price for users when they list an item.
Explanation regarding price suggestion
Currently, @lain_m21, an intern in the ML team, is helping us with the price suggestion system at Mercari JP.
Tasks involving price suggestion are evaluated by the Root Mean Squared Logarithmic Error (RMSLE) metric.
This is defined as:
- : RMSLE score
- : Number of predicted data cases
- : Predicted price
- : Real price
- : Natural logarithm of x
The lower the score is, the higher the accuracy of the price suggestion function.
RMSLE is innately difficult to understand, so for the purpose of this explanation let’s try visualizing it.
Below is a diagram showing the margin of error for RMSLE scores corresponding to a price of 3,000 JPY.
The horizontal axis represents RMSLE values. The vertical axis represents the error range of RMSLE values. This is a logarithmic metric, so for a RMSLE value of 1.0: - Real price: 3,000 JPY - Range of estimated prices: 1,103 JPY ~ 8,156 JPY
As you can see, this isn’t quite accurate enough to use in an actual product.
The top contestant of the Kaggle Competition was able to achieve an RMSLE value of 0.3875 with their model. (75 line kernel)
- Real price: 3000 JPY
- Range of estimated prices: 2051 JPY ~ 4387 JPY
Results with these levels of accuracy could be incorporated into the actual product.
The machine learning team was very impressed with the concise, polished nature of the Code Golf written in under 100 lines by the first-place team.
Even though the code is under 100 lines, it is a very sophisticated piece of Python code and operationalizes the MLP model. We found that even when using the newest product data at Mercari US, this solution significantly enhanced the price suggestion function.
Their final solution has also been published on GitHub.
Explanations regarding the solution have also been published on GitHub.
Our team is focusing on the price suggestion function for the
Japanese version of Mercari. As we provided an English dataset on Kaggle, many problems arise when we try to apply this to the Japanese version of Mercari. Thus, even the model provided by the winning team was unable to achieve the same score in the Japanese version of the app.
Nonetheless, the Mercari Price Suggestion Challenge provided an opportunity for rich debate by many Kagglers discussing various methods and ideas, which will contribute greatly to improving the Mercari JP price suggestion function. As the price suggestion function improves, we plan to update our readers with the technical details right here on the engineering blog.
Please stay tuned!