Mercari Engineering Blog

We're the software engineers behind Mercari. Check out our blog to see the tech that powers our marketplace.

セキュリティエンジニアへの道:私のキャリアチェンジ物語 / The Road to Becoming a Security Engineer: My Story of Career Change

* English version follows after the Japanese

こんにちは。メルカリのProduct Securityチームでセキュリティエンジニアをしている@gloriaです。ブログを書くのが随分お久しぶりなのですが、前にQAと自動化テストについて記事をフォローしていた方がいらっしゃったら、当時に自動化テストエンジニアとして書いたISTQBテスト自動化エンジニア認定資格STARWESTカンファレンス、とAQA POP TALKの記事を読んだことがあるかもしれません。



メルペイの Infrastructure as Code について、 HashiCorp Certified: Terraform Associate を受験した SRE に聞いてみた

こんにちは。メルペイ Engineering Office チームの kiko です。先月、 HashiCorp Certified: Terraform Associate がリリースされましたね。早速 @tjun さん(メルペイ SRE, Engineering Manager )と @keke さん(メルペイ SRE )が受験していました。というわけで、今回はこのお二人に、試験の話とメルペイの Infrastructure as Code について聞いてみました。


  • メルペイでは、クラウドのリソースや Datadogのダッシュボードなど様々なことをTerraformでコード化して管理している
  • Terraform Associate Certificationは、Terraformをメルペイと同じように使っている人で英語ある程度できる人なら受かるのでは
  • 今後は Infrastructure as Code で品質と生産性を高めていきたい

Infrastructure as Code とはなにか

kiko : メルペイの話を聞く前に。まずは、 Infrastructure as Code とはなんぞやというのを教えて下さい。

tjun : Infrastructure as Code とは、システムのインフラをコードで記述しようというものです。たとえば GCP(Google Cloud Platform)上のリソースはWebのコンソールからぽちぽちすると手動で作ることができるんですが、コードで管理することによって他のメンバーがレビューできるようになったり、バージョン管理できて前の構成に戻せたり、他のチームの設定を見て参考にできたり、などのメリットがあります。

keke: KubernetesでYAMLを書くのも Infrastructure as Code の一つと言えそうですね。また、マイクロサービス共通の設定みたいなことも、コード化することで適用することができます。

kiko : なるほど。コードにするメリットは分かったのですが、コード化するのって大変じゃないんですか?

keke : 何かちょっと試したいときは手動で作ることもありますが、コード化してCI/CDによる自動化がされていると手作業によるミスも防げるので、一度仕組みを作ってしまえば後は楽です。


Machine learning system in patterns

Japanese here

Hi, I’m Yusuke Shibui, a member of the Image Search and Edge AI team in Mercari Japan. I publicized design patterns for implementing a machine learning model into a production environment. The patterns are available in GitHub as OSS, and I welcome you to take a look if you are interested! The repository is open for anybody to raise a PR or issue. If you have your own machine learning system pattern, please feel free to share your ideas with us!

Why do we need design patterns for machine learning systems?

For machine learning models to demonstrate their true value, they must be implemented and used in a production service or internal system. In order to do so, we design a workflow and operation for machine learning as well as related systems to develop an integrated system, but this also uses skills other than machine learning. For instance, while data retrieval, preprocessing, inference, postprocessing, and training pipeline are used from a machine learning perspective, we also need UI/UX design, backend engineering, quality assurance, monitoring and alerts from a software engineering and DevOps perspective. Even if a machine learning engineer or researcher takes great effort to develop a model to achieve the highest accuracy on the ImageNet dataset, they probably cannot acquire software engineering skills. Some of them may not need those skills in their career. On the other hand, software engineers, including application developers, backend engineers and SREs, might not be familiar with machine learning. Those having both machine learning and software engineering skills or experience are rare.

Example of overall system design of `Cat or Dog` application

The machine learning design patterns aim to fill in the gaps between machine learning engineers and software engineers to integrate the machine learning lifecycle and software engineering manners. One of the goals of developing a system with machine learning is to make a business impact with the model in production. This requires understanding of machine learning specific technologies and implementing them into a business system in an operable structure. The aim of the machine learning design patterns is to organize patterns of practices for “making a machine learning model and workflow into a production system”.

There is a famous diagram in a paper, “Hidden Technical Debt in Machine Learning Systems”, that depicts components required to run a core machine learning model. The core is located in the center as a small black square, and is surrounded by various systems.

Hidden Technical Debt in Machine Learning Systems

The paper emphasizes that in order to use machine learning effectively in production, we need to develop and operate non-machine learning components. It does not say machine learning is unimportant, but to make machine learning productive in a system, we need many other resources. We have to implement machine learning logic into a web service or system in order to make our business succeed with machine learning. Note that all the systems require business evaluation and system operation. For a plain system, we certainly utilize an infrastructure and resource management, and may optionally use monitoring, alerts and backups. For a machine learning system, we need a prediction server as well as an interface and connection to it. In addition, we can have rather advanced DevOps with a data logging, analytics, visualization, and training pipeline. We do not need to make all of them at once, but to make a machine learning model into our production system, we have to design and develop resource utilization with consideration of cost effectiveness and related systems.

Structure of the machine learning design patterns

I wrote the patterns in the following categories:

  • Training patterns: ways to design a training pipeline

  • QA patterns: test and quality assurance

  • Serving patterns: designs to develop a model into a prediction system

  • Operation patterns: model and system operations

  • Lifecycle patterns: compositions of other patterns

Each category consists of explanations to realize a machine learning lifecycle, from training, QA, and serving to operation, in a production system. The lifecycle pattern is aimed to design an overall system integrating training, QA, serving and operation.

Each pattern includes:

  • Use case: when to use the pattern

  • Architecture: what issue the pattern is trying to solve and how the system is structured

  • Diagram: overview of the system

  • Pros: advantages of the pattern

  • Cons: disadvantages of the pattern

  • Consideration: things to watch out for when using the pattern

I also wrote some anti-patterns to explain what should not be made.

What I want the patterns to become

I tried to make the initial format of the design patterns not be so dependent on a certain platform, language or library. They are written to be able to be deployed on major cloud services and Kubernetes clusters. Choosing which cloud or Kubernetes cluster to use depends on each company’s or system’s scheme, and I made the pattern quite abstract so that they are applicable to as many systems as possible.

I joined Mercari JP in July 2018, as a MLOps engineer, formerly SysML engineer, and had chances to develop and operate various machine learning systems. I guess I have taken part, in some ways, in most of the machine learning systems running in production. Some projects I worked on from their initial stage, some for model development, and some after release. There are some that I joined from production troubleshooting :)>. This machine learning design pattern documentation is the result of my experience with those projects. Since Mercari is a web service based business and uses cloud and Kubernetes, I assume the patterns might be biased to those conditions. I would like to enhance the patterns to compose in major cloud services, such as AWS, GCP and Microsoft Azure, and make workflows based on KubeFlow, to achieve more general and practical patterns and have more fun with machine learning!


English here

メルカリで写真検索とEdge AIチームに所属している澁井(しぶい)です。機械学習のモデルを本番サービスに組み込むための設計やワークフローをパターンにして公開しました。 GithubでOSSとして公開しているので、興味ある方はぜひご笑覧ください! PRやIssueも受け付けています。私の作ったパターン以外にも、有用なパターンやアンチパターンがあれば共有してみてください!


機械学習モデルが価値を発揮するためには本番サービスや社内システムで利用される必要があります。そのためには機械学習のワークフローおよび関連ソフトウェアと運用を設計し、ソフトウェアに組み込むのですが、これらの開発には機械学習以外のスキルや知見が用いられます。たとえば、設計にもデータ取得、前処理、モデルによる推論、後処理、エンドユーザへの見せ方やUI/UX、学習パイプライン、リリース判定基準、監視通報など様々な知見が必要になります。 機械学習エンジニアや研究者がImageNetを使って正解率の高い画像認識モデルを開発していていても、ソフトウェアエンジニアリングのスキルを身につけることは難しいでしょう。中にはシステム開発やソフトウェアエンジニアリングに興味がない方もいるでしょうし、職務上不要な場合もあるかもしれません。他方でソフトウェアエンジニア(アプリエンジニア、バックエンドエンジニア、SRE含む)が機械学習に詳しいと限りませんし、機械学習とシステム開発の両方でスキルや経験を持ったエンジニアは至極稀でしょう。



The Importance of Web Accessibility

Hello, this is Sahil Khokhar from the Web UX team at Mercari JP. I joined Mercari as a Software Engineer in October last year. Today, I want to divert the attention of viewers towards the extremely critical topic of Accessibility and its importance. In this blog I'll focus on Web Accessibility but please note that this concept can be extended to all the client facing applications such as iOS, Android and Web.

As technology continues to advance, the requirements of people who use the web also increase and advance, therefore making accessibility more important than ever. According to Tim Berners-Lee,

"The power of the web is in its universality; Access by everyone regardless of disability is an essential aspect."

This quote highlights the importance of the internet amongst the people that suffer from some form of disability.

The internet has various resources that are useful for disabled people, but those sources can often be difficult for them to operate. Designing the website in a way that people with all kinds of disabilities can use them, makes those websites successful and helps them in fulfilling their goals. The web needs to eliminate the barriers that cause usage problems for some people.