Yansen Zhu

Data Science & NLP Engineer

Jinhua, CN.

About

Experienced Data Science and NLP Engineer with a proven track record in developing and deploying advanced search and recommendation algorithms. Leveraged expertise in machine learning, causal inference, and data-driven strategies to significantly enhance user engagement, monetization, and core business metrics for leading tech companies. Successfully delivered high-impact projects in e-commerce search, live stream recommendations, and intelligent Q&A systems, consistently driving measurable improvements in key performance indicators.

Work

Himalaya (喜马拉雅)

Data Science Engineer

Sep 2022

→

Present

Summary

As a Data Science Engineer at Himalaya, I led strategic initiatives in dual-track monetization and live stream recommendations, significantly boosting user engagement and advertising revenue through advanced algorithmic solutions.

Highlights

Designed and implemented a personalized live stream recommendation system, achieving offline AUCs up to 0.85 and online UV CTR lifts up to 8% by optimizing CTR and CVR models (XGB CTR, DeepFM, ESMM) across diverse scenarios.

Conducted in-depth analysis of existing online experiments, providing actionable insights and new experimental designs to align business unit goals and improve penetration.

Benchmarked competitor paywall strategies, recommending a 5-15% earlier paywall integration based on chapter count to optimize user retention and ARPU.

Performed post-hoc AB test analysis using causal inference techniques (Rule Grouping, K-prototype, CasualForest) on 7M DAU, identifying key user segments and increasing ad revenue by $0.75W/day with a 1.5% penetration uplift.

Developed and shared internal best practices for AB testing and causal inference, including a comprehensive MVP checklist, enhancing data-driven decision-making across the company.

Vipshop (唯品会)

Search Algorithm Engineer

Jun 2021

→

Jul 2022

Summary

Spearheaded the Personalized Suggest Project, enhancing user purchase intent analysis and optimizing key sales metrics through intelligent search recommendations.

Highlights

Analyzed homepage search bar clickstream data and built CPV tables for offline model training, resolving critical data quality issues.

Iteratively refined sample selection and negative sampling strategies, along with multi-round feature engineering, to establish a robust baseline for search relevance.

Implemented and optimized deep learning models (MMOE, ESMM), achieving a 0.5% UV conversion rate increase and 0.8% Suggest usage rate increase for search scenarios.

Mininglamp Technology (明略科技)

Search Algorithm Engineer (NLP)

Jul 2019

→

Jun 2021

Summary

As an NLP Search Algorithm Engineer, I developed an intelligent Q&A system for subway work orders and contributed to building a comprehensive Natural Language Processing platform.

Highlights

Developed NER models (bilstm+crf, bert+bilstm+crf) for subway work order analysis, achieving a 5 percentage point improvement in average F1 performance and an overall F1 score of 0.85 for 24-category entity recognition.

Improved data annotation quality and consistency through comparative analysis and standardized guidelines, leading to significant performance gains with increased data volume.

Developed and encapsulated five core NLP modules (word segmentation, syntactic analysis, keyword extraction, named entity recognition, TFIDF), ensuring robust functionality and API integration.

Refactored module APIs using RegisterModel and BaseNLPModel classes, streamlining function calls and enabling unified external interfaces for parameter setting, training, and prediction.

Deployed the intelligent Q&A system and the NLP platform into Docker images, including Python/Java services, facilitating seamless integration and deployment for various business applications.

Education

Nanjing University of Information Science & Technology

Sep 2016

→

Jun 2019

Master

Control Engineering

Publications

A method and device for determining recommended materials based on session information

May 2021

Published by

Chinese National Intellectual Property Administration

Summary

Patent for a method and device that leverages session-based information to accurately determine and deliver personalized material recommendations, enhancing user engagement. (Application No: CN202110599819.4, Under Examination)

An integrated platform and method for natural language processing

Sep 2020

Published by

Chinese National Intellectual Property Administration

Summary

Patent for an integrated platform and methodology for natural language processing, streamlining NLP tasks and enhancing the efficiency of language data analysis. (Application No: CN202010922615.5, Under Examination)

A method and device for named entity recognition

Nov 2019

Published by

Chinese National Intellectual Property Administration

Summary

Patent for a method and device designed to accurately identify and extract named entities from text, improving information extraction capabilities for various NLP applications. (Application No: CN201911112724.4, Authorized)

Languages

Mandarin Chinese

English

Skills

Machine Learning

Search Algorithms, Recommendation Algorithms, AB-TEST, Causal Inference, XGBoost, DeepFM, ESMM, MMOE, Logistic Regression (LR), Model Optimization, Model Deployment.

Natural Language Processing (NLP)

Named Entity Recognition (NER), Text Classification, Word Segmentation, Syntactic Analysis, Keyword Extraction, TFIDF, bilstm+crf, bert+bilstm+crf, NLP Platform Development, Intelligent Q&A Systems.

Programming & Tools

Python, Java, Docker, Hive, Neo4j, Brat, SQL.

Data Analysis & Engineering

User Profiling, Marketing Algorithms, Data Preprocessing, Data Quality, Data Annotation, Experiment Analysis, KPI Tracking, Data-Driven Strategy.

Project Management & Strategy

Project Scoping, Business Acumen, Solution Design, Cross-functional Collaboration, MVP Development, Stakeholder Management.