Beauty Beyond Words II

Ingredient-Attribute Research - Fall 2023

Project Overview

This research addresses the beauty industry's need for transparent and explainable methods to map product ingredients to skin-related attributes. We developed a BERT-based machine learning model that achieved a balanced F1 score of 0.61 and precision of 0.75 for predicting product attributes from ingredients and descriptions.

Our approach combined explicit model architectures with advanced explainability techniques, including Integrated Gradients, SHAP, and LIME, to uncover the relationships between ingredients and product attributes. By utilizing publicly available Amazon product metadata and reviews alongside a curated skincare glossary, we developed a scalable pipeline to extract and analyze ingredient-attribute relationships in beauty products.

The model successfully identified key attributes such as acne, hydration, and sensitive skin with high precision, while explainability methods revealed significant ingredients like salicylic acid for acne treatment and petroleum jelly for hydration. This work contributes to developing more transparent and effective tools for beauty product analysis and recommendation systems.

Model Performance: Confusion Matrices

Classification performance for key skin attributes:

Acne

680
True Positive
35.4%
87
False Positive
4.5%
405
False Negative
21.1%
748
True Negative
39.0%

Oily Skin

641
True Positive
31.7%
140
False Positive
6.9%
444
False Negative
22.0%
795
True Negative
39.4%

Performance Metrics

0.75
Precision
Accuracy of positive predictions
0.61
F1 Score
Balance of precision and recall
0.63
Recall
Coverage of actual positives

Explainability Analysis

Compare ingredient importance across different explainability methods:

LIME: Top Ingredients for Acne

Salicylic Acid 0.825
Glycolic Acid 0.753
Benzoyl Peroxide 0.701
Tea Tree Oil 0.654
Sulfur 0.612

LIME: Top Ingredients for Hydration

Hyaluronic Acid 0.871
Shea Butter 0.786
Glycerin 0.742
Ceramides 0.689
Squalane 0.652

Pipeline Architecture

Flowchart representing the multi-stage pipeline for extracting ingredients and attributes:

Ingredient-attribute pipeline architecture flowchart

Key Takeaways

0.75
Precision on Attribute Prediction
3
Explainability Methods Compared
100+
Key Ingredients Identified
10K+
Products Analyzed
  • Developed a scalable, interpretable machine learning pipeline for analyzing beauty product ingredients and their attributes
  • Demonstrated that transformer-based models can effectively predict product attributes with high precision (0.75) and reasonable F1 scores (0.61)
  • Applied multiple explainability techniques to uncover meaningful connections between ingredients and skin benefits
  • Identified opportunities for improving recommendation systems through transparent ingredient-attribute mapping

Collaborators

Shreya Sriram
NLP & Models
Priyanshi Gupta
Explainability Analysis