Beauty Beyond Words II

Ingredient-Attribute Research - Fall 2023

Project Overview

This research addresses the beauty industry's need for transparent and explainable methods to map product ingredients to skin-related attributes. We developed a BERT-based machine learning model that achieved a balanced F1 score of 0.61 and precision of 0.75 for predicting product attributes from ingredients and descriptions.

Our approach combined explicit model architectures with advanced explainability techniques, including Integrated Gradients, SHAP, and LIME, to uncover the relationships between ingredients and product attributes. By utilizing publicly available Amazon product metadata and reviews alongside a curated skincare glossary, we developed a scalable pipeline to extract and analyze ingredient-attribute relationships in beauty products.

The model successfully identified key attributes such as acne, hydration, and sensitive skin with high precision, while explainability methods revealed significant ingredients like salicylic acid for acne treatment and petroleum jelly for hydration. This work contributes to developing more transparent and effective tools for beauty product analysis and recommendation systems.

Model Performance: Confusion Matrices

Classification performance for key skin attributes:

Acne

680

True Positive

35.4%

False Positive

4.5%

405

False Negative

21.1%

748

True Negative

39.0%

Oily Skin

641

True Positive

31.7%

140

False Positive

6.9%

444

False Negative

22.0%

795

True Negative

39.4%

Performance Metrics

0.75

Precision

Accuracy of positive predictions

0.61

F1 Score

Balance of precision and recall

0.63

Recall

Coverage of actual positives

Explainability Analysis

Compare ingredient importance across different explainability methods:

LIME: Top Ingredients for Acne

Salicylic Acid 0.825

Glycolic Acid 0.753

Benzoyl Peroxide 0.701

Tea Tree Oil 0.654

Sulfur 0.612

LIME: Top Ingredients for Hydration

Hyaluronic Acid 0.871

Shea Butter 0.786

Glycerin 0.742

Ceramides 0.689

Squalane 0.652

Token	Attribution Score	Ingredient
##tino	0.8290	Retinol (from retino)
sulfur	0.4662	Sulfur
##eth	0.4478	Ethanol
zinc	0.4271	Zinc

Token	Attribution Score	Ingredient
##eth	0.9065	Ethanol
##anu	0.5271	Manuka (honey)
petroleum	0.4896	Petroleum
jelly	0.3519	Petroleum jelly

Pipeline Architecture

Flowchart representing the multi-stage pipeline for extracting ingredients and attributes:

Key Takeaways

0.75

Precision on Attribute Prediction

Explainability Methods Compared

100+

Key Ingredients Identified

10K+

Products Analyzed

Developed a scalable, interpretable machine learning pipeline for analyzing beauty product ingredients and their attributes
Demonstrated that transformer-based models can effectively predict product attributes with high precision (0.75) and reasonable F1 scores (0.61)
Applied multiple explainability techniques to uncover meaningful connections between ingredients and skin benefits
Identified opportunities for improving recommendation systems through transparent ingredient-attribute mapping

Collaborators

Shreya Sriram

NLP & Models

Priyanshi Gupta

Explainability Analysis

Back to Portfolio

Beauty Beyond Words II

Project Overview

Model Performance: Confusion Matrices

Acne

Oily Skin

Performance Metrics

Explainability Analysis

LIME: Top Ingredients for Acne

LIME: Top Ingredients for Hydration

SHAP: Top Ingredients for Acne

SHAP: Top Ingredients for Hydration

Integrated Gradients: Acne

Integrated Gradients: Hydration

Pipeline Architecture

Key Takeaways

Collaborators