accenture research

〰️

data science

〰️

accenture research 〰️ data science 〰️

I spent the majority of my time @ Accenture working with the Data Science Team to deliver productized NLP and ML tools, support development of data pipelines and orchestration, and develop advanced, text-based insights for thought leadership. In this role, I typically used Python with SQL and cloud computing in GCP (BigQuery, Compute Engine, App Engine, Vertex AI, etc.) to build and deploy APIs, pipelines, and more. Many of my data science projects were internal and can’t be shared here — but here are the ones that can…

🌳 random forests to identify drivers

To understand what attributes (sustainability, trust, fun, etc.) consumers value when interacting with businesses, we ran a random forest drivers analysis on survey data. We trained and tuned random forest models with the attributes (or more specifically, statements aligned to each of the attributes) as the inputs and likelihood to stay as the criterion. We then extracted Gini importance scores to rank the most important attributes in influencing a consumers’ likelihood to stay.

📃 nlp with text embeddings for classification

To arrive at the company perspective on how these attributes (sustainability, trust, fun, etc.) are valued, we ran an NLP analysis on earnings calls. We wanted to quantify the frequency with which executives are discussing each attribute. After tagging a training set of calls, we applied pre-trained BERT text embeddings and ran cosine similarity to classify over 5,000 relevant earnings calls. Through this analysis, we demonstrated the disconnect between what consumers care about and what companies are investing in.


⚙️ data engineering in google cloud

Accenture supported Fast Company on this piece - we provided an index score for each company based on 20+ metrics. These metrics were pulled from our GCP, BigQuery database and required significant work in SQL and Python to organize and aggregate.

🔤 natural language processing fuzzy match

An issue when working with company datasets is that one company may go by many names. For this project, I used fuzzy matching in Python to join database cases for when manual matching wasn’t possible.