NLP of Leadership Language Paper
Clemson University — 2019-2021
Technologies Used
Python and Jupyterlab
FastText
pandas/scikit-learn/matplotlib
Labelbox
Major Accomplishments
We published a paper and won the 2022 Tyler Award
Student leader of this project and first name on the paper
Produced a model that could automatically label survey data for the career center, a novelty in their field
In this project, I led a team of roughly 3 students to create an ML model that could label data at scales too large for manual review and with similar accuracy to humans. The results gave insight into leadership language discrepancies between student interns and their mentors. There had always been a gap between how students numerically rated their skills and how their mentors rated them, but our model highlighted a gap in how students wrote about their abilities, too.
I programmed almost everything for this project and handled nearly all the machine learning. I also set up Labelbox, our labeling software, so that we could all take our part in labeling the data provided to us by the Career Center (which I had split into individual sentences, as this was found to be the best “chunk size” for labeling the data).
In regards to the machine learning parts, everyone labeled the data, while I cleaned the data, processed the data using FastText to turn text into vectors, and trained and cross-validated the logistic regression model. Cleaning the data involved removing duplicates and empty sentences and also choosing the correct label (since each labeler only labeled a part of the whole dataset).
At the end of both my senior semesters, my group presented our work to an open forum on campus of students and researchers. The paper was completed after I graduated and moved on, but thankfully I had already contributed my technical pieces to the paper. I’m proud to say our paper won the 2022 Ralph W. Tyler Award, which recognizes distinguished excellence for an outstanding article in the field of Cooperative Education and Internships