Abusive Language Analyzer

HackGT 6 Hackathon Project – 2019

Technologies Used

Python

Jupyter Notebook

AWS

Pandas/numpy/sklearn/matplotlib

FastText

Selenium

Major Accomplishments

Trained a functioning model for classifying text and outputting a graph of its predictions

Clean division of labor where everyone knew their tasks

My team and I created a web app to analyze text for abusive language. Users could upload text, images, or sound clips for analysis (images and audio clips use AWS ML services to extract text). The logistic regression model then output how abusive it thought the text is. We were inspired to create this after hearing a story of emotional abuse and wanted to use technology to help people catch warning signs of abuse.

I handled all the machine learning for this project. I trained a logistic regression model on a labeled dataset of aggressive language I found online. I used FastText to convert the text data into number vectors that could be used as predictors to train the model. I then applied cross validation and picked the best performing model to use for our web app. Finally, I made predictions with that model and visualized them with bar graphs made by matplotlib. As an aside, originally we had set out to scrape data from Reddit to train the model, but we scrapped that idea.

I also served as team lead for our group. This meant I assigned tasks to each person, submitted our project once we finished it, and did a majority of the talking as we pitched our app to judges.

Link