Data Science Student Team (DSST)
- Brandon Trahms
- Derek Borders
Summary
Project Goals
- Identifying common organizations that the subs partner with.
- Use text mining tactics to pull out key words from the text description to identify which are the common organizations that the subcontractors partner with (financial aid, EOP etc) without having to do a manual review of all entries.
- Explore trends over time with regard to who campuses partner with during different parts of the academic year
- Create a model to predict partner level
Timeline
Details regarding deliverables to be added by DSST
Expert CFO staff member fills in level of partnership for 10-20 records to serve as a time estimate for data labeling by 2/11/2022.
- We will use that time estimate to decide if a smaller subset is necessary to reduce the burden of the CFO staff member, while allowing DS team to start working on the code for the prediction model.
- The remainder of the records should be rated by 2/25/2022
10-15 minute project check ins during class time approximately every two weeks starting 3/10/2022. The DS team will provide update the client on their progress with opportunity to ask questions from both parties.
The CFO team will review a scientific poster created by the DS team for presentation at the College of Natural Sciences Poster Session, date somewhere in late April or Early May.
The DS team will give a final stakeholder presentation to CFO team members/leadership during the week of May 15th, 2022 (Finals week).
Project Updates
Weekly updates done by DSST via PR
03/02/2022
Milestones
- Open dialogue with client.
- Obtain initial few labeled rows.
03/09/2022
Milestones
- Read up on literature
- Reviewed past project work
- Completed Data Security Training
- Followed up with client
03/23/2022
Milestones
- Followed up to get proper keyword list
- Refactored prior work and data to work and knit without box access.
- Reviewed sentiment analysis techniques.
03/30/2022
Milestones
- First pass at classification using basic filtration.
- Created first list of entries for clarification by client.
04/06/2022
Milestones
- First zoom share out.
- Sent out first list of entries for clarification by client.
- Client returned first review list.
- Start working on index column and versioning to track verified and predicted labels and list evolution over time.
- Laid groundwork for sentiment analysis (created function for predicting class given proper arguments).
04/13/2022
Milestones
- More refactoring to clean up project code.
- More groundwork for sentiment analysis.
- Start working on abstract for poster.
- Created two potential training sets using keyword filters.
- Exclusive version with no overlap
- Inclusive version with overlap
- Discuss some poster ideas and appropriate visualizations.
Ongoing
Current Goals
(This week)
- First + second pass sentiment analysis.
- Create new clarification list. Target most ambiguous entries after sentiment analysis.
- Refine abstract for poster & get approved.
Next Steps
(Longer Term)
- Create model to classify individual meeting instances using keywords and sentiment analysis techniques.
- Use model to label all activity rows in dataset.
- Find least certain generated labels.
- Get sponsor clarification for most ambiguous rows.
- Input results, update model, repeat 2-3 times until model seems adequate.
- Predict (infer?) level of partnerships over time (subcontractor + partner by quarter).
- Track overall performance of subcontractors and partners.