Education
University of North Carolina at Chapel Hill
- Master of Sciecne in Computer Science (August 2024 - May 2026)
University of North Carolina at Chapel Hill
- Bachelor of Science in Computer Science (Highest Honors), Bachelor of Arts in Linguistics (August 2020 - May 2024)
- Thesis: Priming: Multi-Stage Pretraining using Formal Languages with Ascending Complexity
- Overall GPA: 3.97/4.0, Highest Distinction, Dean’s list (all semesters offered)
Publications
Internships
- Large Language Model Research and Development Intern, Lenovo (May 2024 - August 2024)
- Researched and developed a pipeline for automatically categorizing DPO data and selecting the most appropriate judge model for each category.
- Developed an automatic scraper to locate and formate open-source images with corresponding text captions, creating a corpus of reliable multi-modal ground-truth training data.
- Compiled a detailed report on dataset compositions of open-source language models.
- Verified English-Chinese translations of DPO pairs to ensure consistency in bilingual dataset
- Software Defined Networking & Software Engineering Intern, Nokia Shanghai Bell (May 2023 - August 2023)
- Developed and tested automated redundancy switching tools for 7750 Service Router (SR 7750) and Fortinet 3800, 3900E firewalls – fully adopted by the team to replace manual configuration.
- Developed an experimental machine learning pipeline to extract correlation patterns in failed system health checks.
- Gave a department-wide seminar titled Understanding Large Language Models to an 80-person non-technical audience, including department-level business executives.
Research Experience
- Student Researcher, Learning from Language Lab, UNC Chapel Hill (May 2024 - August 2024)
- Researched and developed a pipeline for automatically categorizing DPO data and selecting the most appropriate judge model for each category.
- Developed an automatic scraper to locate and formate open-source images with corresponding text captions, creating a corpus of reliable multi-modal ground-truth training data.
- Compiled a detailed report on dataset compositions of open-source language models.
- Verified English-Chinese translations of DPO pairs to ensure consistency in bilingual dataset
- Fall 2015: Research Assistant
- GitHub University
- Duties included: Merging pull requests
- Supervisor: Professor Hub
- Summer 2015: Research Assistant
- GitHub University
- Duties included: Tagging issues
- Supervisor: Professor Git