Angelica Kim

B.A. in Computer Science and Statistics, Amherst College (Class of 2024).

My coursework and research experience are centered on machine learning and statistical data analysis.
I interned at Boston Consulting Group X and an agritech startup, where I developed machine learning models to solve business problems.

Work Experience

Industry

Data Science Intern

September 2022 - May 2023

  • Achieved 20% improvement in accuracy of the crop price forecasting ensemble model for the agritech startup by engineering 150 new features from 10-year wholesale agricultural transaction, weather, crop production cycle, and trades data using Pandas, NumPy, SQL, AWS S3, and Docker.
  • Identified causal predictors of crop prices that informed the startup's crop trading strategy by implementing attention-based convolutional neural network using PyTorch.

Research Analyst

May 2022 - August 2022

  • Leveraged random forest and principal component analysis on 15M+ policyholder database using scikit-learn to discover 50 key customer characteristics predictive of new insurance product purchases for a leading property & casualty insurer.
  • Implemented incremental learning to train models on the large-scale policyholder database, circumventing memory constraints in the computational environment and cutting training time by 30%.
  • Predicted 8M+ policyholders' potential connections based on Adamic Adar index using networkX to enhance personalized customer touchpoints based on their networks, facilitating timely purchase decisions aligned with individual needs .

Research

Amherst College Computer Science Department

Gregory S. Call Fellow

June 2023 - Present

Conducted honors thesis research under the guidance of Prof. Matteo Riondato on identifying invariant causal mechanisms using nonparametric hypothesis tests to improve robustness of time series forecasting in nonstationary environments.
Paper >

Amherst College Data* Mammoths

Student Researcher

January 2023 - August 2023

Developed a Markov Chain Monte Carlo algorithm in Java for efficient sampling of binary matrices with fixed margins from a specified null distribution to facilitate hypothesis testing, as a researcher at Data* Mammoths, an NSF-funded research group led by Prof.Matteo Riondato

Amherst College Computer Science Department

Teaching Assistant

September 2022 - December 2022

Held weekly office hours and provided one-on-one support for 50+ students in Introduction to Computer Science II, covering Java and object-oriented programming concepts

Amherst College Statistics Department

Gregory S. Call Academic Intern

September 2022 - December 2022

Conducted literature review on data-generating processes of individually randomized group-treatment trials (IRGT), supporting Professor Brittany Bailey’s research on tackling the inevitable problems of nested clinical trials such as within-group correlation and missing data.

Course Projects

#Pandas

#Numpy

#XGB

#Optuna

#SLURM

Enhancing Mean-Variance Portfolio Optimization with Stock Price Forecasting Using XGBoost

  • Authored research paper that explains mathematical underpinnings of XGBoost, with an emphasis on regularization techniques, and demonstrates its application in stock price forecasting to enhance mean-variance portfolio optimization
  • Accelerated hyperparameter tuning process by implementing Bayesian optimization methods with pruning strategies and parallelization on HPC cluster(NVidia DGX A100 GPU) using SLURM for job scheduling, achieving a 3x speedup compared to grid search
  • Conducted backtesting analysis to evaluate various portfolio strategies, with the XGBoost-based asset allocation yielding the highest Sharpe ratio of 3 and an annualized return of 60% in a simulated environment with no transaction costs
Paper > >

#PostgreSQL

#Linux

#Tableau

#Figma

Amherst College Facilities Database

  • Collaborated with Amherst College Facilities Department in a team of three to design and build PostgreSQL database on Linux, enabling comprehensive monitoring of campus facility conditions and priority calculation of repair/replacement projects
  • Deployed Tableau dashboard integrated with the database to inform facility investment decisions, featuring replacement value projection and dynamic prioritization of facilities projects based on user-specified metrics
>

#Java

#SLURM

Optimization of Bellman-Ford Algorithm using Parallel Computing

Designed and tested parallel algorithm for finding single-source shortest paths in large graphs on HPC cluster using SLURM for job scheduling, achieving a 5x speedup compared to iterative approach via multithreading and memory access optimizations in Java (ranked #1 in class competition for runtime performance)

>

#Tensorflow

#Keras

#Pandas

#Numpy

Agricultural Crop Price Prediction

Built LSTM models on HPC cluster(NVidia DGX A5000 GPU) to predict Korean wholesale crop prices, achieving a 60% improvement in accuracy relative to the ARIMA model

>

#R Shiny

#dplyr

#ggplot2

Health Care Equity

  • Spearheaded the deployment of data analysis blog with a two-member team for a Data Science course project, investigating the causes of health inequities in the United States and assessing the efficacy of public health insurance.
  • Performed spatial data visualization, clustering, and text analysis using R and Shiny
>

Publication

#R

#QGIS

#Tableau

Analysis of Spatial Density Utilizing The Big Data of Floating Population in Seoul

  • Ranked 2nd place in the Cultural Heritage Administration Public Policy Competition 2015 for proposing a policy to enhance accessibility to cultual heritage sites by optimizing public transportation routes based on the spatial density of foreign tourists using R, QGIS and Tableau
  • Published a paper on spatiotemporal analysis of foreign tourists' mobility patterns and the implications for urban planning and tourism management in the Free and Open Source Software for Geospatial 2015 Conference Proceedings
  • Awards & Presentations:
    • Women in Data Science Cambridge 2021 Conference Presentation
    • Free and Open Source Software for Geospatial 2015 Conference Best Presentation Award
    • Cultural Heritage Administration Public Policy Competition 2015 2nd Place
>

© Copyright Angelica Kim