Welcome!

I am a data scientist with experience analyzing large data sets, working with unbalanced data, designing streaming applications, and building custom purpose-driven models. I currently work in cybersecurity at Vectra AI and formerly worked in experimental particle physics as a part of the CMS Collaboration.

I wrote the HTML and CSS for this site so if there are bugs or typos, please let me know!

I have experience in:

  • Working with streaming data in the context of developing real time cyber attacker detection algorithms.
  • Analyzing unbalanced datasets, specifically in cybersecurity where malicious samples are hard to come by.
  • Standard python coding practices including Black formatting, type hinting, and unit testing.
  • Source code management via git including using GitHub workflows + Docker for CI/CD that automates testing, compiling, and deployment of projects.
  • Developing Python packages to improve workflows for myself and others on my team, including using Docker to build services that support development work.
  • Investigating customer issues in on-prem and cloud environments, working with support teams to address customer concerns.
  • Following the latest developments on large language model (LLM) research.
  • Giving presentations on technical topics in various forums ranging from rooms of engineers to company leadership to conference attendees.
  • Performing and publishing rigorous scientific research.

Principles that guide my work:

  • Data insights are most interesting for developing products, not marketing them.
  • A product is only as good as its user experience.
  • Jupyter Notebooks should be limited to 15 min windows of EDA (I may upset some people with that one...).
  • I've worked on detection algorithms in both network and cloud environments (AWS and M365 specifically).
  • On the network side, I've developed algorithms to detect DCSync and DCShadow attacks via the DRSUAPI. I've also worked on logic to detect reconnaissance activity via LDAP.
  • On the cloud side, I've worked on algorithms that detect suspicious SharePoint downloading activity as well as recon in AWS environments from credentials stolen from insecure EC2 instances.
  • I've also participated in efforts to understand the underlying behavior of Command-and-Control channels.
  • As a part of a small team, I investigated using LLMs in the product's interface. I've investigated the feasibility of Natural Language to SQL query systems and the potential for fine-tuning open source models via methods such as LoRA and QLoRA. For this work, I also built an ORM-based package for defining, running, and visualizing repeatable experiments on LLMs and prompts for LLMs.
  • I published independent research performed with data collected by the Compact Muon Solenoid (CMS) experiment, an international collaboration made up of thousands of researchers.
  • My analysis increased the sensitivity to detect a specific particle decay by a factor of 10x. The result was published by the Journal of High Energy Physics and is publicly available at arXiv:2104.12853. I also helped draft a CMS Physics Briefing for the analysis as a way to highlight my work to the public. It can be found here.
  • I created and developed two software tools (TIMBER and 2DAlphabet) that provide fast, user-friendly python interfaces to commonly used tools, algorithms, and statistical models used in particle physics.
  • I completed a two year tenure as the designated statistics software expert for a group of 40-60 people where I was responsible for approving statistical models and their software implementation for 10-15 different analyses per year.

Old Projects

TIMBER (Tree Interface for Making Binned Events with RDataFrame) is an easy-to-use Python library that can quickly process CMS datasets with plug-and-play C++ modules, reducing computation time by up to a factor of 20x.

The primary class builds a directed acyclic graph (the "tree") from successive data manipulations so internal methods can leverage data provenance.The interface makes analysis development quicker and encourages better coding praxis so that analysis code can be more easily shared and understood.

The software implementation of the novel modeling technique I developed of the same name, 2D Alphabet constructs a binned likelihood from 2D parametric distributions constrained by simulation. Built in methods can also fit and test the model and produce publication-ready figures of the post-fit results.

The exposed API also allows for custom models to be built from the fundamental pieces of the 2D Alphabet framework.

An online tool to track and manage job applications and interviews. Stores dates/times, notes, resume version, and other information with sort and filter functionality to make it easier to track 100s of job applications.

Developed in Django and deployed with AWS Elastic Beanstalk (EOL: April 2023).

A Dash web-application which uses Plotly to provide an interactive interface for browsing ROOT files.

In addition to basic browsing functionality similar to ROOT's TBrowser, Better ROOT Browser includes utilities, such as the template morphing sliders, that make EDA tasks interactive.

Tool to automate the various processing steps to generate CMS Fast Simulation and to validate across FastSim versions (in particular for when changes need to be actively tested). The main infrastructure is in automating the consecutive processing steps and the submissions to the CRAB batch system. The tool will wait for CRAB to finish processing jobs before proceeding to subsequent steps. The package also congregates existing tools to create comparisons across simulation samples.

Simple tool to batch track CRAB jobs using the CRABClient API. Skips checking any jobs that are fully completed and makes recommendations to resubmit with more memory if logs indicate this could help.

Other Experiences

Primary Author

  • "Search for a heavy resonance decaying to a top quark and a W boson at √s = 13 TeV in the fully hadronic final state," CMS Collaboration, JHEP, 2021

Collaborator

  • "Search for a heavy resonance decaying into a top quark and a W boson in the lepton+jets final state at √s = 13 TeV," CMS Collaboration, JHEP, 2022
  • "Search for a massive scalar resonance decaying to a light scalar and a Higgs boson in the four b quarks final state with boosted topology," Phys. Lett. B (2023)

"Search for heavy BSM particles coupling to third generation quarks at CMS"

  • SUSY Conference, Spring 2019
  • PHENO Conference, Spring 2019
  • Lake Louise Winter Institute, Early 2019

"A search for an excited bottom quark decaying to a top quark and W boson in pp collisions at s = 13 TeV"

  • American Physical Society (APS), April 2018

Helped to lead a week-long group exercise where students new to CMS data analysis worked together to recreate my analysis. Helped to edit and provide approachable python code, gave presentations explaining key statistics concepts, and answered student questions throughout the week. My group from the 2021 session won best analysis!

  • Using personal hardware (PC + HTC Vive), provided a virtual reality exhibit at the annual JHU Physics Fair which is free and open to the Baltimore community to enjoy.
  • In 2018, the experience gave visitors a tour of the ATLAS detector at the LHC via ATLASrift.
  • In 2019, the experience allowed visitors to step "inside" the Belle II detector at the SuperKEKB accelerator to watch slow motion electron-positron collisions and the particle showers that result via Belle II VR.
  • Volunteered as a tutor for high school AP Physics students struggling with remote schooling during the COVID-19 pandemic.
  • Served as a revision-based writing tutor at the Rutgers Plangere Writing Center for 2.5 years.

Classical Mechanics (Freshman physics majors)

  • Every Fall from 2017 to 2020

General Physics Lecture and Lab

  • Fall 2016, Spring 2017

Mentored both undergraduate and graduate students in their pursuits to begin research in the JHU experimental particle physics group. Assisted with a variety of topics ranging across physics concepts, coding tools, and statistical analysis.