Senior Data Engineer – Princeton Accelerator
Princeton University, Bridging Divides Initiative
Location
Remote
Type
Contract
Posted
Jan 21, 2026
Mission
What you will drive
Core responsibilities:
- Optimize existing Databricks pipelines for cost and performance
- Expand ingestion scope to include YouTube transcripts, multi-media sampling, and new metadata sources
- Design for ML readiness to create gold-layer datasets for researchers
- Establish engineering standards, CI/CD pipelines, and documentation practices
Impact
The difference you'll make
The datasets you help build will accelerate research on pressing social science topics relevant to the social media space that shapes our collective information ecosystem, helping researchers understand how platforms shape the information environment.
Profile
What makes you a great fit
Required qualifications:
- Deep Databricks experience with PySpark, Delta Lake, cost optimization, and cluster tuning
- Experience designing and building pipelines at scale (10+ TB)
- Deep experience with CI/CD, testing, and maintainable systems
- Clear communicator who can work with researchers and junior engineers
Benefits
What's in it for you
Competitive rate commensurate with experience. Remote work structure with ~30 hours/week from January through June 2026.
About
Inside Princeton University, Bridging Divides Initiative
The Accelerator at Princeton's School of Public and International Affairs (SPIA) is building a first-of-its-kind research platform—a living dataset of social media activity that helps researchers understand how platforms shape the information environment.