Poolside

Data Research Engineer - Pre-training

Poolside

EMEA

Remote

posted 2 hours ago

About Poolside

We build the models. You build the future. AGI for the enterprise, starting with software agents.

The Role

Poolside is on a mission to create Artificial General Intelligence, positioning itself as a leader in the field. We aim to accelerate software development through innovative systems that enhance the developer experience. Our team, a diverse mix of research, engineering, and business professionals, collaborates closely to achieve our ambitious goals.

About the Company

Poolside is dedicated to building a future where AI drives economic value and scientific advancement. We focus on reshaping the developer experience with cutting-edge technologies and deploy these systems in secure enterprise environments. Our team is distributed across Europe and North America, gathering monthly in Paris for collaboration and connection.

The Role

As a member of our data team, you will be responsible for enhancing the quality of datasets used for training our models. This hands-on position involves improving pretraining datasets through your expertise and experimentation. You will collaborate with various teams to identify data needs that align with model capabilities and use cases. Staying updated on research in dataset design and pretraining is crucial, as you will lead original research initiatives and implement technical solutions in production.

Responsibilities

  • Stay informed on the latest research related to LLMs and data quality.
  • Familiarize yourself with relevant open-source datasets and models.
  • Design and implement complex data generation pipelines, ensuring high diversity and resource optimization.
  • Collaborate with Pretraining, Posttraining, Evals, and Product teams for quick feedback on model quality.
  • Conduct and analyze data ablations or training experiments to enhance dataset quality using quantitative insights.

Requirements

  • Strong background in machine learning and engineering.
  • Experience with Large Language Models (LLMs), including transformer architectures and learning processes.
  • Knowledge of data ablations, scaling laws, and mid/post-training techniques.
  • Familiarity with building large-scale pretraining datasets, including data curation, deduplication, and tokenization.
  • Excellent programming skills in Python and strong prompt engineering capabilities.
  • Experience with large-scale GPU clusters and distributed data pipelines.
  • Strong focus on data quality and research experience, including authorship of scientific papers on relevant topics is a plus.

What We Offer

  • Fully remote work with flexible hours.
  • 37 days of vacation and holidays per year.
  • Health insurance allowance for you and your dependents.
  • Company-provided equipment.
  • Well-being, continuous learning, and home office allowances.
  • Frequent team gatherings.
  • A diverse and inclusive culture focused on people.

If you are passionate about data quality and want to contribute to groundbreaking AI development, we encourage you to apply and join our mission.

Required skills

Software Development

PYTHON

Programming

AI

Prompt Engineering

Machine Learning

English level

Professional

Still searching manually?

Let us do the work for you.

Tota works for you

We scan thousands of jobs daily and notify you when there is a match. No searching needed.

Anonymous, safe and free

Your profile stays anonymous. Your employer will not see it. You choose when to become visible.

Ready in 3 minutes

Answer a few questions and create your profile in minutes. No commitment.

About TotaMatch

TotaMatch helps professionals find work that truly fits their work happiness. We believe work is more than just an income. It is a source of fulfillment, growth, and pride. Instead of endlessly scrolling through job boards, TotaMatch works for you. Our platform continuously analyzes thousands of opportunities and identifies roles that align with what truly matters to you. You focus on your work and the people around you. We make sure you never miss a better opportunity.