DescriptionWe are seeking a skilled Machine Learning Engineer I to join our team in the SinAI Assurance Lab. The Machine Learning Engineer will play a key role in Machine Learning Operations and will be for designing, maintaining, and optimizing data infrastructure and model validation pipelines that ensure all AI systems, Generative and Non-Generative, deployed across the Mount Sinai Health System (MSHS) are rigorously validated for compliance, performance, and patient safety.
You will work closely with AI product teams, clinical and technical stakeholders, DevOps engineers, and the AI Governance Committee to engineer scalable data flows that support model validation, real-time monitoring, and simulation-based testing environments.
ResponsibilitiesGeneral Data Engineering
- Build and maintain robust ETL pipelines for structured and unstructured clinical data from EHR, imaging, and text sources.
- Design systems to automate data preparation, lineage tracking, and reproducibility for AI model inputs and outputs.
- Develop data infrastructure for benchmarking and stress-testing models in clinical simulation environments.
- Collaborate with DevOps and cloud teams to ensure deployment pipelines meet compliance and performance standards.
- Set up and monitor model tracking infrastructure for evaluation metrics and drift detection.
- Assist in the development of standards and procedures affecting data management, design and maintenance. Documents all standards and procedures.
AI Assurance & Governance
- Engineer and maintain pipelines that support pre-deployment model validation and post-deployment monitoring.
- Collaborate with Data Scientists and Clinical Product Owners to validate data integrity, reproducibility, and fairness in AI workflows.
- Ensure compliance with HIPAA, ethical guidelines, and institutional governance policies on sensitive health data use.
- Build dashboards and tools that provide observability across the ML lifecycle: data, models, outcomes.
Stakeholder Engagement & Others
- Effective communicate technical findings related to model and data integrity to governance teams, clinical stakeholders, and leadership.
- Maintain clear and well-organized documentation of data workflows, platform architecture, and validation processes.
- Help write internal reports on data infrastructure resilience, validation system status, and operational risk.
- Stay informed on industry best practices in data engineering and healthcare-focused machine learning.
- Possess an extremely flexible attitude. Willing to work with multiple types of technologies and languages with an open mind and without technology bias. Continuous interest in updating skill sets and knowledge of trends in the Big Data Technology space.
- Work closely with cross-functional teams including data scientists, healthcare providers, and IT professionals to understand data requirements, develop solutions, and support data-driven decision-making.
- Other duties as assigned
QualificationsRequirements
- Bachelor's degree in Computer Science, Statistics, Mathematics, or related field; Master's degree in a quantitative discipline (e.g., Statistics, Operations Research, Bioinformatics, Economics, Computational Biology, Computer Science, Information Technology, Mathematics, Physics) is preferred.
- 2+ years of experience in data engineering, software engineering, or machine learning.
- Proficient in Python and SQL
- Proficiency in at least one cloud computing platforms (e.g., AWS, Azure, GCP)
- Intermediate knowledge of Machine Learning
- Familiarity with ML lifecycle management tools (e.g., MLflow, Kubeflow, Airflow)
- Experience on deployment and operationalization of ML Systems
- Experience with monitoring tools for AI model tracking
- Understanding of DevOps principles, CI/CD pipelines, and containerization (e.g., Docker, Kubernetes)
- Experience with version control systems (e.g., Git) Knowledge of big data technologies (e.g., Hadoop, Spark)
- Strong problem-solving skills and ability to work in cross-functional teams