Support data preprocessing and feature engineering pipelines under senior engineer direction: clean, normalize, and validate HRSA fraud-related datasets; handle class imbalance preparation (SMOTE, undersampling) and train/validation/test split management.
Assist in the development, training, and evaluation of supervised fraud classification models; compute and document standard evaluation metrics (accuracy, precision, recall, F1 score, AUC-ROC, confusion matrices) for government review in EPLC-required model evaluation reports.
Maintain and monitor ML experiment tracking using MLflow or equivalent tooling approved for the IRMS environment; log hyperparameter configurations, training runs, and evaluation results with full reproducibility documentation.
Support model drift detection and retraining pipelines: run scheduled evaluation jobs, flag performance degradation against established baselines, and escalate findings to the AI/ML Lead Engineer and Fraud AI/ML SME.
Assist the NLP/NER pipeline team (Rohit) with data transformation tasks: format-convert NER pipeline outputs into feature-compatible schemas for downstream ML models; validate entity extraction quality against labeled reference sets.
Develop and maintain Jupyter notebook-based model exploration and reporting artifacts for use in EPLC deliverables, sprint reviews, and government demonstrations.
Support UiPath Maestro agent integration testing: prepare model inference payloads, validate agent input/output schemas, and assist with integration testing between ML model inference APIs and the persona-based agent layer.
Implement and maintain data pipeline scripts (Python/Pandas/NumPy) for batch data ingestion, feature store updates, and model scoring batch runs within the IRMS security boundary.
Follow and enforce IRMS boundary data handling procedures: ensure no PII/PHI is processed outside approved environments; maintain developer/test environment segregation per HHS security policy.
Produce supporting artifacts for EPLC deliverables: training data specifications, model evaluation appendices, data dictionary updates, and sprint retrospective documentation as directed by the PM and AI/ML Lead.
Participate in code reviews; adhere to OWASP secure coding standards, NIST SP 800-160 engineering principles, and Node’s internal CI/CD quality gates.
Requirements
Bachelor’s degree in Computer Science, Data Science, Mathematics, Statistics, or a closely related field; recent graduates with strong applied ML coursework or project portfolios will be considered.
1–3 years of hands-on experience (including internships, graduate research, or project work) in machine learning, data science, or data engineering with Python.
Proficiency in Python ML stack: scikit-learn, Pandas, NumPy; familiarity with at least one deep learning framework (TensorFlow or PyTorch) for model evaluation and inference tasks.
Demonstrated experience with standard ML evaluation workflows: train/validation/test split design, cross-validation, metric computation, and results documentation.
Experience with Jupyter notebooks for data exploration, model evaluation, and technical reporting.
Familiarity with Git-based version control and CI/CD principles; ability to work within a structured sprint cadence with documented deliverable commitments.
Demonstrated ability to handle sensitive data responsibly; understanding of data governance, access control, and the importance of environment segregation in a regulated or government setting.
Strong written communication skills: ability to produce clear, organized technical documentation suitable for government review.