Role Overview

Lead and perform hands-on analysis and modeling involving the creation of intervention hypotheses and experiments, assessment of data needs and available sources, determination of optimal analytical approaches, performance of exploratory data analysis, and feature generation (e.g., identification, derivation, aggregation)
Collaborate with mission stakeholders to define, frame, and scope mission challenges where big data interventions may offer important mitigations and develop robust project plans with key milestones, detailed deliverables, robust work tracking protocols, and risk mitigation strategies
Demonstrate proficiency in extracting, cleaning, and transforming CBP transactional and mission data associated within an identified problem space to build predictive models as well as develop appropriate supporting documentation
Leverage knowledge of a variety of statistical and machine learning techniques and methods to define and develop programming algorithms; train, evaluate, and deploy predictive analytics models that directly inform mission decisions
Execute projects including those intended to identify patterns and/or anomalies in large datasets; perform automated text/data classification and categorization as well as entity recognition, resolution and extraction; and named entity matching
Brief project management, technical design, and outcomes to both technical and non-technical audiences including senior government stakeholders throughout the model development/ project lifecycle through written as well as in-person reporting

Requirements

7-12 years of relevant experience
Experience in applying advanced analytics solutions to solve complex business problems
Experience with programming languages including: R, Python, JavaScript, Visual Basic
Experience with creating VBA applications and macros to structure, manage, and wrangle key datasets
Experience with core data science libraries – Pandas, NumPy, Matplotlib, Plotly, etc.
Experience with Anaconda distribution of Python for package management and deployment
Familiarity with command-line shell programming (Powershell, cmd, etc.)
Proficiency with SQL programming
Familiarity with RESTful APIs, web scraping, and processing unstructured data
Knowledge of visualization and presentation techniques including Tableau, Power BI, Jupyter Notebooks, etc.
Knowledge of cloud technologies such as AWS or Google
Proficiency using git for version control, collaboration, and code review
Familiarity with software organization tools and frameworks (Docker, virtual environments, etc.)
Experience with Natural Language Processing (NLP), computational linguistics, Entity extraction, named entity recognition (NER), name matching, disambiguation
Experience constructing and executing queries to extract data in support of EDA and model development
Experience with unsupervised and supervised machine learning techniques and methods
Experience working with large-scale (e.g., terabyte and petabyte) unstructured and structured data sets and databases
Experience performing data mining, analysis, and training set construction

Tech Stack

AWS
Cloud
Docker
JavaScript
Numpy
Pandas
Python
SQL
Tableau
VBA

Benefits

Health insurance
Flexible work arrangements
Professional development opportunities
Remote work options

Data Scientist

Key skills

About this role

Role Overview

Requirements

Tech Stack

Benefits