Responsible for integrating enterprise wide data from Cofense’s Cyber Security based Applications, Micro-services and other disparate data sources.
Work with Architects & Cloud Systems Engineers in designing Data Platform and Architecture
Experience data modeling and building data pipelines for multi-product and/or multi-department organizations
Develop Data Pipelines on Cloud Technologies like Azure/AWS with well-defined tool frameworks
Develop ETL code to stream data from disparate (structured and semi-structured) SaaS product data stores to Data Lake/Data Warehouse using Python, Azure/AWS Data Lake services
Ability to write complex SQL scripts and automate them using Python
Develop test cases and unit tests for key implementations of Data Platform by adhering to software engineering best practices and standards
Secure data end-to-end by complying data privacy rules while developing processes to move data across Applications/Data Sources and Data Lake/Data Warehouse, as well as while delivering data through SQL clients and BI tools.
Experience building adhoc reporting APIs and service layer on top of underlying OLTP and OLAP databases
Help integrate Data Platform with BI tools like Power BI, Tableau, Splunk etc.
Ability to develop and interpret Entity Relationship Diagrams (ERD) across data sets in relational database systems as well as non-relational Data stores
Able to do Data Mining and Identify trends, patterns, anomalies in complex data sets across multiple data sources/systems and present results without ambiguity.
Develop data transformations to generate Facts, Summaries, Key metrics by applying business rulesets and aggregations using Python, SQL and other transformation tools
Able to review current processes related to data ingestion, transformation and statistical analysis and re-engineer them
Collaborate with business users across Cofense’s departments in defining requirements, prioritize project work and deliver them timely.
Requirements
Bachelor's degree in Computer Science or Math, Data Analytics, Data Sciences, BI or demonstrated industry experience preferred
Over 5 years of proven experience in data architecture, data modelling, and lifecycle management
Hands-on experience with relational and cloud databases, including Azure SQL Database, Microsoft SQL Server, Amazon Aurora, and Amazon Redshift
Strong background in developing Python and data technologies
Practical experience designing and developing ETL data pipelines and applications using SQL and Python
Strong expertise in writing complex SQL queries for data transformation and automating processes using Python
Experience building and consuming RESTful APIs using Python libraries
Proficient in integrating data platforms with BI tools such as Power BI and Splunk, including dashboard and report development
Solid experience working with Unix/Linux environments, including SSH tunneling and writing/interpreting Bash scripts
Advanced proficiency in Python 3, with hands-on experience using NumPy, SciPy, Scikit-learn, and Pandas
Experience leveraging APIs to extract data and load it into databases using Python.
Tech Stack
Amazon Redshift
AWS
Azure
Cloud
Cyber Security
ETL
Linux
MS SQL Server
Numpy
Pandas
Python
Scikit-Learn
Splunk
SQL
Tableau
Unix
Benefits
Equal employment opportunity
Not discriminating against employees or applicants for employment on any legally recognized basis