Cofense is a cybersecurity platform focused on stopping phishing threats. The Data Engineer will integrate enterprise-wide data from various cybersecurity applications and develop data pipelines using cloud technologies while ensuring data privacy and compliance.
Responsibilities:
- Work with Architects & Cloud Systems Engineers in designing Data Platform and Architecture
- Substantial experience with SQL and no-SQL OLTP databases and OLAP data warehousing technologies, especially AWS Aurora
- Experience data modeling and building data pipelines for multi-product and/or multi-department organizations
- Develop Data Pipelines on Cloud Technologies like Azure/AWS with well-defined tool frameworks
- Able to develop ETL code to stream data from disparate (structured and semi-structured) SaaS product data stores to Data Lake/Data Warehouse using Python, Azure/AWS Data Lake services
- Ability to write complex SQL scripts and automate them using Python
- Develop test cases and unit tests for key implementations of Data Platform by adhering to software engineering best practices and standards
- Secure data end-to-end by complying data privacy rules while developing processes to move data across Applications/Data Sources and Data Lake/Data Warehouse, as well as while delivering data through SQL clients and BI tools
- Experience building adhoc reporting APIs and service layer on top of underlying OLTP and OLAP databases
- Help integrate Data Platform with BI tools like Power BI, Tableau, Splunk etc
- Ability to develop and interpret Entity Relationship Diagrams (ERD) across data sets in relational database systems as well as non-relational Data stores
- Able to do Data Mining and Identify trends, patterns, anomalies in complex data sets across multiple data sources/systems and present results without ambiguity
- Develop data transformations to generate Facts, Summaries, Key metrics by applying business rulesets and aggregations using Python, SQL and other transformation tools
- Able to review current processes related to data ingestion, transformation and statistical analysis and re-engineer them
- Collaborate with business users across Cofense’s departments in defining requirements, prioritize project work and deliver them timely
- Other duties as assigned
Requirements:
- US Citizenship – Supports FedRamp
- Substantial experience with SQL and no-SQL OLTP databases and OLAP data warehousing technologies, especially AWS Aurora
- Experience data modeling and building data pipelines for multi-product and/or multi-department organizations
- Develop Data Pipelines on Cloud Technologies like Azure/AWS with well-defined tool frameworks
- Able to develop ETL code to stream data from disparate (structured and semi-structured) SaaS product data stores to Data Lake/Data Warehouse using Python, Azure/AWS Data Lake services
- Ability to write complex SQL scripts and automate them using Python
- Develop test cases and unit tests for key implementations of Data Platform by adhering to software engineering best practices and standards
- Secure data end-to-end by complying data privacy rules while developing processes to move data across Applications/Data Sources and Data Lake/Data Warehouse, as well as while delivering data through SQL clients and BI tools
- Experience building adhoc reporting APIs and service layer on top of underlying OLTP and OLAP databases
- Help integrate Data Platform with BI tools like Power BI, Tableau, Splunk etc
- Ability to develop and interpret Entity Relationship Diagrams (ERD) across data sets in relational database systems as well as non-relational Data stores
- Able to do Data Mining and Identify trends, patterns, anomalies in complex data sets across multiple data sources/systems and present results without ambiguity
- Develop data transformations to generate Facts, Summaries, Key metrics by applying business rulesets and aggregations using Python, SQL and other transformation tools
- Able to review current processes related to data ingestion, transformation and statistical analysis and re-engineer them
- Collaborate with business users across Cofense's departments in defining requirements, prioritize project work and deliver them timely
- Expertise in SQL skills for data transformations, statistical analysis, and troubleshooting across more than one Database Platforms (MySQL, PostgreSQL, Redshift, Azure SQL Warehouse etc.)
- Expert in writing complex SQL scripts and automate them using Python
- Knowledge of Data management on NoSQL DBs like DynamoDB, Mongo, and know-how of Big Data tools Hadoop, Spark, Kafka/Kinesis/SQS/Azure Queues or other messaging tools is huge plus
- Analytical skills, with good at finding data trends/outliers, anomalies, and articulate complex information or data points with Business Users, Management, and individuals
- Enthusiasm to work with lot of data across disparate data sources and Databases
- Has strong sense of engineering craftsmanship, takes pride in the code they write
- Has a sense of intellectual curiosity and a burning desire to learn is self-driven, actively looks for ways to contribute, and knows how to get things done
- Is deliriously customer-focused both internal and external customers
- Sees big picture impact and relationships among and across work units
- Identifies complex technical problems and tries to resolve with minimal help
- Over 5 years of proven experience in data architecture, data modelling, and lifecycle management
- Hands-on experience with relational and cloud databases, including Azure SQL Database, Microsoft SQL Server, Amazon Aurora, and Amazon Redshift
- Strong background in developing Python and data technologies
- Practical experience designing and developing ETL data pipelines and applications using SQL and Python
- Strong expertise in writing complex SQL queries for data transformation and automating processes using Python
- Experience building and consuming RESTful APIs using Python libraries
- Proficient in integrating data platforms with BI tools such as Power BI and Splunk, including dashboard and report development
- Solid experience working with Unix/Linux environments, including SSH tunnelling and writing/interpreting Bash scripts
- Advanced proficiency in Python 3, with hands-on experience using NumPy, SciPy, Scikit-learn, and Pandas
- Experience leveraging APIs to extract data and load it into databases using Python
- Bachelor's degree in Computer Science or Math, Data Analytics, Data Sciences, BI or demonstrated industry experience