Scalence L.L.C. is a Fortune 500 technology, engineering, and science solutions leader addressing complex challenges in various sectors. They are seeking a Data Automation Engineer to design and implement AI-driven automation solutions across AWS and Azure environments, focusing on building scalable data pipelines and automating processes for analytics and customer engagement.
Responsibilities:
- Design and maintain data pipelines in AWS using S3, RDS/SQL Server, Glue, Lambda, EMR, DynamoDB, and Step Functions
- Develop ETL/ELT processes to move data from multiple data systems including DynamoDB → SQL Server (AWS) and between AWS ↔ Azure SQL systems
- Integrate AWS Connect CRM data into the enterprise data pipeline for analytics and operational reporting
- Engineer, enhance ingestion pipelines with Apache Spark, Flume, Kafka for real-time and batch processing into Apache Solr, AWS Open Search platforms
- Leverage Generative AI services and Frameworks (AWS Bedrock, Amazon Q, Azure OpenAI, Hugging Face, LangChain) to: Create automated processes for vector generation and embeddings from unstructured data. Automate data quality checks, metadata tagging, and lineage tracking. Enhance ingestion/ETL with LLM-assisted transformation and anomaly detection. Build conversational BI interfaces that allow natural language access to Solr and SQL data
- Develop AI-powered copilots for pipeline monitoring and automated troubleshooting
- Implement SQL Server stored procedures, indexing, query optimization, profiling, and execution plan tuning to maximize performance
- Apply CI/CD best practices using GitHub, Jenkins, or Azure DevOps for both data pipelines and GenAI model integration
- Ensure security and compliance through IAM, KMS encryption, VPC isolation, RBAC, and firewalls
- Support Agile DevOps processes with sprint-based delivery of pipeline and AI-enabled features
Requirements:
- BS in Computer Science or related field with 2+ years of data engineering, automation experiences
- Hands-on experience with SQL, SSIS, Python, Spark, Bash, Power shell, AWS/Azure CLIs
- Experience with AWS services like S3, RDS/SQL Server, Glue, Lambda, EMR, DynamoDB
- Familiarity with Apache Flume, Kafka, Solr for large-scale data ingestion and search
- Familiarity with LLM, Gen AI frameworks using AWS Bedrock, Azure OpenAI or open source platform, tools
- Experience with integrating REST API calls in data pipelines and workflows
- Familiarity with JIRA, GitHub / Azure DevOps / Jenkins for SDLC and CI/CD automation
- Strong troubleshooting and performance optimization skills in SQL, Spark or other data engineering solutions
- Experience operationalizing Generative AI (GenAI Ops) pipelines, including model deployment, monitoring, retraining, and lifecycle management for LLMs and AI-enabled data workflows
- Good communication and presentation skills
- Ability to obtain Federal government Public Trust clearance
- Certifications: AWS Data Engineer, AWS AI/ML Specialty, Azure AI Engineer, Databricks certified Data Engineer
- Experience implementing RAG pipelines, embeddings, and vector search with Solr, OpenSearch, FAISS, Pinecone, or Pgvector/SQL server vector types
- Experience with GenAI powered coding tools such as Claude Code, OpenAI Codex, VS Code
- Experience with multi-cloud data integration (AWS ↔ Azure SQL)
- Familiarity with Client BizTalk and SSIS for SQL Server ETL workflows
- Knowledge of data lineage/governance tools (Purview, Unity Catalog, AWS Glue Catalog)
- Familiarity with Infrastructure-as-Code (Terraform/CloudFormation, Bicep) for automated deployments
- Experience with compliance frameworks (FedRAMP, PCI-DSS, HIPAA)