ArcheSys Inc is a technology firm specializing in innovative cloud solutions and services for clients across various industries. They are seeking a highly motivated Fullstack Engineer to design, develop, and maintain Grafana dashboards and data pipelines while enhancing system visibility and reliability through AWS and DevOps practices.

Responsibilities:

Design, develop, and maintain comprehensive, intuitive, and real-time Grafana dashboards that visualize key operational metrics, business KPIs, and application logs
Collaborate with SRE, development, and product teams to gather requirements and translate complex data into clear, actionable visualizations
Optimize Grafana dashboards for performance, scalability, and usability, ensuring quick loading times and effective data presentation
Implement alerting rules within Grafana to proactively notify teams of anomalies and potential issues
Design and implement robust ETL/ELT pipelines to extract, transform, and load data from various sources (e.g., Prometheus, Splunk, CloudWatch, RDS, OpenTelemetry, custom APIs) into data stores consumable by Grafana
Write and optimize complex queries (SQL, PromQL, Splunk SPL, etc.) to ensure data accuracy and efficiency
Develop and maintain APIs to facilitate data exchange and integration between different system components and monitoring tools
Implement data quality checks, performance tuning (indexing, partitioning), and backup/restore strategies for data sources
Design, deploy, and manage scalable and resilient AWS infrastructure to support Grafana instances, data sources, and related services
Utilize AWS services such as EC2, ECS/EKS, Lambda, S3, RDS, CloudWatch, Kinesis, DynamoDB, and others to build and optimize our observability platform
Implement security best practices within the AWS environment, including IAM roles, security groups, and network configurations
Design, implement, and maintain robust CI/CD pipelines for automating the build, testing, and deployment of Grafana dashboards, underlying data pipelines, and infrastructure as code
Utilize tools like AWS CodePipeline, Jenkins, GitLab CI, or similar for continuous integration and continuous deployment
Develop and maintain Infrastructure as Code (IaC) using Terraform, CloudFormation, or Ansible for managing all AWS resources
Automate operational tasks, monitoring deployments, and testing processes to improve efficiency and reliability
Apply SRE principles to ensure the reliability, scalability, and performance of our monitoring and observability infrastructure
Participate in on-call rotations, responding to alerts and incidents related to dashboard functionality, data accuracy, and performance
Conduct root cause analysis (RCA) for incidents and implement corrective actions to prevent recurrence
Define and track Service Level Objectives (SLOs) and Service Level Indicators (SLIs) for key services, ensuring dashboards reflect these metrics accurately
Work closely with cross-functional teams (development, operations, product) to understand monitoring needs and provide expert guidance on observability best practices
Create and maintain comprehensive documentation detailing dashboard designs, data sources, query logic, AWS architecture, and operational procedures
Contribute to code reviews, promote best practices, and mentor junior team members

Requirements:

Bachelor's degree in Computer Science, Software Engineering, or a related technical field, or equivalent practical experience
4-7 years of experience in a Fullstack Development, Data Engineering, or SRE role with a strong focus on monitoring, observability, and AWS infrastructure
Proven hands-on experience designing, developing, and maintaining complex Grafana dashboards
Strong proficiency in at least one backend programming language (e.g., Python, Go, Java, Node.js)
Extensive experience with various data sources for Grafana (e.g., Prometheus, Loki, Splunk, SQL databases, CloudWatch)
Deep hands-on experience with AWS cloud services, including but not limited to EC2, ECS/EKS, Lambda, S3, RDS, CloudWatch, Kinesis, DynamoDB
Proven experience designing and implementing robust CI/CD pipelines and DevOps automation using tools like AWS CodePipeline, Jenkins, GitLab CI, or similar
Strong experience with Infrastructure as Code (IaC) tools such as Terraform, CloudFormation, or Ansible
Solid understanding of SRE principles, including SLOs, SLIs, error budgets, toil reduction, and incident management
Experience with containerization technologies (Docker, Kubernetes)
Excellent analytical and problem-solving skills with a keen eye for detail
Strong communication and interpersonal skills, with the ability to articulate complex technical concepts clearly to diverse audiences
Ability to work independently and collaboratively in a fast-paced, dynamic environment
AWS Certifications (e.g., Solutions Architect, DevOps Engineer)
Experience with other observability tools (e.g., Datadog, New Relic, OpenTelemetry)
Knowledge of distributed tracing concepts and tools (e.g., Jaeger, Tempo)
Experience with machine learning for anomaly detection in time-series data
Contributions to open-source projects related to Grafana or observability

Fullstack Engineer - U.S. Citizen (Remote)

Key skills

About this role

Responsibilities:

Requirements: