Greystar is a leading global real estate platform offering expertise in property management and development. They are seeking an experienced Azure Infrastructure and Site Reliability Engineer to design, implement, and manage Azure-based infrastructure, ensuring system scalability and performance while collaborating across teams.
Responsibilities:
- Design, implement, and manage Azure solutions to meet technical and operational needs
- Maintain comprehensive network and server documentation, including infrastructure diagrams, server configurations, standard operating procedures, and incident reports
- Optimize Azure resource configuration for performance, cost and security
- Monitor the health and reliability of Azure resources, ensuring high availability
- Continuously monitor network/server performance, using advanced network management and server administration tools to identify issues proactively
- Monitor and manage the health, performance, and availability of our applications running on Azure (including ADF pipelines)
- Collaborate with development teams to align Azure architecture with application requirements
- Provide guidance on best practices for Azure resource provisioning, scaling, and configuration
- Enable teams to leverage Azure services, including DataBricks, for analytics and data workflows
- Detect and analyze network and server anomalies, security threats, and performance bottlenecks. Initiate incident response procedures and coordinate with relevant teams for swift resolution
- Investigate and resolve infrastructure, network/server-related issues, escalate complex problems to higher-level teams, and maintain detailed incident documentation
- Set up and manage Azure Databricks environments for big data processing and advanced analytics
- Support and optimize Databricks pipelines for data engineers and scientists
- Effectively Troubleshoot and resolve Databricks-related challenges
- Develop and maintain Infrastructure as Code (IaC) scripts using Terraform
- Implement DevOps practices, including CI/CD pipelines, automated testing, and monitoring
- Streamline workflows by collaborating with IT and operations teams
- Act as the primary point of contact for Azure-related issues within the project
- Investigate, diagnose, and resolve complex technical issues in collaboration with development and operations teams
- Implement preventive measures to minimize downtime and disruptions
- Analyze trends and metrics to identify areas for improvement and optimization
- Identify and implement cost optimization opportunities within Azure infrastructure and services
- Conduct regular reviews of Azure cost management using cost management tools
- Stay updated on emerging Azure technologies, AI and cloud computing trends to drive innovation
- Identify opportunities to improve processes, tools, and systems to enhance efficiency and scalability
Requirements:
- Proven experience as an Azure Architect or Senior Systems Engineer with hands-on Azure solution implementation
- Proficiency with monitoring tools such as Datadog, Azure Monitor, Application Insights, Log Analytics, or similar
- Strong expertise in Azure Databricks, including setup, configuration, and optimization
- Proficiency in Azure services such as App Services, Function Apps, CosmosDB, Storage, Networking, Azure DevOps, and Security
- Expert in troubleshooting and resolving Azure-related issues
- Hands-on experience with Infrastructure-as-Code (IaC) tools like Terraform, ARM templates, or Bicep
- Strong scripting skills in PowerShell, Python, or similar
- Excellent communication and collaboration skills, with the ability to work effectively in cross-functional teams
- Bachelor's degree in computer science, information technology, business management information systems, or equivalent experience. Advanced degree preferred
- Experience with DevOps practices and CI/CD pipelines in an Azure environment
- Experience with cloud cost management strategies
- Azure certifications (e.g., Azure Solutions Architect Expert, Azure Data Engineer Associate) are highly desirable
- Knowledge of network security practices and regulatory compliance