Feb 24, 2023
-
Level D&I Solutions
-
Remote
Contractor
Defense
Engineering
Government
Information Technology
Other
Job Description
CLICK ON LINK BELOW TO BE DIRECTED TO OUR WEBSITE FOR YOUR APPLICATION PROCESS:
https://jobs.leveldi.com/job/senior-site-reliability-engineer-remote/
Our client is seeking a remote Senior Site Reliability Engineer who will provide hands-on design, analysis, development and troubleshooting of highly-distributed large-scale production systems and event-driven services spanning on-prem and AWS based hosting. Majority of work will be taking Applications that currently are not running in Containerized Kubernetfied environment and moving them over.
Ideal Candidate Profile:
- Experience as a Software Developer working on n-Tier AWS architecture.
- Adept at authoring, reviewing, and deploying Terraform.
- Well-versed in Docker container theory and containerization strategies for application code
- Hands-on experience in configuring and deploying on AWS EKS
- Expertise in AWS well-architected principles and design strategy
Requirements:
- Strong Linux administration/build/management skills
- Development experience in at least one of these languages: Java, Go, C# and/or Python; Strong skills in reading, understanding and writing code in the same
- Demonstrated expertise building and managing highly scaled production infrastructure in on-prem and AWS based environments
- Extensive experience troubleshooting n-tier architectures with diverse sets of technologies strongly desired. (e.g. load balancers, web/app/caching/database servers, queues, threading, memory, cpu, heap, storage, network, os)
- Strong experience using application and infrastructure monitoring systems (like Splunk, CloudWatch, Datadog, New Relic, Sumologic, ELK)
- Excellent presentation and communication skills
- Mastery of infrastructure automation technologies (like Terraform, Puppet, Ansible, Chef)
- Expertise with continuous deployment based software development lifecycles (e.g. CI/CD)
- Experience with common middleware (e.g., Apache, NGINX, IIS, Tomcat, JBoss)
- Experience with SQL databases (e.g., PostgreSQL, Oracle, MySQL)
- Expertise with SDLC branching, SCM, and code deployment systems (git/gitflow, Jenkins, CircleCI, TravisCI, etc.)
- Expertise in container/container-fleet-orchestration technologies (like Docker, Vagrant, Mesosphere)
- BS Degree in Computer Science (or related technical field and/or equivalent industry experience)
Cloud Engineering:
- Ensure repeatability, traceability, and transparency of our infrastructure automation (infrastructure-
as-code, monitoring-as-code)
- Participate in continual learning of the AWS ecosystem, game day scenarios, and professional
conferences
- Collaborative solutioning of enterprise applications with development teams utilizing our software
stack
- Actively monitor AWS Cost, and utilize optimizer to maximize ROI while maintaining Service Level
Objectives
Observability Engineering
- Ownership of reliability, uptime, system security, cost, operations, capacity, resiliency and
performance-analysis thereof
- Define, monitor and report on service level indicators for applications workloads
- Support on-call rotations for operational duties that have not been addressed with automation, with
an eye for correcting issues that result in on-call alarms
- Maintain telemetry that improve the visibility to our applications & performance and business metrics
and keep operational workload in-check
- Develop, communicate, collaborate, and monitor standard processes to promote the long-term health
and sustainability of operational development tasks.
DevSecOps
- Support healthy software development practices, including complying with agile software
development methodology, building standards for code reviews, work packaging, and continuous
delivery
- Partner with CyberSecurity and develop plans and automation to respond to new risks and
vulnerabilities
Systems Engineering
- Collaborate with Systems Admins to coordinate middleware, network, storage, database, Windows, Linux, VMware maintenance
- Automate legacy onprem system maintenance and migrate to cloud via thoughtful redesign
Resiliency Engineering
- Collaborate with dev teams to identify failure points and blast radius of systems
- Validate effectiveness of monitoring and observability configurations
- Coordinate failure injection testing
- Observe and document steady state production levels, growth patterns
- Plan and forecast for seasonal growth, communicate trend lines with leadership, enhance infrastructure scaling plans to accommodate 2x planned load
- Coordinate improvements of existing software and infrastructure to meet resiliency goals
Additional Qualifications:
- Candidate who has moved Legacy workloads to Kubernetes experience
- Candidate who has experience migrating production workloads between AWS Accounts
- Candidate who has worked in SAAs Businesses – with AWS infrastructure and Terraform
- Software Engineer – 5 years’ experience with DevOps, SRE – 2 years’ hands on experience
- Looking someone with Engineering background, understands how to read code, can code in different languages. Could be Java, P2P as they are to two Stacks utilized by the client right now
- Relatively deep understanding of Containers and Kubernetes
- How Containers work and how to build them
- AWS knowledge – knowing infrastructure, code, using Terraform
- Potentially AWS account migration-moving items from one AWS account to another
- Familiar with control tower, Terraform and understands/familiar with all AWS Services
- Candidate needs to be Proactive, Inquisitive, Capacity to Think
- Not Looking for someone who is just a Task Taker
States
AL, AK, AZ, AR, CA, CO, CT, DE, DC, FL, GA, HI, ID, IL, IN, IA, KS, KY, LA, ME, MD, MA, MI, MN, MS, MO, MT, NE, NV, NH, NJ, NM, NY, NC, ND, OH, OK, OR, PA, RI, SC, SD, TN, TX, UT, VT, VA, WA, WV, WI, WY
Security Clearance
NO Security Clearance