As Director, Reliability Engineering (RE), you will inspire a team of diverse RE’s with your leadership and passion for increasing system reliability. You will coordinate RE engagement needs across the organization focused on anticipating, identifying and solutioning for some of the most complex production issues impacting the business. You will provide a single interface to engineering leadership and partner with Product and UX leadership to ensure overall product success.
The ideal candidate is interested in building scalable infrastructure, adding system resiliency, improving developer productivity, automating everything that can (and should) be automated, as well as being a thoughtful people manager and leader. They will oversee a team of RE engineers responsible for overall system health, availability, performance and reducing operational issues as well as the long-term strategy for our infrastructure. You will report to the VP, Technology services and be based in our Pittsburgh office.
Establish and enhance the vision for RE at DSG and associated brands. Develop strategy and long-term roadmap of our reliability function and technology infrastructure:
Evaluate RE requirements and build out RE team to scale:
Drive the reliability of DSG’s business critical services in a complex distributed ecosystem
Develop a set of support practices for all Vertical (business facing) Product Teams, as well as Foundational (shared services including Platform, Infrastructure, Security, D&A) technology domains
Partner with domain teams to steer product roadmaps and ensure reliability is built in
Serve as an extension of these domains by discovering ways to improve support operations. Creating scalable engineering solutions will be at the heart of what you do.
As the primary leader of RE at DSG, you will study and understand RE industry best practice and help to elevate DSG’s status within the broader RE community.
Oversee day to day Reliability Engineering activities across all Brands and Channels:
Implement best practices to improve scalability of our systems across Store, Omni, eComm, Marketing, Supply Chain, and Corporate Tech as well as horizontal Foundational domains
Establish consistent reliability processes for all Digital and traditional Channels as we support more Brands, Vendors, Products, features, and technology platforms, etc.
Build an ecosystem of Observability to aid in detection, triage, diagnosis and ultimate resolution of business and technology impacting events
Establish and monitor KPIs for reliability, throughput, quality, and controls; deliver dashboards that provide operational and executive views
Perform 24×7 Level 2 support functions for all critical applications, systems, and products
Own system uptime, monitoring/alerting, CI/CD, cloud networking, security, and overall performance
Be a hands-on contributor to projects, including some coding, code reviews, and architectural discussions
Partner with Software Engineering to maximize product and platform reliability through code, tools, and monitoring improvements
Lead the transformation of system reliability, resiliency, and performance for all DSG products and services to the next generation.
Implement Self-Healing solutions to address failures and faults and reduce business impact
Lead the Test Engineering team. Leverage test automation, end to end and exploratory testing to detect issues and flaws before they result in business disruption
By thoughtfully setting strategies for reducing toil you will improve the athlete and teammate experiences and enable our engineering & support organizations to run highly reliable services.
Staff Management and Financial Planning:
Perform staff oversight and financial management for all aspects of functions described in this job description.
Create, implement, and enhance an organization that best supports these responsibilities, and delivers world class operations and support functions to this Fortune 500 company.
Control and manage a budget that leverages technology and automation to delivery seamless and reliable technology execution.
As a technology leader – Participate in overall technology strategy, goal setting, and future vision activities.
Our teammates know that there is an athlete behind every in-store and eCommerce transaction. We go beyond the expected to build technology that makes the DICK’S Sporting Goods’ experience innovative and hassle-free.
COMMITTED TO INCLUSION & DIVERSITY.
We actively seek to create an inclusive and diverse workforce, reflecting the communities we serve. Doing so strengthens our ability to serve all our athletes and drive innovation and growth.
HAVE A PASSION FOR SPORTS.
We believe that sports make people better and we’re determined to be the best sports company in the world. Whether you’re an athlete or sports enthusiast, we bring our passion for the game into everything we do.
GET BETTER EVERY DAY.
The journey is never over. We know that to be the best, we must get a little better each day. We focus on delivering 1% more in everything we do.
What we’re looking for
Bachelor’s degree in Computer Science, related technical field or equivalent practical experience.
10 years of experience with system design, algorithms, data structures, analysis, and software design.
10 years of experience managing a distributed team of engineers
Experience growing and building teams
Experience managing technology infrastructure and conducting technical deep dives into code
5+ years of site reliability engineering, DevOps, or related infrastructure exp
3+ years of engineering management experience
2+ years of retail and/or e-commerce experience
Experience with modern architectures and cloud native design
Experience with cloud infrastructure (Azure, GCP, etc.)
Experience with data streaming platforms like Apache Kafka, and other utility services
Experience with PCF, or similar PaaS providers
Proficiency in data collection and display toolsets (e.g. ELK, Prometheus, etc.)
Familiarity and exposure to Extreme Programming techniques
Prior experience with test engineering and automation tools