Site Reliability Engineer - USDS

TikTok • Sydney • 6d ago

This is a Site Reliability Engineer - USDS role with TikTok based in Sydney, NSW, AU TikTok Role Seniority - mid level More about the Site Reliability Engineer - USDS role at TikTok Site Reliability Engineering(SRE) at TikTok combines software and systems engineering to build and run large-scale, massively distributed, and fault-tolerant systems. In our team, you’ll have the opportunity to manage the complex challenges of scale, while using expertise in coding, algorithms, complexity analysis, and large-scale system design. We embrace a culture of diversity, intellectual curiosity, openness, and problem-solving. We encourage close collaboration while promoting self-direction. Responsibilities: - Develop and maintain automation procedures to maximize system efficiency and minimize human intervention. - Work closely with software engineering teams to design, deploy and operate elements to ensure that systems are functionally robust. - Ensure system scalability to handle growth in web traffic and data. - Implement monitoring tools and set up metrics to keep track of system health and performance. - Participate in on-call rotations, assist with incident management, and diagnose, resolve, and prevent production issues. - Conduct performance tests to find and address system bottlenecks. - Collaborate with teams across the organization to define Service Level Objectives (SLOs), Service Level Indicators (SLIs), and Service Level Agreements (SLAs). - Practice sustainable user support, incident response, and blameless postmortems. In order to enhance collaboration and cross-functional partnerships, among other things, at this time, our organization follows a hybrid work schedule that requires employees to work in the office 3 days a week, or as directed by their manager/department. We regularly review our hybrid work model, and the specific requirements may change at any time. Minimum Qualifications: - Bachelor's degree in Computer Science, Information Technology, or a related field with 3 years of experience - Proven work experience as a Site Reliability Engineer, Systems Engineer, or similar software engineering role. - Proficient knowledge of high-level programming languages (e.g. Python, Go, Java, and Shell script). - Experience in network architecture, database modeling, cloud systems and large-scale distributed systems. - Strong understanding of Linux operating systems and open-source technologies. Preferred Qualifications: - Experience in MySQL, Redis, Ngnix, Kubernetes, Docker, OpenStack, Hadoop, Spark, etc - [Preferred] Knowledge of monitoring tools and methodologies (such as Prometheus, Grafana). - Excellent problem-solving skills, strategic thinking, and a strong ability to debug complex systems. - Exceptional communication skills and the ability to effectively collaborate with cross-functional teams. Before we jump into the responsibilities of the role. No matter what you come in knowing, you’ll be learning new things all the time and the TikTok team will be there to support your growth. Please consider applying even if you don't meet 100% of what’s outlined Key Responsibilities ⚙️ Developing and maintaining automation procedures Working closely with software engineering teams Implementing monitoring tools Key Strengths High-level programming languages ☁️ Network architecture and cloud systems Linux operating systems Monitoring tools and methodologies Containerization technologies Big data technologies A Final Note: This is a role with TikTok not with Hatch.