x
Jobstore Logo
Get our mobile app
Fast & easy access to Jobstore
Use App
Congratulations!
You just received a job recommendation!
check it out now
Default User Icon Menu Hamburger Icon
Browse Jobs
Companies
MyCareerFair
Awards
Campus Fair
Training
News
Resources
Download App
Kerja Kosong
Kerajaan
Government
Job Vacancy
HRDF Claimable Training Programs on Jobstore
Jobs in Malaysia   »   Jobs in Kuala Lumpur   »   Engineering Job   »   Site Reliability Engineer

Site Reliability Engineer

Trade Nation

Trade Nation company logo

The Site Reliability Engineer ensures the reliability, availability, and performance of web services and applications within the Python development squad. This role bridges development and operations, focusing on building scalable systems, automating processes using Python, and maintaining high service uptime. SREs work closely with Python developers, QA engineers, and product teams to embed reliability into every stage of the software lifecycle, leveraging Python as the primary language for tooling, scripting, and automation.

Who we are

Trade Nation is a global CFD and spread betting broker. We help traders make better decisions through clear market insights, transparent pricing and fairer approach to trading.

Since 2014, we’ve grown into a market-leading, low-cost broker with our headquarters in London and offices across Europe, South Africa, Asia-Pacific, and key offshore regions including the Caribbean and Indian Ocean. Our platform is available in 14 languages, making it accessible to traders worldwide.

Built on transparency and trust, and driven by our people, our focus is simple: helping customers trade more effectively. We do that by keeping costs low, cutting unnecessary complexity and using technology to put traders first.

Your team

Our Kuala Lumpur office is a bright, modern space located at the heart of the city. The team here is expanding, making it our second-largest hub, and we’re excited to welcome like-minded individuals to join an exciting, collaborative environment filled with professionals who have a passion for innovation and driving change.

Our commitments to each other

We have each other’s backs

There when we need each other most

 We challenge each other

Be more creative, more curious, more bold

We thrive together

Taking our work to the next level

We form strong bonds

Through team building and social events

We don’t judge

Instead, we teach and are open to learning

We step up

Taking ownership and supporting each other to do the same

Responsibilities

  • System Design & Maintenance: Design, implement, and maintain scalable, secure, and reliable systems, with Python-based services and microservices as the primary stack.
  • Monitoring & Observability: Build and manage monitoring, alerting, and logging systems using Python-native tooling (e.g. Prometheus clients, OpenTelemetry SDK, structlog); proactively identify and resolve performance issues.
  • Automation & Tooling: Develop and maintain automation tools and internal libraries in Python to streamline operations, reduce manual intervention, and support CI/CD pipelines.
  • Collaboration: Partner with Python development squads to ensure new features are designed with reliability in mind; conduct code reviews for reliability-critical paths; participate in Agile ceremonies.
  • Incident Management: Conduct root cause analysis for incidents and implement corrective actions to prevent recurrence; participate in on-call rotations for critical systems; maintain runbooks in version-controlled Python projects.
  • Continuous Improvement: Drive initiatives to improve system performance, reliability, and scalability through Python best practices, including profiling, benchmarking, and dependency management.
  • Bachelor's degree in Computer Science, Engineering, or equivalent practical experience.
  • Minimum 2 years in SRE, DevOps, or similar roles.
  • Strong Python proficiency — including async frameworks (asyncio, FastAPI), ORM frameworks (Django), testing (pytest), packaging (Poetry/pip), and scripting.
  • Experience with cloud platforms (AWS, GCP, or Azure) and container orchestration (Kubernetes, Docker).
  • Familiarity with Infrastructure-as-Code tools such as Terraform or CloudFormation.
  • Strong problem-solving skills and ability to work effectively under pressure.
  • Excellent communication and collaboration skills for cross-functional teamwork.
Apply Now Quick Apply

Sharing is Caring

Know others who would be interested in this job?

Never provide your bank or credit card details when applying for jobs. Do not transfer any money or complete unrelated online surveys.