Site Reliability Engineer

EngineeringFull TimeToronto, ON$85,000 - $100,000 CAD per year

Build the tech behind live events

PheedLoop's mission is to help organizers turn ordinary events into unforgettable experiences with event technology that is bold, intuitive, and built to bring people together. From conferences and trade shows to campus events and summits, we help teams run smarter events with tools that feel seamless for both planners and attendees. We bring together check-in kiosks, badge printing, mobile apps, and engagement tools into a connected ecosystem that unifies every part of the event experience. We move fast, we solve real problems, and we care deeply about every organizer trusting us with their biggest moments.

We’re proud of what we’ve built and even more excited for what’s ahead. If you’re someone who embraces challenges, stays motivated to learn, and is ready to raise the bar for the industry, you’ll fit right in. Here, you won’t just work with cutting-edge technology; you’ll work alongside truly exceptional people with sharp minds, big ambition, and a drive to build something meaningful. Learn more about what it’s like to work at PheedLoop by visiting our Careers Page.

What you’ll build everyday

Keep production humming by designing, scaling, and hardening the infrastructure that thousands of users depend on every day. Own uptime, performance, and reliability as first-class product features.
Build and evolve CI/CD pipelines, infrastructure-as-code, and automation that take the toil out of deployments and let engineers ship with confidence. Turn manual runbooks into self-healing systems.
Partner with backend and frontend teams to translate reliability goals into concrete SLOs, error budgets, and architectural improvements. Drive capacity planning, load testing, and performance tuning across services.
Lead incident response when things break, from first page to root cause analysis. Run blameless postmortems, ship the follow-up fixes, and make sure the same fire never starts twice.
Instrument the platform end-to-end with logging, metrics, tracing, and alerting so issues get caught before users feel them. Tune signal-to-noise so on-call is sustainable, not soul-crushing.
Contribute to peer reviews on infrastructure changes and application code that touches reliability-sensitive paths. Mentor junior engineers and share what you learn with the wider team.
Keep a close eye on cloud spend and drive cost optimization initiatives, right-sizing resources, eliminating waste, and architecting for efficiency so reliability and budget both stay in the green.

Skills we are looking for

Solid hands-on experience with a major cloud platform (AWS and GCP) and container orchestration, AWS Fargate Strongly Preferred. Comfortable operating production workloads at scale.
Fluent with infrastructure-as-code (Terraform strongly preferred) and configuration management. You treat infrastructure like software: versioned, reviewed, and tested.
Strong scripting and automation skills in Python or Bash. Able to jump into application code (Python/Django a plus) to debug issues across the stack, not just the platform layer.
Deep familiarity with observability tooling such as CloudWatch. You know the difference between a good alert and a noisy one.
Working knowledge of relational databases (Postgres), including replication, backups, query performance, and incident recovery. Exposure to caching layers and message queues is a plus.
Comfortable with Git-based workflows, code review culture, and shipping changes through modern CI/CD systems (GitHub Actions, GitLab CI, CircleCI, or similar).
3+ years in an SRE, DevOps, or production engineering role, with real on-call experience and a track record of shipping reliability improvements that moved the numbers.
Bachelor's degree in Computer Science or related field, or equivalent real-world experience. Strong written and verbal English communication — you can explain a complex outage to engineers and non-engineers alike.
Calm under pressure, methodical in a crisis, and genuinely curious about how systems fail. Team-first mindset, sharp problem-solving instincts, and a bias toward automating the boring stuff away.

Perks that hit different

🩺 Enjoy 100% employer-paid health coverage, because your well-being matters.

🚇 Work from an office directly connected to the TTC subway, making your commute smooth and stress-free.

🍜 Join team lunches, learning opportunities, and regular outings that make growth and networking part of the job.

🚀 Be part of an ambitious, high-performance culture surrounded by people who love building big things.

The Kind of People We’re Looking For

We’re looking for ambitious people who work hard, think big, solve problems fast, and know how to enjoy the journey along the way. If you’re someone who wants to grow personally and professionally, you’ll feel right at home here. On a small team like ours, your impact won’t be “someday”- it’ll be immediate. You’ll take on real responsibility early, move quickly, and help shape products used by events around the world. Your work at PheedLoop will matter from day one. Please note: This role is an existing vacancy. We do not use artificial intelligence or automated decision-making tools in our hiring process.

Apply Now See all open positions