Senior Technical Manager, Problem Management
The Hong Kong Jockey Club
Founded in 1884, The Hong Kong Jockey Club (“the Club”) is a world-class racing club that acts continuously for the betterment of our society. The Club has a unique integrated business model, comprising racing and racecourse entertainment, a membership club, responsible sports wagering and lottery, and charities and community contribution. Through this model, the Club generates economic and social value for the community and supports the HKSAR Government in combatting illegal gambling.
Who are we?
We are the IT Division of HKJC, a vibrant community of over 1,500 dedicated professionals working collaboratively across Hong Kong and Shenzhen.
Our team is a diverse mix of individuals from various backgrounds, from all across the world. We embrace our humanity, recognizing that each of us brings unique strengths and perspectives. This diversity not only enriches our work environment but also drives our innovation and creativity as we strive to achieve our collective goals.
What do we do?
We design, build, and operate the technology that powers the Club. Our primary focus is on delivering the service that supports our hospitality, racing and wagering operations, to ensure that our customers and members enjoy exceptional experiences.
We also deliver the changes necessary to drive business growth through new products and services. And, we are committed to safeguarding the Club by protecting it from external threats, providing a secure and resilient technological environment.
The Department
The IT Infrastructure and Platform Operations Department is responsible for the design, implementation, and management of the infrastructure that supports the Club’s IT systems, and leads the Service Management capabilities that ensure the smooth running of these systems.
This department ensures that all technological resources operate efficiently and effectively to support business objectives. Key responsibilities include:
- Design and operate processes and controls that ensure IT service availability, performance, and resilience are aligned with business expectations.
- Manage the 24x7 IT Operations Centre.
- Manage the Club’s exploitation of the public cloud.
- Manage the complete lifecycle of the Club’s IT network and the technology within our data centres.
- Provide the roadmaps, standards, and capabilities that enable our IT infrastructure to remain current (eligible for vendor support) and secure (patched and remediated against CVEs).
- Provide the Club’s colleague collaboration technology suite, including desktop and laptop computers, mobile devices, collaboration tools, carrier contracts, and associated support functions.
The Job
- Problem Identification & Root Cause Analysis:
Lead discussions with technical teams to gather data on incident trends, hardware/software failures, and resource use. Analyse incident records to identify patterns and potential problems. Conduct thorough investigations using root cause techniques like 5 Whys, Fishbone Diagram, & Fault Tree Analysis. Employ data analytics & AIOps tools to detect anomalies and recurring issues relevant to the Club's IT service demands. Communicate documented findings to stakeholders. Consider service management’s four dimensions: People, Process, Technology, and Supplier during analysis.
- Problem Control:
Consider all contributory causes, including factors affecting incident duration & impact. Drive the reduction of operational 'Toil' by identifying manual, repetitive workarounds and partnering with DevOps/Engineering teams to automate them. Conduct error control to find permanent solutions, prioritizing engineering fixes that enhance system resilience.
- Collaboration:
Work closely with SMEs, developers & stakeholders for seamless problem resolution. Facilitate inter-team communication for unified management approaches. Establish effective meeting rhythms with clear agendas, action items, and delivery timelines. Engage external vendors/service providers as needed, maintaining open, timely communication. Collaborate with incident managers, recognising complementary but sometimes conflicting processes. Interface with risk, change, knowledge management & continual improvement teams.
- Incident Washup Calls:
Prepare and moderate washup calls post local horse racing events. Ensure communication and coordination to identify/address issues. Set urgency, drive troubleshooting, and facilitate root cause/impact discussions. Document follow-up actions—further analysis, emergency fixes, preventive measures—and track assignments to completion. Develop and implement remediation plans collaboratively, using configuration changes, software releases, or infrastructure enhancements. Summarise key findings, decisions, and next steps clearly for senior management.
- Training:
- Reporting:
- Continuous Improvement:
Continuously enhance problem management processes for better service quality & efficiency. Stay current with industry trends and best practices. Conduct regular reviews/audits to find improvement areas. Champion a culture of Site Reliability Engineering within the department, promoting 'blameless' post-incident reviews to encourage transparency. Leverage BMC Helix platform advancements & AI features to improve ITSM/ITOM by simplifying, automating, & aligning with industry standards for reliability and efficiency. Implement roadmap for platform migration, building service models, connecting critical business activities to configuration items, & enhancing monitoring. Use AI-driven insights and predictive management to accelerate MTTD, MTTR & improve service reliability & operational efficiency & SRE methodologies to shift from reactive firefighting to proactive reliability engineering.
About You
- Degree or above qualifications in Computer Science, Engineering or relevant disciplines
- Minimum 15 years of work experience in an IT environment, with 8 or more years of experience in project management of medium to large-scale IT Infrastructure projects
- Track record of relevant experience in IT infrastructure/operations implementation projects
- Strong technical knowledge and experience in IT service management, incident management, and problem management. Excellent analytical and problem-solving skills to identify root causes and develop effective solutions.
- Strong verbal and written communication skills to effectively collaborate with IT teams, business users, and stakeholders. Ability to manage multiple projects and tasks simultaneously, ensuring deadlines are met and objectives are achieved.
- ITIL Foundation certification is required; advanced ITIL certifications are a plus. Proven track record in managing and resolving complex IT issues.
- Experience with AI and machine learning applications in ITSM, including predictive analytics and automated remediation.
- Familiarity with the latest BMC Helix platform and its capabilities, including ServiceOps, AIOps, and ITOM technologies. Ability to drive the adoption of these technologies to improve service management processes and outcomes
Apply Now!
We offer competitive salary and benefits packages, a dynamic working environment and development opportunities.
Add horsepower to your career today. Click the “Apply Now” button to create an account and submit your application.
Equal Opportunity and Inclusive Hiring
We are an equal opportunity employer and strive to create an inclusive workplace for all. Applicants from diverse backgrounds are welcomed to apply. If you have any special needs or require accommodations during the interview process, please e-mail us via careers@hkjc.org.hk. Personal data provided by job applicants will be used strictly in accordance with the Club's notice to employees and job applicants relating to the Personal Data (Privacy) Ordinance. A copy of which will be provided immediately upon request.
Share this Job :
To share this job on WeChat, please click the button below to copy the link: