Full Time Job

Senior Software Engineer - AI Infrastructure


Beijing, CN 09-14-2022
  • Paid
  • Full Time
  • Senior (5-10 years) Experience
Job Description


Disney / Hulu AI Infra team are providing distributed computing and storage system to supporting the ML Platform. We are trying to deliver auto-scalable, heterogeneous hardware supported(CPU/GPU) computing platform and distributed low latency storage system for storing nearline/online ML associated data. As for the ML computing platform, it will apply PS / AllReduce or other distributed mechanism into distributed training, which will extraordinarily make both the offline training and online learning more efficiently. Creative thinking are encouraged here to help to deliver excellent software. Moreover, tech sharings will regularly be organized within or across teams. We are committed to build a working place that we can enjoy our work and improve delivering efficiency. If you are someone who love AI infra associated technologies, have extraordinary willingness to do something outstanding and love sharing, then this is a right position for you

• Design, Build the computing platform, Optimizes the feature teams' develop experience and continuously improve the efficiency of delivering business value
• Promote technology evolution based on our business requirements and industrial cut-of-edge technologies
• Proactively collaborative with feature teams to optimize our computing platform in creative way
• Promote standard methodologies of software engineering, continuously improve our team's competency on developing and delivering

• BS+ in Computer Science, or related,5+years experience on infra or Backend developing
• Proficient in docker or container technology
• Proficient on Kubernetes
• Proficient on design pattern
• Experienced in Go/Java/C/C++/Python etc
• Proficient in micro-services and distributed system
• Familiar with Machine learning domains, including ML framework, like Tensorflow, lightGBM, PyTorch etc
• Good at on English reading and communication
• Excellent Competency in teamwork and collaboration across teams

• Familiar with Cloud native
• Familiar with Istio/kubeflow/NUMA/RDMA/gRPC/Alluxio etc
• Familiar with distributed low-latency storage system, including low-latency k-v store and distributed file system etc
• Familiar with Agile / Lean
• Familiar with AWS
• Familiar with Hadoop, Spark, Flink etc

Jobcode: Reference SBJ-r70b46-44-197-198-214-42 in your application.