Job type
Location
Noida, Uttar Pradesh
Benefits
Pulled from the full job description
Full job description
LOCATION: IN-PERSON
SHIFTIME: OVERNIGHT 8:30pm - 6:30am IST
Job Description:
Own the end-to-end path from open-source model → production endpoint on AWS. You’ll provision GPU infra (EKS/EC2), containerize model servers (vLLM/TGI/TensorRT-LLM), ship autoscaling and observability, and drive tokens/s and p95 down while keeping costs predictable.
Key Responsibilities:
- Build & harden AWS GPU stacks (EKS or EC2 ASG): VPC, subnets, SGs, IAM, ECR, ALB/NLB.
- Containerize and operate vLLM/TGI/TensorRT-LLM with streaming, batching, KV-cache.
- Implement autoscaling (HPA/Cluster Autoscaler), blue/green & canary deploys.
- Codify everything with Terraform + Helm; wire CI/CD (GitHub Actions).
- Add observability: Prometheus/Grafana, CloudWatch, logs/traces, SLOs, on-call alerts.
- Optimize cost/perf: quantization (AWQ/GPTQ/FP8), spot strategies, node packing, caching.
- Secure secrets (KMS/Secrets Manager), private ECR, least-privilege IAM, image signing.
- Publish runbooks and incident playbooks; partner with research to productionize new models.
Minimum Qualifications:
- 4+ years DevOps/SRE/Platform or ML-infra experience with AWS (EC2/EKS/ECR/ALB/ASG/IAM/S3).
- Kubernetes, Helm, Docker expertise; Infrastructure-as-Code (Terraform) in production.
- Hands-on with at least one model server (vLLM or TGI); CUDA/NVIDIA drivers/NCCL basics.
- Python or Bash for tooling; CI/CD pipelines; strong debugging skills.
- Triton Inference Server, TensorRT-LLM, SageMaker endpoints.
Nice to have:
- Redis/RocksDB caches, MoE model serving, multi-region DR.
- Experience with Llama-3, Mistral, Qwen, Mixtral; GGUF/ONNX export for edge.
- Success in 30/60/90
30 days: Terraform’d EKS + first GPU endpoint (vLLM) with Grafana dashboard & runbook.
60 days: Autoscaling & canary deploys in place; first cost/perf report and savings plan.
90 days: Multi-model routing, SLOs with alerts, incident playbooks, perf regression tests.
Interview Process
Intro screen → Technical deep-dive → Team panel → Founder chat → Offer
Notes
Beware of fake consultant Don't pay any amount to anyone. Jobpana do not charge any amount from anyone.