Compensation $100,000-150,000 share of equity pool other benefits listed below About Us Oh is pioneering hyper-realistic, uncensored AI-driven content through advanced multi-modal language models, focusing on seamless, real-time experiences for a wide audience. We specialize in building and refining custom AI models to power highly personalized user interactions at scale, leveraging the latest open-source innovations, and continuously optimizing for performance and reliability. Job Overview As the Lead LLM Development Engineer, you’ll own the technical roadmap and development of Oh’s Llama 3.1 70B model, as well as the integration of other open-source models. You’ll oversee all aspects of model fine-tuning, optimization, deployment, safety, and resource management on platforms such as Runpod GPU clusters. This role demands technical depth in LLM architecture, dataset management, and efficient deployment strategies for real-time user interaction across text, audio, and image modalities. Key Responsibilities • LLM Fine-Tuning and Optimization : Fine-tune and optimize the Llama 3.1 70B model using custom and synthetic datasets, ensuring model accuracy, responsiveness, and scalability. • Deployment on GPU Infrastructure : Develop and implement scalable, memory-efficient model deployments on platforms like Runpod, managing GPU resources effectively. • Cross-Model Integration : Adapt and implement other open-source models for additional capabilities, such as image prompt generation, safety filtering, and reasoning improvements. • Dataset Curation and Synthetic Data Generation : Curate, clean, and generate datasets for training and refining models, with a focus on enhancing accuracy, relevance, and diversity in interactions. • Real-Time System Integration : Optimize model architectures and deployment for low-latency, real-time performance in text and audio outputs, ensuring high-quality, fluid user experiences. • Safety and Moderation Controls : Embed robust safety and moderation mechanisms within the models to manage content responsibly, particularly in the uncensored context. • Performance Monitoring and Diagnostics : Establish and maintain tools for monitoring model performance, troubleshooting, and ensuring reliability and continuity in production environments. • Documentation and Knowledge Sharing : Document all model processes, tuning workflows, deployment protocols, and troubleshooting methods for team reference and future scaling. Technical Skills & Requirements • Code Quality : A solid background in software engineering is essential, particularly in terms of code architecture, optimisation, clarity, and testing. • Programming Languages : Strong proficiency in Python is required, particularly for model training, fine-tuning, and pipeline development. Familiarity with Bash for scripting and automation is also valuable. • Deep Learning Frameworks : Extensive experience with PyTorch, including Hugging Face’s key libraries (e.g., Transformers, Diffusers, PEFT, Accelerate) for managing and fine-tuning large models such as Llama 3.1. • Deployment and Resource Management : • Experience deploying large models on GPU platforms (e.g., Runpod, AWS EC2 with GPU instances, or other GPU cloud providers). • Proficiency in CUDA for optimizing GPU performance and troubleshooting resource allocation. • Familiarity with containerization and orchestration tools (e.g., Docker, Kubernetes) for scalable deployments. • Data Handling and Preprocessing : Strong skills in data wrangling, curation, and preprocessing for NLP tasks, including libraries like Pandas and Dask for efficient data manipulation. Experience with synthetic data generation techniques is highly desirable. • Real-Time System Optimization : Proven expertise in optimizing models for real-time, low-latency response, particularly for applications in conversational AI, including techniques like mixed-precision training (e.g., FP16) and model quantization. • Memory and Compute Efficiency : Familiarity with techniques for model optimization and compression, such as pruning, distillation, and batching strategies to maximize memory usage and manage compute constraints on large LLMs. • Safety and Moderation : Practical experience embedding safety layers in LLMs, including custom token filters, controlled response generation, and integration of ethical AI principles for content moderation. • Monitoring and Diagnostics : Proficiency with monitoring and diagnostics tools such as Prometheus, Grafana, and logging frameworks (e.g., ELK stack) to maintain high uptime and system health. Preferred Qualifications • Experience with Open-Source Model Adaptation : Prior experience fine-tuning and integrating other open-source models for a variety of applications, such as conversational AI, image generation and reasoning. • Cloud Automation : Experience in managing CI/CD pipelines (e.g., GitLab CI, Jenkins) for deploying and maintaining models in production. • Data Management and Versioning : Familiarity with data versioning tools like DVC (Data Version Control) and Git for model and data versioning. • Model Monitoring : Knowledge of model monitoring tools (e.g., Weights and Biases,, TensorBoard) and an understanding of model decay for proactive tuning and updates. What We Offer • Competitive Compensation : Competitive salary and benefits of capable engineers • Equity pool : A proportion of the company’s equity pool • Flexible Work : Options for remote work and flexible hours • Growth & Leadership : Opportunities for rapid career growth, technical leadership, and specialization. • Innovative Environment : Join a cutting-edge team working on some of the most advanced uncensored and companionship AI applications in an exciting and emerging field.