Models
Deploy and scale models on your GPU infrastructure of choice with NVIDIA NIM inference microservicesFeatured Models
nvidiaparakeet-1.1b-rnnt-multilingual-asr
High accuracy and optimized performance for transcription in 25 languages
black-forest-labsFLUX.1-dev
FLUX.1 is a state-of-the-art suite of image generation models
nvidiallama-3.1-nemotron-ultra-253b-v1
Superior inference efficiency with highest accuracy for scientific and complex math reasoning, coding, tool calling, and instruction following.
metallama-4-maverick-17b-128e-instruct
A general purpose multimodal, multilingual 128 MoE model with 17B parameters.
metallama-4-scout-17b-16e-instruct
A multimodal, multilingual 16 MoE model with 17B parameters.
nvidiacosmos-predict1-7b
Generates physics-aware video world states from text and image prompts for physical AI development.
nvidiacosmos-predict1-5b
Generates future frames of a physics-aware world state based on simply an image or short video prompt for physical AI development.
nvidiasparsedrive
End-to-end autonomous driving stack integrating perception, prediction, and planning with sparse scene representations for efficiency and safety.
nvidiallama-3.3-nemotron-super-49b-v1
High efficiency model with leading accuracy for reasoning, tool calling, chat, and instruction following.
nvidiallama-3.1-nemotron-nano-8b-v1
Leading reasoning and agentic AI accuracy model for PC and edge.
nvidiamagpie-tts-multilingual
Natural and expressive voices in multiple languages. For voice agents and brand ambassadors.
nvidianv-embedcode-7b-v1
The NV-EmbedCode model is a 7B Mistral-based embedding model optimized for code retrieval, supporting text, code, and hybrid queries.
deepseek-aideepseek-r1-distill-llama-8b
Distilled version of Llama 3.1 8B using reasoning data generated by DeepSeek R1 for enhanced performance.
nvidianemoretriever-table-structure-v1
Model for object detection, fine-tuned to detect charts, tables, and titles in documents.
nvidianemoretriever-graphic-elements-v1
Model for object detection, fine-tuned to detect charts, tables, and titles in documents.
nvidianemoretriever-page-elements-v2
Model for object detection, fine-tuned to detect charts, tables, and titles in documents.
colabfoldmsa-search
Generates a multiple sequence alignment from a query sequence and a protein sequence database search.
googlegemma-3-27b-it
Cutting-edge open multimodal model exceling in high-quality reasoning from images.
googlegemma-3-1b-it
A lightweight, multilingual, advanced SLM text model for edge computing, resource constraint applications
nvidianemoretriever-parse
Cutting-edge vision-language model exceling in retrieving text and metadata from images.
deepseek-aideepseek-r1-distill-qwen-32b
Distilled version of Qwen 2.5 32B using reasoning data generated by DeepSeek R1 for enhanced performance.
deepseek-aideepseek-r1-distill-qwen-14b
Distilled version of Qwen 2.5 14B using reasoning data generated by DeepSeek R1 for enhanced performance.
deepseek-aideepseek-r1-distill-qwen-7b
Distilled version of Qwen 2.5 7B using reasoning data generated by DeepSeek R1 for enhanced performance.
microsoftphi-4-mini-instruct
Lightweight multilingual LLM powering AI applications in latency bound, memory/compute constrained environments
microsoftphi-4-multimodal-instruct
Cutting-edge open multimodal model exceling in high-quality reasoning from image and audio inputs.
openaiwhisper-large-v3
Robust Speech Recognition via Large-Scale Weak Supervision.
nvidiacanary-1b-asr
Multi-lingual model supporting speech-to-text recognition and translation.
nvidiacanary-0.6b-turbo-asr
Multi-lingual model supporting speech-to-text recognition and translation.
mistralaimistral-small-24b-instruct
Latency-optimized language model excelling in code, math, general knowledge, and instruction-following.
deepseek-aideepseek-r1
State-of-the-art, high-efficiency LLM excelling in reasoning, math, and coding.
nvidiallama-3.1-nemoguard-8b-topic-control
Topic control model to keep conversations focused on approved topics, avoiding inappropriate content.
nvidianemoguard-jailbreak-detect
Industry leading jailbreak classification model for protection from adversarial attempts
nvidiallama-3.1-nemoguard-8b-content-safety
Leading content safety model for enhancing the safety and moderation capabilities of LLMs
igeniuscolosseum_355b_instruct_16k
NVIDIA DGX Cloud trained multilingual LLM designed for mission critical use cases in regulated industries including financial services, government, heavy industry
tiiuaefalcon3-7b-instruct
Instruction tuned LLM achieving SoTA performance on reasoning, math and general knowledge capabilities
igeniusitalia_10b_instruct_16k
Multilingual LLM with emphasis on European languages supporting regulated use cases including financial services, government, heavy industry
qwenqwen2.5-7b-instruct
Chinese and English LLM targeting for language, coding, mathematics, reasoning, etc.
nvidiacosmos-nemotron-34b
Multi-modal vision-language model that understands text/img/video and creates informative responses
qwenqwen2.5-coder-32b-instruct
Advanced LLM for code generation, reasoning, and fixing across popular programming languages.
qwenqwen2.5-coder-7b-instruct
Powerful mid-size code model with a 32K context length, excelling in coding in multiple languages.
writerpalmyra-creative-122b
Powerful LLM designed for creative thinking and writing.
nvidiallama-3.2-nv-embedqa-1b-v2
Multilingual and cross-lingual text question-answering retrieval with long context support and optimized data storage efficiency.
nvidiallama-3.2-nv-rerankqa-1b-v2
Fine-tuned reranking model for multilingual, cross-lingual text question-answering retrieval, with long context support.
metallama-3.3-70b-instruct
Advanced LLM for reasoning, math, general knowledge, and function calling
university-at-buffalocached
Context-aware chart extraction that can detect 18 classes for chart basic elements, excluding plot elements.
nvidianv-yolox-page-elements-v1
Model for object detection, fine-tuned to detect charts, tables, and titles in documents.
nvidiaaudio2face-3d
Converts streamed audio to facial blendshapes for realtime lipsyncing and facial performances.
nvidiaconformer-ctc-asr
Automatic speech recognition model that transcribes speech in lower case English with record-setting accuracy and performance
nvidiafourcastnet
FourCastNet predicts global atmospheric dynamics of various weather / climate variables.
hivedeepfake-image-detection
Advanced AI model detects faces and identifies deep fake images.
nvidianemotron-4-mini-hindi-4b-instruct
A bilingual Hindi-English SLM for on-device inference, tailored specifically for Hindi Language.
ibmgranite-guardian-3.0-8b
Detects jailbreaking, bias, violence, profanity, sexual content, and unethical behavior
ibmgranite-3.0-8b-instruct
Advanced Small Language Model supporting RAG, summarization, classification, code, and agentic AI
ibmgranite-3.0-3b-a800m-instruct
Highly efficient Mixture of Experts model for RAG, summarization, entity extraction, and classification
nvidiallama-3.1-nemotron-70b-instruct
Llama-3.1-Nemotron-70B-Instruct is a large language model customized by NVIDIA in order to improve the helpfulness of LLM generated responses.
zyphrazamba2-7b-instruct
Efficient hybrid state-space model designed for conversational and reasoning tasks.
institute-of-science-tokyollama-3.1-swallow-70b-instruct-v0.1
Sovereign AI model trained on Japanese language that understands regional nuances.
institute-of-science-tokyollama-3.1-swallow-8b-instruct-v0.1
Sovereign AI model trained on Japanese language that understands regional nuances.
nvidiastudiovoice
Enhance speech by correcting common audio degradations to create studio quality speech output.
nvidiamistral-nemo-minitron-8b-8k-instruct
State-of-the-art small language model delivering superior accuracy for chatbot, virtual assistants, and content generation.
nvidiallama-3.1-nemotron-70b-reward
Leaderboard topping reward model supporting RLHF for better alignment with human preferences.
metallama-3.2-3b-instruct
Advanced state-of-the-art small language model with language understanding, superior reasoning, and text generation.
metallama-3.2-11b-vision-instruct
Cutting-edge vision-language model exceling in high-quality reasoning from images.
metallama-3.2-90b-vision-instruct
Cutting-edge vision-Language model exceling in high-quality reasoning from images.
metallama-3.2-1b-instruct
Advanced state-of-the-art small language model with language understanding, superior reasoning, and text generation.
nvidiallama-3.1-nemotron-51b-instruct
Unique language model that delivers an unmatched accuracy-efficiency performance.
qwenqwen2-7b-instruct
Chinese and English LLM targeting for language, coding, mathematics, reasoning, etc.
abacusaidracarys-llama-3.1-70b-instruct
Fine-tuned Llama 3.1 70B model for code generation, summarization, and multi-language tasks.
deepmindalphafold2-multimer
Predicts the 3D structure of a protein from its amino acid sequence.
nvidiaconsistory
Generates consistent characters across a series of images without requiring additional training.
hiveai-generated-image-detection
Robust image classification model for detecting and managing AI-generated content.
deepmindalphafold2
Predicts the 3D structure of a protein from its amino acid sequence.
yentinglinllama-3-taiwan-70b-instruct
Sovereign AI model finetuned on Traditional Mandarin and English data using the Llama-3 architecture.
tokyotech-llmllama-3-swallow-70b-instruct-v0.1
Sovereign AI model trained on Japanese language that understands regional nuances.
microsoftphi-3.5-vision-instruct
Cutting-edge open multimodal model exceling in high-quality reasoning from images.
ai21labsjamba-1.5-mini-instruct
Cutting-edge MOE based LLM designed to excel in a wide array of generative AI tasks.
ai21labsjamba-1.5-large-instruct
Cutting-edge MOE based LLM designed to excel in a wide array of generative AI tasks.
nvidianemotron-mini-4b-instruct
Optimized SLM for on-device inference and fine-tuned for roleplay, RAG and function calling
nvidiamistral-nemo-minitron-8b-base
State-of-the-art small language model delivering superior accuracy for chatbot, virtual assistants, and content generation.
microsoftphi-3.5-moe-instruct
Advanced LLM based on Mixture of Experts architecure to deliver compute efficient content generation
microsoftphi-3.5-mini-instruct
Lightweight multilingual LLM powering AI applications in latency bound, memory/compute constrained environments
rakutenrakutenai-7b-instruct
Advanced state-of-the-art LLM with language understanding, superior reasoning, and text generation.
rakutenrakutenai-7b-chat
Advanced state-of-the-art LLM with language understanding, superior reasoning, and text generation.
nvidianv-grounding-dino
Grounding dino is an open vocabulary zero-shot object detection model.
nvidiaradtts-hifigan-tts
Natural, high-fidelity, English voices for personalizing text-to-speech services and voiceovers
nvidiamegatron-1b-nmt
Enable smooth global interactions in 36 languages.
nvidiafastpitch-hifigan-tts
Expressive and engaging English voices for Q&A assistants, brand ambassadors, and service robots
nvidiaparakeet-ctc-1.1b-asr
Record-setting accuracy and performance for English transcription.
nvidiaparakeet-ctc-0.6b-asr
State-of-the-art accuracy and speed for English transcriptions.
ipdproteinmpnn
ProteinMPNN is a deep learning model for predicting amino acid sequences for protein backbones.
microsoftflorence-2
Vision foundation model capable of performing diverse computer vision and vision language tasks.
writerpalmyra-fin-70b-32k
Specialized LLM for financial analysis, reporting, and data processing
googleshieldgemma-9b
Guardrail model to ensure that responses from LLMs are appropriate and safe
googlegemma-2-2b-it
Advanced small language generative AI model for edge applications
GettyImagesedify-image
Getty Images’ API service for 4K image generation. Trained on NVIDIA Edify using Getty Images' commercially safe creative libraries.
nvidiaeyecontact
Estimate gaze angles of a person in a video and redirect to make it frontal.
nvidiaaudio2face-2d
Create facial animations using a portrait photo and synchronize mouth movement with audio.
nvidiausdvalidate
Verify compatibility of OpenUSD assets with instant RTX render and rule-based validation.
thudmchatglm3-6b
Supports Chinese and English languages to handle tasks including chatbot, content generation, coding, and translation.
mistralaimamba-codestral-7b-v0.1
Model for writing and interacting with code across a wide range of programming languages and tasks.
baichuan-incbaichuan2-13b-chat
Support Chinese and English chat, coding, math, instruction following, solving quizzes
metallama-3.1-405b-instruct
Advanced LLM for synthetic data generation, distillation, and inference for chatbots, coding, and domain-specific tasks.
metallama-3.1-70b-instruct
Powers complex conversations with superior contextual understanding, reasoning and text generation.
metallama-3.1-8b-instruct
Advanced state-of-the-art model with language understanding, superior reasoning, and text generation.
nv-mistralaimistral-nemo-12b-instruct
Most advanced language model for reasoning, code, multilingual tasks; runs on a single GPU.
nvidianv-rerankqa-mistral-4b-v3
Multilingual text reranking model.
nvidianv-embedqa-e5-v5
English text embedding model for question-answering retrieval.
nvidianv-embedqa-mistral-7b-v2
Multilingual text question-answering retrieval, transforming textual information into dense vector representations.
microsoftphi-3-medium-128k-instruct
Cutting-edge lightweight open language model exceling in high-quality reasoning.
bigcodestarcoder2-7b
Advanced programming model for code completion, summarization, and generation
bigcodestarcoder2-15b
Advanced programming model for code completion, summarization, and generation
googlegemma-2-27b-it
Cutting-edge text generation model text understanding, transformation, and code generation.
googlegemma-2-9b-it
Cutting-edge text generation model text understanding, transformation, and code generation.
nvidiallama3-chatqa-1.5-70b
Advanced LLM to generate high-quality, context-aware responses for chatbots and search engines.
nvidiallama3-chatqa-1.5-8b
Advanced LLM to generate high-quality, context-aware responses for chatbots and search engines.
nvidianemotron-4-340b-reward
Grades responses on five attributes helpfulness, correctness, coherence, complexity and verbosity.
nvidianemotron-4-340b-instruct
Creates diverse synthetic data that mimics the characteristics of real-world data.
mistralaimistral-7b-instruct-v0.3
This LLM follows instructions, completes requests, and generates creative text.
stabilityaistable-diffusion-3-medium
Advanced text-to-image model for generating high quality images
writerpalmyra-med-70b-32k
Leading LLM for accurate, contextually relevant responses in the medical domain.
writerpalmyra-med-70b
Leading LLM for accurate, contextually relevant responses in the medical domain.
nvidianv-embed-v1
Generates high-quality numerical embeddings from text inputs.
upstagesolar-10.7b-instruct
Excels in NLP tasks, particularly in instruction-following, reasoning, and mathematics.
mediatekbreeze-7b-instruct
LLM for improved language comprehension and chatbot-oriented capabilities in Traditional Chinese.
nvidiavisual-changenet
Visual Changenet detects pixel-level change maps between two images and outputs a semantic change segmentation mask
googlecodegemma-1.1-7b
Advanced programming model for code generation, completion, reasoning, and instruction following.
ibmgranite-34b-code-instruct
Software programming LLM for code generation, completion, explanation, and multi-turn conversion.
ibmgranite-8b-code-instruct
Software programming LLM for code generation, completion, explanation, and multi-turn conversion.
nvidiaretail-object-detection
EfficientDet-based object detection network to detect 100 specific retail objects from an input video.
ipdrfdiffusion
A generative model of protein backbones for protein binder design.
microsoftphi-3-small-8k-instruct
Cutting-edge lightweight open language model exceling in high-quality reasoning.
microsoftphi-3-small-128k-instruct
Long context cutting-edge lightweight open language model exceling in high-quality reasoning.
microsoftphi-3-medium-4k-instruct
Cutting-edge lightweight open language model exceling in high-quality reasoning.
microsoftphi-3-vision-128k-instruct
Cutting-edge open multimodal model exceling in high-quality reasoning from images.
aisingaporesea-lion-7b-instruct
LLM to represent and serve the linguistic and cultural diversity of Southeast Asia
microsoftphi-3-mini-4k-instruct
Lightweight, state-of-the-art open LLM with strong math and logical reasoning skills.
databricksdbrx-instruct
A general-purpose LLM with state-of-the-art performance in language understanding, coding, and RAG.
snowflakearctic-embed-l
Optimized community model for text embedding.
microsoftphi-3-mini-128k-instruct
Lightweight, state-of-the-art open LLM with strong math and logical reasoning skills.
mistralaimixtral-8x22b-instruct-v0.1
An MOE LLM that follows instructions, completes requests, and generates creative text.
metallama3-70b-instruct
Powers complex conversations with superior contextual understanding, reasoning and text generation.
metallama3-8b-instruct
Advanced state-of-the-art LLM with language understanding, superior reasoning, and text generation.
googlerecurrentgemma-2b
Novel recurrent architecture based language model for faster inference when generating long sequences.
googlecodegemma-7b
Cutting-edge model built on Google's Gemma-7B specialized for code generation and code completion.
nvidiaembed-qa-4
GPU-accelerated generation of text embeddings used for question-answering retrieval.
nvidiarerank-qa-mistral-4b
GPU-accelerated model optimized for providing a probability score that a given passage contains the information to answer a question.
stabilityaistable-diffusion-xl
Generate images and stunning visuals with realistic aesthetics.
mistralaimistral-7b-instruct-v0.2
This LLM follows instructions, completes requests, and generates creative text.
nvidiadeepvariant
Run Google's DeepVariant optimized for GPU. Switch models for high accuracy on all major sequencers.
stabilityaistable-video-diffusion
Stable Video Diffusion (SVD) is a generative diffusion model that leverages a single image as a conditioning frame to synthesize video sequences.
stabilityaisdxl-turbo
A fast generative text-to-image model that can synthesize photorealistic images from a text prompt in a single network evaluation
mistralaimixtral-8x7b-instruct-v0.1
An MOE LLM that follows instructions, completes requests, and generates creative text.