NVIDIA/TensorRT-LLM/perf-torch-cuda-graphs
>- Apply CUDA Graphs to PyTorch workloads — API selection (torch.compile, PyTorch make_graphed_callables, TE make_graphed_callables, MCore CudaGraphManager, FullCudaGraphWrapper, m...
How to get this skill
Agent Skill by NVIDIA. Download or clone it, then install it in your agent.
Setup & Installation
- Clone the repository:
git clone https://github.com/NVIDIA/skills.git - Copy the skill folder (which contains
SKILL.md) into your agent skills folder, e.g..claude/skills/. - Restart or reload the agent to auto-discover the skill.
- Check
SKILL.mdfor any special instructions or requirements.
Related skills
NVIDIA/TensorRT-LLM/ad-accuracy-debug
> Debug AutoDeploy accuracy regressions vs a reference score (PyTorch backend or published baseline).
NVIDIA/TensorRT-LLM/ad-add-fusion-transformation
> Claude Code skill (trtllm-agent-toolkit): implement or extend TensorRT-LLM AutoDeploy fusion transforms under transform/library/ in a TensorRT-LLM checkout.
NVIDIA/TensorRT-LLM/ad-conf-check
> Check whether AutoDeploy YAML configs were actually applied by analyzing server logs and optionally graph dumps (AD_DUMP_GRAPHS_DIR).
NVIDIA/TensorRT-LLM/ad-graph-dump
> Enable and interpret TensorRT-LLM AutoDeploy FX graph text dumps via AD_DUMP_GRAPHS_DIR.