# TritonNvidiaGPUOps ### `triton_nvidia_gpu.alloc_mbarrier` (triton::nvidia_gpu::AllocMBarrierOp) _Allocate a vector of mbarriers_ Syntax: ``` operation ::= `triton_nvidia_gpu.alloc_mbarrier` attr-dict `:` type($result) ``` Allocate and initialize a vector of mbarriers. The size of the vector is implied in the returned type. Each mbarrier is initialized as: 1, the current phase initialized to 0. 2, the expected arrival count initialized to 'count'. 3, the pending arrival count initialized to 'count'. 4, the tx-count initialized to 0. Example: case a. when created in vector: %1 = triton_nvidia_gpu.alloc_mbarrier { count = 1 } : tensor<4xi64> case b. when created in scalar: %1 = triton_nvidia_gpu.alloc_mbarrier { count = 1 } : !tt.ptr Traits: VerifyTensorLayoutsTrait Interfaces: MemoryEffectOpInterface (MemoryEffectOpInterface) Effects: MemoryEffects::Effect{MemoryEffects::Allocate on ::mlir::SideEffects::DefaultResource} #### Attributes:
AttributeMLIR TypeDescription
count::mlir::IntegerAttr32-bit signless integer attribute
#### Results: | Result | Description | | :----: | ----------- | | `result` | ptr or tensor of 64-bit signless integer values ### `triton_nvidia_gpu.cluster_arrive` (triton::nvidia_gpu::ClusterArriveOp) Syntax: ``` operation ::= `triton_nvidia_gpu.cluster_arrive` attr-dict ``` Traits: VerifyTensorLayoutsTrait #### Attributes:
AttributeMLIR TypeDescription
relaxed::mlir::IntegerAttr1-bit signless integer attribute
### `triton_nvidia_gpu.cluster_wait` (triton::nvidia_gpu::ClusterWaitOp) Syntax: ``` operation ::= `triton_nvidia_gpu.cluster_wait` attr-dict ``` Traits: VerifyTensorLayoutsTrait ### `triton_nvidia_gpu.consumer_release` (triton::nvidia_gpu::ConsumerReleaseOp) Syntax: ``` operation ::= `triton_nvidia_gpu.consumer_release` $token `,` $idx attr-dict `:` type(operands) ``` Traits: VerifyTensorLayoutsTrait #### Operands: | Operand | Description | | :-----: | ----------- | | `token` | tensor of values | `idx` | 32-bit signless integer ### `triton_nvidia_gpu.consumer_wait` (triton::nvidia_gpu::ConsumerWaitOp) Syntax: ``` operation ::= `triton_nvidia_gpu.consumer_wait` $token `,` $idx attr-dict `:` type(operands) ``` Traits: VerifyTensorLayoutsTrait #### Operands: | Operand | Description | | :-----: | ----------- | | `token` | tensor of values | `idx` | 32-bit signless integer ### `triton_nvidia_gpu.create_mutex` (triton::nvidia_gpu::CreateMutexOp) Syntax: ``` operation ::= `triton_nvidia_gpu.create_mutex` attr-dict `:` type($result) ``` Traits: VerifyTensorLayoutsTrait Interfaces: InferTypeOpInterface #### Results: | Result | Description | | :----: | ----------- | | `result` | ### `triton_nvidia_gpu.create_token` (triton::nvidia_gpu::CreateTokenOp) Syntax: ``` operation ::= `triton_nvidia_gpu.create_token` attr-dict `:` type($result) ``` Traits: VerifyTensorLayoutsTrait #### Attributes:
AttributeMLIR TypeDescription
num::mlir::IntegerAttr32-bit signless integer attribute
#### Results: | Result | Description | | :----: | ----------- | | `result` | tensor of values ### `triton_nvidia_gpu.dot_async` (triton::nvidia_gpu::DotAsyncOp) _Dot async_ Syntax: ``` operation ::= `triton_nvidia_gpu.dot_async` $a`,` $b`,` $c attr-dict `:` type($a) `*` type($b) `->` type($d) ``` $d = matrix_multiply($a, $b) + $c Traits: AlwaysSpeculatableImplTrait, VerifyTensorLayoutsTrait Interfaces: ConditionallySpeculatable, InferTypeOpInterface, NoMemoryEffect (MemoryEffectOpInterface) Effects: MemoryEffects::Effect{} #### Attributes:
AttributeMLIR TypeDescription
allowTF32::mlir::BoolAttrbool attribute
maxNumImpreciseAcc::mlir::IntegerAttr32-bit signless integer attribute
#### Operands: | Operand | Description | | :-----: | ----------- | | `a` | tensor of floating-point values or tensor of integer values | `b` | tensor of floating-point values or tensor of integer values | `c` | tensor of floating-point values or tensor of integer values #### Results: | Result | Description | | :----: | ----------- | | `d` | tensor of floating-point values or tensor of integer values ### `triton_nvidia_gpu.dot_wait` (triton::nvidia_gpu::DotWaitOp) _Dot wait_ Syntax: ``` operation ::= `triton_nvidia_gpu.dot_wait` $inputs attr-dict `:` type($inputs) ``` This operation defining the waiting action for a async dot, MMAv3 .e.g. The subsequent operations should not execute until this operation completes waiting. Traits: VerifyTensorLayoutsTrait Interfaces: InferTypeOpInterface #### Attributes:
AttributeMLIR TypeDescription
pendings::mlir::IntegerAttr32-bit signless integer attribute
#### Operands: | Operand | Description | | :-----: | ----------- | | `inputs` | variadic of tensor of floating-point values or tensor of integer values #### Results: | Result | Description | | :----: | ----------- | | `outputs` | variadic of tensor of floating-point values or tensor of integer values ### `triton_nvidia_gpu.extract_mbarrier` (triton::nvidia_gpu::ExtractMBarrierOp) _Extract a mbarrier from a vector of mbarriers_ Syntax: ``` operation ::= `triton_nvidia_gpu.extract_mbarrier` $tensor `[` $index `]` attr-dict `:` type($tensor) `,` type($index) `->` type($result) ``` Extract a mbarrier from a vector of mbarriers Example: %1 = triton_nvidia_gpu.extract_mbarrier %mbarriers[%idx] : tensor<4xi64>, index -> !tt.ptr Traits: AlwaysSpeculatableImplTrait, VerifyTensorLayoutsTrait Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface) Effects: MemoryEffects::Effect{} #### Operands: | Operand | Description | | :-----: | ----------- | | `tensor` | tensor of 64-bit signless integer values | `index` | 32-bit signless integer #### Results: | Result | Description | | :----: | ----------- | | `result` | ptr ### `triton_nvidia_gpu.fence_async_shared` (triton::nvidia_gpu::FenceAsyncSharedOp) _Fence proxy async_ Syntax: ``` operation ::= `triton_nvidia_gpu.fence_async_shared` attr-dict ``` Traits: VerifyTensorLayoutsTrait #### Attributes:
AttributeMLIR TypeDescription
bCluster::mlir::BoolAttrbool attribute
### `triton_nvidia_gpu.get_agent_id` (triton::nvidia_gpu::GetAgentIdOp) Syntax: ``` operation ::= `triton_nvidia_gpu.get_agent_id` attr-dict `:` type($result) ``` Traits: AlwaysSpeculatableImplTrait, VerifyTensorLayoutsTrait Interfaces: ConditionallySpeculatable, InferTypeOpInterface, NoMemoryEffect (MemoryEffectOpInterface) Effects: MemoryEffects::Effect{} #### Results: | Result | Description | | :----: | ----------- | | `result` | 32-bit signless integer ### `triton_nvidia_gpu.get_canonical_warp_id` (triton::nvidia_gpu::GetCanonicalWarpId) Syntax: ``` operation ::= `triton_nvidia_gpu.get_canonical_warp_id` attr-dict `:` type($result) ``` Returns the one dimensional warpId when it's used for producing warp uniform values. Traits: AlwaysSpeculatableImplTrait, VerifyTensorLayoutsTrait Interfaces: ConditionallySpeculatable, InferTypeOpInterface, NoMemoryEffect (MemoryEffectOpInterface) Effects: MemoryEffects::Effect{} #### Results: | Result | Description | | :----: | ----------- | | `result` | 32-bit signless integer ### `triton_nvidia_gpu.get_cluster_cta_id` (triton::nvidia_gpu::GetClusterCTAIdOp) Syntax: ``` operation ::= `triton_nvidia_gpu.get_cluster_cta_id` attr-dict `:` type($result) ``` Returns the one dimensional cluster_cta_id. Traits: AlwaysSpeculatableImplTrait, VerifyTensorLayoutsTrait Interfaces: ConditionallySpeculatable, InferTypeOpInterface, NoMemoryEffect (MemoryEffectOpInterface) Effects: MemoryEffects::Effect{} #### Results: | Result | Description | | :----: | ----------- | | `result` | 32-bit signless integer ### `triton_nvidia_gpu.get_mutex_role_id` (triton::nvidia_gpu::GetMutexRoleIdOp) Syntax: ``` operation ::= `triton_nvidia_gpu.get_mutex_role_id` attr-dict `:` type($result) ``` Traits: VerifyTensorLayoutsTrait Interfaces: InferTypeOpInterface #### Attributes:
AttributeMLIR TypeDescription
num::mlir::IntegerAttr32-bit signless integer attribute
#### Results: | Result | Description | | :----: | ----------- | | `result` | 32-bit signless integer ### `triton_nvidia_gpu.get_thread_id` (triton::nvidia_gpu::GetThreadIdOp) Syntax: ``` operation ::= `triton_nvidia_gpu.get_thread_id` attr-dict `:` type($result) ``` Returns the one dimensional threadId. Traits: AlwaysSpeculatableImplTrait, VerifyTensorLayoutsTrait Interfaces: ConditionallySpeculatable, InferTypeOpInterface, NoMemoryEffect (MemoryEffectOpInterface) Effects: MemoryEffects::Effect{} #### Results: | Result | Description | | :----: | ----------- | | `result` | 32-bit signless integer ### `triton_nvidia_gpu.insert_slice_async_v2` (triton::nvidia_gpu::InsertSliceAsyncV2Op) Syntax: ``` operation ::= `triton_nvidia_gpu.insert_slice_async_v2` operands attr-dict `:` type(operands) `->` type($result) ``` Traits: AttrSizedOperandSegments, ResultsAreSharedEncoding, VerifyTensorLayoutsTrait Interfaces: MemoryEffectOpInterface (MemoryEffectOpInterface) Effects: MemoryEffects::Effect{MemoryEffects::Read on ::mlir::SideEffects::DefaultResource, MemoryEffects::Write on ::mlir::SideEffects::DefaultResource} #### Attributes:
AttributeMLIR TypeDescription
cache::mlir::triton::CacheModifierAttr
allowed 32-bit signless integer cases: 1, 2, 3, 4, 5, 6{{% markdown %}}Enum cases: * none (`NONE`) * ca (`CA`) * cg (`CG`) * wb (`WB`) * cs (`CS`) * wt (`WT`){{% /markdown %}}
evict::mlir::triton::EvictionPolicyAttr
allowed 32-bit signless integer cases: 1, 2, 3{{% markdown %}}Enum cases: * evict_normal (`NORMAL`) * evict_first (`EVICT_FIRST`) * evict_last (`EVICT_LAST`){{% /markdown %}}
isVolatile::mlir::BoolAttrbool attribute
axis::mlir::IntegerAttr32-bit signless integer attribute
#### Operands: | Operand | Description | | :-----: | ----------- | | `src` | ptr or tensor of ptr values | `dst` | tensor of floating-point values or tensor of integer values or tensor of ptr values | `index` | 32-bit signless integer | `mbar` | ptr | `mask` | tensor of 1-bit signless integer values or 1-bit signless integer | `other` | floating-point or tensor of floating-point values or integer or tensor of integer values or ptr or tensor of ptr values or ptr #### Results: | Result | Description | | :----: | ----------- | | `result` | tensor of floating-point values or tensor of integer values or tensor of ptr values ### `triton_nvidia_gpu.lock` (triton::nvidia_gpu::LockOp) Syntax: ``` operation ::= `triton_nvidia_gpu.lock` $mutex attr-dict `:` type(operands) ``` Traits: VerifyTensorLayoutsTrait #### Operands: | Operand | Description | | :-----: | ----------- | | `mutex` | ### `triton_nvidia_gpu.mbarrier_arrive` (triton::nvidia_gpu::MBarrierArriveOp) _Mbarrier arrive_ Syntax: ``` operation ::= `triton_nvidia_gpu.mbarrier_arrive` operands attr-dict `:` type(operands) ``` This operation defining the arriving action for a mbarrier. txCount: An optional attribute that set tx-count. This Op will be lowered into mbarrier.arrive.expect_tx if the optional attribute exist. trackAsyncOp: If true, this op will be lowered into cp.async.mbarrier.arrive.noinc. pred: Only perform arrive action when pred is true. remoteCtaId: if set, perform an remote arrive action. Example: triton_nvidia_gpu.mbarrier_arrive %0 {trackAsyncOp = false} : !tt.ptr Traits: AttrSizedOperandSegments, VerifyTensorLayoutsTrait Interfaces: MemoryEffectOpInterface (MemoryEffectOpInterface) Effects: MemoryEffects::Effect{MemoryEffects::Write on ::mlir::SideEffects::DefaultResource} #### Attributes:
AttributeMLIR TypeDescription
trackAsyncOp::mlir::IntegerAttr1-bit signless integer attribute
txCount::mlir::IntegerAttr32-bit signless integer attribute
#### Operands: | Operand | Description | | :-----: | ----------- | | `mbarrier` | ptr | `pred` | 1-bit signless integer | `remoteCtaId` | 32-bit signless integer ### `triton_nvidia_gpu.mbarrier_wait` (triton::nvidia_gpu::MBarrierWaitOp) _Mbarrier wait_ Syntax: ``` operation ::= `triton_nvidia_gpu.mbarrier_wait` $mbarrier `,` $phase attr-dict `:` type($mbarrier) ``` This operation defining the waiting action for a mbarrier. The subsequent operations should not execute until this operation completes waiting. Example: triton_nvidia_gpu.mbarrier_wait %0, %1 : !tt.ptr Traits: VerifyTensorLayoutsTrait Interfaces: MemoryEffectOpInterface (MemoryEffectOpInterface) Effects: MemoryEffects::Effect{MemoryEffects::Read on ::mlir::SideEffects::DefaultResource, MemoryEffects::Write on ::mlir::SideEffects::DefaultResource} #### Operands: | Operand | Description | | :-----: | ----------- | | `mbarrier` | ptr | `phase` | 1-bit signless integer ### `triton_nvidia_gpu.bar_arrive` (triton::nvidia_gpu::NamedBarrierArriveOp) _Named barrier arrive_ Syntax: ``` operation ::= `triton_nvidia_gpu.bar_arrive` $bar `,` $numThreads attr-dict `:` type(operands) ``` Traits: VerifyTensorLayoutsTrait #### Operands: | Operand | Description | | :-----: | ----------- | | `bar` | 32-bit signless integer | `numThreads` | 32-bit signless integer ### `triton_nvidia_gpu.bar_wait` (triton::nvidia_gpu::NamedBarrierWaitOp) _Named barrier wait_ Syntax: ``` operation ::= `triton_nvidia_gpu.bar_wait` $bar `,` $numThreads attr-dict `:` type(operands) ``` Traits: VerifyTensorLayoutsTrait #### Operands: | Operand | Description | | :-----: | ----------- | | `bar` | 32-bit signless integer | `numThreads` | 32-bit signless integer ### `triton_nvidia_gpu.producer_acquire` (triton::nvidia_gpu::ProducerAcquireOp) Syntax: ``` operation ::= `triton_nvidia_gpu.producer_acquire` $token `,` $idx attr-dict `:` type(operands) ``` Traits: VerifyTensorLayoutsTrait #### Operands: | Operand | Description | | :-----: | ----------- | | `token` | tensor of values | `idx` | 32-bit signless integer ### `triton_nvidia_gpu.producer_commit` (triton::nvidia_gpu::ProducerCommitOp) Syntax: ``` operation ::= `triton_nvidia_gpu.producer_commit` $token `,` $idx attr-dict `:` type(operands) ``` Traits: VerifyTensorLayoutsTrait #### Operands: | Operand | Description | | :-----: | ----------- | | `token` | tensor of values | `idx` | 32-bit signless integer ### `triton_nvidia_gpu.reg_alloc` (triton::nvidia_gpu::RegAllocOp) _Register allocation_ Syntax: ``` operation ::= `triton_nvidia_gpu.reg_alloc` $regCount attr-dict ``` Traits: VerifyTensorLayoutsTrait #### Attributes:
AttributeMLIR TypeDescription
regCount::mlir::IntegerAttr32-bit signless integer attribute
### `triton_nvidia_gpu.reg_dealloc` (triton::nvidia_gpu::RegDeallocOp) _Register deallocation_ Syntax: ``` operation ::= `triton_nvidia_gpu.reg_dealloc` $regCount attr-dict ``` Traits: VerifyTensorLayoutsTrait #### Attributes:
AttributeMLIR TypeDescription
regCount::mlir::IntegerAttr32-bit signless integer attribute
### `triton_nvidia_gpu.store_async` (triton::nvidia_gpu::StoreAsyncOp) _Store asynchronous by a tensor pointer_ Syntax: ``` operation ::= `triton_nvidia_gpu.store_async` operands attr-dict `:` type(operands) ``` Traits: VerifyTensorLayoutsTrait Interfaces: MemoryEffectOpInterface (MemoryEffectOpInterface) Effects: MemoryEffects::Effect{MemoryEffects::Write on ::mlir::SideEffects::DefaultResource} #### Attributes:
AttributeMLIR TypeDescription
cache::mlir::triton::CacheModifierAttr
allowed 32-bit signless integer cases: 1, 2, 3, 4, 5, 6{{% markdown %}}Enum cases: * none (`NONE`) * ca (`CA`) * cg (`CG`) * wb (`WB`) * cs (`CS`) * wt (`WT`){{% /markdown %}}
#### Operands: | Operand | Description | | :-----: | ----------- | | `dst` | ptr | `src` | tensor of floating-point values or tensor of integer values or tensor of ptr values ### `triton_nvidia_gpu.unlock` (triton::nvidia_gpu::UnlockOp) Syntax: ``` operation ::= `triton_nvidia_gpu.unlock` $mutex attr-dict `:` type(operands) ``` Traits: VerifyTensorLayoutsTrait #### Operands: | Operand | Description | | :-----: | ----------- | | `mutex` |