TritonNvidiaGPUOps¶
triton_nvidia_gpu.alloc_mbarrier (triton::nvidia_gpu::AllocMBarrierOp)¶
Allocate a vector of mbarriers
Syntax:
operation ::= `triton_nvidia_gpu.alloc_mbarrier` attr-dict `:` type($result)
Allocate and initialize a vector of mbarriers. The size of the vector is implied in the returned type. Each mbarrier is initialized as: 1, the current phase initialized to 0. 2, the expected arrival count initialized to ‘count’. 3, the pending arrival count initialized to ‘count’. 4, the tx-count initialized to 0.
Example:
case a. when created in vector:
case b. when created in scalar:
Traits: VerifyTensorLayoutsTrait
Interfaces: MemoryEffectOpInterface (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{MemoryEffects::Allocate on ::mlir::SideEffects::DefaultResource}
Attributes:¶
| Attribute | MLIR Type | Description |
|---|---|---|
count | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Results:¶
Result |
Description |
|---|---|
|
ptr or tensor of 64-bit signless integer values |
triton_nvidia_gpu.cluster_arrive (triton::nvidia_gpu::ClusterArriveOp)¶
Syntax:
operation ::= `triton_nvidia_gpu.cluster_arrive` attr-dict
Traits: VerifyTensorLayoutsTrait
Attributes:¶
| Attribute | MLIR Type | Description |
|---|---|---|
relaxed | ::mlir::IntegerAttr | 1-bit signless integer attribute |
triton_nvidia_gpu.cluster_wait (triton::nvidia_gpu::ClusterWaitOp)¶
Syntax:
operation ::= `triton_nvidia_gpu.cluster_wait` attr-dict
Traits: VerifyTensorLayoutsTrait
triton_nvidia_gpu.consumer_release (triton::nvidia_gpu::ConsumerReleaseOp)¶
Syntax:
operation ::= `triton_nvidia_gpu.consumer_release` $token `,` $idx attr-dict `:` type(operands)
Traits: VerifyTensorLayoutsTrait
Operands:¶
Operand |
Description |
|---|---|
|
tensor of values |
|
32-bit signless integer |
triton_nvidia_gpu.consumer_wait (triton::nvidia_gpu::ConsumerWaitOp)¶
Syntax:
operation ::= `triton_nvidia_gpu.consumer_wait` $token `,` $idx attr-dict `:` type(operands)
Traits: VerifyTensorLayoutsTrait
Operands:¶
Operand |
Description |
|---|---|
|
tensor of values |
|
32-bit signless integer |
triton_nvidia_gpu.create_mutex (triton::nvidia_gpu::CreateMutexOp)¶
Syntax:
operation ::= `triton_nvidia_gpu.create_mutex` attr-dict `:` type($result)
Traits: VerifyTensorLayoutsTrait
Interfaces: InferTypeOpInterface
Results:¶
Result |
Description |
|---|---|
|
triton_nvidia_gpu.create_token (triton::nvidia_gpu::CreateTokenOp)¶
Syntax:
operation ::= `triton_nvidia_gpu.create_token` attr-dict `:` type($result)
Traits: VerifyTensorLayoutsTrait
Attributes:¶
| Attribute | MLIR Type | Description |
|---|---|---|
num | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Results:¶
Result |
Description |
|---|---|
|
tensor of values |
triton_nvidia_gpu.dot_async (triton::nvidia_gpu::DotAsyncOp)¶
Dot async
Syntax:
operation ::= `triton_nvidia_gpu.dot_async` $a`,` $b`,` $c attr-dict `:` type($a) `*` type($b) `->` type($d)
$d = matrix_multiply($a, $b) + $c
Traits: AlwaysSpeculatableImplTrait, VerifyTensorLayoutsTrait
Interfaces: ConditionallySpeculatable, InferTypeOpInterface, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes:¶
| Attribute | MLIR Type | Description |
|---|---|---|
allowTF32 | ::mlir::BoolAttr | bool attribute |
maxNumImpreciseAcc | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands:¶
Operand |
Description |
|---|---|
|
tensor of floating-point values or tensor of integer values |
|
tensor of floating-point values or tensor of integer values |
|
tensor of floating-point values or tensor of integer values |
Results:¶
Result |
Description |
|---|---|
|
tensor of floating-point values or tensor of integer values |
triton_nvidia_gpu.dot_wait (triton::nvidia_gpu::DotWaitOp)¶
Dot wait
Syntax:
operation ::= `triton_nvidia_gpu.dot_wait` $inputs attr-dict `:` type($inputs)
This operation defining the waiting action for a async dot, MMAv3 .e.g. The subsequent operations should not execute until this operation completes waiting.
Traits: VerifyTensorLayoutsTrait
Interfaces: InferTypeOpInterface
Attributes:¶
| Attribute | MLIR Type | Description |
|---|---|---|
pendings | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands:¶
Operand |
Description |
|---|---|
|
variadic of tensor of floating-point values or tensor of integer values |
Results:¶
Result |
Description |
|---|---|
|
variadic of tensor of floating-point values or tensor of integer values |
triton_nvidia_gpu.extract_mbarrier (triton::nvidia_gpu::ExtractMBarrierOp)¶
Extract a mbarrier from a vector of mbarriers
Syntax:
operation ::= `triton_nvidia_gpu.extract_mbarrier` $tensor `[` $index `]` attr-dict `:` type($tensor) `,` type($index) `->` type($result)
Extract a mbarrier from a vector of mbarriers
Example:
Traits: AlwaysSpeculatableImplTrait, VerifyTensorLayoutsTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Operands:¶
Operand |
Description |
|---|---|
|
tensor of 64-bit signless integer values |
|
32-bit signless integer |
Results:¶
Result |
Description |
|---|---|
|
ptr |
triton_nvidia_gpu.get_agent_id (triton::nvidia_gpu::GetAgentIdOp)¶
Syntax:
operation ::= `triton_nvidia_gpu.get_agent_id` attr-dict `:` type($result)
Traits: AlwaysSpeculatableImplTrait, VerifyTensorLayoutsTrait
Interfaces: ConditionallySpeculatable, InferTypeOpInterface, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Results:¶
Result |
Description |
|---|---|
|
32-bit signless integer |
triton_nvidia_gpu.get_canonical_warp_id (triton::nvidia_gpu::GetCanonicalWarpId)¶
Syntax:
operation ::= `triton_nvidia_gpu.get_canonical_warp_id` attr-dict `:` type($result)
Returns the one dimensional warpId when it’s used for producing warp uniform values.
Traits: AlwaysSpeculatableImplTrait, VerifyTensorLayoutsTrait
Interfaces: ConditionallySpeculatable, InferTypeOpInterface, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Results:¶
Result |
Description |
|---|---|
|
32-bit signless integer |
triton_nvidia_gpu.get_cluster_cta_id (triton::nvidia_gpu::GetClusterCTAIdOp)¶
Syntax:
operation ::= `triton_nvidia_gpu.get_cluster_cta_id` attr-dict `:` type($result)
Returns the one dimensional cluster_cta_id.
Traits: AlwaysSpeculatableImplTrait, VerifyTensorLayoutsTrait
Interfaces: ConditionallySpeculatable, InferTypeOpInterface, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Results:¶
Result |
Description |
|---|---|
|
32-bit signless integer |
triton_nvidia_gpu.get_mutex_role_id (triton::nvidia_gpu::GetMutexRoleIdOp)¶
Syntax:
operation ::= `triton_nvidia_gpu.get_mutex_role_id` attr-dict `:` type($result)
Traits: VerifyTensorLayoutsTrait
Interfaces: InferTypeOpInterface
Attributes:¶
| Attribute | MLIR Type | Description |
|---|---|---|
num | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Results:¶
Result |
Description |
|---|---|
|
32-bit signless integer |
triton_nvidia_gpu.get_thread_id (triton::nvidia_gpu::GetThreadIdOp)¶
Syntax:
operation ::= `triton_nvidia_gpu.get_thread_id` attr-dict `:` type($result)
Returns the one dimensional threadId.
Traits: AlwaysSpeculatableImplTrait, VerifyTensorLayoutsTrait
Interfaces: ConditionallySpeculatable, InferTypeOpInterface, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Results:¶
Result |
Description |
|---|---|
|
32-bit signless integer |
triton_nvidia_gpu.insert_slice_async_v2 (triton::nvidia_gpu::InsertSliceAsyncV2Op)¶
Syntax:
operation ::= `triton_nvidia_gpu.insert_slice_async_v2` operands attr-dict `:` type(operands) `->` type($result)
Traits: AttrSizedOperandSegments, ResultsAreSharedEncoding, VerifyTensorLayoutsTrait
Interfaces: MemoryEffectOpInterface (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{MemoryEffects::Read on ::mlir::SideEffects::DefaultResource, MemoryEffects::Write on ::mlir::SideEffects::DefaultResource}
Attributes:¶
| Attribute | MLIR Type | Description |
|---|---|---|
cache | ::mlir::triton::CacheModifierAttr | allowed 32-bit signless integer cases: 1, 2, 3, 4, 5, 6{{% markdown %}}Enum cases: * none (`NONE`) * ca (`CA`) * cg (`CG`) * wb (`WB`) * cs (`CS`) * wt (`WT`){{% /markdown %}} |
evict | ::mlir::triton::EvictionPolicyAttr | allowed 32-bit signless integer cases: 1, 2, 3{{% markdown %}}Enum cases: * evict_normal (`NORMAL`) * evict_first (`EVICT_FIRST`) * evict_last (`EVICT_LAST`){{% /markdown %}} |
isVolatile | ::mlir::BoolAttr | bool attribute |
axis | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands:¶
Operand |
Description |
|---|---|
|
ptr or tensor of ptr values |
|
tensor of floating-point values or tensor of integer values or tensor of ptr values |
|
32-bit signless integer |
|
ptr |
|
tensor of 1-bit signless integer values or 1-bit signless integer |
|
floating-point or tensor of floating-point values or integer or tensor of integer values or ptr or tensor of ptr values or ptr |
Results:¶
Result |
Description |
|---|---|
|
tensor of floating-point values or tensor of integer values or tensor of ptr values |
triton_nvidia_gpu.lock (triton::nvidia_gpu::LockOp)¶
Syntax:
operation ::= `triton_nvidia_gpu.lock` $mutex attr-dict `:` type(operands)
Traits: VerifyTensorLayoutsTrait
Operands:¶
Operand |
Description |
|---|---|
|
triton_nvidia_gpu.mbarrier_arrive (triton::nvidia_gpu::MBarrierArriveOp)¶
Mbarrier arrive
Syntax:
operation ::= `triton_nvidia_gpu.mbarrier_arrive` operands attr-dict `:` type(operands)
This operation defining the arriving action for a mbarrier. txCount: An optional attribute that set tx-count. This Op will be lowered into mbarrier.arrive.expect_tx if the optional attribute exist. trackAsyncOp: If true, this op will be lowered into cp.async.mbarrier.arrive.noinc. pred: Only perform arrive action when pred is true. remoteCtaId: if set, perform an remote arrive action.
Example:
triton_nvidia_gpu.mbarrier_arrive %0 {trackAsyncOp = false} : !tt.ptr
Traits: AttrSizedOperandSegments, VerifyTensorLayoutsTrait
Interfaces: MemoryEffectOpInterface (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{MemoryEffects::Write on ::mlir::SideEffects::DefaultResource}
Attributes:¶
| Attribute | MLIR Type | Description |
|---|---|---|
trackAsyncOp | ::mlir::IntegerAttr | 1-bit signless integer attribute |
txCount | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands:¶
Operand |
Description |
|---|---|
|
ptr |
|
1-bit signless integer |
|
32-bit signless integer |
triton_nvidia_gpu.mbarrier_wait (triton::nvidia_gpu::MBarrierWaitOp)¶
Mbarrier wait
Syntax:
operation ::= `triton_nvidia_gpu.mbarrier_wait` $mbarrier `,` $phase attr-dict `:` type($mbarrier)
This operation defining the waiting action for a mbarrier. The subsequent operations should not execute until this operation completes waiting.
Example:
triton_nvidia_gpu.mbarrier_wait %0, %1 : !tt.ptr
Traits: VerifyTensorLayoutsTrait
Interfaces: MemoryEffectOpInterface (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{MemoryEffects::Read on ::mlir::SideEffects::DefaultResource, MemoryEffects::Write on ::mlir::SideEffects::DefaultResource}
Operands:¶
Operand |
Description |
|---|---|
|
ptr |
|
1-bit signless integer |
triton_nvidia_gpu.bar_arrive (triton::nvidia_gpu::NamedBarrierArriveOp)¶
Named barrier arrive
Syntax:
operation ::= `triton_nvidia_gpu.bar_arrive` $bar `,` $numThreads attr-dict `:` type(operands)
Traits: VerifyTensorLayoutsTrait
Operands:¶
Operand |
Description |
|---|---|
|
32-bit signless integer |
|
32-bit signless integer |
triton_nvidia_gpu.bar_wait (triton::nvidia_gpu::NamedBarrierWaitOp)¶
Named barrier wait
Syntax:
operation ::= `triton_nvidia_gpu.bar_wait` $bar `,` $numThreads attr-dict `:` type(operands)
Traits: VerifyTensorLayoutsTrait
Operands:¶
Operand |
Description |
|---|---|
|
32-bit signless integer |
|
32-bit signless integer |
triton_nvidia_gpu.producer_acquire (triton::nvidia_gpu::ProducerAcquireOp)¶
Syntax:
operation ::= `triton_nvidia_gpu.producer_acquire` $token `,` $idx attr-dict `:` type(operands)
Traits: VerifyTensorLayoutsTrait
Operands:¶
Operand |
Description |
|---|---|
|
tensor of values |
|
32-bit signless integer |
triton_nvidia_gpu.producer_commit (triton::nvidia_gpu::ProducerCommitOp)¶
Syntax:
operation ::= `triton_nvidia_gpu.producer_commit` $token `,` $idx attr-dict `:` type(operands)
Traits: VerifyTensorLayoutsTrait
Operands:¶
Operand |
Description |
|---|---|
|
tensor of values |
|
32-bit signless integer |
triton_nvidia_gpu.reg_alloc (triton::nvidia_gpu::RegAllocOp)¶
Register allocation
Syntax:
operation ::= `triton_nvidia_gpu.reg_alloc` $regCount attr-dict
Traits: VerifyTensorLayoutsTrait
Attributes:¶
| Attribute | MLIR Type | Description |
|---|---|---|
regCount | ::mlir::IntegerAttr | 32-bit signless integer attribute |
triton_nvidia_gpu.reg_dealloc (triton::nvidia_gpu::RegDeallocOp)¶
Register deallocation
Syntax:
operation ::= `triton_nvidia_gpu.reg_dealloc` $regCount attr-dict
Traits: VerifyTensorLayoutsTrait
Attributes:¶
| Attribute | MLIR Type | Description |
|---|---|---|
regCount | ::mlir::IntegerAttr | 32-bit signless integer attribute |
triton_nvidia_gpu.store_async (triton::nvidia_gpu::StoreAsyncOp)¶
Store asynchronous by a tensor pointer
Syntax:
operation ::= `triton_nvidia_gpu.store_async` operands attr-dict `:` type(operands)
Traits: VerifyTensorLayoutsTrait
Interfaces: MemoryEffectOpInterface (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{MemoryEffects::Write on ::mlir::SideEffects::DefaultResource}
Attributes:¶
| Attribute | MLIR Type | Description |
|---|---|---|
cache | ::mlir::triton::CacheModifierAttr | allowed 32-bit signless integer cases: 1, 2, 3, 4, 5, 6{{% markdown %}}Enum cases: * none (`NONE`) * ca (`CA`) * cg (`CG`) * wb (`WB`) * cs (`CS`) * wt (`WT`){{% /markdown %}} |
Operands:¶
Operand |
Description |
|---|---|
|
ptr |
|
tensor of floating-point values or tensor of integer values or tensor of ptr values |
triton_nvidia_gpu.unlock (triton::nvidia_gpu::UnlockOp)¶
Syntax:
operation ::= `triton_nvidia_gpu.unlock` $mutex attr-dict `:` type(operands)
Traits: VerifyTensorLayoutsTrait
Operands:¶
Operand |
Description |
|---|---|
|