# TritonNvidiaGPUOps<!-- Autogenerated by mlir-tblgen; don't manually edit -->
### `triton_nvidia_gpu.alloc_mbarrier` (triton::nvidia_gpu::AllocMBarrierOp)

_Allocate a vector of mbarriers_


Syntax:

```
operation ::= `triton_nvidia_gpu.alloc_mbarrier` attr-dict `:` type($result)
```

Allocate and initialize a vector of mbarriers. The size of the vector is implied in the returned type.
Each mbarrier is initialized as:
1, the current phase initialized to 0.
2, the expected arrival count initialized to 'count'.
3, the pending arrival count initialized to 'count'.
4, the tx-count initialized to 0.

Example:

case a. when created in vector:
%1 = triton_nvidia_gpu.alloc_mbarrier { count = 1 } : tensor<4xi64>

case b. when created in scalar:
%1 = triton_nvidia_gpu.alloc_mbarrier { count = 1 } : !tt.ptr<i64>


Traits: VerifyTensorLayoutsTrait

Interfaces: MemoryEffectOpInterface (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{MemoryEffects::Allocate on ::mlir::SideEffects::DefaultResource}

#### Attributes:

<table>
<tr><th>Attribute</th><th>MLIR Type</th><th>Description</th></tr>
<tr><td><code>count</code></td><td>::mlir::IntegerAttr</td><td>32-bit signless integer attribute</td></tr>
</table>

#### Results:

| Result | Description |
| :----: | ----------- |
| `result` | ptr or tensor of 64-bit signless integer values

### `triton_nvidia_gpu.cluster_arrive` (triton::nvidia_gpu::ClusterArriveOp)

Syntax:

```
operation ::= `triton_nvidia_gpu.cluster_arrive` attr-dict
```


Traits: VerifyTensorLayoutsTrait

#### Attributes:

<table>
<tr><th>Attribute</th><th>MLIR Type</th><th>Description</th></tr>
<tr><td><code>relaxed</code></td><td>::mlir::IntegerAttr</td><td>1-bit signless integer attribute</td></tr>
</table>

### `triton_nvidia_gpu.cluster_wait` (triton::nvidia_gpu::ClusterWaitOp)

Syntax:

```
operation ::= `triton_nvidia_gpu.cluster_wait` attr-dict
```


Traits: VerifyTensorLayoutsTrait

### `triton_nvidia_gpu.consumer_release` (triton::nvidia_gpu::ConsumerReleaseOp)

Syntax:

```
operation ::= `triton_nvidia_gpu.consumer_release` $token `,` $idx attr-dict `:` type(operands)
```


Traits: VerifyTensorLayoutsTrait

#### Operands:

| Operand | Description |
| :-----: | ----------- |
| `token` | tensor of  values
| `idx` | 32-bit signless integer

### `triton_nvidia_gpu.consumer_wait` (triton::nvidia_gpu::ConsumerWaitOp)

Syntax:

```
operation ::= `triton_nvidia_gpu.consumer_wait` $token `,` $idx attr-dict `:` type(operands)
```


Traits: VerifyTensorLayoutsTrait

#### Operands:

| Operand | Description |
| :-----: | ----------- |
| `token` | tensor of  values
| `idx` | 32-bit signless integer

### `triton_nvidia_gpu.create_mutex` (triton::nvidia_gpu::CreateMutexOp)

Syntax:

```
operation ::= `triton_nvidia_gpu.create_mutex` attr-dict `:` type($result)
```


Traits: VerifyTensorLayoutsTrait

Interfaces: InferTypeOpInterface

#### Results:

| Result | Description |
| :----: | ----------- |
| `result` | 

### `triton_nvidia_gpu.create_token` (triton::nvidia_gpu::CreateTokenOp)

Syntax:

```
operation ::= `triton_nvidia_gpu.create_token` attr-dict `:` type($result)
```


Traits: VerifyTensorLayoutsTrait

#### Attributes:

<table>
<tr><th>Attribute</th><th>MLIR Type</th><th>Description</th></tr>
<tr><td><code>num</code></td><td>::mlir::IntegerAttr</td><td>32-bit signless integer attribute</td></tr>
</table>

#### Results:

| Result | Description |
| :----: | ----------- |
| `result` | tensor of  values

### `triton_nvidia_gpu.dot_async` (triton::nvidia_gpu::DotAsyncOp)

_Dot async_


Syntax:

```
operation ::= `triton_nvidia_gpu.dot_async` $a`,` $b`,` $c attr-dict `:` type($a) `*` type($b) `->` type($d)
```

$d = matrix_multiply($a, $b) + $c

Traits: AlwaysSpeculatableImplTrait, VerifyTensorLayoutsTrait

Interfaces: ConditionallySpeculatable, InferTypeOpInterface, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

#### Attributes:

<table>
<tr><th>Attribute</th><th>MLIR Type</th><th>Description</th></tr>
<tr><td><code>allowTF32</code></td><td>::mlir::BoolAttr</td><td>bool attribute</td></tr>
<tr><td><code>maxNumImpreciseAcc</code></td><td>::mlir::IntegerAttr</td><td>32-bit signless integer attribute</td></tr>
</table>

#### Operands:

| Operand | Description |
| :-----: | ----------- |
| `a` | tensor of floating-point values or tensor of integer values
| `b` | tensor of floating-point values or tensor of integer values
| `c` | tensor of floating-point values or tensor of integer values

#### Results:

| Result | Description |
| :----: | ----------- |
| `d` | tensor of floating-point values or tensor of integer values

### `triton_nvidia_gpu.dot_wait` (triton::nvidia_gpu::DotWaitOp)

_Dot wait_


Syntax:

```
operation ::= `triton_nvidia_gpu.dot_wait` $inputs attr-dict `:` type($inputs)
```

This operation defining the waiting action for a async dot, MMAv3 .e.g.
The subsequent operations should not execute until this operation completes waiting.

Traits: VerifyTensorLayoutsTrait

Interfaces: InferTypeOpInterface

#### Attributes:

<table>
<tr><th>Attribute</th><th>MLIR Type</th><th>Description</th></tr>
<tr><td><code>pendings</code></td><td>::mlir::IntegerAttr</td><td>32-bit signless integer attribute</td></tr>
</table>

#### Operands:

| Operand | Description |
| :-----: | ----------- |
| `inputs` | variadic of tensor of floating-point values or tensor of integer values

#### Results:

| Result | Description |
| :----: | ----------- |
| `outputs` | variadic of tensor of floating-point values or tensor of integer values

### `triton_nvidia_gpu.extract_mbarrier` (triton::nvidia_gpu::ExtractMBarrierOp)

_Extract a mbarrier from a vector of mbarriers_


Syntax:

```
operation ::= `triton_nvidia_gpu.extract_mbarrier` $tensor `[` $index `]` attr-dict `:` type($tensor) `,` type($index) `->` type($result)
```

Extract a mbarrier from a vector of mbarriers

Example:

%1 = triton_nvidia_gpu.extract_mbarrier %mbarriers[%idx] : tensor<4xi64>, index -> !tt.ptr<i64>


Traits: AlwaysSpeculatableImplTrait, VerifyTensorLayoutsTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

#### Operands:

| Operand | Description |
| :-----: | ----------- |
| `tensor` | tensor of 64-bit signless integer values
| `index` | 32-bit signless integer

#### Results:

| Result | Description |
| :----: | ----------- |
| `result` | ptr

### `triton_nvidia_gpu.fence_async_shared` (triton::nvidia_gpu::FenceAsyncSharedOp)

_Fence proxy async_


Syntax:

```
operation ::= `triton_nvidia_gpu.fence_async_shared` attr-dict
```


Traits: VerifyTensorLayoutsTrait

#### Attributes:

<table>
<tr><th>Attribute</th><th>MLIR Type</th><th>Description</th></tr>
<tr><td><code>bCluster</code></td><td>::mlir::BoolAttr</td><td>bool attribute</td></tr>
</table>

### `triton_nvidia_gpu.get_agent_id` (triton::nvidia_gpu::GetAgentIdOp)

Syntax:

```
operation ::= `triton_nvidia_gpu.get_agent_id` attr-dict `:` type($result)
```


Traits: AlwaysSpeculatableImplTrait, VerifyTensorLayoutsTrait

Interfaces: ConditionallySpeculatable, InferTypeOpInterface, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

#### Results:

| Result | Description |
| :----: | ----------- |
| `result` | 32-bit signless integer

### `triton_nvidia_gpu.get_canonical_warp_id` (triton::nvidia_gpu::GetCanonicalWarpId)

Syntax:

```
operation ::= `triton_nvidia_gpu.get_canonical_warp_id` attr-dict `:` type($result)
```

Returns the one dimensional warpId when it's used for producing warp uniform values.

Traits: AlwaysSpeculatableImplTrait, VerifyTensorLayoutsTrait

Interfaces: ConditionallySpeculatable, InferTypeOpInterface, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

#### Results:

| Result | Description |
| :----: | ----------- |
| `result` | 32-bit signless integer

### `triton_nvidia_gpu.get_cluster_cta_id` (triton::nvidia_gpu::GetClusterCTAIdOp)

Syntax:

```
operation ::= `triton_nvidia_gpu.get_cluster_cta_id` attr-dict `:` type($result)
```

Returns the one dimensional cluster_cta_id.

Traits: AlwaysSpeculatableImplTrait, VerifyTensorLayoutsTrait

Interfaces: ConditionallySpeculatable, InferTypeOpInterface, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

#### Results:

| Result | Description |
| :----: | ----------- |
| `result` | 32-bit signless integer

### `triton_nvidia_gpu.get_mutex_role_id` (triton::nvidia_gpu::GetMutexRoleIdOp)

Syntax:

```
operation ::= `triton_nvidia_gpu.get_mutex_role_id` attr-dict `:` type($result)
```


Traits: VerifyTensorLayoutsTrait

Interfaces: InferTypeOpInterface

#### Attributes:

<table>
<tr><th>Attribute</th><th>MLIR Type</th><th>Description</th></tr>
<tr><td><code>num</code></td><td>::mlir::IntegerAttr</td><td>32-bit signless integer attribute</td></tr>
</table>

#### Results:

| Result | Description |
| :----: | ----------- |
| `result` | 32-bit signless integer

### `triton_nvidia_gpu.get_thread_id` (triton::nvidia_gpu::GetThreadIdOp)

Syntax:

```
operation ::= `triton_nvidia_gpu.get_thread_id` attr-dict `:` type($result)
```

Returns the one dimensional threadId.

Traits: AlwaysSpeculatableImplTrait, VerifyTensorLayoutsTrait

Interfaces: ConditionallySpeculatable, InferTypeOpInterface, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

#### Results:

| Result | Description |
| :----: | ----------- |
| `result` | 32-bit signless integer

### `triton_nvidia_gpu.insert_slice_async_v2` (triton::nvidia_gpu::InsertSliceAsyncV2Op)

Syntax:

```
operation ::= `triton_nvidia_gpu.insert_slice_async_v2` operands attr-dict `:` type(operands) `->` type($result)
```


Traits: AttrSizedOperandSegments, ResultsAreSharedEncoding, VerifyTensorLayoutsTrait

Interfaces: MemoryEffectOpInterface (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{MemoryEffects::Read on ::mlir::SideEffects::DefaultResource, MemoryEffects::Write on ::mlir::SideEffects::DefaultResource}

#### Attributes:

<table>
<tr><th>Attribute</th><th>MLIR Type</th><th>Description</th></tr>
<tr><td><code>cache</code></td><td>::mlir::triton::CacheModifierAttr</td><td><details><summary>allowed 32-bit signless integer cases: 1, 2, 3, 4, 5, 6</summary>{{% markdown %}}Enum cases:
* none (`NONE`)
* ca (`CA`)
* cg (`CG`)
* wb (`WB`)
* cs (`CS`)
* wt (`WT`){{% /markdown %}}</details></td></tr>
<tr><td><code>evict</code></td><td>::mlir::triton::EvictionPolicyAttr</td><td><details><summary>allowed 32-bit signless integer cases: 1, 2, 3</summary>{{% markdown %}}Enum cases:
* evict_normal (`NORMAL`)
* evict_first (`EVICT_FIRST`)
* evict_last (`EVICT_LAST`){{% /markdown %}}</details></td></tr>
<tr><td><code>isVolatile</code></td><td>::mlir::BoolAttr</td><td>bool attribute</td></tr>
<tr><td><code>axis</code></td><td>::mlir::IntegerAttr</td><td>32-bit signless integer attribute</td></tr>
</table>

#### Operands:

| Operand | Description |
| :-----: | ----------- |
| `src` | ptr or tensor of ptr values
| `dst` | tensor of floating-point values or tensor of integer values or tensor of ptr values
| `index` | 32-bit signless integer
| `mbar` | ptr
| `mask` | tensor of 1-bit signless integer values or 1-bit signless integer
| `other` | floating-point or tensor of floating-point values or integer or tensor of integer values or ptr or tensor of ptr values or ptr

#### Results:

| Result | Description |
| :----: | ----------- |
| `result` | tensor of floating-point values or tensor of integer values or tensor of ptr values

### `triton_nvidia_gpu.lock` (triton::nvidia_gpu::LockOp)

Syntax:

```
operation ::= `triton_nvidia_gpu.lock` $mutex attr-dict `:` type(operands)
```


Traits: VerifyTensorLayoutsTrait

#### Operands:

| Operand | Description |
| :-----: | ----------- |
| `mutex` | 

### `triton_nvidia_gpu.mbarrier_arrive` (triton::nvidia_gpu::MBarrierArriveOp)

_Mbarrier arrive_


Syntax:

```
operation ::= `triton_nvidia_gpu.mbarrier_arrive` operands attr-dict `:` type(operands)
```

This operation defining the arriving action for a mbarrier.
txCount:
    An optional attribute that set tx-count. This Op will be lowered into
    mbarrier.arrive.expect_tx if the optional attribute exist.
trackAsyncOp:
    If true, this op will be lowered into cp.async.mbarrier.arrive.noinc.
pred:
    Only perform arrive action when pred is true.
remoteCtaId:
    if set, perform an remote arrive action.

Example:

triton_nvidia_gpu.mbarrier_arrive %0 {trackAsyncOp = false} : !tt.ptr<i64>


Traits: AttrSizedOperandSegments, VerifyTensorLayoutsTrait

Interfaces: MemoryEffectOpInterface (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{MemoryEffects::Write on ::mlir::SideEffects::DefaultResource}

#### Attributes:

<table>
<tr><th>Attribute</th><th>MLIR Type</th><th>Description</th></tr>
<tr><td><code>trackAsyncOp</code></td><td>::mlir::IntegerAttr</td><td>1-bit signless integer attribute</td></tr>
<tr><td><code>txCount</code></td><td>::mlir::IntegerAttr</td><td>32-bit signless integer attribute</td></tr>
</table>

#### Operands:

| Operand | Description |
| :-----: | ----------- |
| `mbarrier` | ptr
| `pred` | 1-bit signless integer
| `remoteCtaId` | 32-bit signless integer

### `triton_nvidia_gpu.mbarrier_wait` (triton::nvidia_gpu::MBarrierWaitOp)

_Mbarrier wait_


Syntax:

```
operation ::= `triton_nvidia_gpu.mbarrier_wait` $mbarrier `,` $phase attr-dict `:` type($mbarrier)
```

This operation defining the waiting action for a mbarrier.
The subsequent operations should not execute until this operation completes waiting.

Example:

triton_nvidia_gpu.mbarrier_wait %0, %1 : !tt.ptr<i64>


Traits: VerifyTensorLayoutsTrait

Interfaces: MemoryEffectOpInterface (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{MemoryEffects::Read on ::mlir::SideEffects::DefaultResource, MemoryEffects::Write on ::mlir::SideEffects::DefaultResource}

#### Operands:

| Operand | Description |
| :-----: | ----------- |
| `mbarrier` | ptr
| `phase` | 1-bit signless integer

### `triton_nvidia_gpu.bar_arrive` (triton::nvidia_gpu::NamedBarrierArriveOp)

_Named barrier arrive_


Syntax:

```
operation ::= `triton_nvidia_gpu.bar_arrive` $bar `,` $numThreads attr-dict `:` type(operands)
```


Traits: VerifyTensorLayoutsTrait

#### Operands:

| Operand | Description |
| :-----: | ----------- |
| `bar` | 32-bit signless integer
| `numThreads` | 32-bit signless integer

### `triton_nvidia_gpu.bar_wait` (triton::nvidia_gpu::NamedBarrierWaitOp)

_Named barrier wait_


Syntax:

```
operation ::= `triton_nvidia_gpu.bar_wait` $bar `,` $numThreads attr-dict `:` type(operands)
```


Traits: VerifyTensorLayoutsTrait

#### Operands:

| Operand | Description |
| :-----: | ----------- |
| `bar` | 32-bit signless integer
| `numThreads` | 32-bit signless integer

### `triton_nvidia_gpu.producer_acquire` (triton::nvidia_gpu::ProducerAcquireOp)

Syntax:

```
operation ::= `triton_nvidia_gpu.producer_acquire` $token `,` $idx attr-dict `:` type(operands)
```


Traits: VerifyTensorLayoutsTrait

#### Operands:

| Operand | Description |
| :-----: | ----------- |
| `token` | tensor of  values
| `idx` | 32-bit signless integer

### `triton_nvidia_gpu.producer_commit` (triton::nvidia_gpu::ProducerCommitOp)

Syntax:

```
operation ::= `triton_nvidia_gpu.producer_commit` $token `,` $idx attr-dict `:` type(operands)
```


Traits: VerifyTensorLayoutsTrait

#### Operands:

| Operand | Description |
| :-----: | ----------- |
| `token` | tensor of  values
| `idx` | 32-bit signless integer

### `triton_nvidia_gpu.reg_alloc` (triton::nvidia_gpu::RegAllocOp)

_Register allocation_


Syntax:

```
operation ::= `triton_nvidia_gpu.reg_alloc` $regCount attr-dict
```


Traits: VerifyTensorLayoutsTrait

#### Attributes:

<table>
<tr><th>Attribute</th><th>MLIR Type</th><th>Description</th></tr>
<tr><td><code>regCount</code></td><td>::mlir::IntegerAttr</td><td>32-bit signless integer attribute</td></tr>
</table>

### `triton_nvidia_gpu.reg_dealloc` (triton::nvidia_gpu::RegDeallocOp)

_Register deallocation_


Syntax:

```
operation ::= `triton_nvidia_gpu.reg_dealloc` $regCount attr-dict
```


Traits: VerifyTensorLayoutsTrait

#### Attributes:

<table>
<tr><th>Attribute</th><th>MLIR Type</th><th>Description</th></tr>
<tr><td><code>regCount</code></td><td>::mlir::IntegerAttr</td><td>32-bit signless integer attribute</td></tr>
</table>

### `triton_nvidia_gpu.store_async` (triton::nvidia_gpu::StoreAsyncOp)

_Store asynchronous by a tensor pointer_


Syntax:

```
operation ::= `triton_nvidia_gpu.store_async` operands attr-dict `:` type(operands)
```


Traits: VerifyTensorLayoutsTrait

Interfaces: MemoryEffectOpInterface (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{MemoryEffects::Write on ::mlir::SideEffects::DefaultResource}

#### Attributes:

<table>
<tr><th>Attribute</th><th>MLIR Type</th><th>Description</th></tr>
<tr><td><code>cache</code></td><td>::mlir::triton::CacheModifierAttr</td><td><details><summary>allowed 32-bit signless integer cases: 1, 2, 3, 4, 5, 6</summary>{{% markdown %}}Enum cases:
* none (`NONE`)
* ca (`CA`)
* cg (`CG`)
* wb (`WB`)
* cs (`CS`)
* wt (`WT`){{% /markdown %}}</details></td></tr>
</table>

#### Operands:

| Operand | Description |
| :-----: | ----------- |
| `dst` | ptr
| `src` | tensor of floating-point values or tensor of integer values or tensor of ptr values

### `triton_nvidia_gpu.unlock` (triton::nvidia_gpu::UnlockOp)

Syntax:

```
operation ::= `triton_nvidia_gpu.unlock` $mutex attr-dict `:` type(operands)
```


Traits: VerifyTensorLayoutsTrait

#### Operands:

| Operand | Description |
| :-----: | ----------- |
| `mutex` |
Attribute	MLIR Type	Description
`allowTF32`	::mlir::BoolAttr	bool attribute
`maxNumImpreciseAcc`	::mlir::IntegerAttr	32-bit signless integer attribute
Attribute	MLIR Type	Description
`cache`	::mlir::triton::CacheModifierAttr	allowed 32-bit signless integer cases: 1, 2, 3, 4, 5, 6 {{% markdown %}}Enum cases: * none (`NONE`) * ca (`CA`) * cg (`CG`) * wb (`WB`) * cs (`CS`) * wt (`WT`){{% /markdown %}}
`evict`	::mlir::triton::EvictionPolicyAttr	allowed 32-bit signless integer cases: 1, 2, 3 {{% markdown %}}Enum cases: * evict_normal (`NORMAL`) * evict_first (`EVICT_FIRST`) * evict_last (`EVICT_LAST`){{% /markdown %}}
`isVolatile`	::mlir::BoolAttr	bool attribute
`axis`	::mlir::IntegerAttr	32-bit signless integer attribute
Attribute	MLIR Type	Description
`trackAsyncOp`	::mlir::IntegerAttr	1-bit signless integer attribute
`txCount`	::mlir::IntegerAttr	32-bit signless integer attribute