vllm.model_executor.layers.hybrid_attn_layer ¶
HybridAttentionLayer ¶
Bases: Attention, AttentionLayerBase
Attention layer that fuses sliding-window KV with an SSM history branch.
This layer is a thin wrapper around the standard Attention module that:
- Forces the use of
HybridAttentionBackendfor its attention backend. - Owns a
HybridSSMAdapterinstance representing the history branch. - Reuses
Attention.get_kv_cache_specso it continues to expose either aSlidingWindowSpecorFullAttentionSpecfor its KV cache.
Source code in vllm/model_executor/layers/hybrid_attn_layer.py
ssm_adapter instance-attribute ¶
ssm_adapter = HybridSSMAdapter(
hidden_size=num_heads * head_size,
ssm_state_size=ssm_state_size,
conv_kernel_size=ssm_conv_kernel_size,
intermediate_size=ssm_intermediate_size,
model_config=model_config,
cache_config=cache_config or cache_config,
prefix=f"{prefix}.ssm",
)
__init__ ¶
__init__(
num_heads: int,
head_size: int,
scale: float,
num_kv_heads: int | None = None,
*,
ssm_state_size: int,
ssm_conv_kernel_size: int,
ssm_intermediate_size: int,
cache_config: CacheConfig | None = None,
prefix: str = "",
**extra_impl_args,
) -> None
Source code in vllm/model_executor/layers/hybrid_attn_layer.py
get_attn_backend ¶
get_attn_backend() -> type[AttentionBackend]
get_kv_cache_spec ¶
get_kv_cache_spec(vllm_config: VllmConfig) -> KVCacheSpec