LightweightStepfun

Step 3.5 Flash

Step 3.5 Flash is Stepfun's lightweight model designed for fast text generation, featuring a 262K token context window and high throughput performance.

Context 262K
Tier Lightweight
Input from
$0.100 / 1M tokens
across 1 provider

API Pricing

ProviderInput / 1MOutput / 1MSpeedTTFTUpdated
$0.100$0.300248 t/s709ms4/14/2026

Prices updated daily. Last check: 4/14/2026

Model Details

General

Creator
Stepfun
Family
Step
Tier
Lightweight
Context Window
262K
Modalities
Text

Capabilities

Tool Calling
No
Open Source
No

Strengths & Limitations

  • Fast token generation at 169.5 tokens per second
  • Large 262K token context window for processing lengthy documents
  • Quick response initiation with 812ms time-to-first-token
  • Lightweight architecture optimized for speed over complexity
  • Suitable for high-throughput text processing workflows
  • Streamlined feature set reduces overhead for basic text tasks
  • No tool calling or function execution capabilities
  • Text-only modality - no image or multimodal input support
  • Proprietary model with no open-source weights available
  • Limited positioning as lightweight tier within Step family
  • Fewer advanced reasoning capabilities compared to flagship models

Key Features

262,144 token context window
Text input and output processing
Streaming response generation
Fast token generation (169.5 tokens/second)
Quick response initiation (812ms TTFT)
Lightweight model architecture
High-throughput text processing
API access through multiple providers

About Step 3.5 Flash

Step 3.5 Flash is a lightweight language model developed by Stepfun, positioned as a fast-generation option within the Step model family. As a tier-focused model, it emphasizes speed and efficiency over maximum capability, making it suitable for applications requiring quick responses and high throughput. The model supports a 262,144-token context window, allowing it to process substantial amounts of text in a single request. Performance benchmarks show Step 3.5 Flash generates tokens at 169.5 tokens per second with a time-to-first-token of 812 milliseconds. The model handles text-only interactions and does not include tool calling capabilities, keeping its feature set streamlined for core text generation tasks. Step 3.5 Flash targets use cases where response speed and processing efficiency are prioritized over complex reasoning or multimodal capabilities. Its combination of a large context window and fast generation makes it practical for content processing, summarization, and other text-heavy workflows where quick turnaround is essential.

Common Use Cases

Step 3.5 Flash is designed for applications requiring fast text generation and high throughput processing. Its large context window combined with quick generation speeds makes it well-suited for document summarization, content processing pipelines, customer service automation, and real-time text analysis. The model works effectively for workflows that need to process substantial text volumes quickly, such as content moderation, text classification at scale, or generating responses in chat applications where speed is prioritized. Its lightweight nature makes it cost-effective for high-volume deployments where complex reasoning capabilities are not required.

Frequently Asked Questions

How much does Step 3.5 Flash cost per million tokens?

Step 3.5 Flash pricing varies by provider and may include different rates for input and output tokens. Check the pricing table above for current rates across all available providers offering this model.

What is Step 3.5 Flash best used for?

Step 3.5 Flash excels at high-throughput text processing tasks where speed is important. With its 169.5 tokens per second generation rate and large 262K context window, it's ideal for document summarization, content processing pipelines, customer service automation, and real-time text analysis where quick responses matter more than complex reasoning.

Does Step 3.5 Flash support tool calling or multimodal inputs?

No, Step 3.5 Flash is designed as a streamlined text-only model without tool calling capabilities or support for images and other modalities. This focused approach contributes to its fast performance characteristics and makes it suitable for pure text generation tasks.