alireza/vosk-datacleaner

Fork 0

Files

Alireza 0d151529f0 optimization

2025-08-02 17:46:06 +03:30

5.4 KiB

Raw Blame History

192-Core Optimization Guide

This guide explains how to optimize your audio processing pipeline to utilize 192 CPU cores at 100% capacity.

🚀 Quick Start

Install dependencies:

pip install -r requirements_optimized.txt

Run the optimized pipeline:
```
./run_optimized_192cores.sh
```
Monitor performance:
```
python monitor_performance.py
```

📊 Key Optimizations Implemented

1. Asynchronous Processing

aiohttp for concurrent HTTP requests
asyncio for non-blocking I/O operations
ProcessPoolExecutor for CPU-intensive tasks

2. Parallel Processing Strategy

# Configuration for 192 cores
NUM_CORES = 192
BATCH_SIZE = 32  # Increased for better throughput
MAX_CONCURRENT_REQUESTS = 48  # 192/4 for optimal concurrency

3. Memory-Efficient Processing

Streaming data processing
Chunked batch processing
Parallel file I/O operations

4. System-Level Optimizations

CPU governor set to performance mode
Increased file descriptor limits
Process priority optimization
Environment variables for thread optimization

🔧 Configuration Details

Batch Processing

Batch Size: 32 samples per batch
Concurrent Requests: 48 simultaneous API calls
Process Pool Workers: 192 parallel processes

Memory Management

Chunk Size: 1000 samples per chunk
Streaming: True for large datasets
Parallel Sharding: 50 shards for optimal I/O

Network Optimization

Connection Pool: 48 concurrent connections
Timeout: 120 seconds per request
Retry Logic: Built-in error handling

📈 Performance Monitoring

Real-time Monitoring

python monitor_performance.py

Metrics Tracked

CPU utilization per core
Memory usage
Network I/O
Disk I/O
Load average

Performance Targets

CPU Utilization: >90% across all cores
Memory Usage: <80% of available RAM
Processing Rate: >1000 samples/second

🛠️ Troubleshooting

Low CPU Utilization (<50%)

Increase batch size:
```
BATCH_SIZE = 64  # or higher
```
Increase concurrent requests:
```
MAX_CONCURRENT_REQUESTS = 96  # 192/2
```
Check I/O bottlenecks:
- Monitor disk usage
- Check network bandwidth
- Verify API response times

High Memory Usage (>90%)

Reduce batch size:
```
BATCH_SIZE = 16  # or lower
```
Enable streaming:
```
ds = load_dataset(..., streaming=True)
```
Process in smaller chunks:
```
CHUNK_SIZE = 500  # reduce from 1000
```

Network Bottlenecks

Reduce concurrent requests:

MAX_CONCURRENT_REQUESTS = 24  # reduce from 48

Increase timeout:

timeout=aiohttp.ClientTimeout(total=300)

Use connection pooling:

connector=aiohttp.TCPConnector(limit=MAX_CONCURRENT_REQUESTS)

🔄 Advanced Optimizations

1. Custom Process Pool Configuration

# For CPU-intensive tasks
with ProcessPoolExecutor(
    max_workers=NUM_CORES,
    mp_context=mp.get_context('spawn')
) as executor:
    results = executor.map(process_function, data)

2. Memory-Mapped Files

import mmap

def process_large_file(filename):
    with open(filename, 'rb') as f:
        with mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ) as mm:
            # Process memory-mapped file
            pass

3. NUMA Optimization (for multi-socket systems)

# Bind processes to specific NUMA nodes
numactl --cpunodebind=0 --membind=0 python script.py

4. GPU Acceleration (if available)

# Use GPU for audio processing if available
import torch

if torch.cuda.is_available():
    device = torch.device('cuda')
    # Move audio processing to GPU

📊 Expected Performance

Baseline Performance

192 cores: 100% utilization target
Processing rate: 1000-2000 samples/second
Memory usage: 60-80% of available RAM
Network throughput: 1-2 GB/s

Optimization Targets

CPU Efficiency: >95%
Memory Efficiency: >85%
I/O Efficiency: >90%
Network Efficiency: >80%

🎯 Monitoring Commands

System Resources

# CPU usage
htop -p $(pgrep -f "python.*batch_confirm")

# Memory usage
free -h

# Network I/O
iftop

# Disk I/O
iotop

Process Monitoring

# Process tree
pstree -p $(pgrep -f "python.*batch_confirm")

# Resource usage per process
ps aux | grep python

🔧 System Requirements

Minimum Requirements

CPU: 192 cores (any architecture)
RAM: 256 GB
Storage: 1 TB SSD
Network: 10 Gbps

Recommended Requirements

CPU: 192 cores (AMD EPYC or Intel Xeon)
RAM: 512 GB
Storage: 2 TB NVMe SSD
Network: 25 Gbps

🚨 Important Notes

Memory Management: Monitor memory usage closely
Network Limits: Ensure sufficient bandwidth
API Limits: Check Vosk service capacity
Storage I/O: Use fast storage for temporary files
Process Limits: Increase system limits if needed

📞 Support

If you encounter issues:

Check the performance logs
Monitor system resources
Adjust configuration parameters
Review the troubleshooting section

For optimal performance, ensure your system meets the recommended requirements and follow the monitoring guidelines.

5.4 KiB Raw Blame History