optimization

2025-08-02 17:46:06 +03:30
parent 474245fe83
commit 0d151529f0
7 changed files with 1209 additions and 0 deletions
--- a/vosk/test_files/OPTIMIZATION_GUIDE.md
+++ b/vosk/test_files/OPTIMIZATION_GUIDE.md
@@ -0,0 +1,243 @@
+# 192-Core Optimization Guide
+
+This guide explains how to optimize your audio processing pipeline to utilize 192 CPU cores at 100% capacity.
+
+## 🚀 Quick Start
+
+1. **Install dependencies:**
+   ```bash
+   pip install -r requirements_optimized.txt
+   ```
+
+2. **Run the optimized pipeline:**
+   ```bash
+   ./run_optimized_192cores.sh
+   ```
+
+3. **Monitor performance:**
+   ```bash
+   python monitor_performance.py
+   ```
+
+## 📊 Key Optimizations Implemented
+
+### 1. **Asynchronous Processing**
+- **aiohttp** for concurrent HTTP requests
+- **asyncio** for non-blocking I/O operations
+- **ProcessPoolExecutor** for CPU-intensive tasks
+
+### 2. **Parallel Processing Strategy**
+```python
+# Configuration for 192 cores
+NUM_CORES = 192
+BATCH_SIZE = 32  # Increased for better throughput
+MAX_CONCURRENT_REQUESTS = 48  # 192/4 for optimal concurrency
+```
+
+### 3. **Memory-Efficient Processing**
+- Streaming data processing
+- Chunked batch processing
+- Parallel file I/O operations
+
+### 4. **System-Level Optimizations**
+- CPU governor set to performance mode
+- Increased file descriptor limits
+- Process priority optimization
+- Environment variables for thread optimization
+
+## 🔧 Configuration Details
+
+### Batch Processing
+- **Batch Size**: 32 samples per batch
+- **Concurrent Requests**: 48 simultaneous API calls
+- **Process Pool Workers**: 192 parallel processes
+
+### Memory Management
+- **Chunk Size**: 1000 samples per chunk
+- **Streaming**: True for large datasets
+- **Parallel Sharding**: 50 shards for optimal I/O
+
+### Network Optimization
+- **Connection Pool**: 48 concurrent connections
+- **Timeout**: 120 seconds per request
+- **Retry Logic**: Built-in error handling
+
+## 📈 Performance Monitoring
+
+### Real-time Monitoring
+```bash
+python monitor_performance.py
+```
+
+### Metrics Tracked
+- CPU utilization per core
+- Memory usage
+- Network I/O
+- Disk I/O
+- Load average
+
+### Performance Targets
+- **CPU Utilization**: >90% across all cores
+- **Memory Usage**: <80% of available RAM
+- **Processing Rate**: >1000 samples/second
+
+## 🛠️ Troubleshooting
+
+### Low CPU Utilization (<50%)
+1. **Increase batch size:**
+   ```python
+   BATCH_SIZE = 64  # or higher
+   ```
+
+2. **Increase concurrent requests:**
+   ```python
+   MAX_CONCURRENT_REQUESTS = 96  # 192/2
+   ```
+
+3. **Check I/O bottlenecks:**
+   - Monitor disk usage
+   - Check network bandwidth
+   - Verify API response times
+
+### High Memory Usage (>90%)
+1. **Reduce batch size:**
+   ```python
+   BATCH_SIZE = 16  # or lower
+   ```
+
+2. **Enable streaming:**
+   ```python
+   ds = load_dataset(..., streaming=True)
+   ```
+
+3. **Process in smaller chunks:**
+   ```python
+   CHUNK_SIZE = 500  # reduce from 1000
+   ```
+
+### Network Bottlenecks
+1. **Reduce concurrent requests:**
+   ```python
+   MAX_CONCURRENT_REQUESTS = 24  # reduce from 48
+   ```
+
+2. **Increase timeout:**
+   ```python
+   timeout=aiohttp.ClientTimeout(total=300)
+   ```
+
+3. **Use connection pooling:**
+   ```python
+   connector=aiohttp.TCPConnector(limit=MAX_CONCURRENT_REQUESTS)
+   ```
+
+## 🔄 Advanced Optimizations
+
+### 1. **Custom Process Pool Configuration**
+```python
+# For CPU-intensive tasks
+with ProcessPoolExecutor(
+    max_workers=NUM_CORES,
+    mp_context=mp.get_context('spawn')
+) as executor:
+    results = executor.map(process_function, data)
+```
+
+### 2. **Memory-Mapped Files**
+```python
+import mmap
+
+def process_large_file(filename):
+    with open(filename, 'rb') as f:
+        with mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ) as mm:
+            # Process memory-mapped file
+            pass
+```
+
+### 3. **NUMA Optimization** (for multi-socket systems)
+```bash
+# Bind processes to specific NUMA nodes
+numactl --cpunodebind=0 --membind=0 python script.py
+```
+
+### 4. **GPU Acceleration** (if available)
+```python
+# Use GPU for audio processing if available
+import torch
+
+if torch.cuda.is_available():
+    device = torch.device('cuda')
+    # Move audio processing to GPU
+```
+
+## 📊 Expected Performance
+
+### Baseline Performance
+- **192 cores**: 100% utilization target
+- **Processing rate**: 1000-2000 samples/second
+- **Memory usage**: 60-80% of available RAM
+- **Network throughput**: 1-2 GB/s
+
+### Optimization Targets
+- **CPU Efficiency**: >95%
+- **Memory Efficiency**: >85%
+- **I/O Efficiency**: >90%
+- **Network Efficiency**: >80%
+
+## 🎯 Monitoring Commands
+
+### System Resources
+```bash
+# CPU usage
+htop -p $(pgrep -f "python.*batch_confirm")
+
+# Memory usage
+free -h
+
+# Network I/O
+iftop
+
+# Disk I/O
+iotop
+```
+
+### Process Monitoring
+```bash
+# Process tree
+pstree -p $(pgrep -f "python.*batch_confirm")
+
+# Resource usage per process
+ps aux | grep python
+```
+
+## 🔧 System Requirements
+
+### Minimum Requirements
+- **CPU**: 192 cores (any architecture)
+- **RAM**: 256 GB
+- **Storage**: 1 TB SSD
+- **Network**: 10 Gbps
+
+### Recommended Requirements
+- **CPU**: 192 cores (AMD EPYC or Intel Xeon)
+- **RAM**: 512 GB
+- **Storage**: 2 TB NVMe SSD
+- **Network**: 25 Gbps
+
+## 🚨 Important Notes
+
+1. **Memory Management**: Monitor memory usage closely
+2. **Network Limits**: Ensure sufficient bandwidth
+3. **API Limits**: Check Vosk service capacity
+4. **Storage I/O**: Use fast storage for temporary files
+5. **Process Limits**: Increase system limits if needed
+
+## 📞 Support
+
+If you encounter issues:
+1. Check the performance logs
+2. Monitor system resources
+3. Adjust configuration parameters
+4. Review the troubleshooting section
+
+For optimal performance, ensure your system meets the recommended requirements and follow the monitoring guidelines.