243 lines
5.4 KiB
Markdown
243 lines
5.4 KiB
Markdown
# 192-Core Optimization Guide
|
|
|
|
This guide explains how to optimize your audio processing pipeline to utilize 192 CPU cores at 100% capacity.
|
|
|
|
## 🚀 Quick Start
|
|
|
|
1. **Install dependencies:**
|
|
```bash
|
|
pip install -r requirements_optimized.txt
|
|
```
|
|
|
|
2. **Run the optimized pipeline:**
|
|
```bash
|
|
./run_optimized_192cores.sh
|
|
```
|
|
|
|
3. **Monitor performance:**
|
|
```bash
|
|
python monitor_performance.py
|
|
```
|
|
|
|
## 📊 Key Optimizations Implemented
|
|
|
|
### 1. **Asynchronous Processing**
|
|
- **aiohttp** for concurrent HTTP requests
|
|
- **asyncio** for non-blocking I/O operations
|
|
- **ProcessPoolExecutor** for CPU-intensive tasks
|
|
|
|
### 2. **Parallel Processing Strategy**
|
|
```python
|
|
# Configuration for 192 cores
|
|
NUM_CORES = 192
|
|
BATCH_SIZE = 32 # Increased for better throughput
|
|
MAX_CONCURRENT_REQUESTS = 48 # 192/4 for optimal concurrency
|
|
```
|
|
|
|
### 3. **Memory-Efficient Processing**
|
|
- Streaming data processing
|
|
- Chunked batch processing
|
|
- Parallel file I/O operations
|
|
|
|
### 4. **System-Level Optimizations**
|
|
- CPU governor set to performance mode
|
|
- Increased file descriptor limits
|
|
- Process priority optimization
|
|
- Environment variables for thread optimization
|
|
|
|
## 🔧 Configuration Details
|
|
|
|
### Batch Processing
|
|
- **Batch Size**: 32 samples per batch
|
|
- **Concurrent Requests**: 48 simultaneous API calls
|
|
- **Process Pool Workers**: 192 parallel processes
|
|
|
|
### Memory Management
|
|
- **Chunk Size**: 1000 samples per chunk
|
|
- **Streaming**: True for large datasets
|
|
- **Parallel Sharding**: 50 shards for optimal I/O
|
|
|
|
### Network Optimization
|
|
- **Connection Pool**: 48 concurrent connections
|
|
- **Timeout**: 120 seconds per request
|
|
- **Retry Logic**: Built-in error handling
|
|
|
|
## 📈 Performance Monitoring
|
|
|
|
### Real-time Monitoring
|
|
```bash
|
|
python monitor_performance.py
|
|
```
|
|
|
|
### Metrics Tracked
|
|
- CPU utilization per core
|
|
- Memory usage
|
|
- Network I/O
|
|
- Disk I/O
|
|
- Load average
|
|
|
|
### Performance Targets
|
|
- **CPU Utilization**: >90% across all cores
|
|
- **Memory Usage**: <80% of available RAM
|
|
- **Processing Rate**: >1000 samples/second
|
|
|
|
## 🛠️ Troubleshooting
|
|
|
|
### Low CPU Utilization (<50%)
|
|
1. **Increase batch size:**
|
|
```python
|
|
BATCH_SIZE = 64 # or higher
|
|
```
|
|
|
|
2. **Increase concurrent requests:**
|
|
```python
|
|
MAX_CONCURRENT_REQUESTS = 96 # 192/2
|
|
```
|
|
|
|
3. **Check I/O bottlenecks:**
|
|
- Monitor disk usage
|
|
- Check network bandwidth
|
|
- Verify API response times
|
|
|
|
### High Memory Usage (>90%)
|
|
1. **Reduce batch size:**
|
|
```python
|
|
BATCH_SIZE = 16 # or lower
|
|
```
|
|
|
|
2. **Enable streaming:**
|
|
```python
|
|
ds = load_dataset(..., streaming=True)
|
|
```
|
|
|
|
3. **Process in smaller chunks:**
|
|
```python
|
|
CHUNK_SIZE = 500 # reduce from 1000
|
|
```
|
|
|
|
### Network Bottlenecks
|
|
1. **Reduce concurrent requests:**
|
|
```python
|
|
MAX_CONCURRENT_REQUESTS = 24 # reduce from 48
|
|
```
|
|
|
|
2. **Increase timeout:**
|
|
```python
|
|
timeout=aiohttp.ClientTimeout(total=300)
|
|
```
|
|
|
|
3. **Use connection pooling:**
|
|
```python
|
|
connector=aiohttp.TCPConnector(limit=MAX_CONCURRENT_REQUESTS)
|
|
```
|
|
|
|
## 🔄 Advanced Optimizations
|
|
|
|
### 1. **Custom Process Pool Configuration**
|
|
```python
|
|
# For CPU-intensive tasks
|
|
with ProcessPoolExecutor(
|
|
max_workers=NUM_CORES,
|
|
mp_context=mp.get_context('spawn')
|
|
) as executor:
|
|
results = executor.map(process_function, data)
|
|
```
|
|
|
|
### 2. **Memory-Mapped Files**
|
|
```python
|
|
import mmap
|
|
|
|
def process_large_file(filename):
|
|
with open(filename, 'rb') as f:
|
|
with mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ) as mm:
|
|
# Process memory-mapped file
|
|
pass
|
|
```
|
|
|
|
### 3. **NUMA Optimization** (for multi-socket systems)
|
|
```bash
|
|
# Bind processes to specific NUMA nodes
|
|
numactl --cpunodebind=0 --membind=0 python script.py
|
|
```
|
|
|
|
### 4. **GPU Acceleration** (if available)
|
|
```python
|
|
# Use GPU for audio processing if available
|
|
import torch
|
|
|
|
if torch.cuda.is_available():
|
|
device = torch.device('cuda')
|
|
# Move audio processing to GPU
|
|
```
|
|
|
|
## 📊 Expected Performance
|
|
|
|
### Baseline Performance
|
|
- **192 cores**: 100% utilization target
|
|
- **Processing rate**: 1000-2000 samples/second
|
|
- **Memory usage**: 60-80% of available RAM
|
|
- **Network throughput**: 1-2 GB/s
|
|
|
|
### Optimization Targets
|
|
- **CPU Efficiency**: >95%
|
|
- **Memory Efficiency**: >85%
|
|
- **I/O Efficiency**: >90%
|
|
- **Network Efficiency**: >80%
|
|
|
|
## 🎯 Monitoring Commands
|
|
|
|
### System Resources
|
|
```bash
|
|
# CPU usage
|
|
htop -p $(pgrep -f "python.*batch_confirm")
|
|
|
|
# Memory usage
|
|
free -h
|
|
|
|
# Network I/O
|
|
iftop
|
|
|
|
# Disk I/O
|
|
iotop
|
|
```
|
|
|
|
### Process Monitoring
|
|
```bash
|
|
# Process tree
|
|
pstree -p $(pgrep -f "python.*batch_confirm")
|
|
|
|
# Resource usage per process
|
|
ps aux | grep python
|
|
```
|
|
|
|
## 🔧 System Requirements
|
|
|
|
### Minimum Requirements
|
|
- **CPU**: 192 cores (any architecture)
|
|
- **RAM**: 256 GB
|
|
- **Storage**: 1 TB SSD
|
|
- **Network**: 10 Gbps
|
|
|
|
### Recommended Requirements
|
|
- **CPU**: 192 cores (AMD EPYC or Intel Xeon)
|
|
- **RAM**: 512 GB
|
|
- **Storage**: 2 TB NVMe SSD
|
|
- **Network**: 25 Gbps
|
|
|
|
## 🚨 Important Notes
|
|
|
|
1. **Memory Management**: Monitor memory usage closely
|
|
2. **Network Limits**: Ensure sufficient bandwidth
|
|
3. **API Limits**: Check Vosk service capacity
|
|
4. **Storage I/O**: Use fast storage for temporary files
|
|
5. **Process Limits**: Increase system limits if needed
|
|
|
|
## 📞 Support
|
|
|
|
If you encounter issues:
|
|
1. Check the performance logs
|
|
2. Monitor system resources
|
|
3. Adjust configuration parameters
|
|
4. Review the troubleshooting section
|
|
|
|
For optimal performance, ensure your system meets the recommended requirements and follow the monitoring guidelines. |