# 192-Core Optimization Guide

This guide explains how to optimize your audio processing pipeline to utilize 192 CPU cores at 100% capacity.

## 🚀 Quick Start

1. **Install dependencies:**
   ```bash
   pip install -r requirements_optimized.txt
   ```

2. **Run the optimized pipeline:**
   ```bash
   ./run_optimized_192cores.sh
   ```

3. **Monitor performance:**
   ```bash
   python monitor_performance.py
   ```

## 📊 Key Optimizations Implemented

### 1. **Asynchronous Processing**
- **aiohttp** for concurrent HTTP requests
- **asyncio** for non-blocking I/O operations
- **ProcessPoolExecutor** for CPU-intensive tasks

### 2. **Parallel Processing Strategy**
```python
# Configuration for 192 cores
NUM_CORES = 192
BATCH_SIZE = 32  # Increased for better throughput
MAX_CONCURRENT_REQUESTS = 48  # 192/4 for optimal concurrency
```

### 3. **Memory-Efficient Processing**
- Streaming data processing
- Chunked batch processing
- Parallel file I/O operations

### 4. **System-Level Optimizations**
- CPU governor set to performance mode
- Increased file descriptor limits
- Process priority optimization
- Environment variables for thread optimization

## 🔧 Configuration Details

### Batch Processing
- **Batch Size**: 32 samples per batch
- **Concurrent Requests**: 48 simultaneous API calls
- **Process Pool Workers**: 192 parallel processes

### Memory Management
- **Chunk Size**: 1000 samples per chunk
- **Streaming**: True for large datasets
- **Parallel Sharding**: 50 shards for optimal I/O

### Network Optimization
- **Connection Pool**: 48 concurrent connections
- **Timeout**: 120 seconds per request
- **Retry Logic**: Built-in error handling

## 📈 Performance Monitoring

### Real-time Monitoring
```bash
python monitor_performance.py
```

### Metrics Tracked
- CPU utilization per core
- Memory usage
- Network I/O
- Disk I/O
- Load average

### Performance Targets
- **CPU Utilization**: >90% across all cores
- **Memory Usage**: <80% of available RAM
- **Processing Rate**: >1000 samples/second

## 🛠️ Troubleshooting

### Low CPU Utilization (<50%)
1. **Increase batch size:**
   ```python
   BATCH_SIZE = 64  # or higher
   ```

2. **Increase concurrent requests:**
   ```python
   MAX_CONCURRENT_REQUESTS = 96  # 192/2
   ```

3. **Check I/O bottlenecks:**
   - Monitor disk usage
   - Check network bandwidth
   - Verify API response times

### High Memory Usage (>90%)
1. **Reduce batch size:**
   ```python
   BATCH_SIZE = 16  # or lower
   ```

2. **Enable streaming:**
   ```python
   ds = load_dataset(..., streaming=True)
   ```

3. **Process in smaller chunks:**
   ```python
   CHUNK_SIZE = 500  # reduce from 1000
   ```

### Network Bottlenecks
1. **Reduce concurrent requests:**
   ```python
   MAX_CONCURRENT_REQUESTS = 24  # reduce from 48
   ```

2. **Increase timeout:**
   ```python
   timeout=aiohttp.ClientTimeout(total=300)
   ```

3. **Use connection pooling:**
   ```python
   connector=aiohttp.TCPConnector(limit=MAX_CONCURRENT_REQUESTS)
   ```

## 🔄 Advanced Optimizations

### 1. **Custom Process Pool Configuration**
```python
# For CPU-intensive tasks
with ProcessPoolExecutor(
    max_workers=NUM_CORES,
    mp_context=mp.get_context('spawn')
) as executor:
    results = executor.map(process_function, data)
```

### 2. **Memory-Mapped Files**
```python
import mmap

def process_large_file(filename):
    with open(filename, 'rb') as f:
        with mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ) as mm:
            # Process memory-mapped file
            pass
```

### 3. **NUMA Optimization** (for multi-socket systems)
```bash
# Bind processes to specific NUMA nodes
numactl --cpunodebind=0 --membind=0 python script.py
```

### 4. **GPU Acceleration** (if available)
```python
# Use GPU for audio processing if available
import torch

if torch.cuda.is_available():
    device = torch.device('cuda')
    # Move audio processing to GPU
```

## 📊 Expected Performance

### Baseline Performance
- **192 cores**: 100% utilization target
- **Processing rate**: 1000-2000 samples/second
- **Memory usage**: 60-80% of available RAM
- **Network throughput**: 1-2 GB/s

### Optimization Targets
- **CPU Efficiency**: >95%
- **Memory Efficiency**: >85%
- **I/O Efficiency**: >90%
- **Network Efficiency**: >80%

## 🎯 Monitoring Commands

### System Resources
```bash
# CPU usage
htop -p $(pgrep -f "python.*batch_confirm")

# Memory usage
free -h

# Network I/O
iftop

# Disk I/O
iotop
```

### Process Monitoring
```bash
# Process tree
pstree -p $(pgrep -f "python.*batch_confirm")

# Resource usage per process
ps aux | grep python
```

## 🔧 System Requirements

### Minimum Requirements
- **CPU**: 192 cores (any architecture)
- **RAM**: 256 GB
- **Storage**: 1 TB SSD
- **Network**: 10 Gbps

### Recommended Requirements
- **CPU**: 192 cores (AMD EPYC or Intel Xeon)
- **RAM**: 512 GB
- **Storage**: 2 TB NVMe SSD
- **Network**: 25 Gbps

## 🚨 Important Notes

1. **Memory Management**: Monitor memory usage closely
2. **Network Limits**: Ensure sufficient bandwidth
3. **API Limits**: Check Vosk service capacity
4. **Storage I/O**: Use fast storage for temporary files
5. **Process Limits**: Increase system limits if needed

## 📞 Support

If you encounter issues:
1. Check the performance logs
2. Monitor system resources
3. Adjust configuration parameters
4. Review the troubleshooting section

For optimal performance, ensure your system meets the recommended requirements and follow the monitoring guidelines.