# Western Blot Band Detection - Synthetic Dataset Generator

Generate synthetic Western blot images with annotations for training object detection models using **PyTorch** and **HuggingFace**.

## Features

- **Row-aligned bands**: Bands appear in consistent horizontal rows across lanes
- **Row tilt**: Each row can tilt slightly upward or downward
- **Ladder position**: Marker can be first (left) or last (right) lane
- **One annotation per blob**: Overlapping annotations are merged
- **Irregular loading**: Variable band intensity across lanes
- **Faint bands**: Thin/faint bands NOT annotated (model learns to ignore)
- **Dual format**: YOLO (.txt) and COCO (.json) annotations

## Installation

```bash
pip install torch torchvision
pip install transformers      # For DETR
pip install opencv-python numpy matplotlib pyyaml
pip install pycocotools       # For COCO evaluation
```

## Quick Start

### 1. Preview samples

```bash
python visualize_samples.py --num 6 --output preview.png
```

### 2. Generate dataset

```bash
# Small dataset for testing
python generate_dataset.py --output ./dataset --train 100 --val 20 --test 20

# Full dataset for training
python generate_dataset.py --output ./dataset --train 1000 --val 200 --test 200
```

### 3. Train model

**Option A: Faster R-CNN (torchvision)**
```bash
python train.py --data ./dataset --model fasterrcnn --epochs 50
```

**Option B: RetinaNet (torchvision)**
```bash
python train.py --data ./dataset --model retinanet --epochs 50
```

**Option C: DETR (HuggingFace Transformers)**
```bash
python train.py --data ./dataset --model detr --epochs 50
```

### 4. Run inference

```bash
# With Faster R-CNN
python predict.py --model ./output/best_model.pt --image your_blot.png --type fasterrcnn

# With DETR
python predict.py --model ./output/best_model --image your_blot.png --type detr
```

## Dataset Structure

```
dataset/
├── data.yaml               # YOLO config
├── train/
│   ├── images/             # Training images (.png)
│   ├── labels/             # YOLO annotations (.txt)
│   └── annotations.json    # COCO format (for PyTorch)
├── val/
│   ├── images/
│   ├── labels/
│   └── annotations.json
└── test/
    ├── images/
    ├── labels/
    └── annotations.json
```

## Annotation Formats

### YOLO format (labels/*.txt)
```
class_id x_center y_center width height
0 0.332500 0.512000 0.177000 0.075000
```

### COCO format (annotations.json)
```json
{
  "images": [{"id": 0, "file_name": "...", "width": 800, "height": 500}],
  "annotations": [{"id": 0, "image_id": 0, "bbox": [x, y, w, h], "category_id": 0}],
  "categories": [{"id": 0, "name": "band"}]
}
```

## Model Comparison

| Model | Framework | Speed | Accuracy | Best For |
|-------|-----------|-------|----------|----------|
| Faster R-CNN | torchvision | Medium | High | General use |
| RetinaNet | torchvision | Fast | Good | Speed-focused |
| DETR | HuggingFace | Slow | Highest | Best accuracy |

## Customization

### Generator parameters

```python
from western_blot_generator import generate_western_blot_with_annotations

img, annotations = generate_western_blot_with_annotations(
    width=800,
    height=500,
    num_lanes=4,
    include_ladder=True,
    ladder_position='auto',         # 'first', 'last', or 'auto'
    num_protein_rows=(2, 6),        # Min/max protein rows
    bands_per_row_probability=0.7,  # Chance each lane has band
    row_tilt_range=(-15, 15),       # Row tilt in pixels
    skew_range=(-3, 3),             # Global skew (degrees)
    irregular_loading=True,
    add_faint_bands=True,
    min_annotate_intensity=0.4,     # Min intensity to annotate
    min_annotate_height=12,         # Min height to annotate
)
```

## Files

| File | Description |
|------|-------------|
| `western_blot_generator.py` | Core generator with merged annotations |
| `generate_dataset.py` | Generate YOLO + COCO format dataset |
| `train.py` | PyTorch/HuggingFace training script |
| `predict.py` | Inference script |
| `visualize_samples.py` | Preview generated images |

## Training Tips

1. **Start small**: 100-200 images to verify pipeline works
2. **GPU recommended**: Training is much faster with CUDA
3. **Batch size**: Reduce if you run out of memory (try 2 or 1)
4. **Epochs**: 50-100 usually sufficient for synthetic data
5. **Fine-tuning**: Consider fine-tuning on real Western blots after

## Example Usage in Code

```python
import torch
from PIL import Image
from torchvision import transforms as T

# Load trained model
from torchvision.models.detection import fasterrcnn_resnet50_fpn
from torchvision.models.detection.faster_rcnn import FastRCNNPredictor

model = fasterrcnn_resnet50_fpn(pretrained=False)
in_features = model.roi_heads.box_predictor.cls_score.in_features
model.roi_heads.box_predictor = FastRCNNPredictor(in_features, 2)
model.load_state_dict(torch.load('output/best_model.pt'))
model.eval()

# Run inference
image = Image.open('your_blot.png').convert('RGB')
image_tensor = T.ToTensor()(image).unsqueeze(0)

with torch.no_grad():
    predictions = model(image_tensor)[0]

# Filter predictions
conf_threshold = 0.5
mask = predictions['scores'] > conf_threshold
boxes = predictions['boxes'][mask]
scores = predictions['scores'][mask]

print(f"Detected {len(boxes)} bands")
```