Advanced Computer Vision with TensorFlow - Building Real-World Image Recognition Systems#

Computer vision has revolutionized industries from healthcare to autonomous vehicles. With TensorFlow, building sophisticated image recognition systems has become more accessible than ever. This comprehensive guide will take you beyond basic image classification to advanced computer vision techniques used in production systems.

What We’ll Build#

In this tutorial, you’ll learn to create:

Custom Image Classifiers using transfer learning
Object Detection Systems for real-time applications
Image Segmentation Models for pixel-level analysis
Style Transfer Networks for artistic applications
Production-Ready CV Pipelines with optimization and deployment

Prerequisites#

Before diving in, ensure you have:

TensorFlow 2.x installed (see our getting started guide)
Basic CNN knowledge from the previous tutorial
8GB+ RAM (16GB recommended for large models)
GPU support (highly recommended for training)

Advanced CNN Architectures#

1. Understanding Modern CNN Architectures#

Let’s explore the evolution of CNN architectures:

1
import tensorflow as tf
2
from tensorflow import keras
3
from tensorflow.keras import layers
4
import matplotlib.pyplot as plt
5
import numpy as np
6

7
# ResNet Block - Solving the vanishing gradient problem
8
def residual_block(x, filters, kernel_size=3, stride=1, conv_shortcut=True):
9
    """A residual block with skip connections"""
10

11
    if conv_shortcut:
12
        shortcut = layers.Conv2D(4 * filters, 1, strides=stride)(x)
13
        shortcut = layers.BatchNormalization()(shortcut)
14
    else:
15
        shortcut = x
16

17
    x = layers.Conv2D(filters, 1, strides=stride)(x)
18
    x = layers.BatchNormalization()(x)
19
    x = layers.Activation('relu')(x)
20

21
    x = layers.Conv2D(filters, kernel_size, padding='SAME')(x)
22
    x = layers.BatchNormalization()(x)
23
    x = layers.Activation('relu')(x)
24

25
    x = layers.Conv2D(4 * filters, 1)(x)
26
    x = layers.BatchNormalization()(x)
27

28
    x = layers.Add()([shortcut, x])
29
    x = layers.Activation('relu')(x)
30
    return x
31

32
# Attention Mechanism for CNN
33
def attention_block(x, filters):
34
    """Squeeze-and-Excitation attention block"""
35

36
    # Global average pooling
37
    gap = layers.GlobalAveragePooling2D()(x)
38

39
    # Fully connected layers
40
    fc1 = layers.Dense(filters // 16, activation='relu')(gap)
41
    fc2 = layers.Dense(filters, activation='sigmoid')(fc1)
42

43
    # Reshape and multiply
44
    attention = layers.Reshape((1, 1, filters))(fc2)
45
    x = layers.Multiply()([x, attention])
46

47
    return x
48

49
# Modern CNN with attention
50
def create_advanced_cnn(input_shape, num_classes):
51
    inputs = keras.Input(shape=input_shape)
52

53
    # Initial convolution
54
    x = layers.Conv2D(64, 7, strides=2, padding='SAME')(inputs)
55
    x = layers.BatchNormalization()(x)
56
    x = layers.Activation('relu')(x)
57
    x = layers.MaxPooling2D(3, strides=2, padding='SAME')(x)
58

59
    # Residual blocks with increasing filters
60
    for filters in [64, 128, 256, 512]:
61
        x = residual_block(x, filters)
62
        x = attention_block(x, filters * 4)
63

64
    # Classification head
65
    x = layers.GlobalAveragePooling2D()(x)
66
    x = layers.Dropout(0.5)(x)
67
    outputs = layers.Dense(num_classes, activation='softmax')(x)
68

69
    return keras.Model(inputs, outputs)
70

71
# Example usage
72
model = create_advanced_cnn((224, 224, 3), 1000)
73
print(f"Model parameters: {model.count_params():,}")

2. EfficientNet - Optimized Architecture#

1
# Using pre-trained EfficientNet
2
def create_efficientnet_model(num_classes, input_shape=(224, 224, 3)):
3
    base_model = keras.applications.EfficientNetB0(
4
        weights='imagenet',
5
        include_top=False,
6
        input_shape=input_shape
7
    )
8

9
    # Freeze base model initially
10
    base_model.trainable = False
11

12
    model = keras.Sequential([
13
        base_model,
14
        layers.GlobalAveragePooling2D(),
15
        layers.BatchNormalization(),
16
        layers.Dropout(0.2),
17
        layers.Dense(num_classes, activation='softmax')
18
    ])
19

20
    return model, base_model
21

22
# Fine-tuning strategy
23
def setup_fine_tuning(model, base_model, learning_rate=1e-5):
24
    # Unfreeze the top layers of the base model
25
    base_model.trainable = True
26

27
    # Fine-tune from this layer onwards
28
    fine_tune_at = len(base_model.layers) - 20
29

30
    # Freeze all the layers before the `fine_tune_at` layer
31
    for layer in base_model.layers[:fine_tune_at]:
32
        layer.trainable = False
33

34
    # Use a lower learning rate for fine-tuning
35
    model.compile(
36
        optimizer=keras.optimizers.Adam(learning_rate/10),
37
        loss='categorical_crossentropy',
38
        metrics=['accuracy']
39
    )
40

41
    return model

Transfer Learning Mastery#

1. Custom Dataset Preparation#

1
import os
2
from pathlib import Path
3
import PIL.Image
4

5
def create_dataset_from_directory(data_dir, image_size=(224, 224), batch_size=32):
6
    """Create a tf.data dataset from directory structure"""
7

8
    # Create dataset from directory
9
    dataset = keras.utils.image_dataset_from_directory(
10
        data_dir,
11
        validation_split=0.2,
12
        subset="training",
13
        seed=123,
14
        image_size=image_size,
15
        batch_size=batch_size
16
    )
17

18
    val_dataset = keras.utils.image_dataset_from_directory(
19
        data_dir,
20
        validation_split=0.2,
21
        subset="validation",
22
        seed=123,
23
        image_size=image_size,
24
        batch_size=batch_size
25
    )
26

27
    return dataset, val_dataset
28

29
# Advanced data augmentation pipeline
30
def create_augmentation_pipeline():
31
    """Create sophisticated data augmentation"""
32

33
    data_augmentation = keras.Sequential([
34
        layers.RandomFlip("horizontal"),
35
        layers.RandomRotation(0.1),
36
        layers.RandomZoom(0.1),
37
        layers.RandomContrast(0.1),
38
        layers.RandomBrightness(0.1),
39
        # Custom augmentation
40
        layers.Lambda(lambda x: tf.image.random_hue(x, 0.02)),
41
        layers.Lambda(lambda x: tf.image.random_saturation(x, 0.7, 1.3)),
42
    ])
43

44
    return data_augmentation
45

46
# Preprocessing pipeline
47
def preprocess_dataset(dataset, augment=True):
48
    """Optimize dataset for training"""
49

50
    # Normalization (for pre-trained models)
51
    normalization_layer = keras.utils.get_file(
52
        'imagenet_mean.txt',
53
        'https://github.com/tensorflow/models/raw/master/research/deeplab/datasets/remove_gt_colormap.py'
54
    )
55

56
    AUTOTUNE = tf.data.AUTOTUNE
57

58
    if augment:
59
        augmentation = create_augmentation_pipeline()
60
        dataset = dataset.map(
61
            lambda x, y: (augmentation(x, training=True), y),
62
            num_parallel_calls=AUTOTUNE
63
        )
64

65
    # Normalize pixel values to [0,1]
66
    dataset = dataset.map(
67
        lambda x, y: (tf.cast(x, tf.float32) / 255.0, y),
68
        num_parallel_calls=AUTOTUNE
69
    )
70

71
    # Optimize performance
72
    dataset = dataset.cache()
73
    dataset = dataset.shuffle(1000)
74
    dataset = dataset.prefetch(AUTOTUNE)
75

76
    return dataset
77

78
# Complete transfer learning pipeline
79
def train_custom_classifier(data_dir, num_classes, epochs=20):
80
    # Load and preprocess data
81
    train_ds, val_ds = create_dataset_from_directory(data_dir)
82
    train_ds = preprocess_dataset(train_ds, augment=True)
83
    val_ds = preprocess_dataset(val_ds, augment=False)
84

85
    # Create model
86
    model, base_model = create_efficientnet_model(num_classes)
87

88
    # Initial training (frozen base)
89
    model.compile(
90
        optimizer='adam',
91
        loss='sparse_categorical_crossentropy',
92
        metrics=['accuracy']
93
    )
94

95
    # Callbacks
96
    callbacks = [
97
        keras.callbacks.EarlyStopping(patience=5, restore_best_weights=True),
98
        keras.callbacks.ReduceLROnPlateau(factor=0.2, patience=3),
99
        keras.callbacks.ModelCheckpoint('best_model.h5', save_best_only=True)
100
    ]
101

102
    # Train frozen model
103
    print("Training with frozen base model...")
104
    history1 = model.fit(
105
        train_ds,
106
        epochs=epochs//2,
107
        validation_data=val_ds,
108
        callbacks=callbacks
109
    )
110

111
    # Fine-tuning
112
    print("Fine-tuning model...")
113
    model = setup_fine_tuning(model, base_model)
114

115
    history2 = model.fit(
116
        train_ds,
117
        epochs=epochs//2,
118
        validation_data=val_ds,
119
        callbacks=callbacks
120
    )
121

122
    return model, history1, history2

2. Advanced Transfer Learning Techniques#

1
# Multi-scale feature extraction
2
def create_multiscale_model(base_model, num_classes):
3
    """Extract features from multiple layers"""
4

5
    # Get intermediate layer outputs
6
    layer_names = [
7
        'block4a_expand_activation',  # 28x28
8
        'block6a_expand_activation',  # 14x14
9
        'top_activation'              # 7x7
10
    ]
11

12
    layers_outputs = [base_model.get_layer(name).output for name in layer_names]
13

14
    # Create feature extraction model
15
    feature_extractor = keras.Model(
16
        inputs=base_model.input,
17
        outputs=layers_outputs
18
    )
19

20
    # Multi-scale processing
21
    inputs = keras.Input(shape=(224, 224, 3))
22
    features = feature_extractor(inputs)
23

24
    # Process each scale
25
    processed_features = []
26
    for i, feature in enumerate(features):
27
        # Global average pooling for each scale
28
        gap = layers.GlobalAveragePooling2D()(feature)
29
        dense = layers.Dense(256, activation='relu')(gap)
30
        processed_features.append(dense)
31

32
    # Concatenate multi-scale features
33
    combined = layers.Concatenate()(processed_features)
34
    combined = layers.Dropout(0.5)(combined)
35
    outputs = layers.Dense(num_classes, activation='softmax')(combined)
36

37
    return keras.Model(inputs, outputs)
38

39
# Domain adaptation techniques
40
def create_domain_adaptive_model(source_model, target_classes):
41
    """Adapt model from source domain to target domain"""
42

43
    # Remove the last classification layer
44
    base_features = source_model.layers[-2].output
45

46
    # Add domain classifier (for adversarial training)
47
    domain_classifier = layers.Dense(1, activation='sigmoid', name='domain')(base_features)
48

49
    # Add new task classifier
50
    task_classifier = layers.Dense(target_classes, activation='softmax', name='task')(base_features)
51

52
    # Create multi-output model
53
    adapted_model = keras.Model(
54
        inputs=source_model.input,
55
        outputs=[task_classifier, domain_classifier]
56
    )
57

58
    return adapted_model

Object Detection with TensorFlow#

1. YOLO-style Object Detection#

1
# Custom YOLO implementation
2
def create_yolo_model(input_shape, num_classes, num_anchors=3):
3
    """Simplified YOLO architecture"""
4

5
    inputs = keras.Input(shape=input_shape)
6

7
    # Backbone (feature extractor)
8
    x = layers.Conv2D(32, 3, padding='same', activation='relu')(inputs)
9
    x = layers.MaxPooling2D()(x)
10

11
    x = layers.Conv2D(64, 3, padding='same', activation='relu')(x)
12
    x = layers.MaxPooling2D()(x)
13

14
    x = layers.Conv2D(128, 3, padding='same', activation='relu')(x)
15
    x = layers.MaxPooling2D()(x)
16

17
    x = layers.Conv2D(256, 3, padding='same', activation='relu')(x)
18
    x = layers.MaxPooling2D()(x)
19

20
    x = layers.Conv2D(512, 3, padding='same', activation='relu')(x)
21

22
    # Detection head
23
    # Output: (batch, grid_h, grid_w, anchors * (5 + num_classes))
24
    # 5 = x, y, w, h, confidence
25
    outputs = layers.Conv2D(
26
        num_anchors * (5 + num_classes),
27
        1,
28
        activation='linear'
29
    )(x)
30

31
    return keras.Model(inputs, outputs)
32

33
# YOLO loss function
34
def yolo_loss(y_true, y_pred, num_classes=80, num_anchors=3):
35
    """YOLO loss function"""
36

37
    # Reshape predictions
38
    grid_h, grid_w = tf.shape(y_pred)[1], tf.shape(y_pred)[2]
39
    y_pred = tf.reshape(y_pred, (-1, grid_h, grid_w, num_anchors, 5 + num_classes))
40

41
    # Split predictions
42
    pred_xy = tf.sigmoid(y_pred[..., :2])  # Center coordinates
43
    pred_wh = y_pred[..., 2:4]             # Width and height
44
    pred_conf = tf.sigmoid(y_pred[..., 4]) # Confidence
45
    pred_class = y_pred[..., 5:]           # Class probabilities
46

47
    # Split ground truth
48
    true_xy = y_true[..., :2]
49
    true_wh = y_true[..., 2:4]
50
    true_conf = y_true[..., 4]
51
    true_class = y_true[..., 5:]
52

53
    # Calculate losses
54
    xy_loss = tf.reduce_sum(tf.square(true_xy - pred_xy)) * true_conf
55
    wh_loss = tf.reduce_sum(tf.square(tf.sqrt(true_wh) - tf.sqrt(pred_wh))) * true_conf
56
    conf_loss = tf.reduce_sum(tf.square(true_conf - pred_conf))
57
    class_loss = tf.reduce_sum(tf.square(true_class - pred_class)) * true_conf
58

59
    total_loss = xy_loss + wh_loss + conf_loss + class_loss
60
    return total_loss
61

62
# Non-Maximum Suppression
63
def non_max_suppression(boxes, scores, max_outputs=50, iou_threshold=0.5):
64
    """Apply NMS to filter overlapping boxes"""
65

66
    selected_indices = tf.image.non_max_suppression(
67
        boxes, scores, max_outputs, iou_threshold
68
    )
69

70
    selected_boxes = tf.gather(boxes, selected_indices)
71
    selected_scores = tf.gather(scores, selected_indices)
72

73
    return selected_boxes, selected_scores, selected_indices

2. Using TensorFlow Object Detection API#

1
# Install TensorFlow Object Detection API
2
# !pip install tensorflow-object-detection-api
3

4
import tensorflow_hub as hub
5

6
def load_detector(model_url):
7
    """Load pre-trained object detection model"""
8
    detector = hub.load(model_url)
9
    return detector
10

11
def detect_objects(detector, image_path, min_score=0.3):
12
    """Detect objects in an image"""
13

14
    # Load and preprocess image
15
    image = tf.io.read_file(image_path)
16
    image = tf.image.decode_image(image, channels=3)
17
    image = tf.image.convert_image_dtype(image, tf.float32)
18
    image = image[tf.newaxis, ...]
19

20
    # Run detection
21
    results = detector(image)
22

23
    # Filter by confidence score
24
    scores = results['detection_scores'][0].numpy()
25
    boxes = results['detection_boxes'][0].numpy()
26
    classes = results['detection_class_entities'][0].numpy()
27

28
    # Filter detections
29
    valid_detections = scores >= min_score
30

31
    return {
32
        'boxes': boxes[valid_detections],
33
        'scores': scores[valid_detections],
34
        'classes': classes[valid_detections]
35
    }
36

37
# Real-time object detection
38
def real_time_detection(detector, camera_index=0):
39
    """Real-time object detection from webcam"""
40

41
    import cv2
42

43
    cap = cv2.VideoCapture(camera_index)
44

45
    while True:
46
        ret, frame = cap.read()
47
        if not ret:
48
            break
49

50
        # Convert BGR to RGB
51
        rgb_frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
52

53
        # Prepare for detection
54
        input_tensor = tf.convert_to_tensor(rgb_frame)
55
        input_tensor = input_tensor[tf.newaxis, ...]
56
        input_tensor = tf.cast(input_tensor, tf.float32) / 255.0
57

58
        # Detect objects
59
        detections = detector(input_tensor)
60

61
        # Draw bounding boxes
62
        frame = draw_detections(frame, detections)
63

64
        cv2.imshow('Object Detection', frame)
65

66
        if cv2.waitKey(1) & 0xFF == ord('q'):
67
            break
68

69
    cap.release()
70
    cv2.destroyAllWindows()
71

72
def draw_detections(image, detections, min_score=0.3):
73
    """Draw bounding boxes and labels on image"""
74

75
    import cv2
76

77
    h, w, _ = image.shape
78
    scores = detections['detection_scores'][0].numpy()
79
    boxes = detections['detection_boxes'][0].numpy()
80
    classes = detections['detection_class_entities'][0].numpy()
81

82
    for i in range(len(scores)):
83
        if scores[i] >= min_score:
84
            # Convert normalized coordinates to pixel coordinates
85
            y1, x1, y2, x2 = boxes[i]
86
            x1, y1, x2, y2 = int(x1*w), int(y1*h), int(x2*w), int(y2*h)
87

88
            # Draw bounding box
89
            cv2.rectangle(image, (x1, y1), (x2, y2), (0, 255, 0), 2)
90

91
            # Draw label
92
            label = f"{classes[i].decode('utf-8')}: {scores[i]:.2f}"
93
            cv2.putText(image, label, (x1, y1-10),
94
                       cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)
95

96
    return image

Image Segmentation#

1. U-Net for Semantic Segmentation#

1
def conv_block(x, filters, kernel_size=3):
2
    """Convolutional block for U-Net"""
3
    x = layers.Conv2D(filters, kernel_size, padding='same')(x)
4
    x = layers.BatchNormalization()(x)
5
    x = layers.Activation('relu')(x)
6

7
    x = layers.Conv2D(filters, kernel_size, padding='same')(x)
8
    x = layers.BatchNormalization()(x)
9
    x = layers.Activation('relu')(x)
10

11
    return x
12

13
def create_unet(input_shape, num_classes):
14
    """U-Net architecture for segmentation"""
15

16
    inputs = keras.Input(shape=input_shape)
17

18
    # Encoder (downsampling)
19
    conv1 = conv_block(inputs, 64)
20
    pool1 = layers.MaxPooling2D()(conv1)
21

22
    conv2 = conv_block(pool1, 128)
23
    pool2 = layers.MaxPooling2D()(conv2)
24

25
    conv3 = conv_block(pool2, 256)
26
    pool3 = layers.MaxPooling2D()(conv3)
27

28
    conv4 = conv_block(pool3, 512)
29
    pool4 = layers.MaxPooling2D()(conv4)
30

31
    # Bottleneck
32
    conv5 = conv_block(pool4, 1024)
33

34
    # Decoder (upsampling)
35
    up6 = layers.Conv2DTranspose(512, 2, strides=2, padding='same')(conv5)
36
    up6 = layers.Concatenate()([up6, conv4])
37
    conv6 = conv_block(up6, 512)
38

39
    up7 = layers.Conv2DTranspose(256, 2, strides=2, padding='same')(conv6)
40
    up7 = layers.Concatenate()([up7, conv3])
41
    conv7 = conv_block(up7, 256)
42

43
    up8 = layers.Conv2DTranspose(128, 2, strides=2, padding='same')(conv7)
44
    up8 = layers.Concatenate()([up8, conv2])
45
    conv8 = conv_block(up8, 128)
46

47
    up9 = layers.Conv2DTranspose(64, 2, strides=2, padding='same')(conv8)
48
    up9 = layers.Concatenate()([up9, conv1])
49
    conv9 = conv_block(up9, 64)
50

51
    # Output layer
52
    outputs = layers.Conv2D(num_classes, 1, activation='softmax')(conv9)
53

54
    return keras.Model(inputs, outputs)
55

56
# Dice loss for segmentation
57
def dice_loss(y_true, y_pred, smooth=1e-6):
58
    """Dice loss function for segmentation"""
59

60
    y_true_f = tf.keras.backend.flatten(y_true)
61
    y_pred_f = tf.keras.backend.flatten(y_pred)
62

63
    intersection = tf.keras.backend.sum(y_true_f * y_pred_f)
64
    dice = (2. * intersection + smooth) / (
65
        tf.keras.backend.sum(y_true_f) + tf.keras.backend.sum(y_pred_f) + smooth
66
    )
67

68
    return 1 - dice
69

70
# IoU metric for segmentation
71
def iou_metric(y_true, y_pred, num_classes):
72
    """Intersection over Union metric"""
73

74
    ious = []
75
    for cls in range(num_classes):
76
        y_true_cls = tf.equal(y_true, cls)
77
        y_pred_cls = tf.equal(tf.argmax(y_pred, axis=-1), cls)
78

79
        intersection = tf.reduce_sum(tf.cast(y_true_cls & y_pred_cls, tf.float32))
80
        union = tf.reduce_sum(tf.cast(y_true_cls | y_pred_cls, tf.float32))
81

82
        iou = intersection / (union + 1e-10)
83
        ious.append(iou)
84

85
    return tf.reduce_mean(ious)

2. Instance Segmentation with Mask R-CNN#

1
# Using TensorFlow Hub for Mask R-CNN
2
def load_mask_rcnn():
3
    """Load pre-trained Mask R-CNN model"""
4

5
    model_url = "https://tfhub.dev/tensorflow/mask_rcnn/inception_resnet_v2_1024x1024/1"
6
    model = hub.load(model_url)
7

8
    return model
9

10
def instance_segmentation(model, image_path):
11
    """Perform instance segmentation"""
12

13
    # Load and preprocess image
14
    image = tf.io.read_file(image_path)
15
    image = tf.image.decode_image(image, channels=3)
16
    image = tf.image.convert_image_dtype(image, tf.float32)
17
    image = tf.expand_dims(image, 0)
18

19
    # Run inference
20
    results = model(image)
21

22
    return {
23
        'detection_boxes': results['detection_boxes'][0].numpy(),
24
        'detection_classes': results['detection_classes'][0].numpy().astype(int),
25
        'detection_scores': results['detection_scores'][0].numpy(),
26
        'detection_masks': results['detection_masks'][0].numpy()
27
    }
28

29
def visualize_instance_segmentation(image, results, min_score=0.3):
30
    """Visualize instance segmentation results"""
31

32
    import matplotlib.pyplot as plt
33
    from matplotlib.patches import Rectangle
34
    import matplotlib.patches as patches
35

36
    fig, ax = plt.subplots(1, figsize=(12, 8))
37
    ax.imshow(image)
38

39
    boxes = results['detection_boxes']
40
    classes = results['detection_classes']
41
    scores = results['detection_scores']
42
    masks = results['detection_masks']
43

44
    colors = plt.cm.Set3(np.linspace(0, 1, len(boxes)))
45

46
    for i, (box, cls, score, mask) in enumerate(zip(boxes, classes, scores, masks)):
47
        if score >= min_score:
48
            # Draw bounding box
49
            y1, x1, y2, x2 = box
50
            h, w = image.shape[:2]
51
            x1, y1, x2, y2 = x1*w, y1*h, x2*w, y2*h
52

53
            rect = Rectangle((x1, y1), x2-x1, y2-y1,
54
                           linewidth=2, edgecolor=colors[i], facecolor='none')
55
            ax.add_patch(rect)
56

57
            # Draw mask
58
            mask_resized = tf.image.resize(mask[..., None], [h, w])
59
            mask_resized = tf.squeeze(mask_resized) > 0.5
60

61
            colored_mask = np.zeros((h, w, 4))
62
            colored_mask[..., :3] = colors[i][:3]
63
            colored_mask[..., 3] = mask_resized * 0.5
64

65
            ax.imshow(colored_mask)
66

67
            # Add label
68
            ax.text(x1, y1-10, f'Class {cls}: {score:.2f}',
69
                   bbox=dict(facecolor=colors[i], alpha=0.8))
70

71
    ax.axis('off')
72
    plt.title('Instance Segmentation Results')
73
    plt.show()

Style Transfer and GANs#

1. Neural Style Transfer#

1
def load_style_transfer_models():
2
    """Load pre-trained style transfer models"""
3

4
    # VGG19 for feature extraction
5
    vgg = keras.applications.VGG19(include_top=False, weights='imagenet')
6
    vgg.trainable = False
7

8
    # Layers for content and style representation
9
    content_layers = ['block5_conv2']
10
    style_layers = ['block1_conv1', 'block2_conv1', 'block3_conv1',
11
                   'block4_conv1', 'block5_conv1']
12

13
    return vgg, content_layers, style_layers
14

15
def extract_features(model, content_layers, style_layers):
16
    """Extract features for style transfer"""
17

18
    outputs = [model.get_layer(name).output for name in style_layers + content_layers]
19
    model = keras.Model([model.input], outputs)
20

21
    return model
22

23
def gram_matrix(input_tensor):
24
    """Calculate Gram matrix for style representation"""
25

26
    result = tf.linalg.einsum('bijc,bijd->bcd', input_tensor, input_tensor)
27
    input_shape = tf.shape(input_tensor)
28
    num_locations = tf.cast(input_shape[1]*input_shape[2], tf.float32)
29

30
    return result / num_locations
31

32
def style_content_loss(outputs, style_targets, content_targets,
33
                      style_weight=1e-2, content_weight=1e4):
34
    """Calculate style and content loss"""
35

36
    style_outputs = outputs['style']
37
    content_outputs = outputs['content']
38

39
    # Style loss
40
    style_loss = tf.add_n([tf.reduce_mean((style_outputs[name]-style_targets[name])**2)
41
                          for name in style_outputs.keys()])
42
    style_loss *= style_weight / len(style_outputs)
43

44
    # Content loss
45
    content_loss = tf.add_n([tf.reduce_mean((content_outputs[name]-content_targets[name])**2)
46
                            for name in content_outputs.keys()])
47
    content_loss *= content_weight / len(content_outputs)
48

49
    total_loss = style_loss + content_loss
50
    return total_loss
51

52
@tf.function
53
def train_step(image, extractor, style_targets, content_targets, optimizer):
54
    """Single training step for style transfer"""
55

56
    with tf.GradientTape() as tape:
57
        outputs = extractor(image)
58
        loss = style_content_loss(outputs, style_targets, content_targets)
59

60
    grad = tape.gradient(loss, image)
61
    optimizer.apply_gradients([(grad, image)])
62
    image.assign(tf.clip_by_value(image, clip_value_min=0.0, clip_value_max=1.0))
63

64
    return loss
65

66
def neural_style_transfer(content_path, style_path, epochs=100):
67
    """Perform neural style transfer"""
68

69
    # Load and preprocess images
70
    content_image = load_img(content_path)
71
    style_image = load_img(style_path)
72

73
    # Initialize the optimization variable
74
    image = tf.Variable(content_image)
75

76
    # Set up the feature extraction model
77
    vgg, content_layers, style_layers = load_style_transfer_models()
78
    extractor = extract_features(vgg, content_layers, style_layers)
79

80
    # Extract target features
81
    style_targets = extractor(style_image)['style']
82
    content_targets = extractor(content_image)['content']
83

84
    # Optimization
85
    optimizer = tf.optimizers.Adam(learning_rate=0.02, beta_1=0.99, epsilon=1e-1)
86

87
    for epoch in range(epochs):
88
        loss = train_step(image, extractor, style_targets, content_targets, optimizer)
89

90
        if epoch % 10 == 0:
91
            print(f"Epoch {epoch}, Loss: {loss}")
92

93
    return image

2. Generative Adversarial Networks (GANs)#

1
def create_generator(latent_dim, img_shape):
2
    """Create generator network for GAN"""
3

4
    model = keras.Sequential([
5
        layers.Dense(128 * 7 * 7, input_dim=latent_dim),
6
        layers.Reshape((7, 7, 128)),
7
        layers.BatchNormalization(),
8
        layers.LeakyReLU(alpha=0.01),
9

10
        layers.Conv2DTranspose(128, 4, strides=2, padding='same'),
11
        layers.BatchNormalization(),
12
        layers.LeakyReLU(alpha=0.01),
13

14
        layers.Conv2DTranspose(128, 4, strides=2, padding='same'),
15
        layers.BatchNormalization(),
16
        layers.LeakyReLU(alpha=0.01),
17

18
        layers.Conv2D(1, 7, activation='tanh', padding='same')
19
    ])
20

21
    return model
22

23
def create_discriminator(img_shape):
24
    """Create discriminator network for GAN"""
25

26
    model = keras.Sequential([
27
        layers.Conv2D(64, 3, strides=2, padding='same', input_shape=img_shape),
28
        layers.LeakyReLU(alpha=0.01),
29
        layers.Dropout(0.25),
30

31
        layers.Conv2D(128, 3, strides=2, padding='same'),
32
        layers.BatchNormalization(),
33
        layers.LeakyReLU(alpha=0.01),
34
        layers.Dropout(0.25),
35

36
        layers.Conv2D(256, 3, strides=2, padding='same'),
37
        layers.BatchNormalization(),
38
        layers.LeakyReLU(alpha=0.01),
39
        layers.Dropout(0.25),
40

41
        layers.Flatten(),
42
        layers.Dense(1, activation='sigmoid')
43
    ])
44

45
    return model
46

47
class GAN(keras.Model):
48
    """Complete GAN implementation"""
49

50
    def __init__(self, discriminator, generator, latent_dim):
51
        super(GAN, self).__init__()
52
        self.discriminator = discriminator
53
        self.generator = generator
54
        self.latent_dim = latent_dim
55

56
    def compile(self, d_optimizer, g_optimizer, loss_fn):
57
        super(GAN, self).compile()
58
        self.d_optimizer = d_optimizer
59
        self.g_optimizer = g_optimizer
60
        self.loss_fn = loss_fn
61
        self.d_loss_metric = keras.metrics.Mean(name="d_loss")
62
        self.g_loss_metric = keras.metrics.Mean(name="g_loss")
63

64
    @property
65
    def metrics(self):
66
        return [self.d_loss_metric, self.g_loss_metric]
67

68
    def train_step(self, real_images):
69
        batch_size = tf.shape(real_images)[0]
70

71
        # Generate fake images
72
        random_latent_vectors = tf.random.normal(shape=(batch_size, self.latent_dim))
73
        generated_images = self.generator(random_latent_vectors)
74

75
        # Combine real and fake images
76
        combined_images = tf.concat([generated_images, real_images], axis=0)
77
        labels = tf.concat([tf.ones((batch_size, 1)), tf.zeros((batch_size, 1))], axis=0)
78

79
        # Add noise to labels for better training
80
        labels += 0.05 * tf.random.uniform(tf.shape(labels))
81

82
        # Train discriminator
83
        with tf.GradientTape() as tape:
84
            predictions = self.discriminator(combined_images)
85
            d_loss = self.loss_fn(labels, predictions)
86

87
        grads = tape.gradient(d_loss, self.discriminator.trainable_weights)
88
        self.d_optimizer.apply_gradients(zip(grads, self.discriminator.trainable_weights))
89

90
        # Train generator
91
        random_latent_vectors = tf.random.normal(shape=(batch_size, self.latent_dim))
92
        misleading_labels = tf.zeros((batch_size, 1))
93

94
        with tf.GradientTape() as tape:
95
            predictions = self.discriminator(self.generator(random_latent_vectors))
96
            g_loss = self.loss_fn(misleading_labels, predictions)
97

98
        grads = tape.gradient(g_loss, self.generator.trainable_weights)
99
        self.g_optimizer.apply_gradients(zip(grads, self.generator.trainable_weights))
100

101
        # Update metrics
102
        self.d_loss_metric.update_state(d_loss)
103
        self.g_loss_metric.update_state(g_loss)
104

105
        return {"d_loss": self.d_loss_metric.result(), "g_loss": self.g_loss_metric.result()}

Model Optimization and Deployment#

1. Model Quantization#

1
def quantize_model(model, representative_dataset):
2
    """Quantize model for mobile deployment"""
3

4
    converter = tf.lite.TFLiteConverter.from_keras_model(model)
5

6
    # Enable optimizations
7
    converter.optimizations = [tf.lite.Optimize.DEFAULT]
8

9
    # Set representative dataset for calibration
10
    converter.representative_dataset = representative_dataset
11

12
    # Enable integer quantization
13
    converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
14
    converter.inference_input_type = tf.int8
15
    converter.inference_output_type = tf.int8
16

17
    quantized_model = converter.convert()
18

19
    return quantized_model
20

21
def representative_data_gen():
22
    """Generate representative data for quantization"""
23
    for _ in range(100):
24
        yield [np.random.random((1, 224, 224, 3)).astype(np.float32)]
25

26
# Model pruning
27
def prune_model(model, target_sparsity=0.5):
28
    """Prune model to reduce size"""
29

30
    import tensorflow_model_optimization as tfmot
31

32
    # Define pruning parameters
33
    pruning_params = {
34
        'pruning_schedule': tfmot.sparsity.keras.PolynomialDecay(
35
            initial_sparsity=0.0,
36
            final_sparsity=target_sparsity,
37
            begin_step=0,
38
            end_step=1000
39
        )
40
    }
41

42
    # Apply pruning
43
    model_for_pruning = tfmot.sparsity.keras.prune_low_magnitude(model, **pruning_params)
44

45
    return model_for_pruning
46

47
# Knowledge distillation
48
class Distiller(keras.Model):
49
    """Knowledge distillation for model compression"""
50

51
    def __init__(self, student, teacher):
52
        super(Distiller, self).__init__()
53
        self.teacher = teacher
54
        self.student = student
55

56
    def compile(self, optimizer, metrics, student_loss_fn, distillation_loss_fn,
57
                alpha=0.1, temperature=3):
58
        super(Distiller, self).compile(optimizer=optimizer, metrics=metrics)
59
        self.student_loss_fn = student_loss_fn
60
        self.distillation_loss_fn = distillation_loss_fn
61
        self.alpha = alpha
62
        self.temperature = temperature
63

64
    def train_step(self, data):
65
        x, y = data
66

67
        # Forward pass of teacher
68
        teacher_predictions = self.teacher(x, training=False)
69

70
        with tf.GradientTape() as tape:
71
            # Forward pass of student
72
            student_predictions = self.student(x, training=True)
73

74
            # Compute losses
75
            student_loss = self.student_loss_fn(y, student_predictions)
76
            distillation_loss = self.distillation_loss_fn(
77
                tf.nn.softmax(teacher_predictions / self.temperature, axis=1),
78
                tf.nn.softmax(student_predictions / self.temperature, axis=1)
79
            )
80

81
            loss = self.alpha * student_loss + (1 - self.alpha) * distillation_loss
82

83
        # Compute gradients
84
        trainable_vars = self.student.trainable_variables
85
        gradients = tape.gradient(loss, trainable_vars)
86

87
        # Update weights
88
        self.optimizer.apply_gradients(zip(gradients, trainable_vars))
89

90
        # Update metrics
91
        self.compiled_metrics.update_state(y, student_predictions)
92

93
        results = {m.name: m.result() for m in self.metrics}
94
        results.update({"student_loss": student_loss, "distillation_loss": distillation_loss})
95

96
        return results

2. Model Serving and Deployment#

1
# TensorFlow Serving deployment
2
def create_serving_signature(model):
3
    """Create serving signature for TensorFlow Serving"""
4

5
    @tf.function
6
    def serve_fn(input_image):
7
        # Preprocess input
8
        processed_input = tf.cast(input_image, tf.float32) / 255.0
9

10
        # Run prediction
11
        predictions = model(processed_input)
12

13
        # Post-process output
14
        class_ids = tf.argmax(predictions, axis=-1)
15
        probabilities = tf.nn.softmax(predictions)
16

17
        return {
18
            'class_ids': class_ids,
19
            'probabilities': probabilities
20
        }
21

22
    # Define input specification
23
    input_spec = tf.TensorSpec(shape=[None, 224, 224, 3], dtype=tf.uint8)
24

25
    # Create concrete function
26
    concrete_function = serve_fn.get_concrete_function(input_spec)
27

28
    return concrete_function
29

30
def export_for_serving(model, export_path):
31
    """Export model for TensorFlow Serving"""
32

33
    # Create serving signature
34
    serving_fn = create_serving_signature(model)
35

36
    # Save model with signature
37
    tf.saved_model.save(
38
        model,
39
        export_path,
40
        signatures={'serving_default': serving_fn}
41
    )
42

43
    print(f"Model exported to: {export_path}")
44

45
# Edge deployment with TensorFlow Lite
46
def deploy_to_edge(model, model_path):
47
    """Deploy model to edge devices"""
48

49
    # Convert to TensorFlow Lite
50
    converter = tf.lite.TFLiteConverter.from_keras_model(model)
51
    converter.optimizations = [tf.lite.Optimize.DEFAULT]
52

53
    # Enable GPU delegate (optional)
54
    converter.target_spec.supported_ops = [
55
        tf.lite.OpsSet.TFLITE_BUILTINS,
56
        tf.lite.OpsSet.SELECT_TF_OPS
57
    ]
58

59
    tflite_model = converter.convert()
60

61
    # Save model
62
    with open(model_path, 'wb') as f:
63
        f.write(tflite_model)
64

65
    return tflite_model
66

67
# TensorFlow.js deployment
68
def export_for_web(model, export_path):
69
    """Export model for web deployment"""
70

71
    import tensorflowjs as tfjs
72

73
    tfjs.converters.save_keras_model(model, export_path)
74
    print(f"Model exported for web to: {export_path}")
75

76
# Cloud deployment with TensorFlow Extended (TFX)
77
def create_tfx_pipeline(model, data_path, serving_model_dir):
78
    """Create TFX pipeline for production deployment"""
79

80
    from tfx import v1 as tfx
81

82
    # Define pipeline components
83
    example_gen = tfx.components.CsvExampleGen(input_base=data_path)
84

85
    statistics_gen = tfx.components.StatisticsGen(
86
        examples=example_gen.outputs['examples']
87
    )
88

89
    schema_gen = tfx.components.SchemaGen(
90
        statistics=statistics_gen.outputs['statistics']
91
    )
92

93
    trainer = tfx.components.Trainer(
94
        module_file='trainer.py',
95
        examples=example_gen.outputs['examples'],
96
        schema=schema_gen.outputs['schema'],
97
        train_args=tfx.proto.TrainArgs(num_steps=1000),
98
        eval_args=tfx.proto.EvalArgs(num_steps=100)
99
    )
100

101
    pusher = tfx.components.Pusher(
102
        model=trainer.outputs['model'],
103
        push_destination=tfx.proto.PushDestination(
104
            filesystem=tfx.proto.PushDestination.Filesystem(
105
                base_directory=serving_model_dir
106
            )
107
        )
108
    )
109

110
    # Create and run pipeline
111
    pipeline = tfx.dsl.Pipeline(
112
        pipeline_name='computer_vision_pipeline',
113
        pipeline_root='pipeline_root',
114
        components=[example_gen, statistics_gen, schema_gen, trainer, pusher]
115
    )
116

117
    return pipeline

Performance Monitoring and MLOps#

1. Model Performance Monitoring#

1
# Data drift detection
2
def detect_data_drift(reference_data, current_data, threshold=0.1):
3
    """Detect data drift using statistical tests"""
4

5
    from scipy.stats import ks_2samp
6

7
    drift_scores = []
8

9
    for i in range(reference_data.shape[1]):
10
        # Kolmogorov-Smirnov test
11
        statistic, p_value = ks_2samp(
12
            reference_data[:, i],
13
            current_data[:, i]
14
        )
15

16
        drift_scores.append({
17
            'feature': i,
18
            'ks_statistic': statistic,
19
            'p_value': p_value,
20
            'drift_detected': p_value < threshold
21
        })
22

23
    return drift_scores
24

25
# Model performance tracking
26
class ModelMonitor:
27
    """Monitor model performance in production"""
28

29
    def __init__(self, model, reference_data):
30
        self.model = model
31
        self.reference_data = reference_data
32
        self.prediction_history = []
33
        self.performance_history = []
34

35
    def log_prediction(self, input_data, prediction, ground_truth=None):
36
        """Log model prediction"""
37

38
        log_entry = {
39
            'timestamp': tf.timestamp(),
40
            'input_shape': input_data.shape,
41
            'prediction': prediction,
42
            'confidence': tf.reduce_max(tf.nn.softmax(prediction))
43
        }
44

45
        if ground_truth is not None:
46
            log_entry['ground_truth'] = ground_truth
47
            log_entry['correct'] = tf.equal(
48
                tf.argmax(prediction),
49
                tf.argmax(ground_truth)
50
            )
51

52
        self.prediction_history.append(log_entry)
53

54
    def calculate_drift(self, current_batch):
55
        """Calculate data drift for current batch"""
56

57
        return detect_data_drift(self.reference_data, current_batch)
58

59
    def generate_report(self):
60
        """Generate performance report"""
61

62
        if not self.prediction_history:
63
            return "No predictions logged"
64

65
        total_predictions = len(self.prediction_history)
66
        correct_predictions = sum(1 for p in self.prediction_history
67
                                if p.get('correct', False))
68

69
        accuracy = correct_predictions / total_predictions if total_predictions > 0 else 0
70
        avg_confidence = np.mean([p['confidence'] for p in self.prediction_history])
71

72
        return {
73
            'total_predictions': total_predictions,
74
            'accuracy': accuracy,
75
            'average_confidence': avg_confidence,
76
            'low_confidence_predictions': sum(1 for p in self.prediction_history
77
                                            if p['confidence'] < 0.7)
78
        }
79

80
# A/B testing for model comparison
81
class ModelABTest:
82
    """A/B test framework for model comparison"""
83

84
    def __init__(self, model_a, model_b, traffic_split=0.5):
85
        self.model_a = model_a
86
        self.model_b = model_b
87
        self.traffic_split = traffic_split
88
        self.results_a = []
89
        self.results_b = []
90

91
    def predict(self, input_data):
92
        """Route traffic between models"""
93

94
        if np.random.random() < self.traffic_split:
95
            prediction = self.model_a(input_data)
96
            self.results_a.append(prediction)
97
            return prediction, 'model_a'
98
        else:
99
            prediction = self.model_b(input_data)
100
            self.results_b.append(prediction)
101
            return prediction, 'model_b'
102

103
    def statistical_significance(self, metric_a, metric_b):
104
        """Test statistical significance of results"""
105

106
        from scipy.stats import ttest_ind
107

108
        t_stat, p_value = ttest_ind(metric_a, metric_b)
109

110
        return {
111
            't_statistic': t_stat,
112
            'p_value': p_value,
113
            'significant': p_value < 0.05,
114
            'winner': 'model_a' if np.mean(metric_a) > np.mean(metric_b) else 'model_b'
115
        }

Conclusion#

Advanced computer vision with TensorFlow opens up endless possibilities for solving real-world problems. From transfer learning for quick prototyping to sophisticated object detection and segmentation systems, the techniques covered in this guide provide a solid foundation for building production-ready computer vision applications.

Key Takeaways:#

Transfer Learning is often the best starting point for most CV tasks
Modern Architectures like EfficientNet provide excellent performance/efficiency trade-offs
Object Detection and Segmentation enable more complex visual understanding
Model Optimization is crucial for deployment to resource-constrained environments
MLOps Practices ensure reliable operation in production

Next Steps:#

Experiment with different architectures on your specific datasets
Explore domain-specific applications (medical imaging, satellite imagery, etc.)
Implement real-time processing pipelines
Study the latest research in computer vision
Build end-to-end applications with proper monitoring

The computer vision field is rapidly evolving, with new architectures and techniques emerging regularly. The foundation you’ve built with TensorFlow will serve you well as you continue to explore this exciting domain.

Ready to apply these techniques? Check out our TensorFlow getting started guide for the fundamentals, then start building your own computer vision applications!