论文给出的 HED 网络是一个通用的边缘检测网络,按照论文的描述,每一个尺度上得到的 image,都需要参与 cost 的计算,这部分的代码如下: input_queue_for_train = tf.train.string_input_producer([FLAGS.csv_path]) image_tensor, annotation_tensor = input_image_pipeline(dataset_root_dir_string, input_queue_for_train, FLAGS.batch_size) dsn_fuse, dsn1, dsn2, dsn3, dsn4, dsn5 = hed_net(image_tensor, FLAGS.batch_size) cost = class_balanced_sigmoid_cross_entropy(dsn_fuse, annotation_tensor) + class_balanced_sigmoid_cross_entropy(dsn1, annotation_tensor) + class_balanced_sigmoid_cross_entropy(dsn2, annotation_tensor) + class_balanced_sigmoid_cross_entropy(dsn3, annotation_tensor) + class_balanced_sigmoid_cross_entropy(dsn4, annotation_tensor) + class_balanced_sigmoid_cross_entropy(dsn5, annotation_tensor) 按照这种方式训练出来的网络,检测到的边缘线是有一点粗的,为了得到更细的边缘线,通过多次试验找到了一种优化方案,代码如下: input_queue_for_train = tf.train.string_input_producer([FLAGS.csv_path]) image_tensor, annotation_tensor = input_image_pipeline(dataset_root_dir_string, input_queue_for_train, FLAGS.batch_size) dsn_fuse, _, _, _, _, _ = hed_net(image_tensor, FLAGS.batch_size) cost = class_balanced_sigmoid_cross_entropy(dsn_fuse, annotation_tensor) 也就是不再让每个尺度上得到的 image 都参与 cost 的计算,只使用融合后得到的最终 image 来进行计算。 两种 cost 函数的效果对比如下图所示,右侧是优化过后的效果:
另外还有一点,按照 HED 论文里的要求,计算 cost 的时候,不能使用常见的方差 cost,而应该使用 cost-sensitive loss function,代码如下: def class_balanced_sigmoid_cross_entropy(logits, label,): """ The class-balanced cross entropy loss, as in `Holistically-Nested Edge Detection <>`_. This is more numerically stable than class_balanced_cross_entropy :param logits: size: the logits. :param label: size: the ground truth in {0,1}, of the same shape as logits. :returns: a scalar. class-balanced cross entropy loss """ y = tf.cast(label, tf.float32) count_neg = tf.reduce_sum(1. - y) # the number of 0 in y count_pos = tf.reduce_sum(y) # the number of 1 in y (less than count_neg) beta = count_neg / (count_neg + count_pos) pos_weight = beta / (1 - beta) cost = tf.nn.weighted_cross_entropy_with_logits(logits, y, pos_weight) cost = tf.reduce_mean(cost * (1 - beta), name=name) return cost 转置卷积层的双线性初始化 在尝试 FCN 网络的时候,就被这个问题卡住过很长一段时间,按照 FCN 的要求,在使用转置卷积 (transposed convolution)/ 反卷积 (deconv) 的时候,要把卷积核的值初始化成双线性放大矩阵 (bilinear upsampling kernel),而不是常用的正态分布随机初始化,同时还要使用很小的学习率,这样才更容易让模型收敛。 HED 的论文中,并没有明确的要求也要采用这种方式初始化转置卷积层,但是,在训练过程中发现,采用这种方式进行初始化,模型才更容易收敛。 这部分的代码如下: def get_kernel_size(factor): """ Find the kernel size given the desired factor of upsampling. """ return 2 * factor - factor % 2 def upsample_filt(size): """ Make a 2D bilinear kernel suitable for upsampling of the given (h, w) size. """ factor = (size + 1) // 2 if size % 2 == 1: center = factor - 1 else: center = factor - 0.5 og = np.ogrid[:size, :size] return (1 - abs(og[0] - center) / factor) * (1 - abs(og[1] - center) / factor) def bilinear_upsample_weights(factor, number_of_classes): """ Create weights matrix for transposed convolution with bilinear filter initialization. """ filter_size = get_kernel_size(factor) weights = np.zeros((filter_size, filter_size, number_of_classes, number_of_classes), dtype=np.float32) upsample_kernel = upsample_filt(filter_size) for i in xrange(number_of_classes): weights[:, :, i, i] = upsample_kernel return weights 训练过程冷启动 HED 网络不像 VGG 网络那样很容易就进入收敛状态,也不太容易进入期望的理想状态,主要是两方面的原因: (责任编辑:本港台直播) |