Submitted by MichelMED10 t3_ysah21 in MachineLearning
Hey ,
In Timm's implementation of stochastic depth (https://github.com/rwightman/pytorch-image-models/blob/main/timm/models/layers/drop.py) the tensor is scaled by the probability of keeping the actual block. I didn't understand why he does so specially that this is not mentioned in the paper.
Can anyone explain this to me please ?
Thanks !
The code :
def drop_path(x, drop_prob: float = 0., training: bool = False, scale_by_keep: bool = True):
keep_prob = 1 - drop_prob shape = (x.shape[0],) + (1,) * (x.ndim - 1)
random_tensor = x.new_empty(shape).bernoulli_(keep_prob)
if keep_prob > 0.0 and scale_by_keep:
random_tensor.div_(keep_prob)
return x * random_tensor
killver t1_ivzoqe1 wrote
Why don't you ask in his repo?