“分离()"之间的区别和“with torch.nograd()&quo-Python问题

Difference between quot;detach()quot; and quot;with torch.nograd()quot; in PyTorch?(“分离()之间的区别和“with torch.nograd()在 PyTorch 中?)

本文介绍了“分离()"之间的区别和“with torch.nograd()"在 PyTorch 中?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我知道两种从梯度计算中排除计算元素的方法backward

I know about two ways to exclude elements of a computation from the gradient calculation backward

方法一:使用with torch.no_grad()

with torch.no_grad():
    y = reward + gamma * torch.max(net.forward(x))
loss = criterion(net.forward(torch.from_numpy(o)), y)
loss.backward();

方法二:使用.detach()

y = reward + gamma * torch.max(net.forward(x))
loss = criterion(net.forward(torch.from_numpy(o)), y.detach())
loss.backward();

这两者有区别吗?两者都有好处/坏处吗?

Is there a difference between these two? Are there benefits/downsides to either?

推荐答案

tensor.detach() 创建一个与不需要 grad 的张量共享存储的张量.它将输出与计算图分离.所以不会沿着这个变量反向传播梯度.

tensor.detach() creates a tensor that shares storage with tensor that does not require grad. It detaches the output from the computational graph. So no gradient will be backpropagated along this variable.

包装器 with torch.no_grad() 临时将所有 requires_grad 标志设置为 false.torch.no_grad 表示没有操作应该构建图.

The wrapper with torch.no_grad() temporarily set all the requires_grad flag to false. torch.no_grad says that no operation should build the graph.

不同的是，一个变量只引用一个给定的变量，它被调用.另一个影响在 with 语句中发生的所有操作.此外，torch.no_grad 将使用更少的内存，因为它从一开始就知道不需要梯度，因此不需要保留中间结果.

The difference is that one refers to only a given variable on which it is called. The other affects all operations taking place within the with statement. Also, torch.no_grad will use less memory because it knows from the beginning that no gradients are needed so it doesn’t need to keep intermediary results.

从此处.

这篇关于“分离()"之间的区别和“with torch.nograd()"在 PyTorch 中?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持编程学习网！