Initialization

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 6974811 Experts worldwide ranked by ideXlab platform

Yoshua Bengio - One of the best experts on this subject based on the ideXlab platform.

  • how to initialize your network robust Initialization for weightnorm resnets
    Neural Information Processing Systems, 2019
    Co-Authors: Devansh Arpit, Victor Campos, Yoshua Bengio
    Abstract:

    Residual networks (ResNet) and weight normalization play an important role in various deep learning applications. However, parameter Initialization strategies have not been studied previously for weight normalized networks and, in practice, Initialization methods designed for un-normalized networks are used as a proxy. Similarly, Initialization for ResNets have also been studied for un-normalized networks and often under simplified settings ignoring the shortcut connection. To address these issues, we propose a novel parameter Initialization strategy that avoids explosion/vanishment of information across layers for weight normalized networks with and without residual connections. The proposed strategy is based on a theoretical analysis using mean field approximation. We run over 2,500 experiments and evaluate our proposal on image datasets showing that the proposed Initialization outperforms existing Initialization methods in terms of generalization performance, robustness to hyper-parameter values and variance between seeds, especially when networks get deeper in which case existing methods fail to even start training. Finally, we show that using our Initialization in conjunction with learning rate warmup is able to reduce the gap between the performance of weight normalized and batch normalized networks.

  • NeurIPS - How to Initialize your Network? Robust Initialization for WeightNorm & ResNets
    2019
    Co-Authors: Devansh Arpit, Victor Campos, Yoshua Bengio
    Abstract:

    Residual networks (ResNet) and weight normalization play an important role in various deep learning applications. However, parameter Initialization strategies have not been studied previously for weight normalized networks and, in practice, Initialization methods designed for un-normalized networks are used as a proxy. Similarly, Initialization for ResNets have also been studied for un-normalized networks and often under simplified settings ignoring the shortcut connection. To address these issues, we propose a novel parameter Initialization strategy that avoids explosion/vanishment of information across layers for weight normalized networks with and without residual connections. The proposed strategy is based on a theoretical analysis using mean field approximation. We run over 2,500 experiments and evaluate our proposal on image datasets showing that the proposed Initialization outperforms existing Initialization methods in terms of generalization performance, robustness to hyper-parameter values and variance between seeds, especially when networks get deeper in which case existing methods fail to even start training. Finally, we show that using our Initialization in conjunction with learning rate warmup is able to reduce the gap between the performance of weight normalized and batch normalized networks.

  • How to Initialize your Network? Robust Initialization for WeightNorm & ResNets
    arXiv: Machine Learning, 2019
    Co-Authors: Devansh Arpit, Victor Campos, Yoshua Bengio
    Abstract:

    Residual networks (ResNet) and weight normalization play an important role in various deep learning applications. However, parameter Initialization strategies have not been studied previously for weight normalized networks and, in practice, Initialization methods designed for un-normalized networks are used as a proxy. Similarly, Initialization for ResNets have also been studied for un-normalized networks and often under simplified settings ignoring the shortcut connection. To address these issues, we propose a novel parameter Initialization strategy that avoids explosion/vanishment of information across layers for weight normalized networks with and without residual connections. The proposed strategy is based on a theoretical analysis using mean field approximation. We run over 2,500 experiments and evaluate our proposal on image datasets showing that the proposed Initialization outperforms existing Initialization methods in terms of generalization performance, robustness to hyper-parameter values and variance between seeds, especially when networks get deeper in which case existing methods fail to even start training. Finally, we show that using our Initialization in conjunction with learning rate warmup is able to reduce the gap between the performance of weight normalized and batch normalized networks.

David Rolnick - One of the best experts on this subject based on the ideXlab platform.

  • how to start training the effect of Initialization and architecture
    Neural Information Processing Systems, 2018
    Co-Authors: Boris Hanin, David Rolnick
    Abstract:

    We identify and study two common failure modes for early training in deep ReLU nets. For each, we give a rigorous proof of when it occurs and how to avoid it, for fully connected, convolutional, and residual architectures. We show that the first failure mode, exploding or vanishing mean activation length, can be avoided by initializing weights from a symmetric distribution with variance 2/fan-in and, for ResNets, by correctly scaling the residual modules. We prove that the second failure mode, exponentially large variance of activation length, never occurs in residual nets once the first failure mode is avoided. In contrast, for fully connected nets, we prove that this failure mode can happen and is avoided by keeping constant the sum of the reciprocals of layer widths. We demonstrate empirically the effectiveness of our theoretical results in predicting when networks are able to start training. In particular, we note that many popular Initializations fail our criteria, whereas correct Initialization and architecture allows much deeper networks to be trained.

  • how to start training the effect of Initialization and architecture
    arXiv: Machine Learning, 2018
    Co-Authors: Boris Hanin, David Rolnick
    Abstract:

    We investigate the effects of Initialization and architecture on the start of training in deep ReLU nets. We identify two common failure modes for early training in which the mean and variance of activations are poorly behaved. For each failure mode, we give a rigorous proof of when it occurs at Initialization and how to avoid it. The first failure mode, exploding/vanishing mean activation length, can be avoided by initializing weights from a symmetric distribution with variance 2/fan-in. The second failure mode, exponentially large variance of activation length, can be avoided by keeping constant the sum of the reciprocals of layer widths. We demonstrate empirically the effectiveness of our theoretical results in predicting when networks are able to start training. In particular, we note that many popular Initializations fail our criteria, whereas correct Initialization and architecture allows much deeper networks to be trained.

Paul R Moorcroft - One of the best experts on this subject based on the ideXlab platform.

  • using lidar and radar measurements to constrain predictions of forest ecosystem structure and function
    Ecological Applications, 2011
    Co-Authors: A. S. Antonarakis, Sassan S. Saatchi, Robin L. Chazdon, Paul R Moorcroft
    Abstract:

    Insights into vegetation and aboveground biomass dynamics within terrestrial ecosystems have come almost exclusively from ground-based forest inventories that are limited in their spatial extent. Lidar and synthetic-aperture Radar are promising remote-sensing-based techniques for obtaining comprehensive measurements of forest structure at regional to global scales. In this study we investigate how Lidar-derived forest heights and Radar-derived aboveground biomass can be used to constrain the dynamics of the ED2 terrestrial biosphere model. Four-year simulations initialized with Lidar and Radar structure variables were compared against simulations initialized from forest-inventory data and output from a longterm potential-vegtation simulation. Both height and biomass Initializations from Lidar and Radar measurements significantly improved the representation of forest structure within the model, eliminating the bias of too many large trees that arose in the potential-vegtationinitialized simulation. The Lidar and Radar Initializations decreased the proportion of larger trees estimated by the potential vegetation by ;20–30%, matching the forest inventory. This resulted in improved predictions of ecosystem-scale carbon fluxes and structural dynamics compared to predictions from the potential-vegtation simulation. The Radar Initialization produced biomass values that were 75% closer to the forest inventory, with Lidar Initializations producing canopy height values closest to the forest inventory. Net primary production values for the Radar and Lidar Initializations were around 6–8% closer to the forest inventory. Correcting the Lidar and Radar Initializations for forest composition resulted in improved biomass and basal-area dynamics as well as leaf-area index. Correcting the Lidar and Radar Initializations for forest composition and fine-scale structure by combining the remote-sensing measurements with ground-based inventory data further improved predictions, suggesting that further improvements of structural and carbon-flux metrics will also depend on obtaining reliable estimates of forest composition and accurate representation of the fine-scale vertical and horizontal structure of plant canopies.

  • Using Lidar and Radar measurements to constrain predictions of forest ecosystem structure and function
    Ecological Applications, 2011
    Co-Authors: A. S. Antonarakis, Sassan S. Saatchi, Robin L. Chazdon, Paul R Moorcroft
    Abstract:

    Insights into vegetation and aboveground biomass dynamics within terrestrial ecosystems have come almost exclusively from ground-based forest inventories that are limited in their spatial extent. Lidar and synthetic-aperture Radar are promising remote-sensing-based techniques for obtaining comprehensive measurements of forest structure at regional to global scales. In this study we investigate how Lidar-derived forest heights and Radar-derived aboveground biomass can be used to constrain the dynamics of the ED2 terrestrial biosphere model. Four-year simulations initialized with Lidar and Radar structure variables were compared against simulations initialized from forest-inventory data and output from a long-term potential-vegtation simulation. Both height and biomass Initializations from Lidar and Radar measurements significantly improved the representation of forest structure within the model, eliminating the bias of too many large trees that arose in the potential-vegtation-initialized simulation. The Lidar and Radar Initializations decreased the proportion of larger trees estimated by the potential vegetation by approximately 20-30%, matching the forest inventory. This resulted in improved predictions of ecosystem-scale carbon fluxes and structural dynamics compared to predictions from the potential-vegtation simulation. The Radar Initialization produced biomass values that were 75% closer to the forest inventory, with Lidar Initializations producing canopy height values closest to the forest inventory. Net primary production values for the Radar and Lidar Initializations were around 6-8% closer to the forest inventory. Correcting the Lidar and Radar Initializations for forest composition resulted in improved biomass and basal-area dynamics as well as leaf-area index. Correcting the Lidar and Radar Initializations for forest composition and fine-scale structure by combining the remote-sensing measurements with ground-based inventory data further improved predictions, suggesting that further improvements of structural and carbon-flux metrics will also depend on obtaining reliable estimates of forest composition and accurate representation of the fine-scale vertical and horizontal structure of plant canopies.

마르코스 씨 티자네스 - One of the best experts on this subject based on the ideXlab platform.

Devansh Arpit - One of the best experts on this subject based on the ideXlab platform.

  • how to initialize your network robust Initialization for weightnorm resnets
    Neural Information Processing Systems, 2019
    Co-Authors: Devansh Arpit, Victor Campos, Yoshua Bengio
    Abstract:

    Residual networks (ResNet) and weight normalization play an important role in various deep learning applications. However, parameter Initialization strategies have not been studied previously for weight normalized networks and, in practice, Initialization methods designed for un-normalized networks are used as a proxy. Similarly, Initialization for ResNets have also been studied for un-normalized networks and often under simplified settings ignoring the shortcut connection. To address these issues, we propose a novel parameter Initialization strategy that avoids explosion/vanishment of information across layers for weight normalized networks with and without residual connections. The proposed strategy is based on a theoretical analysis using mean field approximation. We run over 2,500 experiments and evaluate our proposal on image datasets showing that the proposed Initialization outperforms existing Initialization methods in terms of generalization performance, robustness to hyper-parameter values and variance between seeds, especially when networks get deeper in which case existing methods fail to even start training. Finally, we show that using our Initialization in conjunction with learning rate warmup is able to reduce the gap between the performance of weight normalized and batch normalized networks.

  • NeurIPS - How to Initialize your Network? Robust Initialization for WeightNorm & ResNets
    2019
    Co-Authors: Devansh Arpit, Victor Campos, Yoshua Bengio
    Abstract:

    Residual networks (ResNet) and weight normalization play an important role in various deep learning applications. However, parameter Initialization strategies have not been studied previously for weight normalized networks and, in practice, Initialization methods designed for un-normalized networks are used as a proxy. Similarly, Initialization for ResNets have also been studied for un-normalized networks and often under simplified settings ignoring the shortcut connection. To address these issues, we propose a novel parameter Initialization strategy that avoids explosion/vanishment of information across layers for weight normalized networks with and without residual connections. The proposed strategy is based on a theoretical analysis using mean field approximation. We run over 2,500 experiments and evaluate our proposal on image datasets showing that the proposed Initialization outperforms existing Initialization methods in terms of generalization performance, robustness to hyper-parameter values and variance between seeds, especially when networks get deeper in which case existing methods fail to even start training. Finally, we show that using our Initialization in conjunction with learning rate warmup is able to reduce the gap between the performance of weight normalized and batch normalized networks.

  • How to Initialize your Network? Robust Initialization for WeightNorm & ResNets
    arXiv: Machine Learning, 2019
    Co-Authors: Devansh Arpit, Victor Campos, Yoshua Bengio
    Abstract:

    Residual networks (ResNet) and weight normalization play an important role in various deep learning applications. However, parameter Initialization strategies have not been studied previously for weight normalized networks and, in practice, Initialization methods designed for un-normalized networks are used as a proxy. Similarly, Initialization for ResNets have also been studied for un-normalized networks and often under simplified settings ignoring the shortcut connection. To address these issues, we propose a novel parameter Initialization strategy that avoids explosion/vanishment of information across layers for weight normalized networks with and without residual connections. The proposed strategy is based on a theoretical analysis using mean field approximation. We run over 2,500 experiments and evaluate our proposal on image datasets showing that the proposed Initialization outperforms existing Initialization methods in terms of generalization performance, robustness to hyper-parameter values and variance between seeds, especially when networks get deeper in which case existing methods fail to even start training. Finally, we show that using our Initialization in conjunction with learning rate warmup is able to reduce the gap between the performance of weight normalized and batch normalized networks.