Graph out-of-distribution generalization with controllable data augmentation
IEEE Transactions on Knowledge and Data Engineering, 2024•ieeexplore.ieee.org
Graph Neural Network (GNN) has demonstrated extraordinary performance in classifying
graph properties. However, due to the selection bias of training and testing data (eg, training
on small graphs and testing on large graphs, or training on dense graphs and testing on
sparse graphs), distribution deviation is widespread. More importantly, we often observe
hybrid structure distribution shift of both scale and density, despite of one-sided biased data
partition. The spurious correlations over hybrid distribution deviation degrade the …
graph properties. However, due to the selection bias of training and testing data (eg, training
on small graphs and testing on large graphs, or training on dense graphs and testing on
sparse graphs), distribution deviation is widespread. More importantly, we often observe
hybrid structure distribution shift of both scale and density, despite of one-sided biased data
partition. The spurious correlations over hybrid distribution deviation degrade the …
Graph Neural Network (GNN) has demonstrated extraordinary performance in classifying graph properties. However, due to the selection bias of training and testing data (e.g., training on small graphs and testing on large graphs, or training on dense graphs and testing on sparse graphs), distribution deviation is widespread. More importantly, we often observe hybrid structure distribution shift of both scale and density, despite of one-sided biased data partition. The spurious correlations over hybrid distribution deviation degrade the performance of previous GNN methods and show large instability among different datasets. To alleviate this problem, we propose OOD-GMixup to jointly manipulate the training distribution with controllable data augmentation in metric space. Specifically, we first extract the graph rationales to eliminate the spurious correlations due to irrelevant information. Secondly, we generate virtual samples with perturbation on graph rationale representation domain to obtain potential OOD training samples. Finally, we propose OOD calibration to measure the distribution deviation of virtual samples by leveraging Extreme Value Theory, and further actively control the training distribution by emphasizing the impact of virtual OOD samples. Extensive studies on several real-world datasets on graph classification demonstrate the superiority of our proposed method over state-of-the-art baselines.
ieeexplore.ieee.org
以上显示的是最相近的搜索结果。 查看全部搜索结果