That leaks node-level features from the train set into the test/validation set then no? Like if 90% of my channels are in the train set, and the top performing feature is "positional encoding" then the train set can basically learn my nodes positional encoding and whether my channels tend to be balanced or not. Then it just extrapolates that to my channels in the test/validation set.

Group by channel ID and then split. 

https://dplyr.tidyverse.org/reference/group_split.html

random_

lightning

Amboss Targets Bitcoin Scalability with New Machine Learning Research

ambosstech

- how many nodes and channels were in the dataset used to train this?
- i think it is basically impossible to do a proper train/test/validation here. for example, if you have one side of the channel in the training set then the other side might be in the test set and you have leaked the answer to your model to learn. you can't fully sequester this information. how did you try to deal with this?