Frequent "SSH could not read data: Error waiting on socket" errors #851

Open
opened 2021-11-05 04:00:32 -05:00 by hiddeco · 1 comment
hiddeco commented 2021-11-05 04:00:32 -05:00 (Migrated from github.com)

We use libgit2 and git2go a bit different than most other projects I have seen putting it to use (in the public domain), in that we have a long running process which continuously spawns new (temporary) repository objects to perform cloning operations. For some time now, a group of users complains about seeing frequent (but not constant) SSH could not read data: Error waiting on socket errors during clone attempts.

To ensure it was not due to our (initially hasty) implementation of libgit2 and git2go, we spend time the past weeks to ensure we work with git2go in a way it likes to be handled. By e.g. ensuring that we properly Free objects, and by informing the transports they should close connections once they reach a defined timeout (in the hope this would maybe signal something in the C library to do something socket related).

The error however remains, and I am not sure in what direction we should look next to attempt to solve it. Given this, and due to Go native transport still not working properly in combination with authentication, it would be great if I could receive some guidance to at least explain why this error occurs, and if there is anything that could be done about it (and if it requires a PR, I would be happy to submit this).

Our git2go wrapping code (for more context) can be found in: https://github.com/fluxcd/source-controller/tree/main/pkg/git/libgit2

We use `libgit2` and `git2go` a bit different than most other projects I have seen putting it to use (in the public domain), in that we have a long running process which continuously spawns new (temporary) repository objects to perform cloning operations. For some time now, a group of users complains about seeing frequent (but not constant) `SSH could not read data: Error waiting on socket` errors during clone attempts. To ensure it was not due to our (initially hasty) implementation of `libgit2` and `git2go`, we spend time the past weeks to ensure we work with `git2go` in a way it likes to be handled. By e.g. ensuring that we properly `Free` objects, and [by informing the transports they should close connections once they reach a defined timeout](https://github.com/fluxcd/source-controller/pull/477) (in the hope this would maybe signal something in the C library to do something socket related). The error however remains, and I am not sure in what direction we should look next to attempt to solve it. Given this, and due to [Go native transport still not working properly in combination with authentication](https://github.com/libgit2/git2go/issues/836#issuecomment-951129507), it would be great if I could receive some guidance to at least explain why this error occurs, and if there is anything that could be done about it (and if it requires a PR, I would be happy to submit this). Our `git2go` wrapping code (for more context) can be found in: https://github.com/fluxcd/source-controller/tree/main/pkg/git/libgit2
cbanciu667 commented 2022-05-20 10:30:02 -05:00 (Migrated from github.com)

Hi,

Any updates on this issue ? It is affecting fluxcd image automation controller since some time.

Hi, Any updates on this issue ? It is affecting fluxcd image automation controller since some time.
Sign in to join this conversation.
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: jcarr/git2go#851
No description provided.