pull down to refresh
I don't know how much inference demand downtime there is right now, but judging by the "batch" APIs being promoted for a while now, there must be non-constant demand
I guess in an ideal world it would be like multi-tasking. Moment to moment context switching (maybe not at the microsecond level, but maybe at the seconds level). So run an inference request, then guess a few hundred million hashes, then switch back to inference....goal would be to keep NPU constantly occupied
A couple years ago I spoke to a former Broadcom designer that for lulz developed a PCIe board that he built in 2 versions: SHA256d and a bespoke tensor unit for inference. His thinking was that he could just put sets of both in a container and during NPU demand downtime, mine Bitcoin as grid booking permitted.
I don't know how much inference demand downtime there is right now, but judging by the "batch" APIs being promoted for a while now, there must be non-constant demand.