I don't know how much inference demand downtime there is right now, but judging by the "batch" APIs being promoted for a while now, there must be non-constant demand
I guess in an ideal world it would be like multi-tasking. Moment to moment context switching (maybe not at the microsecond level, but maybe at the seconds level). So run an inference request, then guess a few hundred million hashes, then switch back to inference....goal would be to keep NPU constantly occupied
I guess in an ideal world it would be like multi-tasking. Moment to moment context switching (maybe not at the microsecond level, but maybe at the seconds level). So run an inference request, then guess a few hundred million hashes, then switch back to inference....goal would be to keep NPU constantly occupied