The most difficult part of this is signal processing on the node server, which the web audio api is designed for. Doing this yourself would be extremely complicated, you have to manage node connections and then resolve the values of all the node outputs using some kind of efficient dependency resolution, and that needs to happen at some sample rate where each pass is just a slice of signal output in the stream, and the sample rate or final quality of the signal depends on how quickly each pass can run on your hardware. This is what the web audio api does, with hardware acceleration and all that.
In a perfect world, the node server would be processing the audio signal chain and streaming those values out thru the MIDI stream. The client UI is just receiving messages on the web socket about the state of the machine on the server and updating the UI accordingly. The websocket introduces potential latency, even though its probably <1ms with everything running locally. With the client being disconnected from the MIDI stream source on the server, you don't want the client producing the signals that are being streamed out to other DAWs, etc. The sample rate is limited by the websocket network transfer latency. That's terrible.
So in this perfect world, theres a nodejs implementation of the Web Audio API, that can process and stream complex, node-based audio signal chains with hardware acceleration.
The node-midi library is a js wrapper for a C++ library called RtMidi. I might look for other C++ libraries that similarly provide audio signal chain processing and see about js wrappers.