uh ..

so, when you say MUD, i think you mean Multi User Dungeon ?

In Mud you got a room, 1-4 exits, a handful of mobs (like 3 or 4), a few items on the ground and the other players.
The mobs don't move either. The room is essentially one tile.

In Angband you got hundreds of items, traps, mobs, some hidden from view but known. You got monster qualities as well, so not just "ooze" but "purple ooze" which affects how you approach that mob.

So .. lots of things to convert into audio. Not just each mob and item, but their location too.

Frankly i cannot imagine having to convert a vault to audio, and repeat that every turn. i can't even imagine someone listening to that audio, and actually managing to understand what is happening.
To really drive the point home, i cannot imagine a single turn of an Angband vault being converted into audio, and then someone managing to sit through that entire audio and managing to have a visual understanding of what they just heard.

and then repeat every turn.
"i can take this dracolich"
