The 'zh sound' /ʒ/ is voiced (the vocal cords vibrate during its production), and is the counterpart to the unvoiced 'sh sound' /ʃ/.
To create the /ʒ/, air is forced between a wide groove in the center of the front of the tongue and the back of the tooth ridge. The sides of the blade of the tongue may touch the side teeth. The lips are kept slightly tense, and may protrude somewhat during the production of the sound.
The /ʒ/ is a continuous consonant, meaning that it should be capable of being held for a few seconds with even and smooth pronunciation for the entire duration.
Reference: https://pronuncian.com/pronounce-zh-sound