This looks like one I recapped 10 years ago. I had the same problem. What you're up against is technology from the 1930's. This is what's known as the split-sound design. The sound IF stages are fed signal from somewhere near the tuner, and amplifies it at its natural IF frequency, 4.5 MHz away from the video IF signal. This differs from the later inter-carrier system, where the sound carrier signal shows up as a difference-frequency of 4.5 MHz in the video-demodulator stage, which is easily snagged and of consistent amplitude.
In your case, the AGC is probably driving the frontend gain too low --due to the strong VCR signal-- for the sound section to get enough signal. I recommend you do this: 1) review the video IF response-curve at the sound takeoff-point to see if the sound-carrier level is not down too far. 2) recheck the sound IF down to the last resistor, realign per spec, then 3) use an attenuator with the VCR or whatever you're inputting; you shouldn't have to adjust it again with the input level just past what's needed for good picture contrast.
Your video alignment looks really good from the picture!