With the continuous advancement of artificial intelligence technology, data-driven methods for reconstructing and animating virtual agents have achieved increasing levels of realism. However, there is limited research on how these novel data-driven methods, combined with voice cues, affect user perceptions. We use advanced data-driven methods to reconstruct stylized agents and combine them with synthesized voices to study their effects on users' trust and other perceptions (e.g. social presence and empathy). Through an experiment with 27 participants, our findings reveal that stylized virtual agents enhance user trust to a degree comparable to real style, while voice has a negligible effect on trust. Additionally, elder agents are more likely to be trusted. The style of the agents also plays a key role in participants' perceived realism, and audio-visual matching significantly enhances perceived empathy. These results provide new insights into designing trustworthy virtual agents and further support and validate the audio-visual integration theory.