Automatic discrimination of speech and music is an important tool in many multimedia applications. For the discrimination of speech and music the author have used three characteristics: HZCRR (High Zero Crossing Rate Ratio), LSTER (Low Short Time Energy Ratio), SF (Spectrum Flux) and the algorithm for training and discrimination is K Nearest Neighbor. The data is musical segments with different kind of music like Vietnamese music, Rock, Pop songs, country music and speech segments of male and female voices for Vietnamese. In the article the major objective of the research is to discriminate two audio signals: speech and music. The author have got results with rather high accuracy: about 88 percent for speech and 92 percent for music. In the future, the author would like to develop the system to classify more classes of audio signal.