html桃花源码,HTML过滤 - 桃花源 - OSCHINA - 中文开源技术交流社区
实用
public static String removeTag(String htmlStr) {
try {
String regEx_script = "
String regEx_style = "
String regEx_html = "<[^>]+>"; // HTML tag
String regEx_space = "\\s+|\t|\r|\n| |\\[0-9+\\]";// other characters
Pattern p_script = Pattern.compile(regEx_script, Pattern.CASE_INSENSITIVE);
Matcher m_script = p_script.matcher(htmlStr);
htmlStr = m_script.replaceAll("");
Pattern p_style = Pattern.compile(regEx_style, Pattern.CASE_INSENSITIVE);
Matcher m_style = p_style.matcher(htmlStr);
htmlStr = m_style.replaceAll("");
Pattern p_html = Pattern.compile(regEx_html, Pattern.CASE_INSENSITIVE);
Matcher m_html = p_html.matcher(htmlStr);
htmlStr = m_html.replaceAll("");
Pattern p_space = Pattern.compile(regEx_space, Pattern.CASE_INSENSITIVE);
Matcher m_space = p_space.matcher(htmlStr);
htmlStr = m_space.replaceAll(" ");
return htmlStr;
}catch (NullPointerException nullException){
return "";
}
}
上一篇: 研究百度中文点选验证码
下一篇: 解读常见中文字符编码,助你获得全面理解