UP | HOME

Python笔记

Table of Contents

Python 正则表达式

去除 HTML 标签

On Python 2

from HTMLParser import HTMLParser

class MLStripper(HTMLParser):
    def __init__(self):
        self.reset()
        self.fed = []
    def handle_data(self, d):
        self.fed.append(d)
    def get_data(self):
        return ''.join(self.fed)

def strip_tags(html):
    s = MLStripper()
    s.feed(html)
    return s.get_data()

For Python 3

from html.parser import HTMLParser

class MLStripper(HTMLParser):
    def __init__(self):
        self.reset()
        self.strict = False
        self.convert_charrefs= True
        self.fed = []
    def handle_data(self, d):
        self.fed.append(d)
    def get_data(self):
        return ''.join(self.fed)

def strip_tags(html):
    s = MLStripper()
    s.feed(html)
    return s.get_data()

来源:http://stackoverflow.com/questions/753052/strip-html-from-strings-in-python

Date: 2015-3-4

Author: manan

Created: 2016-01-19 二 19:06

Emacs 24.5.1 (Org mode 8.2.10)

Validate