Html link extractor

Just a handy regex for extracting url and link text from an html page

      String pattern = "<a(.+?)href=\"([^\"]+)\"[^>]*>(.+?)</a>";
      Pattern r = Pattern.compile(pattern);
      Matcher m = r.matcher(table);
      while( m.find() ) {
        String ahreflink = m.group(0);
        String url = m.group(2);
        String linktext = m.group(3);
      }

The ? symbol makes it a lazy match, matching up to the next word “href” or ““

Advertisements
Tagged , , ,
%d bloggers like this: