欢迎您访问 最编程 本站为您分享编程语言代码,编程技术文章!
您现在的位置是: 首页

如何使用 HtmlUnit 库抓取天猫图片

最编程 2024-05-01 13:16:57
...
```javaimport com.gargoylesoftware.htmlunit.WebClient;import com.gargoylesoftware.htmlunit.WebPage;import com.gargoylesoftware.htmlunit.html.Image;import org.apache.commons.io.FileUtils;import org.apache.commons.lang3.StringUtils;import java.io.File;import java.io.IOException;import java.net.URL;import java.util.List;public class Crawler {public static void main(String[] args) {String targetUrl = "https://www.tmall.com/";String proxyHost = "https://www.duoip.cn/get_proxy";int proxyPort = 8000;WebClient webClient = new WebClient();webClient.setProxyHost(proxyHost);webClient.setProxyPort(proxyPort);try {webClient.connect(targetUrl);WebPage webPage = webClient.getPage(targetUrl);List images = webPage.getImages();for (Image image : images) {URL imageUrl = image.getUrl();String imageUrlStr = imageUrl.toString();if (StringUtils.startsWith(imageUrlStr, "https")) {String filename = imageUrlStr.substring(imageUrlStr.lastIndexOf("/") + 1);File file = new File("images/" + filename);FileUtils.copyURLToFile(imageUrl, file);System.out.println("Downloaded image: " + filename);}}} catch (IOException e) {e.printStackTrace();} finally {webClient.close();}}}```