[Java/Selenium, jsoup] 자바를 이용하여 스크린샷, 텍스트 수집하기

IT/Java

[Java/Selenium, jsoup] 자바를 이용하여 스크린샷, 텍스트 수집하기

당근당근이

2021. 12. 23.

https://kingname.tistory.com/221

[Python] 파이썬 selenium을 이용한 스크린샷 찍기

이전에는 셀레니움을 설치하는 포스팅을 했다면 https://kingname.tistory.com/219 파이썬 크롬브라우저 자동화 컨트롤러 Python Selenium 설치하기 Selenium이란 Selenium은 웹 어플리케이션을 자동으로 테스팅,.

kingname.tistory.com

저번 글에서는 파이썬을 이용하여 스크린샷을 찍는 글을 포스팅 했었습니다. 이번에는 Java를 이용하여 저번에 작성한 파이썬 코드를 Java를 이용하여 보도록 하겠습니다.

우선 아래의 라이브러리를 설치해야합니다. 메이븐레파지토리에서 다운로드가 가능합니다.

// selenium
implementation group: 'org.seleniumhq.selenium', name: 'selenium-java', version: '4.1.0'
// jsoup
implementation group: 'org.jsoup', name: 'jsoup', version: '1.14.3'

이후 파이썬 코드를 컨버팅하여 내용은 이전글과 동일합니다.

 public NotionVO saveNotionScreenShotAndGetText(String url) {
        WebDriver driver = null;
        int totalWidth = 1920;
        int totalHeight = 1027;
        int timeOutInSeconds = 10;
        String htmlSource = "";
        String saveImagePath = "";
        try {
            driver = getWebChromeDriver();
            driver.get(url);

            changeWindowSize(driver, totalWidth, totalHeight);
            
            // 특정 엘리먼트가 생성될때까지 웹페이지를 기다림
            waitVisibilityOfElementLocated(driver, "엘리먼트명", timeOutInSeconds); 
            sleep(Duration.ofSeconds(3L));
			
            // 특정 클래스의 요소(현재는 스크롤)에 접근하여 데이터 가져오기
            WebElement element = driver.findElement(By.className("scroller"));
            totalHeight = Integer.parseInt(element.getAttribute("scrollHeight"));
			
            // 크롬 사이즈 변경
            changeWindowSize(driver, totalWidth, totalHeight);
            sleep(Duration.ofSeconds(2L));
			
            // 데이터 저장 장소
            saveImagePath = IMAGE_SAVE_DIR + "notion_" + totalHeight + ".png";
            saveScreenshot(driver, saveImagePath);
            
            // Html 데이터
            htmlSource = driver.getPageSource();
        } catch (Exception e) {
            e.printStackTrace();
            return NotionVO.builder().url(url).build();
        } finally {
            if (driver != null) {
                driver.close();
            }
        }
		
        // Jsoup 을 이용하여 텍스트 수집
        String content = Jsoup.parse(htmlSource).text();

        return NotionVO.builder().content(content)
                .contentLength(content.length())
                .regDt(LocalDateTime.now())
                .ner(nerAnalyzerService.getNerFromText(content))
                .screenshotFilePath(saveImagePath)
                .url(url)
                .build();
    }
	
    // 크롬 드라이버 
    private WebDriver getWebChromeDriver() {
        ChromeDriverService service = new ChromeDriverService.Builder()
                .usingDriverExecutable(new File(DRIVER_PATH))
                .usingAnyFreePort()
                .build();
        ChromeOptions options = getChromeOptions();
        return ChromeDriver.builder()
                .withDriverService(service)
                .addAlternative(options)
                .build();
    }

	// 크롬 드라이버 옵션
    private ChromeOptions getChromeOptions() {
        ChromeOptions options = new ChromeOptions();
        options.addArguments("headless");
        options.addArguments("disable-gpu");
        options.addArguments("window-size=1920x1080");
        options.addArguments("lang=ko_KR");
        options.addArguments("Content-Type=application/json; charset=utf-8");
        return options;
    }

    private void sleep(Duration duration) {
        try {
            Thread.sleep(duration.toMillis());
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
    }
	
    private void waitVisibilityOfElementLocated(WebDriver driver, String className, int timeOutInSeconds) {
        WebDriverWait webDriverWait = new WebDriverWait(driver, timeOutInSeconds);
        webDriverWait.until(ExpectedConditions.visibilityOfElementLocated(By.className(className)));
    }

    private void changeWindowSize(WebDriver driver, int totalWidth, int totalHeight) {
        driver.manage().window().setSize(new Dimension(totalWidth, totalHeight));
    }

    private void saveScreenshot(WebDriver driver, String saveImagePath) {
        try {
            File screenshot = ((TakesScreenshot) driver).getScreenshotAs(OutputType.FILE);
            FileUtils.copyFile(screenshot, new File(saveImagePath));
        } catch (IOException e) {
            e.printStackTrace();
        }
    }

위 코드를 이용하여 특정 웹 페이지를 스크린샷, 텍스트를 수집할 수 있습니다.

저작자표시 비영리 변경금지 (새창열림)

아래에 IT/Java의 더 많은 정보에 링크를 걸어놨습니다.

caused by java.nio.charset.malformedinputexception input length = 1 해결방법 바로가기 클릭!Java 파일 읽는 방법, Apache Tika(아파치 티카) 사용하여 파일 읽기 바로가기 클릭![디자인 패턴] Java를 이용한 소스로 보는 디자인 패턴 종류 정리 2편 바로가기 클릭![디자인 패턴] Java를 이용한 소스로 보는 디자인 패턴 종류 정리 1편 바로가기 클릭!

IT/Java

[Java/Selenium, jsoup] 자바를 이용하여 스크린샷, 텍스트 수집하기

엉망진창

개인 블로그 입니다. 코딩, 맛집, 정부정책, 서비스, ~방법 등 다양한 정보를 소개합니다

티스토리툴바