참고용, AI가 짜준 파이썬 데이타 크롤링 프로그램

AI에게 다음과 같이 질문을 해서 답을 받는 내용을 기록해 봅니다. 아직 파이썬을 배우지 않아서 이 프로그램이 제대로 작동하는지는 모릅니다.

https://companiesmarketcap.com/ 사이트 데이타를 한국 시간 기준 매일 오전 7시에 정기적으로 백업할 수 있도록 프로그램을 짜줄 수 있겠니?

Table of contents 목차 보이기

AI가 짜준 파이썬 데이타 크롤링 프로그램

이 프로그램은 한국 시간 기준 매일 오전 7시에 https://companiesmarketcap.com/ 사이트의 데이터를 스크랩하고 companies_market_cap.csv 파일에 저장합니다. time_string 변수를 원하는 시간으로 변경할 수 있습니다. 예를 들어, time_string = “12:00” 으로 설정하면 정오에 데이터를 백업합니다.

이 프로그램을 실행하려면 requests, BeautifulSoup, schedule 라이브러리가 필요합니다.

이 프로그램은 참고용이며, 실제로 사용하기 전에 테스트하고 필요한 경우 수정하시기 바랍니다.

https://companiesmarketcap.com/ 사이트의 데이터를 한국 시간 기준 매일 오전 7시에 정기적으로 백업할 수 있는 파이썬 프로그램을 작성해 보았습니다. 다음은 코드입니다.

코드

# Import requests, BeautifulSoup, schedule and time libraries
import requests
from bs4 import BeautifulSoup
import schedule
import time

# Define the URL of the website
url = "https://companiesmarketcap.com/"

# Define a function to scrape the data from the website
def scrape_data(url):
    # Send a GET request to the website and get the response
    response = requests.get(url)
    # Check if the response status code is 200 (OK)
    if response.status_code == 200:
        # Parse the response content using BeautifulSoup
        soup = BeautifulSoup(response.content, "html.parser")
        # Find the table element that contains the data
        table = soup.find("table", id="companies")
        # Find all the table rows that contain the data
        rows = table.find_all("tr")
        # Create an empty list to store the data
        data = []
        # Loop through each row
        for row in rows:
            # Find all the table cells that contain the data
            cells = row.find_all("td")
            # Check if the row has 7 cells
            if len(cells) == 7:
                # Extract the data from each cell
                rank = cells[0].text.strip()
                name = cells[1].text.strip()
                market_cap = cells[2].text.strip()
                price = cells[3].text.strip()
                today = cells[4].text.strip()
                volume = cells[5].text.strip()
                ratio = cells[6].text.strip()
                # Create a dictionary to store the data for each company
                company = {
                    "rank": rank,
                    "name": name,
                    "market_cap": market_cap,
                    "price": price,
                    "today": today,
                    "volume": volume,
                    "ratio": ratio
                }
                # Append the dictionary to the data list
                data.append(company)
        # Return the data list
        return data
    else:
        # Return None if the response status code is not 200
        return None

# Define a function to save the data to a CSV file
def save_data(data, filename):
    # Import the csv library
    import csv
    # Open the file in write mode
    with open(filename, "w") as file:
        # Create a csv writer object
        writer = csv.writer(file)
        # Write the header row
        writer.writerow(["Rank", "Name", "Market Cap", "Price", "Today", "Volume", "P/E Ratio"])
        # Loop through each company in the data list
        for company in data:
            # Write the data row
            writer.writerow([company["rank"], company["name"], company["market_cap"], company["price"], company["today"], company["volume"], company["ratio"]])

# Define a function to schedule the scraping and saving tasks
def schedule_tasks(url, filename, time_string):
    # Define a task to scrape and save the data
    def task():
        # Print a message
        print(f"Scraping and saving data from {url} at {time.ctime()}")
        # Scrape the data from the website
        data = scrape_data(url)
        # Check if the data is not None
        if data:
            # Save the data to the file
            save_data(data, filename)
            # Print a message
            print(f"Data saved to {filename}")
        else:
            # Print a message
            print(f"Failed to scrape data from {url}")
    # Schedule the task to run at the specified time every day
    schedule.every().day.at(time_string).do(task)
    # Loop indefinitely
    while True:
        # Run the pending tasks
        schedule.run_pending()
        # Sleep for 1 second
        time.sleep(1)

# Define the main function
def main():
    # Define the URL of the website
    url = "https://companiesmarketcap.com/"
    # Define the filename of the CSV file
    filename = "companies_market_cap.csv"
    # Define the time string in HH:MM format
    time_string = "07:00" # 7 AM in Korean Standard Time
    # Schedule the scraping and saving tasks
    schedule_tasks(url, filename, time_string)

# Call the main function
if __name__ == "__main__":
    main()
Code language: PHP (php)

Buy now

참고용, AI가 짜준 파이썬 데이타 크롤링 프로그램

AI가 짜준 파이썬 데이타 크롤링 프로그램

코드

Latest articles

봄을 반기는 제주 매화꽃 명소 5

2/25 발표될 엔비디아 실적 전망과 엔비디아 주가 전망

이번 실적발표에서 나타난 테크기업 AI 투자비

24년 4분기 삼성전자 실적 단상

Related articles

아이오닉5 전기차 시스템 점검 경고 AS 경험기록

양자 컴퓨터와 AI 시너지, 양자컴퓨터는 AI의 게임 체인저가 될 수 있는 3가지 이유

워드프레스 뉴스페이퍼 테마 자동로딩, 인피니트 로딩 설정 방법