ChatGPT の利用（文書の整形，文書の要約）（ChatGPT の API，Python を使用）（Windows 上）

【要約】 ChatGPTとPythonを使った文章の整形や要約の方法をWindowsで紹介している．特徴として，一度に処理できない長い文章の場合には分割処理し，結合して，最終的な結果を得るようにしている．具体的なプログラムの実行方法についても触れられている．文書整形プログラム `arrange.py` は，指定されたテキストファイルをOpenAIのChatGPT 3.5 turboを使用して整形する．要約プログラム `summary.py` は，指定されたテキストファイルをOpenAIのChatGPT 3.5 turboを使用して要約し，長さ500以下にする．いずれも，テキストが長い場合には分割して処理する．そして，いずれも，入力ファイルと出力ファイルをコマンドライン引数で指定する．いずれのプログラムも，実行には Chat GPT のAPIキーが必要である．プログラムを実行する際に必要なPython開発環境やプログラムの保存方法，ファイル名の指定方法についても説明している．PythonプログラムはUbuntuでも動く．

【目次】

前準備
文書の整形プログラム（ChatGPT API，Python を使用）
文書の要約プログラム（ChatGPT API，Python を使用）

前準備

Python のインストール（Windows 上）

【サイト内の関連ページ】

Windows での Python 3.10，関連パッケージ，Python 開発環境のインストール: 別ページ »で説明している．

Windows での Anaconda3 のインストール: 別ページ »で説明している．

Python のまとめ: 別ページ »にまとめている．

【関連する外部ページ】

openai のインストール

Windows で，コマンドプロンプトを管理者として実行

コマンドプロンプトを管理者として実行: 別ページ »で説明

python -m pip install -U --ignore-installed pip
python -m pip install -U openai

文書の整形プログラム（ChatGPT API，Python を使用）

ChatGPT 3.5 turbo にのプロンプトを与えることにより，日本語の文書を整形して，最終結果をファイルに保存する． HTMLファイルやPDFやパワーポイントなどから抜き出して得られる書式が乱れた文書を整える用途を想定． 指定したファイルの中身が長いときには，分割して，ChatGPT 3.5 turbo で処理を行う．このプログラムの利用では，OpenAI の APIキーが必要である．

OpenAI の APIキーを準備する

【関連する外部ページ】

OpenAI の API キーのページ
https://platform.openai.com/account/api-keys
料金の条件や利用履歴はこちらで確認．
https://platform.openai.com/account/billing/limits

Windows で，コマンドプロンプトを実行

エディタを起動

cd %HOMEPATH%
notepad arrange.py

エディタで，次のプログラムを保存

'''
ChatGPT 3.5 turbo に「次の日本語の文章を整えてください.」のプロンプトを与えることにより，
整形して，最終結果をファイルに保存する．
指定したファイルの中身が長いときには，分割して，ChatGPT 3.5 turbo で処理を行う．
このプログラムの利用では，APIキーが必要である．
[利用法]
python arrange.py --input input.txt --output output.txt --api_key your_api_key
'''

import argparse
import openai
import sys
import textwrap
import time


MAX_CHUNK_LENGTH = 1500
DEBUG_PRINT = False


def get_arguments():
    parser = argparse.ArgumentParser(description='ChatGPT Text Refinement.')
    parser.add_argument('--input', type=str, required=True,
                        help='Input file path')
    parser.add_argument('--output', type=str, required=True,
                        help='Output file path')
    parser.add_argument('--api_key', type=str, required=True,
                        help='OpenAI API Key')
    parser.add_argument('--remove_url_and_source_code', type=bool, default=False,
                        help='Remove URL and source code from text')
    parser.add_argument('--model', type=str, default="gpt-3.5-turbo", 
                        help='GPT model')

    args = parser.parse_args()
    return args


def send_messages(model, content, chunk):
    messages = [
        {"role": "system", "content": content},
        {"role": "user", "content": chunk}
    ]

    if DEBUG_PRINT:
        print("----------------------------------------------------")
        print(messages)

    try:
        # APIにリクエスト
        response = openai.ChatCompletion.create(
            model=model,
            messages=messages,
        )
    except openai.api_resources.abstract.Error as e:
        print(f"Failed to send request to OpenAI API: {str(e)}")
        sys.exit(1)

    return response['choices'][0]['message']['content']


def handle_chunk(model, content, current_chunk):
    results = []
    # テキストの長さが長すぎる場合、その位置で分割
    chunks = textwrap.wrap(
        current_chunk, width=MAX_CHUNK_LENGTH, break_long_words=True)
    for chunk in chunks:
        # APIにリクエスト
        response = send_messages(model, content, chunk)
        if DEBUG_PRINT:
            print("----------------------------------------------------")
            print("response,", response)
        # レスポンスをリストに追加
        results.append(response)
        # APIのレート制限．1分間に40kトークン以下、1分間に200以下リクエストを目指したい
        time.sleep(20)
    return results


def request(model, content, text):
    sentences = text.split("\n")

    results = []
    current_chunk = ""

    for sentence in sentences:
        # 一定の長さに達するまで文章を追加
        if len(current_chunk) + len(sentence) < (MAX_CHUNK_LENGTH - 100):
            current_chunk += sentence + "\n"
        else:
            if len(current_chunk) > 0:
                results += handle_chunk(model, content, current_chunk)
            # 現在のチャンクをリセット．次の文を設定
            current_chunk = sentence + "\n"

    # 最終チャンクの処理
    if current_chunk:
        results += handle_chunk(model, content, current_chunk)

    if DEBUG_PRINT:
        print("----------------------------------------------------")
        print("results,", results)
    # 結合して最終的な結果を作成
    final_result = "\n".join(results)
    return final_result


def main():
    args = get_arguments()

    # OpenAIのAPIキーを設定
    openai.api_key = args.api_key

    # ファイル名は，コマンドライン引数
    filename = args.input
    output_filename = args.output

    # コマンドライン引数から remove_url_and_source_code の値を取得
    remove_url_and_source_code = args.remove_url_and_source_code

    # コマンドライン引数から使用する GPT モデル名を取得
    model = args.model

    try:
        with open(filename, 'r', encoding='utf-8') as file:
            text = file.read()
    except FileNotFoundError:
        print(f"The file {filename} was not found.")
        sys.exit(1)

    if remove_url_and_source_code:
        prompt = "Please examine the provided Japanese text, and enhance its format by correcting any grammar, spelling, syntax, and punctuation errors, while maintaining the original meaning. If you identify any clear redundancies or disorganized sections, restructure them to improve readability, while ensuring the original intent remains intact. Please exclude any source code and URL in the output. Do not translate or predict subsequent sentences. The output should be in Japanese."
    else:
        prompt = "Please examine the provided Japanese text, and enhance its format by correcting any grammar, spelling, syntax, and punctuation errors, while maintaining the original meaning. If you identify any clear redundancies or disorganized sections, restructure them to improve readability, while ensuring the original intent remains intact. Do not translate or predict subsequent sentences. The output should be in Japanese."

    final_result = request(
        model,
        prompt, text)

    try:
        # 結果をファイルに書き込む
        with open(output_filename, 'w', encoding='utf-8') as output_file:
            output_file.write(final_result)
        print("\033[32m結果が保存されました．入力ファイル名は", filename, "出力ファイル名は", output_filename, "\033[0m")
    except IOError as e:
        print("\033[31mファイルへの書き込みに失敗しました．エラー内容:", str(e), "\033[0m")
        sys.exit(1)


if __name__ == "__main__":
    main()

Python プログラムの実行

Windows では python （Python ランチャーは py）
Ubuntu では python3

Python 開発環境（Jupyter Qt Console, Jupyter ノートブック (Jupyter Notebook), Jupyter Lab, Nteract, Spyder, PyCharm, PyScripterなど）も便利である．

Python のまとめ: 別ページ »にまとめ

プログラムを arrange.pyのようなファイル名で保存したので，「python arrange.py」のようなコマンドで行う．

input.txt のところには処理したいファイル名を指定すること．

output.txt のところには結果を保存したいファイル名を指定すること．

your_api_key のところには OpenAI の API キーを指定すること．

もし，ソースコードおよび URL を取り除いて処理したいときは「--remove_url_and_source_code 1」を付けることができる．

既定（デフォルト）のモデルは，「gpt-3.5-turbo」である．もしモデルを変更したいときは，「--model ＜モデル名＞」のように指定できる．

python arrange.py --input input.txt --output output.txt --api_key your_api_key

結果の確認

処理前のテキストファイルの先頭部分

（以下省略）
処理結果のテキストファイルの先頭部分
処理結果では，文章として整うように整形される．

（以下省略）

文書の要約プログラム（ChatGPT API，Python を使用）

指定したテキストファイルを読み込み， ChatGPT 3.5 turbo にプロンプトを与えることにより，要約して，最終結果をファイルに保存する．指定したファイルの中身が長いときには，分割して，ChatGPT 3.5 turbo で処理を行い，処理結果を結合するようにしている． 要約した結果が長いときは，要約を繰り返す．

OpenAI の API キーを準備しておく

OpenAI の API キーのページ

https://platform.openai.com/account/api-keys

料金の条件や利用履歴はこちらで確認．

https://platform.openai.com/account/billing/limits

Windows で，コマンドプロンプトを実行

エディタを起動

cd %HOMEPATH%
notepad summary.py

エディタで，次のプログラムを保存

'''
指定したテキストファイルを読み込み，
ChatGPT 3.5 turbo にプロンプトを与えることにより，要約して，最終結果をファイルに保存する．指定したファイルの中身が長いときには，分割して，ChatGPT 3.5 turbo で処理を行い，処理結果を結合するようにしている．
要約した結果が長いときは，要約を繰り返す．
[利用法]
python arrange.py --input input.txt --output output.txt --api_key your_api_key
'''

import argparse
import openai
import sys
import textwrap
import time


MAX_CHUNK_LENGTH = 1500
DEBUG_PRINT = False
CHARACTERS = 500

def get_arguments():
    parser = argparse.ArgumentParser(description='ChatGPT Text Refinement.')
    parser.add_argument('--input', type=str, required=True,
                        help='Input file path')
    parser.add_argument('--output', type=str, required=True,
                        help='Output file path')
    parser.add_argument('--api_key', type=str, required=True,
                        help='OpenAI API Key')
    parser.add_argument('--remove_url_and_source_code', type=bool, default=False,
                        help='Remove URL and source code from text')
    parser.add_argument('--model', type=str, default="gpt-3.5-turbo", 
                        help='GPT model')

    args = parser.parse_args()
    return args


def send_messages(model, content, chunk):
    messages = [
        {"role": "system", "content": content},
        {"role": "user", "content": chunk}
    ]

    if DEBUG_PRINT:
        print("----------------------------------------------------")
        print(messages)

    try:
        # APIにリクエスト
        response = openai.ChatCompletion.create(
            model=model,
            messages=messages,
        )
    except openai.api_resources.abstract.Error as e:
        print(f"Failed to send request to OpenAI API: {str(e)}")
        sys.exit(1)

    return response['choices'][0]['message']['content']


def handle_chunk(model, content, current_chunk):
    results = []
    # テキストの長さが長すぎる場合、その位置で分割
    chunks = textwrap.wrap(
        current_chunk, width=MAX_CHUNK_LENGTH, break_long_words=True)
    for chunk in chunks:
        # APIにリクエスト
        response = send_messages(model, content, chunk)
        if DEBUG_PRINT:
            print("----------------------------------------------------")
            print("response,", response)
        # レスポンスをリストに追加
        results.append(response)
        # APIのレート制限．1分間に40kトークン以下、1分間に200以下リクエストを目指したい
        time.sleep(20)
    return results


def request(model, content, text):
    sentences = text.split("\n")

    results = []
    current_chunk = ""

    for sentence in sentences:
        # 一定の長さに達するまで文章を追加
        if len(current_chunk) + len(sentence) < (MAX_CHUNK_LENGTH - 100):
            current_chunk += sentence + "\n"
        else:
            if len(current_chunk) > 0:
                results += handle_chunk(model, content, current_chunk)
            # 現在のチャンクをリセット．次の文を設定
            current_chunk = sentence + "\n"

    # 最終チャンクの処理
    if current_chunk:
        results += handle_chunk(model, content, current_chunk)

    if DEBUG_PRINT:
        print("----------------------------------------------------")
        print("results,", results)
    # 結合して最終的な結果を作成
    final_result = "\n".join(results)
    return final_result


def main():
    args = get_arguments()

    # OpenAIのAPIキーを設定
    openai.api_key = args.api_key

    # ファイル名は，コマンドライン引数
    filename = args.input
    output_filename = args.output

    # コマンドライン引数から remove_url_and_source_code の値を取得
    remove_url_and_source_code = args.remove_url_and_source_code

    # コマンドライン引数から使用する GPT モデル名を取得
    model = args.model

    try:
        with open(filename, 'r', encoding='utf-8') as file:
            text = file.read()
    except FileNotFoundError:
        print(f"The file {filename} was not found.")
        sys.exit(1)

    prompt1 = "Please provide a summary of the supplied Japanese text, excluding any source code and URL. Do not translate or predict any future sentences. The output should be a single paragraph and it should be in Japanese."
    prompt2 = "Please provide a summary of the supplied Japanese text. Do not translate or predict any future sentences. The output should be a single paragraph and it should be in Japanese."

    if remove_url_and_source_code:
        prompt = prompt1
    else:
        prompt = prompt2

    final_result = request(
        model,
        prompt, text)

    if DEBUG_PRINT:
        print("----------------------------------------------------")
        print("final_result,", final_result)

    while(len(final_result) > CHARACTERS):
        # 長いので、もう一度要約する。
        text = final_result
        final_result = request(
            "gpt-3.5-turbo",
            prompt, text)

        if DEBUG_PRINT:
            print("----------------------------------------------------")
            print("final_result,", final_result)
        
    # 全体を整えるように ChatGPT に頼む
    text = final_result
    final_result = request(
        "gpt-3.5-turbo",
        "As a professional proofreader, your task is to refine and correct the provided Japanese text while preserving its original meaning. The refined utput should not translate or predict any future sentences, and it should be consolidated into a single paragraph. Importantly, the output must remain in Japanese and not exceed " + str(CHARACTERS) + " characters.", text)

    if DEBUG_PRINT:
        print("----------------------------------------------------")
        print("final_result,", final_result)


    try:
        # 結果をファイルに書き込む
        with open(output_filename, 'w', encoding='utf-8') as output_file:
            output_file.write(final_result.replace('\r', '').replace('\n', ''))
        print("\033[32m結果が保存されました．入力ファイル名は", filename, "出力ファイル名は", output_filename, "\033[0m")
    except IOError as e:
        print("\033[31mファイルへの書き込みに失敗しました．エラー内容:", str(e), "\033[0m")
        sys.exit(1)


if __name__ == "__main__":
    main()

Python プログラムの実行

Windows では python （Python ランチャーは py）
Ubuntu では python3

Python 開発環境（Jupyter Qt Console, Jupyter ノートブック (Jupyter Notebook), Jupyter Lab, Nteract, Spyder, PyCharm, PyScripterなど）も便利である．

Python のまとめ: 別ページ »にまとめ

プログラムを summary.pyのようなファイル名で保存したので，「python summary.py」のようなコマンドで行う．

output.txt のところには処理したいファイル名を指定すること．

summary.txt のところには結果を保存したいファイル名を指定すること．

your_api_key のところには OpenAI の API キーを指定すること．

もし，ソースコードおよび URL を取り除いて処理したいときは「--remove_url_and_source_code 1」を付けることができる．

既定（デフォルト）のモデルは，「gpt-3.5-turbo」である．もしモデルを変更したいときは，「--model ＜モデル名＞」のように指定できる．

python arrange.py --input output.txt --output summary.txt --api_key your_api_key

結果の確認

処理前のテキストファイルの先頭部分

（以下省略）
処理結果のテキストファイル
処理結果では要約される．文字数は「CHARACTERS = 500」で調整してください．

本サイトは金子邦彦研究室のWebページである．

資料等の公開では，原則，「クリエイティブコモンズ BY NC SA」として公開するようにしている． PDFファイル，パワーポイントファイルなどには，「クリエイティブコモンズ BY NC SA」を明記するとともに，ロゴを記載するようにしている（作業が間に合っていない分もあるのでご容赦ください）．

公開している資料をご利用になる場合の，再配布の条件，剽窃の防止などについて，別ページ »で説明再配布や資料改変の際には，そのページをご確認ください．

サイトマップは，サイトマップのページをご覧下さい．本サイト内の検索は，サイト内検索のページをご利用下さい．

問い合わせ先：金子邦彦（かねこくにひこ） [image]