日本語   ◀Blog top   ▲Site top  

Blog top > Article from December 2024 > I Built the Site Using ChatGPT to Its Full Potential.

This is a blog about this site, which introduces the lesser-known Emanuel Bach (C. P. E. Bach).

I Built the Site Using ChatGPT to Its Full Potential.

This site was built using ChatGPT extensively. Additionally, I am using a program called HTML Cleaner and a custom content management program. I will write about these.

There is limited information about Emanuel Bach available online. Therefore, when discussing topics related to him with ChatGPT, incorrect information is often obtained. In particular, information regarding opus numbers, keys, and tempos is mostly inaccurate. Nevertheless, by avoiding these parts, I have been able to use ChatGPT to build this site.

ChatGPT.png

Explanations by ChatGPT

Emanuel Bach wrote numerous collections of works, each containing about 4 to 6 pieces. When creating entries for such collections, I often have ChatGPT write explanations about them. Additionally, I have had it write explanations about the works, Bachʼs family, and related musicians, and answer questions about them. While it is necessary to be cautious of hallucinations, I have not yet thoroughly checked everything.

Incorporating Information from CD Booklets -- OCR and Translation

To incorporate information from the booklet of Spányiʼs CD, I first scanned it and had ChatGPT read the resulting images. In other words, I used ChatGPT as an OCR. There are various OCR options available, starting with the one usable on Google Drive, but ChatGPT seems to read better than those. It is relatively resistant to noise and can fill in unreadable parts (though the filled-in parts may be incorrect...).

Then, I had the obtained English information translated into Japanese. It is also possible to directly output the OCR results in Japanese if necessary, but I can also have it write in English first to verify the accuracy of the Japanese translation. Some pages quote the obtained Japanese text.

SpanyiConcerto2.jpg

Furthermore, long texts were summarized by ChatGPT before being published on the site. This is also to avoid potential copyright issues that could arise from verbatim translations. Although I sometimes explicitly instructed it to summarize, ChatGPT tends to summarize long texts on its own even without such instructions.

Utilizing the Structure of Texts Written by ChatGPT

ChatGPT returns results in rich text, but if you feed it directly into an editor, the structure of the text is lost. Therefore, by processing it with a program called HTML Cleaner, I convert it into HTML tags, keeping it as simple as possible.

There are still some inconveniences. First, I have instructed ChatGPT to use ʼ.ʼ and ʼ,ʼ as punctuation marks, but it quickly reverts to using ʼ。ʼ and ʼ、ʼ. It also frequently inserts unnecessary horizontal lines (<hr />). I use a Python program to convert or remove these.

For content management, I use a custom Python program of about 500 lines (the management of the main site and the blog is done with separate programs). Using something like WordPress is one option, but as far as I have used WordPress, it doesnʼt quite work as I want, and it doesnʼt address all my needs. For example, when an opus number appears in the text, I want it to automatically link to the relevant entry, but achieving this with WordPress is not easy, so writing a content management program in Python makes such processing easier to implement. However, each page is generated statically. Also, there is no custom search engine, and Google is used instead. This could be seen as laziness, but it is also to avoid creating security holes. There is a complete set of pre-publication files on my MacBook, allowing for quick testing. Once the tests are OK, the files are transferred to the published website.

Correcting Texts Written by ChatGPT

The texts written by ChatGPT about Emanuel Bach and others often contain errors. In some cases, they can be almost entirely incorrect. I do not adopt the nonsensical parts, but for those with errors, I supplement them in the form of “correction ” or “[supplement]”. However, only some of the errors are corrected.

There is still insufficient information about Emanuel Bach available online. There are also many other early classical and Baroque composers for whom information is lacking. Therefore, it is believed that the training data for ChatGPT is insufficient, leading to hallucinations. While I strive to avoid incorporating errors as mentioned above, I believe there are still many errors on this site, so please be cautious.

Site search by Google

Upper items

< Article from December 2024

Dasyn.com デイシン
Created: 2024-12-24 23:22   Edited: 2025-01-17