自然言語処理してみたいです。
このような要望にお応えします。
今回は、GPT-2学習済みモデルを用いてテキスト生成を行ってみます。
下記のサイトを参考にさせていただきました。
- https://openai.com/blog/better-language-models/
- https://github.com/graykode/gpt-2-Pytorch
- https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf
- https://clean-copy-of-onenote.hatenablog.com/entry/gpt-2-774M
GPT-2について
GPT-2学習済みモデルは、OpenAIにより公開されています。
GPT-2は、40GBのWebテキストを用いて学習されています。
OpenAIのwebページには、以下の説明があります。(詳しくは、こちらのページ参考 https://openai.com/blog/better-language-models/)
“この技術の悪意のある応用についての懸念があるため、論文に記載された学習済みモデルを公開していません。責任ある情報開示の実験として、代わりに、研究者が実験できるように、より小さなモデルと技術論文を公開します。”
何が悪意のある技術の応用となるのでしょうか。
近年では、フェイクニュースの危険性が指摘されています。
つまり、ここでいう悪意とは、インターネット上の捏造された情報がSNSを通じて拡散されて社会に悪影響を及ぼすことを指すのですかね。
そして応用は、例えば、フェイクニュースにより組織的に世論を操作することでしょうか。
確かに、これが実現した場合、全世界に影響を与える可能性がありますので危険かもしれないです。
多くの人々がスマホ, PCなどのメディアで情報を得る時代となっており、それなりに情報リテラシーは向上していると思いますし、AIが統計的に関連する文書として生成した文書にそこまでの悪影響があるのでしょうか。
一方で、読解力の低下により、フェイクニュースに踊らされることもあるのかもしれませんが。
OpenAIが危険だと指摘しているわけですから、それほどまでに、GPT-2では圧倒的に自然な文書を生成してくれるのですね。
楽しみです。
GPT-2は、あるテキスト内の単語が与えられたときに、次の単語を予測することで学習します。
また、GPT2-はGPTを直接スケールアップしたもので、10倍以上のパラメータを持ち、10倍以上のデータ量で学習するそうです。
GPT-2の詳細については、以下の論文を参照ください。
- https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf
それでは、GPT-2を試してみましょう。
Google Colaboratoryの準備
・Googleのアカウントを作成します。
・Googleドライブにアクセスし、「新規」→「その他」から「Google Colaboratory」の順でクリックします。そうすると、Colaboratoryが起動します。
・Colaboratoryが起動したら、以下のコマンドをCoalboratoryのセルに入力し実行します。
そうすることで、Googleドライブをマウントします。
1 2 |
from google.colab import drive drive.mount('/content/drive') |
・実行後、認証コードの入力が促されます。このとき、「Go to this URL in a browser」が指しているURLにアクセスしgoogleアカウントを選択すると、認証コードが表示されますので、それをコピーしenterを押します。これでGoogleドライブのマウントが完了します。
GPT-2の準備
データは、’Python’, ‘人口知能’, ‘深層学習’のwikipedia記事から一部テキストを抜粋します。
【Pythonのwikipedia記事】
Python is an interpreted, high-level, general-purpose programming language. Created by Guido van Rossum and first released in 1991, Python’s design philosophy emphasizes code readability with its notable use of significant whitespace. Its language constructs and object-oriented approach aim to help programmers write clear, logical code for small and large-scale projects.
【人工知能のwikipedia記事】
In computer science, artificial intelligence (AI), sometimes called machine intelligence, is intelligence demonstrated by machines, in contrast to the natural intelligence displayed by humans and animals. Leading AI textbooks define the field as the study of “intelligent agents”: any device that perceives its environment and takes actions that maximize its chance of successfully achieving its goals.Colloquially, the term “artificial intelligence” is often used to describe machines (or computers) that mimic “cognitive” functions that humans associate with the human mind, such as “learning” and “problem solving”.
【深層学習のwikipedia記事】
Deep learning (also known as deep structured learning) is part of a broader family of machine learning methods based on artificial neural networks with representation learning. Learning can be supervised, semi-supervised or unsupervised.
Deep learning architectures such as deep neural networks, deep belief networks, recurrent neural networks and convolutional neural networks have been applied to fields including computer vision, speech recognition, natural language processing, audio recognition, social network filtering, machine translation, bioinformatics, drug design, medical image analysis, material inspection and board game programs, where they have produced results comparable to and in some cases surpassing human expert performance.
GPT-2のツールをダウンロードする場所に移動します。
本記事では、マイドライブにダウンロードします。
1 |
cd /content/drive/My Drive |
gitからツールをダウンロードします。
1 |
!git clone https://github.com/graykode/gpt-2-Pytorch |
GPT-2学習済みモデルをダウンロードします。
1 |
!curl --output gpt2-pytorch_model.bin https://s3.amazonaws.com/models.huggingface.co/bert/gpt2-pytorch_model.bin |
ツールを使用するためのpythonライブラリをインストールします。
1 |
!pip install -r requirements.txt |
これで準備完了です。
出力結果
以下のコマンドでプログラムを実行します。
1 2 |
!python main.py --text "Python is an interpreted, high-level, general-purpose programming language. Created by Guido van Rossum and first released in 1991, Python's design philosophy emphasizes code readability with its notable use of significant whitespace. Its language constructs and object-oriented approach aim to help programmers write clear, logical code for small and large-scale projects." |
1 2 3 4 5 6 7 8 9 10 11 12 13 |
【Python記事から自動生成したテキスト】 ======================================== SAMPLE 1 ======================================== The main goals of Python are to be a "pure" language, which provides a complete object-oriented programming language for Python. It is also intended to be a "pure" programming language, which would be very useful for developers who want to write complex code in the most generalised form. The language is intended to be a "pure" programming language, which would be very useful for developers who want to write complex code in the most generalised form. While Python is intended to be a "pure" programming language, it does not contain any special features that would provide an ideal target for writing Python code. Instead, Python is designed to be a "pure" programming language, which would provide a complete object-oriented programming language for Python. The language is intended to be a "pure" programming language, which would be very useful for developers who want to write complex code in the most generalised form. While Python is intended to be a "pure" programming language, it does not contain any special features that would provide an ideal target for writing Python code. Instead, Python is designed to be a "pure" programming language, which would provide a complete object-oriented programming language for Python. For example, the language is not intended to be a "pure" programming language, and it does not provide any special features that would allow for any special features to be added to the object in the program. The language is intended to be a "pure" programming language, which would be very useful for developers who want to write complex code in the most generalised form. While Python is intended to be a "pure" programming language, it does not contain any special features that would allow for any special features to be added to the object in the program. The language is intended to be a "pure" programming language, which would provide a complete object-oriented programming language for Python. Furthermore, the language is not intended to be a "pure" programming language, and it does not provide any special features that would allow for any special features to be added to the object in the program. The language is not intended to be a "pure" programming language, and it does not provide any special features that would allow for any special features to be added to the object in the program. It is designed to be a "pure" programming language, which would provide a complete object-oriented programming language for Python. Furthermore, the language is designed to be a "pure" programming language, which would provide |
1 2 |
!python main.py --text "In computer science, artificial intelligence (AI), sometimes called machine intelligence, is intelligence demonstrated by machines, in contrast to the natural intelligence displayed by humans and animals. Leading AI textbooks define the field as the study of \"intelligent agents\": any device that perceives its environment and takes actions that maximize its chance of successfully achieving its goals.Colloquially, the term \"artificial intelligence\" is often used to describe machines (or computers) that mimic \"cognitive\" functions that humans associate with the human mind, such as \"learning\" and \"problem solving\"." |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
【人工知能記事から自動生成したテキスト】 ======================================== SAMPLE 1 ======================================== In fact, the term could be used to describe both "artificial intelligence" and "human intelligence", as well as "machine intelligence", or "machine-like intelligence". These terms are used to describe the most advanced forms of AI research and development. Artificial intelligence advances are primarily based on the ability to perform complex tasks, such as predicting and interpreting the world around us, understanding the world's changing climate and more. AI advancements and technologies are sometimes called "intelligent agents", or "intelligent agents that understand the world around them". In the most recent academic research, more than 5,000 artificial intelligence systems were developed over the course of five years. In the United States, the number of AI systems has risen from 441 in 2010 to 735 by the end of 2012. In India, the number of AI systems is increasing at a rate of nearly 20,000 per year, with almost all of them being run by Indian AI research institutes. The number of AI systems is increasing at a rate of nearly 20,000 per year, with almost all of them being run by Indian AI research institutes. The number of AI systems is increasing at a rate of nearly 20,000 per year, with almost all of them being run by Indian AI research institutes. In the United States, the number of AI systems is increasing at a rate of nearly 20,000 per year, with almost all of them being run by Indian AI research institutes. This is because many of the AI systems are not developed in a way that is suitable for individual human-powered programs. In some cases, the research is conducted in countries where human-powered software has no control over the program. In some cases, the research is conducted in countries where human-powered software has no control over the program. This is because many of the AI systems are not developed in a way that is suitable for individual human-powered programs. In some cases, the research is conducted in countries where human-powered software has no control over the program. In some cases, the research is conducted in countries where human-powered software has no control over the program. This is because many of the AI systems are not developed in a way that is suitable for individual human-powered programs. In some cases, the research is conducted in countries where human-powered software has no control over the program. The role of human-powered software In recent years, the use of human-powered software has become more widely available and more profitable. The use of human-powered software has become |
1 2 |
!python main.py --text "Deep learning (also known as deep structured learning) is part of a broader family of machine learning methods based on artificial neural networks with representation learning. Learning can be supervised, semi-supervised or unsupervised. Deep learning architectures such as deep neural networks, deep belief networks, recurrent neural networks and convolutional neural networks have been applied to fields including computer vision, speech recognition, natural language processing, audio recognition, social network filtering, machine translation, bioinformatics, drug design, medical image analysis, material inspection and board game programs, where they have produced results comparable to and in some cases surpassing human expert performance." |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
【深層学習記事から自動生成したテキスト】 ======================================== SAMPLE 1 ======================================== In the field of data science, deep learning is a system of algorithms that are able to perform an action, such as translating from a given text to an output. In many cases, deep learning techniques are applied to the data and perform the process independently of one another. In this way, deep learning can be used to perform many functions that are not possible in the real world, such as machine translation, word learning, word processing, or speech recognition. Deep learning is used to perform many functions that are not possible in the real world but can be used for many different purposes. Deep learning is used to perform many functions that are not possible in the real world such as machine translation, word learning, word processing, or speech recognition. Deep learning is used to perform many functions that are not possible in the real world but can be used for many different purposes. In these cases, the human expert performs their job and in many cases, they are not trained to perform the task. In many cases, the human expert performs their job and in many cases, they are not trained to perform the task. In these cases, deep learning algorithms are used to perform tasks that are not possible in the real world. Deep learning is applied to perform these tasks. In some cases, the tasks involve a large number of inputs and outputs and the computer is able to process the data. In some cases, the task involves a large number of inputs and outputs and the computer is able to process the data. In many cases, the tasks involve a large number of inputs and outputs and the computer is able to process the data. In some cases, the task involves a large number of inputs and outputs and the computer is able to process the data. In some cases, the task involves a large number of inputs and outputs and the computer is able to process the data. The basic concepts of deep learning are: a) The idea of a set of parameters for a data set. The parameters are those numbers or sets with the right values for each of these numbers. b) The idea of a function of a set of parameters for a data set. The parameters are those functions of a structure that can be used to compute the results of the functions. c) The idea of a function of a set of parameters for a data set. The parameters are those functions of a structure that can be used to compute the results of the functions. d) The idea of a function of a set of |
今回は、GPT-2の学習済みモデルを使用してみました。
GPT-2で生成された文書は、自然というか、むしろ不自然なように見えます。
個々のセンテンスは、自然に見えなくもないですが、文書全体で見たときに不自然な感じです。
正直ベースに言うと、この結果では危険性はあまり感じません。
今回使用したモデルがスモールセットだからでしょうか?
また、パラメータを指定して実行することもできるようなので、考慮して実施することも試したほうがいいかもしれませんね。
また、入力テキストのジャンルとして、IT系の記事から抜粋したものを使用していますが、その他のジャンルではどうなるのか試してみるのもいいかもしれません。