Jump to content
Sign in to follow this  
Crest

Introduction to Machine Learning

Recommended Posts

Lately I've been taking up an interest in machine learning and wanted to see if I couldn't get some discussion started about it. I understand that there are a lot of novice programmers in the community so, here is an introduction to what this technology is, and why its so relevant to game development. Feel free to ask questions or start a discussion. Good research paper shares are always welcomed.

 

What is Machine Learning?
A machine learning algorithm is a problem solving method in which a machine can be taught to understand patterns in datasets and act accordingly on that data without having to program the specifics of how the system understands its input. Machine learning algorithms cover 2 different types of problems; Optimization Problems and Categorization Problems.

An example of an optimization problem would be The Travelling Salesman Problem; I have a set of cities, and I must stop at all of them at least once. I need to determine the best route through these cities. You could conquer this problem through conventional programming by iterating through the collection of cities, determining all possible routes and finding the one that has the lowest travel time. As you can imagine, this method is not optimal, if i had a large set of cities, the process would take a long time. So, instead of using a conventional programming method to determine the best route, I could use an unguided learning algorithm, such as a Genetic Algorithm, to determine a /damn good/ route instead.

 

Unguided Learning Algorithm: An algorithm in which a machine is not taught through learning data, and instead learns through its environment and patterns within data.
Genetic Algorithm: An unguided/reinforcement learning algorithm in which a system learns over iterations where mutations are introduced to the weakest data in a population each learning iteration. Genetic Algorithms are modeled off of Darwin's Theory of Evolution.
Population: (In a genetic algorithm)The current learning iteration's data set.

 

To give an example of a categorization problem; I need to programatically determine whether a shape in an image is a circle or a square. In order to solve this problem, I will need a type of Guided Learning Algorithm. With a Guided Learning Algorithm, before introducing the problem to the system, I will introduce a set of data called "Learning Data". Learning Data is a dataset (much like the one you will introduce your problem with) in which all the data is labeled with output before it is fed to the system. The system will then learn patterns in the input data and record which output label with the input data. When you introduce the problem, the system will turn to its memory of training data and determine what label best fits the unlabeled data record being introduced.

When you use a Machine Learning Algorithms, you are removing strict logic from a system, and are therefore giving up the "perfect accuracy" of a conventionally programmed system for performance and ease of problem solving. What I mean by this; Take the last example I gave about conquering the categorization problem of shapes. The system is only trained to determine the difference between a circle and a square. What happens if I introduce the image of a triangle? The system is only capable of telling us its a circle or a square, so neither label is correct. Introducing data like this to this type of system is what we call "Uncertainty". It is important to be aware that there is no perfect system when it comes to machine learning. The general standard of a "functional" ML system is one with an accuracy of about 96% (more or less).

Gaming in the world of Machine Learning

It is actually a pretty big trend to develop machine learning algorithms to play games. You could use machine learning to determine the best route to speed run a game, or even more interesting, you could use the game as a virtual environment for teaching an ai what to do in the real world. Sentdex was a twitch streamer who created a bot for Grand Theft Auto V to teach an AI how to drive a car. This is actually very common. Game worlds provide us with a simplified and usually easy to manipulate environment, often with physics much like the real world. This gives us excellent environments to test and develop artificial intelligence with.

I  am currently planning an algorithm that will play the survival game, Conan Exiles. I chose this game because it has multiplayer chat, voice chat, online play, and the interface makes it easy to determine the current task of the game. I can use this game as an environment to help an AI learn to play a game which essentially has no actual game rules (the player can do what the player wants, sandbox style). The AI could use the same environment to communicate and play with other players as well which gives a whole new input for teaching the AI to play a game. I am currently writing the chat algorithms, and you can join the discussion on my discord if you're interested in more.

If you're looking for a good server host for running machine learning algorithms on the GPU, I recommend PaperSpace. They're a host designed for indie machine learning researchers, they host jupyter notebooks, virtual servers with GPU access, and also provide services for running operations on the gpu without needing to rent an entire server for it.

Share this post


Link to post
Share on other sites

One of the best applications for Machine Learning imo is data science, wouldn't mind if you could do a writeup on that since it's quickly becoming a hobby of mine and I think it's a great use of machine learning to perform a task and conveys many of the aspects of it (although not all applications of data science are done using machine learning).

Share this post


Link to post
Share on other sites
2 hours ago, IAskQuestionsTooMuchButHey said:

One of the best applications for Machine Learning imo is data science, wouldn't mind if you could do a writeup on that since it's quickly becoming a hobby of mine and I think it's a great use of machine learning to perform a task and conveys many of the aspects of it (although not all applications of data science are done using machine learning).

All data science is really is the process of collecting and operating on sets of data. Usually in data science, you're running statistical algorithms on data sets. The way I see it, machine learning IS the application of data science and data science is basically a branch from software engineering, and in most cases is more simple. Most data scientists use Python or Node for ease of operation. These languages, being as high level as they are, make it very easy to manipulate, curate, and operate on data. So when I say machine learning is the application of data science, i mean that when you write a machine learning algorithm, you are writing an application which uses statistical algorithms to operate on datasets (which is what data science is).

Edit: As for a tutorial, all I can give is a high level explanation of how i collect data, what i collect, how i format it, and how i operate on it.

Take my discord chatbot, Scarlett, for example, which you've seen. Before i can write a bot that can have discussions I must collect necessary information that would essentially teach that bot how to converse. There's a general idea in data science that powerful data exists in the "wild" (our every day application use, real world conversations, and things that people use such as applications and websites.) In order to collect data I leave the bot in a channel and it watches for new messages coming it. When a message comes in, I operate on the incoming data before I store it and format it in a generic way that i know it could be used not only for one task but many. So my data is stored like this:
 

{
_id: "some unique id"
content: "The contents of the message",
parent: "The message that came before this one if any",
comment_id: "The id of the data supplied by the input application if any",
source: "Where the data came from",
tag: "Just something i use to add sub-categories onto a source",
author: "Who created the data (if available)"
}


This data format is not the final format to go into my training model. It is a generic data format that has a bit more information than i actually need, because i see the information as valuable and reusable. When I go to actually feed this information to my bot, i transform the data into the appropriate training format:

{"comment": "The parent comment", "response": "The reply to the comment"}

This is the only necessary data that i need in order to get my bot doing what i want it to. I can also apply filters based on the extra information that i stored in the database. For example, if I wanted a bot that learned information only on data that came from my discord philosophy channel, I would use this as my query when selecting data to convert to a training model:

{"tag": "philosophy", "source": "discord-collector"}

or if I wanted to train the bot only on comments that only you have made in my discord server (to train the bot to talk more like you) I would use this query instead:

{"tag": "lobby", "source": "discord-collector", "author": "<Phen's Discord UUID>"}

When i operate on the data itself, I feed the training model to the AI, which tells it "this is the input, and this is the response i expect". So when you introduce new input to that bot, it will use the input it knows of to generate a likely result to the input comment. The formatting of the data, acting on it, running the training model, all of that is data science.

Share this post


Link to post
Share on other sites

Update with a new ML Process:

If you have ever played Magic the Gathering and been in an official tournament, you will know that before a tournament begins, you must write down your decklist on a sheet of paper, so the house knows what cards your using and that all the you are using are  legal within the current rotating sets. After the tournament, the house sends the deck lists back to wizards where the data gets digitally recorded and sent to websites like http://www.mtgtop8.com.

Mtgtop8 lists the rank the player came in during that tournament, the cards they were using and how many of each card they used. This data can be inserted into a Genetic Algorithm to generate a new deck based off winning statistics, here is the algorithm I'm setting up currently:

Right now I am scraping 2 sets of data, a list of events (which includes the player name, a link to the deck, and the rank the player came in. Data from the event data table looks like this:
e323a5a1face76437c4cd17de8d1f837.png

 

The second set of data i am ripping, is the data of the cards that was used in that event, which can be ripped from the deck_url. The card data looks like this in the database:
3f58e0ee420d1337f131b7c00048565d.png

Once all my data has been ripped I will need a system for scoring the decks generated by the system.
The score of the deck can based on the ranks of individual cards.

Here is how I am converting ranks to scores, I am simply assigning each rank group a value. The better the rank, the higher the point value.
 

def convert_rank_to_score(rank):
    if rank == '1':
        return 100
    if rank == '2':
        return 80
    if rank == '3' or rank == '3-4' or rank == '4':
        return 60
    if rank == '5-8':
        return 40
    if rank == '9-16':
        return 20
    if rank == '':
        return 0

I can calculate the total score of the deck with the following formulas:
card_score = sum of all event rank scores for the card being scored
total_card_score = sum of all card_scores
pair_score = sum of all event rank scores that both cards in a pair took place in
total_pair_score = sum of all pair_scores

deck_score = (total_card_score + total_pair_score)

After I score a deck, before sending it off to its next generation, I can mutate the deck by removing the cards with the weakest scores and the weakest pairings.

This scoring system will generate decks with the most used cards and the most used pairings. It will also stop too many unwanted color cards from blending into the deck, because the card's paring is taken into value, and since each card in deck could have multiple pairings(1 card could be in up to 60 pairs), it would output a deck with cards that only play well with each other. The card title also includes the number of that used in the deck, this will the system to determine the right number of each card we should be using.

As I make progress, i will continue to post my findings and will display some results of generated decks, and how they play in competition

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

×