Using Alteryx to Discover the World of Data Through Football Statistics

In my previous blog post which can be found here, I showed how you could extract data from the ESPN Sports website using Python and then suggested how with that data you could enter this wonderful world.

This post is a little different, I'm using Altertyx to ingest the data so I thought I would talk through what I did and hopefully inspire some Alyteryx users or show some benefits of Alteryx to none Alteryx users. This Alteryx Workflow does not extract as much data as the Python code I wrote before, but it does have the potential to do so.

So a brief overview of Alteryx, it is a user-friendly data analytics platform designed to simplify data preparation, blending, and analysis from various sources through its intuitive drag-and-drop interface. Some key features Alteryx offers are:

Data preparation and blending: Users can clean, transform, and merge data from multiple sources to create comprehensive datasets for analysis.
Advanced analytics and automation: Alteryx integrates with advanced analytical tools, and machine learning algorithms, and enables workflow automation to increase efficiency.
Seamless integration and collaboration: The platform easily connects with popular data visualization tools and offers server and cloud-based solutions for team collaboration and data-driven decision-making.

I would be wrong not to mention the Alteryx Community which is an invaluable resource for Alteryx users, offering a variety of benefits that include:

Knowledge sharing: Members can ask questions, share best practices, and learn from the experiences of others to improve their skills and troubleshoot issues.
Networking opportunities: The community allows professionals to connect, collaborate, and expand their professional networks within the field.
Access to resources: Users can deepen their understanding of Alteryx through the wealth of resources available, such as blogs, webinars, tutorials, and documentation.

If you want to know Alteryx, join the community.

So each Alteryx Workflow which you can see below is a series of tasks or tools joined together in a flow.

I have created an Analytical App here, there are a couple of different types of workflow but essentially this takes some input from the user and executes the workflow.

The Inputs allow a user to re-run certain sections of the workflow by connecting checkboxes to the containers inside of the workflow. The fig on the left shows how the input looks in the workflow, and on the right, it shows how it is presented to the user at run time.

The next section looks at creating a list of dates, appending them to an existing URL and then checking the site page for <href> tags while extracting the URL inside the tags if they include certain words. Once it has the list it saves them to a file, which can be called upon rather than scrapping again later on.

Pause here for a second to reflect on something I just said. I think it's important when doing this type of work to be respectful of the website. I have requests set to around 1 per second and I save the data where possible so if I need to re-run I don't have to call the site again, reducing the interactions with the site. Unpause!

The next section of the Alteryx Workflow looks at extracting the Match and pages of each game. using the list from above it calls each one. extracting the whole page into a file. This is good for 2 reasons:

If you change your code and rerun you don't hit the site again for the same page.
If much later you see you want to get some other data, you will be able to use that snapshot to extract as it was when you originally downloaded it.

When this runs it checks to see if any files exist already and doesn't re-extract those pages. There is an option to override this so as to collect all pages again. At the end there is a count tool, this is just to have an entry to move on to the next process, this won't actually be used.

Here the workflow reads the location of all the match and commentary files and saves them into a list. the filter at the end splits out to commentary and match so we can handle them differently.

Here is the main part, the section that uses lots of RegEx to read through the pages, find different sections of the page, extract it, clean it, enrich and create three outputs; Team Details, Match Details and Team Stats. This now allows me to start creating insight and analysis on the league and teams.

This was a bit of an overview of what is possible with Alteryx with a comparison to my original code in Python, do check that out again here. If you still reading it and want the Alteryx workflow you can download it here on GitHub here. Would love to hear if you have any other ideas or suggestions for this blog as I'm just trying to start feeling my way.

Using Alteryx to Discover the World of Data Through Football Statistics

Recent Posts

Comments