Data in the wild can be messy, malformed, and/or generally ill-suited to the specifications of statistical analyses and machine-learning techniques. In this intermediate-level workshop, you'll learn how to use Python to clean, reshape, and transform data prior to analysis. Topics covered may include:
- Editing strings with regular expressions
- Converting data between wide and long formats
- Dealing with null values
- Grouping and aggregating data
- Working with time series/datetime types
- Encoding categorical values
- Importing and exporting to and from common formats.
Attendees will understand the importance of clean, well-formed datasets; practice using Python to clean, reshape, and format data in various ways; and become familiar with best practices in writing Python code. It will be helpful to have had prior exposure to Python, such as through the "Introduction to Python" workshop or Python Camp. No installation of Python is needed.
This workshop is part of the Using Programming and Code for Research workshop series for for anyone who wants to get started or learn more about use programming languages like Python, R, or other applications. These tools can help you to collect, manipulate, clean, analyze, and visualize research data or automate many repetitive tasks. If you need personalized assistance with a data analysis, programming, or coding project, consider booking a consultation with one of our librarian-experts. Learn more about our services for programming and coding and for working with data.
All sessions are free to GW students, faculty, staff, and alumni. GW has an institutional commitment to ensuring that all of our programs and events are accessible for all individuals. If you require any accommodations to participate in this event, please contact firstname.lastname@example.org at least 72 business hours (3 business days) prior to the event.