Event box

The availability of training data can be a roadblock for machine learning, NLP and generative AI projects, particularly in specialized domains. While much data is freely available, many sources of specialized data are restricted by licenses and terms of use. In this workshop, we will walk through issues to be aware of in copyright and licensing that restrict or prevent data from being used for machine learning, and areas to look out for in this fast-moving landscape. We’ll also give an overview of datasets that are licensed through the MIT libraries, and what sources are available for text and data mining. Finally, we’ll touch on best practices for documenting the data that you use. There will be time at the end for specific questions. 

Workshop will be held over Zoom and the link will be emailed to registrants. Open to all, but note many data resources discussed are restricted to only MIT affiliates. 

Date:
Wednesday, January 21, 2026
Time:
4:00pm - 5:00pm

Registration is required. There are 100 seats available.

Event Organizer

Profile photo of Katie Zimmerman
Katie Zimmerman

Director, Copyright Strategy

Profile photo of Phoebe Ayers
Phoebe Ayers

Librarian for EECS, IDSS, and Math

Research Data Management Services

contact me