Event box

The availability of training data can be a roadblock for machine learning, NLP and generative AI projects, particularly in specialized domains. While much data is freely available, many sources of specialized data are restricted by licenses and terms of use. In this workshop, we will walk through issues to be aware of in copyright and licensing that restrict or prevent data from being used for machine learning, and areas to look out for in this fast-moving landscape. We’ll also give an overview of datasets that are licensed through the MIT libraries, and what sources are available for text and data mining. Finally, we’ll touch on best practices for documenting the data that you use. There will be time at the end for specific questions. 

Presented by the MIT Libraries. In person workshop, registration required. Open to all but note that many data resources discussed will be limited to only MIT affiliates.

See the related workshop "Managing your machine learning data" on January 31 if you have ML/AI data that you want to share.

Date:
Friday, January 24, 2025
Time:
11:00am - 12:30pm
Location:
14S-130 (The Nexus)

Registration is required. There are 34 seats available.

Event Organizer

Profile photo of Katie Zimmerman
Katie Zimmerman

Director, Copyright Strategy

Profile photo of Phoebe Ayers
Phoebe Ayers

Librarian for EECS, IDSS, and Math

Research Data Management Services

contact me