The Group:
Morningstar’s Research group provides independent analysis on individual securities, funds, markets, and portfolios. The Research group also provides data on hundreds of thousands of investment offerings, including stocks, mutual funds, and similar vehicles, along with real-time global market data on millions of equities, indexes, futures, options, commodities, and precious metals, in addition to foreign exchange and Treasury markets. Morningstar is one of the largest independent sources of fund, equity, and credit data and research in the world, and our advocacy for investors’ interests is the foundation of our company.
The Role:
As a Data Scientist, you will be a leading contributor in the implementation of Artificial Intelligence (AI) within Data Collections software applications, API’s, and other data products. This role requires significant interaction with both upstream and downstream stakeholders across Technology, Data, Products, Sales/Service, and Research.
The Data Scientist will transition approved Data Collections AI products from a prototype phase to a scalable and consumable service. Often, these services must be integrated into Morningstar’s platform of financial products, so that our clients can use these software tools in the investment decision-making process.
We are looking for an individual who possesses strong technical development skills, an ability to follow analyst requirements and technical specifications for robust code, and a passion for investment research.
This position reports to the Tech Manager of the Data Collections AI team. This position is based in our Chicago office. We follow a hybrid policy of 3 days onsite and 2 days remote work. Candidates must be currently authorized to work Permanently in the United States - this position does not sponsor H-1B Visa.
Responsibilities:
- Automate manual data collection processes by applying cutting-edge solutions to tackle NLP problems, e.g., text classification, NER tasks.
- Collaborate with upstream data analysts to clarify business needs, define project scope, design ML/AI solutions, and iteratively improve workflows and data storage practices
- Implement ML/AI solutions from start to finish and collaborate with peer engineering teams for model deployment
- Design innovative ways to improve automation rates for data collection
- Research on latest technologies and propose new solutions to existing problems
- Participate in team brainstorming sessions, provide guidance to MLDAs (machine learning data analysts), and contribute to the codebase
- Introduce and follow good development practices, innovative frameworks and technology solutions
- Follow best practices like estimation, planning, reporting and improvement brought to processes in daily work
Requirements:
- No minimal industrial experience is required, if you have a Ph.D. degree in engineering, computer science, statistics or related field
- Must demonstrate ML/AI knowledge and skills through research and/or side-projects in NLP related fields, if you have no prior industrial experience
- Must have 2+ years of industrial experience in a data science role featuring NLP tasks, if you have a master’s degree or below
- Fluent with Python and related packages like NumPy, pandas, scikit-learn, NLTK, PyTorch, TensorFlow, etc.
- Sound knowledge of common ML/AI algorithms (e.g., linear/logistic regression, random forest, gradient boosting) and Deep Learning algorithms in particular (e.g., transformers, BERT, open-source LLMs)
- Experience with SQL
- Great communication and presentation skills
- Able to work independently and being proactive
- Experience with generative AI is preferred
- Experience with finetuning LLMs is preferred
- Experience with DevOps tools (e.g. Sagemaker, Git, Jenkins) is desirable
- Experience developing and deploying solutions using services in the Amazon AWS ecosystem (Lambda, Sagemaker, EC2, RDS, EMR) is desirable
Intermediate knowledge of statistical methods is desirable
- Familiarity with common data cleaning and munging techniques
- Familiarity with data visualization
- Familiarity with statistical methods, e.g., linear/logistic regression, optimization
- Familiarity with mutual fund, fixed income, and equity data is a plus
001_MstarInc Morningstar Inc. Legal Entity
Morningstar’s hybrid work environment gives you the opportunity to work remotely and collaborate in-person each week. We’ve found that we’re at our best when we’re purposely together on a regular basis, at least three days each week. A range of other benefits are also available to enhance flexibility as needs change. No matter where you are, you’ll have tools and resources to engage meaningfully with your global colleagues.