Blogs · Deep Learning · Project Management

Starting your AI/ML Project: from research to engineering

Things you need to do before starting a real-world AI/ML engineering project

2024.02.17 · 11 min read · by Zhenlin Wang

Welcome to the world of AI and ML engineering! While the realms of research and academia provide a solid foundation, transitioning your knowledge into practical applications demands a comprehensive understanding of the various components and considerations specific to the industry.

In this guide, we’ll navigate through crucial aspects you need to address before embarking on a real-world AI/ML project. From data handling to project management and system scalability, we’ll delve into the high-level overview and major concerns that accompany the shift from research to production.

High level overview

Project details

When moving from research-based project to a real-world engineering project, the first thing is alwasy to make sure there’s impact. The outcome should be beneficial to some stakeholders, or there’s no real difference between your work at school and the carefully crafted results from the community’s point of view, and your work will become a simple self-absorbed show. Hence consider the following steps before you even start the project:

  1. Goals: Clearly Outlining Objectives Establishing clear goals is foundational to project success. Define the primary objectives that the project aims to achieve.

    Actions to take:

    • Conduct stakeholder meetings to gather input.
    • Clearly articulate short-term and long-term goals.
    • Prioritize goals based on business impact.
  2. User Experience: Focusing on User-Centric Design User experience directly influences the success of a project. Ensure that the system is designed with the end user in mind.

    Actions to take:

    • Conduct user research to understand needs.
    • Develop user personas and journey maps.
    • Prioritize features that enhance usability.
  3. Performance constraints: Defining Limitations Identify and understand any performance constraints that could impact the system’s functionality or responsiveness.

    Actions to take:

    • Define performance benchmarks.
    • Identify potential bottlenecks and limitations.
    • Explore monitoring options for performance tracking.
  4. Evaluation: Establishing Metrics for Success Evaluation metrics provide a quantitative measure of project success. Clearly define the criteria for assessing performance. This can be deterministic, but very often requires human evaluation & inputs to truly provide accurate judgement.

    Actions to take:

    • Define key performance indicators (KPIs).
    • Establish a framework for continuous evaluation.
    • Incorporate feedback loops for improvement.
  5. Personalization: Tailoring Experiences Personalization enhances user engagement. Determine whether personalization features are relevant to your project.

    Actions to take:

    • Assess the feasibility of personalization.
    • Implement personalization algorithms.
    • Balance personalization with privacy considerations.
  6. Project Constraints: Identifying Limiting Factors Every project operates within constraints. Identify and understand the constraints that may impact project execution. This include cost, manpower, time, regulation, infrastructure and even locations. Carefully review potential constraints are of paramount importance as you grow the project in the long run, something you won’t even plan ahead based on research, where the inherent constraints are already given and fixed.

Key Components

In general, it is beneficial to consider the following aspects when starting your project.

Major concerns when shifting from research to production

Transitioning from research-oriented machine learning projects to production-ready systems involves navigating several critical concerns. These considerations play a pivotal role in ensuring the successful deployment and operation of machine learning models in real-world scenarios

  1. Objectives: Aligning with Business Goals The objectives of a research project may differ from the goals of a production system. Ensuring alignment with business objectives is crucial for delivering tangible value. Hence, you should clearly define and prioritize the business objectives that the machine learning model aims to achieve. Regularly reassess alignment to adapt to evolving business needs.

  2. Computational Priority: Efficiency in Production Research models may prioritize accuracy over computational efficiency, leading to challenges in deployment where low-latency and resource efficiency are critical. You need to optimize models for efficient inference, considering factors such as model size, inference speed, and resource utilization. Strike a balance between accuracy and computational demands.

  3. Data: Ensuring Quality and Accessibility Research datasets may not fully represent the complexities of real-world production data, and ensuring data accessibility is essential for ongoing model performance. It is critical to curate high-quality, diverse datasets that closely reflect production scenarios. Implement robust data pipelines and monitoring to ensure data quality and availability.

  4. Fairness: Mitigating Bias and Ethical Concerns Biases present in research data or models may lead to unfair outcomes in production, raising ethical concerns and potential negative impacts. To ensure fairness, prioritize fairness and ethical considerations in model development. Implement measures to detect and mitigate biases, and regularly evaluate model fairness.

  5. Interpretability: Enhancing Model Explainability Complex research models may lack interpretability, making it challenging to explain predictions to stakeholders and ensure transparency. Integrate interpretability techniques into model development to enhance understanding will become super useful in production applications. Use methods such as feature importance analysis and model-agnostic interpretability tools like .

Requirements for MLsys

Finally let’s talk about when should be considered to gauge the quality of a ML system, the backbone of your real-world project’s outcome. Here are five critical requirements for ML systems:

  1. Scalability: The system’s ability to handle an increasing amount of data, workload, or user requests while maintaining performance.

    Considerations:

    • Evaluate the system’s scalability under varying workloads and data volumes.
    • Assess the capability to efficiently scale both training and inference processes.
    • Consider distributed computing for parallel processing and efficient resource utilization.
  2. Maintainability: The ease with which the ML system can be managed, updated, and modified over time.

    Considerations:

    • Implement modular and well-documented code to facilitate easy maintenance.
    • Incorporate version control for both code and models.
    • Establish a monitoring and logging system for tracking system health and performance.
    • Regularly update dependencies and address technical debt.
  3. Adaptability: The ability of the ML system to adapt to changes in data distributions, user requirements, or environmental factors.

    Considerations:

    • Design models that can be retrained or fine-tuned with new data.
    • Implement continuous learning techniques for adapting to evolving patterns.
    • Consider automated retraining pipelines based on changing data characteristics.
    • Ensure flexibility in feature engineering and model configurations.
  4. Reliability: The consistency and accuracy of the ML system’s predictions or outcomes over time.

    Considerations:

    • Implement rigorous testing procedures to validate model performance.
    • Establish robust error handling mechanisms to handle unexpected situations.
    • Monitor and address issues related to data quality, outliers, and changing distributions.
    • Consider implementing fallback strategies for critical applications.
  5. Traceability: The ability to trace and understand the decision-making process of the ML system, including the origin of data, model training, and inference.

    Considerations:

    • Maintain comprehensive documentation of data sources, preprocessing steps, and model architectures.
    • Implement model versioning to trace changes over time.
    • Record and monitor model predictions, including explanations for interpretability.
    • Establish an audit trail for regulatory compliance and accountability.

These requirements collectively contribute to the overall quality and success of an ML system. Striking a balance between scalability, maintainability, adaptability, reliability, and traceability is essential for building robust, effective, and sustainable machine learning solutions in real-world projects. Regularly reassess and update these considerations to keep pace with evolving project needs and industry best practices.

Closing…

Each of these aspects can be further explored, and I will keep updating blog posts on these parts. In the meantime, checkout these blogs that provide more details about the Key Components listed above.