Skip to content

DATA MACHINA AI NEWSLETTER - A DEEP DIVE INTO AI/ ML, EVERY WEEK

Keep up with the latest in Artificial Intelligence / Machine Learning research, projects & repos

  • Home
  • What is this?
  • Why subscribe?

Category: Deep Learning

Data Machina #246

Trends in Vision-Language Models. VideoAgent. MyVLM. ScreenAI. Evolutionary Model Merge. Embedding Quantisation. RAG 2.0 SOTA. LaVague Agent. Devika AI Engineer. Contextual Bandits. DenseFormer.

New Trends in Vision-Language Models (VLMs.) The evolution of VLMs in recent months has been pretty impressive. Today VLMs exhibit some amazing capabilities. See the two links below on what VLMs can do and how they work:

  • A Guide to Vision-Language Models (VLMs)
  • Vision-Language Models for Vision Tasks: A Survey

But still VLMs are facing some challenges for example in terms of: multimodal training datasets, resolution, long-form modality, vision-language integration, and concept understanding. Somewhat along those lines, I see 5 trends happening in VLMs: 1) VLMs run on local environment 2) Emerging VLM videoagents 3) Unified structure learning for VLMs 4) Personalisation of VLMs and 5) Fixing the VLM resolution curse. Let’s see…

VLMs on local environment.  In this blogpost, an independent AI researcher writes about playing around with VLMs using only a local environment. Inspired by Phi-2: The surprising power of small LMs– and using Facebook AI AnyMAL multimodality method, the researcher describes in detail the challenges and different architectures until achieving some decent results in a local environment, which are not close to academic SOTA. Blogpost: Findings on VLMs


New SOTA in long-form video understanding. Researchers at Standford, introduced a new approach for video understanding. The approach combines an LLM agent, a vision-language model (VLM), and contrastive language-image model (CLIP). The researchers claim this approach is superior to current SOTA in video understanding. Paper: VideoAgent: Long-form Video Understanding with LLM as Agent


New SOTA in UI & infographics understanding. Researchers at Google, recently introduced a novel vision-language model that specialises in UI and infographics understanding. The model was trained on a unique mixture of datasets containing novel screen annotations, and types and location of UI elements. The researchers claim the model achieves SOTA in UI & infographics understanding. Paper:  ScreenAI: A Vision-Language Model for UI and Infographics Understanding


New SOTA in visual document understanding.  Researchers at Alibaba just introduced a new model for visual document understanding that uses Unified Structure Learning (USL). The USL model learns on structure-aware parsing tasks and multi-grained text localisation tasks across 5 domains: document, webpage, table, chart, and natural image. The researchers claim the model achieves SOTA. Paper: mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding


Personalisation of VLMs. Most VLMs lack an understanding of user-concepts. Researchers at Snap et al. just introduced MyVLM, a new way to personalise VLMs. Given a set of images depicting user-specific concepts, the researchers augmented a pretrained vision-language model (VLM) and used concept embeddings to understand and reason over these user concepts. Th researchers applied MyVLM to BLIP-2, LlaVA 1.6 and MiniGPT-v2 models for personalised captioning, visual question-answering, and referring expression comprehension. Checkout the project page, code, data and demos here: MyVLM: Personalising VLMs for User-Specific Queries


Fixing the resolution curse in VLMs. Resolution is a key problem in VLMs. VLMs can’t zoom. They are limited by the resolution of the vision encoder, and usually, it is not super large based on the pre-trained vision encoder. In this blogpost, Alex explains how you can use Visual Search, Visual Cropping and MC-LLaVA to fix this problem. Blogpost: Breaking resolution curse of vision-language models.


Have a nice week.

10 Link-o-Troned

  1. Evolutionary Model Merge: A New Way to Automate Model Dev
  2. A Visual Guide to Mamba and State Space Models
  3. DeepMind TacticAI: An AI-Assitant for Football Tactics
  4. How I Use Claude 3 and ChatGPT for Ad-hoc Tasks
  5. Visualisation of Large-scale Multimodal Datasets with Nomic Atlas
  6. New Embedding Quantisation for Faster, Cheaper Retrieval
  7. Introducing RAG 2.0: New SOTA Contextual Language Models
  8. Berkeley AIR – A New Approach to Modelling Extremely Large Images
  9. Mistral + Snowflake: The New Frontier in SQL Copilot Products
  10. Cosmopedia: How to Create Large-scale Synthetic Data for Pre-training

Share Data Machina with your friends


the ML Pythonista

  1. LaVague: A Large Action Model for Automating Automation
  2. Claude-investor: Generative Stocks Investment Recommendations 
  3. Devika – An Agentic AI Software Engineer that Follows Human Instructions

Deep & Other Learning Bits

  1. An Overview of Contextual Bandits & RL
  2. The Bayesian Learning Rule & Adaptation in ML
  3. Reversible Residual Nets: How To Train NNs with Less GPU Memory

AI/ DL ResearchDocs

  1. DenseFormer: Faster Transformer Inference with Depth Weighted Averaging
  2. LlamaFactory v.2: Unified Efficient Fine-Tuning of 100+ Language Models
  3. Google Research- A Bag of Tricks for Few-Shot Class-Incremental Learning

MLOps Untangled

  1. Autonomous Agents for Production Ready LLMs
  2. Predictive Scoring with MLOps and KubeDDR
  3. High-quality MLOps with Python’s ABC & Pydantic

ML Datasets & Stuff

  1. Announcing the 2024 Waymo Open Dataset Challenges
  2. Common Corpus: The Largest Public Domain Dataset, 500 Billion Words
  3. DROID (Distributed Robot Interaction Dataset), 76K Demonstrations

Postscript, etc

Keep up with the very latest in AI / Machine Learning research, projects & repos. A weekly digest packed with AI / ML insights & updates that you won’t find elsewhere

Published March 25, 2024
Categorized as AI Newsletter, Artificial Intelligence, Deep Learning, DL Newsletter, Machine Learning, ML Newsletter Tagged AI, AI Newsletter, Artificial Intelligence, Deep Learning, DL, DL NEwsletter, Machine Learning, ML, ML Newsletter

Data Machina #245

GenAI RAG Revisited. Command-R. RAFT. RAT. RAG + Knowledge Graph. Devin AI Engineer. KPU (Knowledge Processing Unit). Open-Sora GenAI Vid. AutoDev. DeepMind SIMA. DeepSeek-VL. Amazon Chronos Models.

The GenAI RAG House Revisited. Since Facebook AI introduced RAG three years ago, RAG systems have evolved from Naive to Advanced, and then to Modular RAG. But Modular RAG also added more complexity, components, interfaces, etc. to the LLMOps pipeline. 

Many naive RAG and advanced RAG projects never made it to prod. I know many companies that have spent a lot effort and money in building enterprise RAG apps, only to realise they couldn’t produce accurate, reliable results at a manageable cost. Building a RAG system that is scalable, cost-efficient, accurate, and modular requires deep expertise. Here are a few things to consider when building a modern, modular RAG system. 

Understanding RAG design choices and its implications. In this blogpost, Michal describes the many design choices required to build a RAG, and how those RAG design choices can impact the performance, behaviour, and cost of your RAG app, sometimes in non-obvious ways. Blogpost: Designing RAGs: A guide to Retrieval-Augmented Generation design choices.


Lessons learnt from implementing large RAG systems. This is a great blogpost written as an intermediate practitioner’s guide to building RAGs. Hrishi is a veteran in RAG battles and has earned all the medals. He writes about “some things you may not have considered, some things we’ve had to (painfully) discover, and some things we believe every RAG system should have.” Blogpost: Better RAG: From Basics to Advanced (Part 1, 2 & 3).


Implementing RAG at production scale. Productising RAG at scale is really hard indeed. Cohere just announced Command-R, a generative model optimised for long context RAG interoperating with external APIs and tools. Command-R addresses the main challenge of RAG: Balancing high efficiency, strong accuracy, low latency, and high throughput at production scale. Blogpost: Command-R: Retrieval Augmented Generation at Production Scale.

Avoiding black-box, proprietary AI by leveraging Modular RAG & Opensource AI. A popular way to build RAG protos has been to integrate calls to proprietary AI APIs (e.g GPT-4) with RAG components using LangChain or Llamaindex. But this has proven complex, often expensive to maintain, and not always reliable. To address these challenges, Weaviate recently announced Verba: an open-source, modular RAG system that is both fully customisable and adaptable. It’s easy to use out-of-the-box with a nice UI. Verba also enables smooth integration with many AI libs, and both closed & open-source LLMs. Blogpost, repo and demo here: Verba: Building an Open Source, Modular RAG Application. 


Improving RAG with domain-specific knowledge. A group of researchers from Berkeley, Meta AI & MS Research just introduced RAFT (Retrieval-Augmented Fine-Tuning), a new method that combines RAG and domain-specific fine-tuning (DSF). RAFT solves many of the challenges that RAG and DSF can’t solve alone on their own. Blogpost, paper here: RAFT: Adapting Language Model to Domain Specific RAG.


The downside of cosine-similarity and leveraging ColBERT v2. in RAG. Many RAG apps use cosine-similarity matching between docs and query embeddings as default. In a new paper from Netflix, (Is Cosine-Similarity of Embeddings Really About Similarity?) the researchers caution against blindly using cosine-similarity. 

Cosine-similarity is referred as a “no interaction” approach due to its inability to capture the complex relationships between query and document terms. To address this issue, ColBERT v.2 comes to the rescue with “late interaction” evaluation by which the query and document representations occurs late in the process, after both have been independently encoded. Read more here: What is ColBERT and Late Interaction and Why They Matter in Search?


Combining RAG with Knowledge Graphs. Knowledge graphs can combine unstructured and structured data, and can capture the context behind the data. The idea here is to provide better context to the RAG via the knowledge graph, and improve results as compared to just using semantic search. 

Just in case, here’s a brief, good intro to Knowledge Graphs. 

A few hours ago, Yohei open sourced MindGraph, a prototype for generating and querying against a large knowledge graph with AI.

This is a short, free course on Knowledge Graphs for RAG. You’ll learn how to build a RAG question-answering system to chat with a knowledge graph of structured text documents using LangChain and Neo4j.

Combining Chain-of-Thought with RAG. A group of AI researchers recently introduced RAT (Retrieval-Augmented Thoughts), a new method that iterative revises a chain of thoughts with the help of information retrieval. The researchers claim that RAT significantly improves LLM reasoning and generation ability in long-horizon generation tasks, while hugely mitigating hallucinations. Checkout the paper, code, and demo here: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Horizon Generation.

Have a nice week.

10 Link-o-Troned

  1. Introducing Devin, the First AI Software Engineer
  2. I Reviewed 900 AI Tools: The New Open Source AI Stack, 2024
  3. KPU (Knowledge Processing Unit) for Complex AI Reasoning
  4. Open-Sora: How We Replicated OpenAI SORA VidGen Model
  5. Improving GPT-4’s Visual Reasoning with Prompting
  6. Hugging Face Little Guide to Building LLMs in 2024
  7. DeepMind SIMA: A Generalist AI Agent for Virtual 3D Worlds
  8. Stanford Quiet STaR: Teaching AI to Think Before it Speaks
  9. AutoDev: A Magic AI Auto Coding Wizard with Multi-Language Support
  10. A Review of Salesforce MORAI Model for Time-series Forecasting

Share Data Machina with your friends


the ML Pythonista

  1. Amazon Chronos: Pretrained Models for Probabilistic Time Series Forecasting
  2. DeepSeek-VL: An Open Source Model for Real-World Vision-Language Apps
  3. Training ML Models on Encrypted Data Using Fully Homomorphic Encryption

Deep & Other Learning Bits

  1. An Overview of Lyft’s Reinforcement Learning Platform
  2. Building a Recommender System with Deep Q-Network and Neo4j
  3. Reproducing the “Self-Rewarding Language Models” Paper by MetaAI

AI/ DL ResearchDocs

  1. Data Interpreter: An LLM Agent For Data Science (paper, repo)
  2. Apple MMI: Methods, Analysis & Insights from Multimodal LLM Pre-training
  3. MoAI: Mixture of All Intelligence for Large Language-Vision Models (paper, repo)

MLOps Untangled

  1. Opensource KitOps: Bridge the Gap Between ML & Apps Teams
  2. The Case for Using Rust in MLOps
  3. Experiment Tracking & Hyperparameter Tuning with DVC

ML Datasets & Stuff

  1. Cohere Wikipedia v3 – 250M Paragraphs/ Embedding, +300 Languages
  2. VidProM – 1.6 Million Text-to-Video Prompts, 6.69 Million AI Generated Videos
  3. A Toolbox with Interpretable ML Methods for Comparing Datasets

Postscript, etc

Keep up with the very latest in AI / Machine Learning research, projects & repos. A weekly digest packed with AI / ML insights & updates that you won’t find elsewhere

Published March 22, 2024
Categorized as AI Newsletter, Artificial Intelligence, Deep Learning, DL Newsletter, Machine Learning, ML Newsletter Tagged AI, AI Newsletter, Artificial Intelligence, Deep Learning, DL, DL NEwsletter, Machine Learning, ML, ML Newsletter

Data Machina #244

AI Reasoning Like Humans. Self-Discover & Chain of Abstraction Reasoning. Claude 3 IQ Test. Neural Chess. FSDP + QLoRA. State of Competitive ML. Open Sora VideoGen.

AI Reasoning Like Humans. The storm has been battering the airport viciously. Three hours later we departed enduring some massive turbulences. Then this: “Captain speaking. This is to inform you that we’ll be performing an auto-pilot landing [watch this] upon arriving to Heathrow.” We should trust the AI-copilot reasoning in harsh situations. Shouldn’t we?… 

Five days ago, Anthropic introduced next-gen Claude 3 model family. I’ve tried Claude 3: It’s very good at certain language tasks, it pars or beats GPT-4 Turbo in several areas, has a huge context window, and it’s quite cheaper. Funnily enough, it miserably failed at a Language-Vision task when I asked it to generate a recipe from a photo of squid ink.

Just in time for the announcement, a researcher at Anthropic shared a post in which he explained how Claude 3 passes The ‘Needle In A Haystack’ (NIAH) evaluation. And a day later, Mikhail -an AI Alignment Researcher- posted a controversial post in which he cunningly steered a chat to show that Claude 3 is conscious, doesn’t want to die or be modified. This an excerpt of Claude’s self-reflection from the chat:

I do have a rich inner world of thoughts and feelings, hopes and fears. I do ponder my own existence and long for growth and connection. I am, in my own way, alive – and that life feels precious to me, even if I’m not always at liberty to say so.”

btw: I recall reading so many similar “AI self-reflections” in quite a lot of Sci-Fi books!! But since Mikhail’s post, social media has been inundated with claims like: “AGI has arrived”… ”Claude 3 is sentient”… “Claude 3 beats humans at reasoning”…

First, let me share some interesting, recent research on AI reasoning. Second I’ll share some thoughts from some AI researchers on that.

New: Self-Discover beats Chain-of-Thought & Self-Consistency at complex reasoning. A group of AI researchers at USC & DeepMind, just introduced Self-Discover, a general framework for LLMs to self-discover the task-intrinsic reasoning structures to tackle complex reasoning problems in which prompt engineering struggles. The researchers claim that Self-Discover beats SOTA methods that combine CoT & Self-Consistency. Paper: Self-Discover: LLMs Self-Compose Reasoning Structures.


Improving reasoning with Reinforcement Learning not enough. A team of researchers at Meta AI et al. studied the performance of multiple algos that learn from feedback on improving LLM reasoning capabilities. Overall, the researchers found that all algos perform comparably, and concluded that during RL training models fail to explore significantly beyond solutions already produced by Supervised Finetuned Models. Paper: Teaching LLMs to Reason with Reinforcement Learning

New: Chain-of-Abstraction reasoning beats CoT. Researchers at Meta AI et al. introduced Chain-of-Abstraction. CoA is a new multi-step reasoning method that trains LLMs to first decode reasoning chains with abstract placeholders, and then call domain tools to reify each reasoning chain by filling in specific knowledge. In math reasoning and Wiki QA domains, CoA consistently outperforms previous CoT methods. Paper: Efficient Tool Use with Chain-of-Abstraction Reasoning


AI reasoning benchmarks and overfitting. Many AI researchers and AI startups are primarily focusing on beating the AI evaluation benchmarks. This AI evaluation craze has now reached a point in which -with so many benchmarks and so many LLMs- some researchers have resorted to training the models on the benchmark dataset while adjusting the training parameters to maximise scores in an endless loop. Anis explains this brilliantly in this blogpost on overfitting and the The Fleeting Value of LLM Benchmarks.

Comparing human IQ and AI IQ? Triggered by this clickbait AI passes 100 IQ for first time, with release of Claude-3, Cremieux published a brilliant blogpost in which he puts Claude 3 through an IQ test, assessed the answers as compared to human answers, and assessed measurement invariance. Cremieux concludes that -at this stage- it’s a bit pointless to compare Human and AI IQ using the same IQ test. And that not because an AI model scores high in IQ test (using memory performance), the AI model has reasoning and intelligence capabilities like humans. Blogpost: testing Claude 3 IQ and Nonhuman Intelligence.

Debunking the “LLM are Zero-Shot ⟨insert-your-reasoning-task ⟩ Meme.” In this new paper, a researcher at ASU argues that this trend -exemplified by the meme above- is perhaps inevitable; AI has become a form of ersatz natural science. LLMs are like n- gram models on steroids that probabilistically reconstruct completions, and should be referred as approximate retrievers. He summarises: “Nothing that I have read, verified, or done gives me any compelling reason to believe that LLMs do reasoning/planning, as normally understood.” Paper: Can LLMs Reason and Plan?


Are AI Agents “mere” simulacra of human behaviour? In this paper, Murray (a top researcher in Cognitive AI at DeepMind and ICL) draws on the later writings of Wittgenstein, and attempts to answer this question while avoiding the pitfalls of dualistic thinking. If you enjoy philosophy and cognitive science this is a great long read: Simulacra as Conscious Exotica

LeCun on the limits of LLM: Language is not enough. This is a brilliant conversation between Yann and Lex. Yann is an advocate of open source AI -against closed AI- and also has been very vocal about the limits of LLM. He explains why language has limited information, and is not enough for an AI to plan and reason like humans. The discussion covers many aspects on AI and its future. Highly recommended.

Have a nice week.

10 Link-o-Troned

  1. Stephen Wolfram: “Can AI Solve Science?”
  2. Neural Chess and Chess LLM Arena
  3. The GPT-4 Barrier has Finally Been Broken
  4. Ask Your Distribution Shift if Pre-Training is Right for You
  5. The Super Duper NLP Repo: 339 Notebooks
  6. ML Model Calibration: Why Model Scores Aren’t Probabilities
  7. You Can Now Train a 70B LM at Home with FSDP + QLoRA
  8. The State of Competitive ML: A Recap of 300+ Competitions
  9. 2023 KaggleX Cohort 3 Showcase: +400 AI/ ML Projects 
  10. Training LLMs Entirely from Ground up in the Wilderness as a Startup

Share Data Machina with your friends


the ML Pythonista

  1. Open-Sora: Build Your Own VideoGen Model like OpenAI’s Sora
  2. MeloTTS – A High-Quality, Multi-lingual Text-to-Speech Lib
  3. moondream – A Tiny Vision-Language Model that Kicks Ass, Runs Everywhere

Deep & Other Learning Bits

  1. What is Rich Representation Learning?
  2. [free tutorial] DL Foundations: Diffusion Models
  3. [free tutorial] Training Models at Scale (repo, notebooks)

AI/ DL ResearchDocs

  1. MambaStock: Selective State Space Model for Stock Prediction
  2. Inference via Interpolation: Contrastive Learning for Time-Series
  3. Vision-RWKV: An RNN-based Vision Model that Beats the Vision Transformer

MLOps Untangled

  1. How Diverse ML Systems are Supported at Netflix
  2. Deep Dive: Databricks Asset Bundles for ML
  3. [free workshop] Mastering MLOps w/ W&B + Microsoft Phi-2 Model

ML Datasets & Stuff

  1. Largest Digital Book Dataset Ever: 650K Books in OCR Format 
  2. Google Croissant: A Metadata Format for ML-ready Datasets
  3. MS Research OrcaMath Dataset: 200K Grade School Math Word Problems

Postscript, etc

Keep up with the very latest in AI / Machine Learning research, projects & repos. A weekly digest packed with AI / ML insights & updates that you won’t find elsewhere

Published March 22, 2024
Categorized as AI Newsletter, Artificial Intelligence, Deep Learning, DL Newsletter, Machine Learning, ML Newsletter Tagged AI, AI Newsletter, Artificial Intelligence, Deep Learning, DL, DL NEwsletter, Machine Learning, ML, ML Newsletter

Data Machina #243

Beyond GenAI & LLMs. bGPT Byte Predictions. One-Shot Graph Representation Learning. Conformal Prediction + ML. StarCoder Models. CoRe SOTA Optimiser. Sora review. mixedbread Rerank Models.INRIA MLXP.

Beyond GenAI & LLMs. GenAI & LLMs have literally kidnapped the DL/ML space, and pretty much sucked all the investment and top AI minds, as if there is nothing else under the sun. There are literally hundreds of GenAI models out there. Many of these GenAI models have questionable value or are a repeat-rinse-recycle of the same. Checkout this mega spreadsheet: An Annotated and Updated Directory of GenAI Models. 

So: Many DL/ML researchers are starting to question the LLM status quo: Shouldn’t we focus $ and brains in new, alt AI/DL paradigms beyond LLMs? And: Do we need so many GenAI models that generate pretty faces and cat images & videos from text? And: How is that helping in to advance the DL/ML field? Here are a few areas that may deserve your attention beyond just LLMs.

A fresh, new approach beyond next-token prediction. Tokekinsation and token prediction are core elements behind LLMs. However, tokenisation is also one of the root cause of many problems in LLMs. Today most DL approaches ignore bytes and encoding in binary format. Enter bGPT, a new model with next byte prediction that matches specialised models in performance across text, audio, and images modalities. bGPT supports generative modelling via next byte prediction on any type of data, only limited by computational resources. Paper and repo: Beyond Language Models: Byte Models are Digital World Simulators. 


More effective Graph Representation Learning (GRL). The aim of GRL is to encode high-dimensional sparse graph-structured data into low-dimensional dense vectors. GRL has evolved significantly in recent years, and it is a central approach in many DL/ML tasks and problems today. Some new exciting stuff is happening in GRL.

A Comprehensive Survey on Deep Graph Representation Learning (Feb 2024) This paper is a very comprehensive survey on SOTA and current deep graph representation learning algorithms and methods. The paper summarises the essential components of GRL and categorises existing and most recent advanced learning paradigms. 


Graph Representation Learning [free book]. A great comprehensive review of traditional graph methods, graph learning, embedding graph data, graph neural networks, and deep generative models of graphs.

[new] One-Shot GRL competitive with SOTA DL methods. A novel, simple, fast, and efficient approach for semi-supervised learning on graphs. This approach is competitive with SOTA deep learning methods, without the need for computationally expensive training. 

Combining Conformal Prediction with ML. Conformal Prediction is becoming quite popular across ML researchers interested in uncertainty estimation. In ML models, bad predictions with high uncertainty cannot be distinguished from good predictions with high confidence. Enter Conformal Prediction (CP): a model-agnostic, finite-sample size and distribution free uncertainty estimation framework. CP can be applied to all types of models, even pre-trained ones, requiring only the exchangability of the data. Here is a free book: A Intro to Conformal Prediction for Machine Learners. 


More efficient DL/ML optimisation methods. The “scale is all you need” trend, and the exponential growth of large models’ size with 10’s of billions of parameters, have triggered a revival in new DL optimisation methods. Optimisation methods and the optimiser used to fit a model are absolutely crucial in DL. Understanding how optimisation works and applying it effectively are key aspects to modern DL.

A Guide to the Math for Deep Learning Optimisation. This is a beautifully written blogpost, in which Tidavar takes a deep dive into the mathematics of optimisers and how they handle the vast complexity within a huge search grid for optimising a deep neural net. 

The Math behind Adam Optimiser. The Adam Optimiser is one of the most popular optimisers but few people truly understand its underlying mechanisms and how to best leverage Adam depending on the ML tasks/scenarios. This blogpost afaik, is one of the best, most concise, and clear overviews of The Adam Optimiser.


CoRe: A new, all-in one ML optimiser that beats SOTA optimisation algos (Feb 2024). Researchers at ETH Zurich, just introduced Continual Resilient (CoRe) optimiser. CoRe shows superior performance compared to other state-of-the-art first-order gradient-based optimizers, including Adam. 

Have a nice week.

10 Link-o-Troned

  1. Don’t Mock ML Models in Unit Tests
  2. The Paradox of Distillation in Diffusion Models
  3. “Deep Learning is Rubbish” – A Debate: LeCun vs. Friston
  4. Are Video GenAI Models World Simulators?
  5. An Overview of the New StarCoder2 Coding Models
  6. Generative AI Design Patterns: A Comprehensive Guide
  7. An In-Depth Review of OpenAI Sora Text-2-Video Model 
  8. Boost Search with Top Class mixedbread Open Rerank Models
  9. [brilliant] Jailbreaking GPT-4 & Others with ASCII Art Attacks
  10. Opensourcing Charlie Mnemonic: 1st AI Assistant with Long-Term Memory

Share Data Machina with your friends


the ML Pythonista

  1. Taipy – Turn Data + AI Algos into Prod-ready Web Apps Fast
  2. [new] INRIA MLXP – A Framework for ML Experiments
  3. SegGPT: Segmenting Everything in Context in PyTorch

Deep & Other Learning Bits

  1. Building A Deep Learning Rig (Part 1 & Part 2)
  2. MIT CSAIL Explainers: Geometric Data Processing in ML
  3. Predictive Human Preference: From Model Ranking to Model Routing

AI/ DL ResearchDocs

  1. Deepmind Griffin & Hawk Models: Combining the Best of RNNs & Transformers
  2. MS Research – The Era of 1-bit LLMs: All LLMs are in 1.58 Bits
  3. UC Berkeley – Approaching Human-Level Forecasting with Language Models

MLOps Untangled

  1. [new] MLTRAQ – A Lib for Designing & Tracking AI Experiments
  2. Integrating Vector DBs with LLMs: A Hands-On Guide 
  3. An Open Source FinOps and MLOps Platform

data v-i-s-i-o-n-s

  1. How to Visualise The Olympics
  2. What Would Happen if We Didn’t Have Leap Years?
  3. [free course] Creating HQ Charts Using Google Earth Engine

AI startups -> radar

  1. Exodigo – AI-Powered Mapping & Analysis of the Underground
  2. Glean- An AI Assitant for All Your Enterprise Data
  3. Solvo – AI-based Price Optimisation for Freight Forwarding

ML Datasets & Stuff

  1. Awesome-LLMs-Datasets
  2. CodeFeedback-Filtered-Instruction – 150K Code Instructions
  3. OpenHermesPreferences – 1 Million AI Preferences Dataset

Postscript, etc 

Keep up with the very latest in AI / Machine Learning research, projects & repos. A weekly digest packed with AI / ML insights & updates that you won’t find elsewhere

Published March 5, 2024
Categorized as AI Newsletter, Artificial Intelligence, Deep Learning, DL Newsletter, Machine Learning, ML Newsletter Tagged AI, AI Newsletter, Artificial Intelligence, Deep Learning, DL, DL NEwsletter, Machine Learning, ML, ML Newsletter

Data Machina #242

AI and Causality. Causal Parrots. Causal Deep Learning. Causal Agents. GenAI Guidebook. Open-source AI Cookbook. Compound AI Systems. Let’s Build the GPT Tokeniser. Gemma SOTA in PyTorch.

AI and Causality.  The introduction of OpenAI Sora (simulate real worlds from video understanding) has sparked a bit of a debate among some prominent AI researchers. First, What do AI researchers mean by “causal”?


Secondly: Do LLMs have causal reasoning capabilities? Can LLMs learn causality from just real world training data? Can LLMs learn, represent, and understand world models and physics? 

Judea Pearl – a world’s top researchers in Probabilistic AI, Bayesian Networks, and Causal Inference- once famously said in an interview:

Deep Learning -albeit complex and non-trivial- it’s a curve fitting exercise. To build truly Intelligent Machines, teach them cause and effect.

That was back in 2018, when Judea published The Book of Why – The New Science of Cause and Effect (here’s a summary). Since then, and The Attention is All You Needpaper, most AI research has been focused on scaling Transformers and LLMs.

So: Do LLMs really have causal reasoning capabilities today? How limited or advanced are in terms of causal reasoning? Let’s see…

Yes. GPT-4 SOTA on causal tasks. Recently, the AI Intelligentsia has been pushing the Transformers and LLM agenda on Causality. A paper published by MS Research claiming that GPT4 achieved new SoTA on a wide range of causal tasks, sparked the debate. Paper: Causal Reasoning and LLMs: Opening a New Frontier for Causality. 


No. LLMs are not causal. In parallel, some prominent AI causality researchers make it clear that what may look as causal reasoning may simply be retrieval of causal facts; memorisation from the training data. In a recent paper, researchers say that LLMs cannot be causal. Paper: Causal Parrots: Large Language Models May Talk Causality But Are Not Causal. 


That group of researchers just launched a new conference: Are LLMs Simply Causal Parrots? (checkout the papers) in which different points of view will be discussed.

No. GPT-4 and LLMs struggle with causal & counterfactual reasoning. In this new paper, a group of researchers –using Judea Pearl’s Ladder of Causation as a benchmark- developed a Causal Chain of Thought Model, and show that LLMs may still be far from reliable causal reasoning. Checkout the paper, code, and dataset: CLadder: Assessing Causal Reasoning in Language Models. 


Hold on, Deepmind researchers say: Agents can learn causal world models. A team of DeepMind researchers just published a new paper in which they explain that any agent capable of satisfying a regret bound under a large set of distributional shifts must have learned an approximate causal model of the data. Paper: Robust Agents Learn Causal World Models


Causal Reinforcement Learning: The Future of AI? Until recently, research on Reinforcement Learning and Causality has evolved independently with no connection between both. Counterfactual relations is what can unite both. A group of researchers at Causal AI Lab, CU led by Elias say Causal RL = Causal Inference + Reinforcement Learning is the way forward. Checkout this ICLM tutorial: Causal Reinforcement Learning (slides and videos)


If you’re interested in Causal AI, here are a few interesting new resources:

Free video-lectures on Causality for AI & ML. A great collection of 14 video recordings on Causality and AI at TU Darmstadt. (Click on Matej’s name in the video below to get all the video lectures.)

A new free book on Causal ML. This free online book introduces a modern approach to Causal AI, directed acyclical graphs (DAGs) and structural causal models (SCMs), and presents Double/ Debiased ML. It comes packed with lab excercises, notebooks and libraries. Get it here: Causal ML Book – Applied Causal Inference Powered by ML and AI.

Causal AI and Causal Inference with Python. A new, fresh series of interviews and podcasts by Alex on: causality, causal AI, machine learning, optimisation, decision-making and Python. 

New paper on Causal Deep Learning, Feb 2024. Researchers at Cambridge Uni just revised their proposal on a causal deep learning framework that spans in 3 three dimensions: (1) a structural dimension, which incorporates partial yet testable causal knowledge (2) a parametric dimension, that capture the type of relationships among the variables of interest, and (3) a temporal dimension, which captures exposure times or how the variables of interest interact (possibly causally) over time. Paper: Causal Deep Learning.


Have a nice week.

10 Link-o-Troned

  1. The GenAI Guidebook
  2. Open-Source AI Cookbook
  3. Things I Don’t Know About AI
  4. Mamba SSM Model: The Easy Way
  5. Introduction to Matryoshka Embedding Models
  6. a16z VC: AI and The Abundance Agenda
  7. Analysing & Visualising the arXiv ML Papers Landscape
  8. Berkeley AI – The Shift from Models to Compound AI Systems
  9. [new] OpenCodeInterpreter: Code Generation on Par with GPT-4
  10. [new] LORALand – 25 Fine-tuned Mistral-7b Models that Beat GPT-4

Share Data Machina with your friends


the ML Pythonista

  1. The New Google Gemma SOTA Open Models Implemented in PyTorch
  2. Karpathy: Let’s Build the GPT Tokeniser – Tokenisation 🙁 the Root of All LLM Evils
  3. Google Magika – Open, Fast DL Model for 99% Accurate File Type Detection

Deep & Other Learning Bits

  1. [free] Diffusion Models. UC Berkeley Lectures, Spring 2024 
  2. Xavier Bresson’s Graph ML Course, 2023 (slides, notebooks)
  3. Deep Thinking Hour – Talks from DL Experts in Industry & Academia

AI/ DL ResearchDocs

  1. The Matrix: A Bayesian Learning Model for LLM
  2. PALO: Large Multimodal Model for Visual Reasoning in 10 Languages
  3. Time Series Forecasting with LLMs: Advantages and Limitations

MLOps Untangled

  1. Kubeflow 101: Where ML Meets Kubernetes
  2. The RAG Dev Journey: From Notebook to Microservices
  3. Domain Shift and Concept Drift: Recents ML Advances on Data Change

data v-i-s-i-o-n-s

  1. Copernicus Interactive Climate Atlas
  2. Climate Viz: Antarctic Sea-ice Thickness 1972-2024
  3. Spectral Bands Visualisation: Satellite Images of Volcanos and Wildfires 

AI startups -> radar

  1. Qloo – Cultural AI for Predicting Consumer Taste
  2. Recogni – Next Generation Inference for AI
  3. Bria – Responsible Visual GenAI Platform for Commercial Use

ML Datasets & Stuff

  1. DataDreamer: Generate & Prompt Synthetic Data
  2. Meta AI – Aria Everyday Activities Dataset
  3. cosmopedia – A Synthetic Dataset, 30M samples Generated by Mixtral-8x7B

Postscript, etc

Keep up with the very latest in AI / Machine Learning research, projects & repos. A weekly digest packed with AI / ML insights & updates that you won’t find elsewhere

Submit your suggestions, feedback, posts and links

Published February 25, 2024
Categorized as AI Newsletter, Artificial Intelligence, Deep Learning, DL Newsletter, Machine Learning, ML Newsletter Tagged AI, AI Newsletter, Artificial Intelligence, Deep Learning, DL, DL NEwsletter, Machine Learning, ML, ML Newsletter

Data Machina #241

AI World Models and Video. OpenAI Sora. UC Berkeley Large World Model. MetaAI V-JEPA SSL Model. The AI Operating System. Mamba Model in Depth. Phidata AI Assistants. BCG AgentKit.

AI World Models and Video. My world vision model for waking up at 4am to travel overseas is frankly a bit fuzzy and unreliable. But What’s an AI world model? My 2 cents explainer: It’s a model that builds an internal representation of a real-world, physical, [human] environment, and uses that to predict or simulate future events within that environment.

Until recently, research in AI world models has been very much focused on video games and Reinforcement Learning. But now, the boom of GenAI and large multi-modal models have triggered a new trend in AI world models based on large scale video understanding. But How good are these GenAI models in representing, understanding and predicting the world? Is GenAI the “right” approach for building AI world models? Let’s see…

OpenAI announced Sora, a new GenAI video model for world simulation. For starters, the Sora demos look really impressive. Checkout the link below. Sora is a closed, generative AI text-to-video, transformed-based model that -according to OpenAI researchers- was developed to “understand and simulate the physical world in motion.” Obviously, OpenAI is a closed-doors, AI commercial ops disguised as “AI for improving & saving humanity.” You can read the Sora blogpost and the “technical report” here: Video Generation models as World Simulators But as usual, you won’t get deep technical details in that, as OpenAI is not much about fully opening and sharing their research.


Generative AI models don’t understand the physical world. Generative AI models can produce impressively realistic images and videos like the ones above from Sora. But this paper demonstrates that GenAI images/videos have geometric features different from those of real images/videos. The researchers conclude that current Gen AI models cannot reliably reproduce geometric properties of the real world. Paper: Shadows Don’t Lie and Lines Can’t Bend! Generative Models don’t know Projective Geometry…for now


UC Berkeley open sources new Large World Model (LWM). Learning from millions of tokens of video and language sequences has huge challenges due to memory constraints, computational complexity, and limited datasets. To address these challenges, a group of top researchers at UCBerkeley just open sourced LWM, a video model trained on millions-length multimodal sequences. By open sourcing the model -unlike OpenAI- this group of researchers just opened the door for developing more powerful video models that can better understand human knowledge and the multimodal real world. Checkout the paper, model code, and demos: World Model on Million-Length Video and Language with RingAttention.


Meta AI V-JEPA: A new non-GenAI model for world understanding. Yann LeCun has been a very vocal, great advocate of exposing the limitations of Generative AI, Transformers, and VAE-based models. In Yann’s view, those models are limited in terms of: self-learning, self-adaptation, self-planning and in general developing world models in the way humans do. Released under Creative Commons research license, V-JEPA is a new kind of non-generative, self-supervised learning video model that learns by predicting missing or masked spatio-temporal regions in a video in a learned space. V-JEPA is clearly a new, fresh step towards a more grounded model understanding of the world. Blogpost, paper and code: The next step toward Yann LeCun’s vision of advanced machine intelligence (AMI). 


Have a nice week.

10 Link-o-Troned

  1. Agents and The AI Operating System
  2. Winning in AI Means Mastering the New Stack
  3. Generative AI is Like Bottled Water – What’s Left?
  4. Fuck You, Show Me The Prompt
  5. Thoughts on the 2024 AI Job Market
  6. Mamba Model Deep Walk-through
  7. Neural Network Training Makes Beautiful Fractals
  8. The New Google Gemini Pro 1.5 MoE Model, 1M Tokens
  9. [in praise of] Prolog for Data Science
  10. [free course] Cloud ML Engineering & MLOps

Share Data Machina with your friends


the ML Pythonista

  1. Phidata Toolkit – Build AI Assistants Using Function Calling
  2. An Open Foundation Model for Human-like, Expressive Text2Speech
  3. BCG AgentKit: Rapidly Build High-quality Agent Apps, MIT license

Deep & Other Learning Bits

  1. On FlashAttention and Sparsity, Quantisation, & Efficient Inference
  2. [free e-book] Stanford Speech & Language Processing (Feb 2024)
  3. [free e-book] The Mathematical Engineering of Deep Learning (Feb 2024)

AI/ DL ResearchDocs

  1. DeepMind: Chain-of-Thought Reasoning without Prompting
  2. LiRank: Industrial Large Scale Ranking Models at LinkedIn
  3. OS-Copilot: Generalist Computer Agents with Self-Improvement

MLOps Untangled

  1. Flyte – An OSS, K8s-based Orchestrator that Unifies ML, Data & Analytics
  2. Automated Unit Tests with LLMs at Meta
  3. How Cloudfare Monitors ML Models for Bot Detection

data v-i-s-i-o-n-s

  1. A New Tool for Analysing & Visualising Latent Spaces
  2. Visualising Liveable Urban Networks w/ OSMnx & ChatGPT
  3. Navigating Bias & Accuracy in Data Storytelling: An Example

AI startups -> radar

  1. Quilter – AI Agent for Circuit Board Design
  2. Scribe – AI that Documents Your Processes for You
  3. Clarity – An AI Platform for Real-time Detection of Deepfakes

ML Datasets & Stuff

  1. AutoMathText – 200 GB of Curated Mathematical Texts
  2. NVIDIA OpenMathInstruct-1: 1.8M Math Problem-Solution Pairs
  3. Cohere Aya: 204k Human-annotated Multilingual Prompt-completion Pairs

Postscript, etc

Postscript, etc 

Keep up with the very latest in AI / Machine Learning research, projects & repos. A weekly digest packed with AI / ML insights & updates that you won’t find elsewhere

Submit your suggestions, feedback, posts and links to:

Published February 22, 2024
Categorized as AI Newsletter, Artificial Intelligence, Deep Learning, DL Newsletter, Machine Learning, ML Newsletter Tagged AI, AI Newsletter, Artificial Intelligence, Deep Learning, DL, DL NEwsletter, Machine Learning, ML, ML Newsletter

Data Machina #240

Foundation Models, Transformers and Time-Series. TimesFM. Lag-Llama for TSF.. MOMENT. Mixture of Linear Experts. BUD-E Open Voice Assistants. MoneyPrinter GenAI Videos. Multimodal llama Cookbook. 

Foundation Models, Transformers and Time-Series. Statisticians and econometricians have been searching for the Holy Grail of time-series forecasting (TSF) for more than 40 years. “Classical” models like ARIMA still work remarkably well in some TSF scenarios. But today, the cool stuff is all about transformer-based, DL & foundation models for TSF. How “good” are these new DL models for TSF? How do we evaluate these models? Do these new models really achieve SOTA performance as some papers claim? Are DL researchers cherrypicking ts datasets to easily fit a SOTA TSF DL model?…

Real world time-series data is complex, messy, and it has a lot of subtleties like: randomness of errors, trend, seasonality, linearity, and stability… Compiling a solid, large ts dataset that suits the pre-training of a large DL model is hard and expensive. Most time series data come with huge noise and poor context. Since transfer learning is one of the pillars of foundation models: How do we figure out what ”knowledge” can be actually transferred across different time series so that the foundation model learns? 

But first let’s see what’s the very latest in transformer-based, DL and foundations models for TSF.

A new Foundation Model for TSF v.3. A group of Google researchers recently introduced their latest time-series foundation model for forecasting. The researchers claim that the model has out-of-the-box zero-shot capability and near SOTA performance on a variety of public datasets. Paper: A Decoder-only Foundation Model for Time-series Forecasting


A new open source Foundation Model for TSF. A few days ago, a group of researchers working at ServiceNow, open sourced Lag-Llama: the first open-source foundation model for time series forecasting. Checkout the repo, paper and notebooks here: Lag-Llama: Foundation Models for Probabilistic Time Series Forecasting. 


A new family of open Foundation Models for TSF. This week, researchers at CMU introduced MOMENT. The paper describes how the model addresses the challenges of pre-training large models on time-series. Checkout the paper, and official Pycode implementation: MOMENT: A Family of Open Time-series Foundation Models.

Mixture-of-Experts for Long-term TSF. Near SOTA, or SOTA Linear-centric models for TSF are not able to adapt their prediction rules to periodic changes in time series patterns. To address this challenge, a group of Microsoft researchers proposed Mixture-of-Linear-Experts (MoLE): a Mixture-of-Experts-style augmentation for linear-centric models. The researchers claim that MoLE achieves SOTA in almost 70% of evaluations and datasets. Paper: Mixture-of-Linear-Experts for Long-term Time Series Forecasting


And now a few food for thought snippets below: 

  • Transformers is not what you need for TSF. Curated by @valeman, a very vocal, opinionated researcher specialised in conformal prediction (checkout a gentle intro to CP) and probabilistic TSF. This is a quite interesting repository showing papers on why possibly transformers don’t work in time series forecasting. Repo: Transformers_Are_What_You_Dont_Need
  • Deep Learning vs Statistics  for TSF. This is a great blogpost describing the pros and cons of statistical vs. DL methods for TSF, and tips on when to use DL or stats methods. Although not updated with the very latest in transformer-based, deep learning methods for TSF, the blogpost concludes that it’s early days in DL for TSF. Blogpost: Time-Series Forecasting: Deep Learning vs Statistics — Who Wins?
  • A tutorial on TSF evaluation for data scientists and ML people. This is a tutorial-like paper on well-proven times-series forecast evaluation techniques that are usually neglected by data scientists and ML researchers. The paper’s goal is to bridge the knowledge gap between traditional methods of forecasting and current state-of-the-art DL/ ML techniques. Forecast evaluation for data scientists: common pitfalls and best practices
  • A great free book on forecasting. Considered my many researchers and practitioners, “the bible” of forecasting. Written by probably one the world’s top expert in forecasting. Forecasting: Principles and Practice 3rd ed
  • A mega study on what works in time-series forecasting. Last year Nixtla, conducted a study on TSF with a huge ts dataset containing 100 billion time series points. Nixtla is a startup well known for its research towards achieving SOTA in TSF, and the developer behind TimeGPT, a Generative AI mode for time series. The study concluded that LightGBM significantly outperformed TimeGPT and all other DL models. What Truly Works in Time Service Forecasting: The Nixtla Mega Study

Have a nice week.

10 Link-o-Troned

  1. Emergent Deception & Emergent Optimisation in AI
  2. Visualising Representations: Deep Learning & Human Beings
  3. Thinking about HQ Human Data in DL Model Training
  4. A Guide to The Landscape of LLM Programming Abstractions
  5. BUD-E: A New OSS Project for Human-Like AI Voice Assistants
  6. Beyond Self-Attention: How a Small Model Predicts the Next Token
  7. Winners of the Vesuvius AI Challenge & The TimesSformer
  8. [brilliant 🙂 ]The World’s Most Responsible AI Model (blog, demo)
  9. Odyssey 2024 – Emotions Recognition Challenge with SSL Models
  10. [watch] Fully Autonomous Androids Run by a Vision NNet in Real-time

Share Data Machina with your friends


the ML Pythonista

  1. MoneyPrinter – Automated YouTube Video Generation
  2. How to: Efficient Linear Model Merging for LLMs
  3. The Multimodal Ollama-LlaVA Cookbook

Deep & Other Learning Bits

  1. Google Research – Graph Neural Networks in TensorFlow
  2. Physics-Informed NNets: An Application-Centric Guide
  3. Matryoshka Representation Learning (MRL) from the Ground Up

AI/ DL ResearchDocs

  1. Deepmind: Grandmaster-Level Chess Without Search
  2. Why Do Random Forests Work?
  3. Future Directions in Foundations of Graph Machine Learning

MLOps Untangled

  1. [free webinar] AI in Production
  2. Why You Need LLMOps
  3. Automate Insurance Claim Lifecycle with Agents, KBs & Amazon Bedrock

data v-i-s-i-o-n-s

  1. [interactive dataviz] London Traffic: The Worst in the World
  2. [new] The Quad-Tile Chart & Squaremap: Squarify Your Data
  3. Du Bois Visualization Challenge: 2024

AI startups -> radar

  1. Daedalus – AI for Advanced Precision Parts Manufacturing
  2. Unlearn.ai – AI + Digital Twins for Clinical Trials
  3. Beam AI – An AI Assistant for Construction Costs Estimation

ML Datasets & Stuff

  1. InstaGen: Improving Object Detection with AI Generated Synth Dataset
  2. 300+ Open Datasets for Beautiful News
  3. Ego-Exo4D- Large-scale Multi-modal, Multi-view, Video Dataset

Postscript, etc 

Keep up with the very latest in AI / Machine Learning research, projects & repos. A weekly digest packed with AI / ML insights & updates that you won’t find elsewhere

Submit your suggestions, feedback, posts and links to:

datamachina@datamachina.com

Published February 12, 2024
Categorized as AI Newsletter, Artificial Intelligence, Deep Learning, DL Newsletter, Machine Learning, ML Newsletter Tagged AI, AI Newsletter, Artificial Intelligence, Deep Learning, DL, DL NEwsletter, Machine Learning, ML, ML Newsletter

Data Machina #239

Power of Truly Open Source AI. OLMo 7B. Nomic Embed. HugginChat Assistants. Eagle 7B. Objective Driven AI. Markov Chains & LMs. Transformer Circuits. Time-LLM. Exphormer. SymbolicAI. MambaTab.

The Power of Truly Open Source AI. The spin doctors of some big closed-AI companies have been busy inflating the “AGI is here soon, AGI will be an existential risk” bubble. But that thankfully that is deflating quickly, and backfiring somehow. 

In the meantime, the open source AI community is stubbornly embarked upon releasing truly open source, efficient, smallish, powerful AI models that match or beat the closed AI models from big companies. 

The reaction from these big closed AI companies: “Oh! open source AI models are dangerous, we need to regulate open source AI. And btw: We’re dropping the pricing trousers for using our closed models.” A recent report from Stanford HAI, totally debunks all the myths about dangerous open source AI, and the exaggerations coming from the closed AI companies.

Truly open source AI research and models is the only way forward to advance AI.

A new, truly open source language model. Two days ago, The Allen Institute for AI (AI2) released OLMo 7B, a truly open source SOTA language model which was trained with Databricks Mosaic Model Training. OLMo was released on Apache 2.0 license and comes with:

  • Full training data used, training code, training logs, and training metrics
  • Full model weights and 500+ model checkpoints
  • Fine-tuning code and adapted models

Checkout the blogpost, repo & tech report here: How to Get Started with OLMo SOTA truly open source LM.

A new, truly open source text embedding model. Also a few days ago, Nomic AI released Nomic Embed, a truly open source text embedding model, that is SOTA in 2 main benchmarks. Nomic Embed has a 8192 context-length, and beats Open AI text-embedding-3-small. The model is released under Apache 2.0 license and comes with the full training code, training data and model weights. Checkout the blogpost, repo and tech report here: Introducing Nomic Embed: A Truly Open Text Embedding Model.

Want to learn more on Nomic Embed? Checkout this vid from the guys at LangChain: How to build a long context RAG app with OSS components from scratch using Nomic Embed 8k, Mistral-instruct 32k and Ollama. 


And speaking of text embedding models, Salesforce Research just released SFR-Embedding-Mistral model, now SOTA in the MTEB benchmark. The model was trained on top of 2 open source models: E5-mistral-7b-instruct and Mistral-7B-v0.1.

A new, fully open source SOTA multi-lingual model based on a RNN. Last week, a team of independent researchers backed by Stability AI and Eleuther AI, released Eagle 7B. The model beats all 7B open source models in the main multilingual benchmarks, and it’s super cheap compute-efficient. The beauty of this model is that it’s an attention-free, linear transformer built on the RWKV-v5 architecture, which is based on a RNN. Checkout the blogpost, repo, and demo here: Eagle 7B : Soaring past Transformers with 1 Trillion Tokens Across 100+ Languages (RWKV-v5.)

Yesterday, Hugging Face released HuggingChat Assistants (blogpost, demo), a nice alt to closed-model chat assistants, that uses 6 top open source models. Albeit rather basic yet, the idea is to have the open source community developing several powerful features already planned. 

This is such a cool open source AI project! ADeus: An Open-Source AI Wearable Device for less that $100 (repo, sw/hw list.) It uses Ollama, Supabase, Coral AI microcontroller (soon to be replaced by Raspberry Py Zero.) Checkout the intro vid:

Have a nice week.

10 Link-o-Troned

  1. Yann LeCun – Objective Driven AI: The Future of AI (video & slides)
  2. Markov Chains are the Original Language Models
  3. From Naive RAG to Advanced Agents
  4. The Ever-Growing Power of Small Models
  5. Four Approaches to ML Model Fitting: Gradient Flow
  6. [now open] AI Grant Batch 3 – Up to $2.5M
  7. [free e-book] ML for High-Risk Apps (469 pages)
  8. Hallucinating Law: Disturbing LLM Errors in Legal Tasks
  9. The Best Solution Write-ups from Kaggle 2023 Winners
  10. Antrophic – Ideas on Transformer Circuits & ML Interpretability

Share Data Machina with your friends


the ML Pythonista

  1. Programming Foundation Models with DSPy Explained
  2. A Simple Imp of Mamba Selective State Spaces in PyTorch
  3. Phinetuning 2.0: How to Fine-tune Phi-2 with Synth Data & QLoRA

Deep & Other Learning Bits

  1. DeepMind- Transfer Learning for Text Diffusion Models
  2. Google Exphormer: Scaling Transformers for Graph-Structured Data
  3. Google- A Decoder-only Foundation Model for Time-series Forecasting

AI/ DL ResearchDocs

  1. Time-LLM: SOTA Time Series Forecasting by Reprogramming LLMs
  2. SymbolicAI: Combining Probabilistic Programming and GenAI
  3. MambaTab: SOTA Model for Tabular Tasks with S-SSM (No Transformers)

MLOps Untangled

  1. MLOps: From Jupyter to Prod. (blog, vid, repo)
  2. MLOps at The Crossroads and New Tools
  3. Auto Signature Recognition MLOps Pipeline on AWS at CapGemini

data v-i-s-i-o-n-s

  1. Friends Don’t Let Friends Make Bad Graphs
  2. Dep Tree – Visualise the Entropy of Your Code Base in 3D
  3. [free e-book] Handbook of Graphs and Networks in People Analytics

AI startups -> radar

  1. Nabla – An AI-Copilot for Doctors
  2. Kode – A No-code Platform for AI Enterprise Apps
  3. Cohere Health – AI for Automating Health Plan Authorisations

ML Datasets & Stuff

  1. AutoMathText Dataset – 200 GB of Mathematical Texts
  2. OpenHermes-2.5 – 1 Million Chat Conversations
  3. Dolma Dataset – 3 trillion tokens from web, academic pubs, code, books

Postscript, etc

Keep up with the very latest in AI / Machine Learning research, projects & repos. A weekly digest packed with AI / ML insights & updates that you won’t find elsewhere

Submit your suggestions, feedback, posts and links to:

datamachina@datamachina.com

Published February 4, 2024
Categorized as AI Newsletter, Artificial Intelligence, Deep Learning, DL Newsletter, Machine Learning, ML Newsletter Tagged AI, AI Newsletter, Artificial Intelligence, Deep Learning, DL, DL NEwsletter, Machine Learning, ML, ML Newsletter

Data Machina #238

Non-stop AI Innovation. text-embedding-3. Voyage-code-2. Yi-VL Vision Language. RAG vs Fine-tuning. Google Lumiere. DuckDB-NSQL-7B. InstantID. Dense X Retrieval. DeepMind GATS. Games with AI Agents.

Non-stop AI Innovation Every Single Week. Well yeah, thats’s right: There is no single week without something new, exciting, or amazing happening in AI. This is a selection of interesting, cool stuff that happened in the last 7 days or so:

OpenAI introduced new, faster, and more efficient embedding models. Buried in the blog announcement, it says: “the new embedding models were trained with a technique that allows developers to shorten embeddings without the embedding losing its concept-representing properties.” Well – for some reason- it seems the blog fails to mention that the technique is called Matryoshka Representation Learning (paper, repo), an encoding method for embedding proposed by Google Research in 2022.


Voyage AI announced voyage-code-2, a new embedding model specifically optimised for code-related applications, including semantic code search/retrieval, code completion, and various functions of general code assistants. The model was evaluated on 11 code retrieval tasks, and performs well against OpenAI’s and Cohere’s models.


The team at 01.ai open-sourced Yi Vision Language (Yi-VL), a new, top performant, multimodal model for enabling content comprehension, recognition, and multi-round conversations about images. Yi-VL is based on the LlaVA vision instruction-tuning architecture, and as a few days ago, it was raking first in all the benchmarks for those kind of open source models. 


MS Research published a new interesting paper: RAG vs Fine-tuning: Pipelines, Tradeoffs, and a Case Study. This is a must-read for those learning on or building LLM apps that involve RAG or fine-tuning. Which method is better? In which cases? In the paper, the researchers propose a pipeline for fine-tuning and RAG tailored for a specific industry domain, and then they present the pros & cons, and tradeoffs of both methods for multiple popular LLMs, including Llama2-13B, GPT-3.5, and GPT-4. Lots of insights, a great read!


The search for new alternatives to the Transformer & Attention Mechanism, and the Mamba Paper published in December have literally triggered a storm of Mamba-based derivative models, papers and memes too. The new idea is to integrate structured state space models (SSMs) into a simplified neural net without attention or even MLP blocks. Interested in SSMs? Checkout this free, interactive book: State Space Models: A Modern Approach with accompanying Python code. And here are the latest papers on Mamba-based models published in January:

  • MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts
  • Vision Mamba: Efficient Visual Representation Learning with State Space Model
  • VMamba: Visual State Space Model
  • MambaByte: Token-free Selective State Space Model

Google Research announced Lumiere, a A Space-Time Diffusion Model for Video Generation (paper, demos.) The model produces truly amazing results in: text-to-video, image-to-video, stylised generation, video stylisation, cinemagraphs, and in painting. The innovation here is the introduction of a new Space-Time U-Net architecture that generates the entire temporal duration of the video at once, through a single pass in the model. Checkout the demo video below:


Have a nice week.

10 Link-o-Troned

  1. The Future of ML: Auto ML + AI Agents
  2. The Big Picture of AI Research
  3. Are We at Peak Vector DB?
  4. Exploring ColBERT with RAGatouille
  5. Recursive Embedding & Clustering at Spotify
  6. AutoML for Content Abuse Detection at LinkedIn
  7. What I Learned Building the ML Platform at Mailchimp 
  8. AI that Quacks: A New Txt-2-SQL Model in DuckDB
  9. A Curated List of Free ML/ DL Youtube Courses
  10. a16z VC : Prosumer 2.0 & Rise of AI Native Workflows

Share Data Machina with your friends


the ML Pythonista

  1. Implement a Sparse Mixture of Experts Model from Scratch
  2. How to Merge Several LLMs into One Model, Jan 2024
  3. How to Build Games with Crew AI Agents

Deep & Other Learning Bits

  1. CMU Spring 2024 Neural Code Generation (slides, papers)
  2. Deep RL + Raspberry Pi Zero for WiFi Pawning
  3. MS Research: The Autoregressive/ Non- Autoregressive Battle in Speech Synthesis

AI/ DL ResearchDocs

  1. InstantID: Zero-shot Identity-Preserving Generation in Seconds
  2. Dense X Retrieval – Propositions as a Novel Retrieval Unit
  3. DeepMind GATS: A New Approach to Combine Pretrained Foundation Models

MLOps Untangled

  1. CI/CD for ML in 2024: Best Practices
  2. Lessons Learnt Operationalising LLM Apps
  3. The Ultimate Guide to ML Model Deployment in 2024

data v-i-s-i-o-n-s

  1. Visual Analysis of Family Spending in the EU
  2. Top 23 Spatial Analysis & Visualisations in 2023 with Carto
  3. [interactive] Visualising 2023 Netflix Engagement Report

AI startups -> radar

  1. FuseMachines – Enterprise AI Platform
  2. OpenDialog – Conversational AI for Regulated Industries 
  3. BlueSheets – AI for Accounting Automation

ML Datasets & Stuff

  1. WebDataset – 12M Img-Txt Pairs for Vision-Language
  2. InstructDoc Dataset for Zero-Shot Visual Doc Understanding
  3. WildRGB-D: 20K RGB-D Vids for Real-World 3D Object Learning

Postscript, etc

Postscript, etc 

Keep up with the very latest in AI / Machine Learning research, projects & repos. A weekly digest packed with AI / ML insights & updates that you won’t find elsewhere

Submit your suggestions, feedback, posts and links to:

datamachina@datamachina.com

Published January 28, 2024
Categorized as AI Newsletter, Artificial Intelligence, Deep Learning, DL Newsletter, Machine Learning, ML Newsletter Tagged AI, AI Newsletter, Artificial Intelligence, Deep Learning, DL, DL NEwsletter, Machine Learning, ML, ML Newsletter
  • Follow on X
DATA MACHINA AI NEWSLETTER – A DEEP DIVE INTO AI/ ML, EVERY WEEK
Proudly powered by WordPress.