MAY

 MAY 2023 Ai/Tech updates: 

1. Nvidia released a 2b param model trained on 1.1T Tokens

2. Brain activity decoder can reveal stories in people’s minds

3. ArK: Augmented Reality with Knowledge Interactive Emergent Ability

4. GeneFace++: Generalized and Stable Real-Time Audio-Driven 3D Talking Face Generation

5. Inflection AI, Startup From Ex-DeepMind Leaders, Launches Pi — A Chattier Chatbot

6. Generalizing Dataset Distillation via Deep Generative Prior

7. Pick-a-Pic: An Open Dataset of User Preferences for Text-to-Image Generation

8. Introducing LLaVA Lightning: Train a lite, multimodal GPT-4 with just $40 in 3 hours

9. Unlimiformer: Long-Range Transformers with Unlimited Length Input

10. Learning Physically Simulated Tennis Skills from Broadcast Videos

11. 7B OpenLLaMA model that has been trained with 200 billion tokens on the RedPajama dataset

12. AG3D: Learning to Generate 3D Avatars from 2D Image Collections

13. Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes

14. TMR: Text-to-Motion Retrieval Using Contrastive 3D Human Motion Synthesis

15. Microsoft’s Bing chatbot gets smarter with restaurant bookings, image results, smart and persistent chat history, video answers, and even plug-ins & multimodality

16. CLIP ViT-L/14 model with 79.2% zero-shot accuracy on ImageNet

17. OpenAI's Shap-E: Generating Conditional 3D Implicit Functions

18. StarCoder is a 15B LLM for code with 8k context and trained only on permissive data in 80+ programming languages (open source)

19. MaMMUT: A simple vision-encoder text-decoder architecture for multimodal tasks

20. AutoML-GPT: Automatic Machine Learning with GPT

21. Nvidia Real-Time Neural Appearance Models

22. Personalize Segment Anything Model with One Shot

23. MPT-7B: A New Standard for Open-Source, Commercially Usable LLMs (up to 65k tokens context length & trained on 1T tokens)

24. The first RedPajama models are here! The 3B and 7B models are now available under Apache 2.0 license, including instruction-tuned and chat versions

25. Dolphin is a chatbot that can interact with videos, spanning from video understanding to generation/Editing

26. Composite Motion Learning with Task Control

27. Multi-Space Neural Radiance Fields

28. Locally Attentional SDF Diffusion for Controllable 3D Shape Generation

29. Announcing Nyric, an AI world-generation platform for digital communities.Build the world of your dreams in seconds

30. ImageBind: Meta's latest multimodal embedding, covering not only the usual suspects (text, image, audio), but also depth, thermal (infrared), and IMU signals (open source)

31. Introducing LeMUR, short for Leveraging Large Language Models to Understand Recognized Speech

32. OpenAI GPT-4 to interpretability — automatically proposing explanations for GPT-2's 300k Neurons

33. TidyBot: Personalized Robot Assistance with Large Language Models

34. FrugalGPT: How to Use Large Language Models While Reducing Cost and Improving Performance

35. MultiModal-GPT: A Vision and Language Model for Dialogue with Humans

36. Announcing the Haven-1 and Vast-1 missions to low-Earth orbit. Launched by SpaceX, Haven-1 is scheduled to be the world’s first commercial space station

37. Hugging Face's Transformer Agents - Control 100,000+ HF models by talking to Transformers and Diffusers

38. Bard available in over 180 countries and territories including India and upgraded to PaLM 2

39. Google introduces PaLM 2 and it is coming to more than 25 products of Google

40. Google's Generative AI is coming to search

41. Google IO 2023 wrapped up

42. Synthesia research releases HumanRF: High-Fidelity Neural Radiance Fields for Humans in Motion

43. AnthropicAI expanded Claude’s context window to 100,000 tokens of text, corresponding to around 75K words

44. EfficientViT: Memory Efficient Vision Transformer with Cascaded Group Attention

45. Assisted Generation: a new direction toward low-latency text generation (3x faster)

46. Artificial intelligence identifies anti-aging drug candidates targeting 'zombie' cells

47. OpenAI: We’re rolling out web browsing and Plugins to all ChatGPT Plus users over the next week! Moving from alpha to beta, they allow ChatGPT to access the internet and to use 70+ third-party Plugins

48. 100k context windows now available on Poe: we are excited to start a beta test for Claude-instant-100k

49. MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers

50. HACK: Learning a Parametric Head and Neck Model for High-fidelity Animation

51. Consensus and subjectivity of skin tone annotation for ML fairness

52. Microsoft's TinyStories: How Small Can Language Models Be and Still Speak Coherent English

53. Google: Using reinforcement learning for dynamic planning in open-ended conversations

54. Multiple fully Tesla-made Bots now walking around & learning about the real world

55. Introducing Phoenix: a revolutionary humanoid general-purpose robot designed for work

56. Google's SoundStorm: Efficient Parallel Audio Generation - Produces audio of comparable quality to AR models while being two orders of magnitude faster

57. Improving Language Model Negotiation with Self-Play and In-Context Learning from AI Feedback

58. FitMe: Deep Photorealistic 3D Morphable Model Avatars

59. Understanding 3D Object Interaction from a Single Image

60. Make-An-Animation: Large-Scale Text-conditional 3D Human Motion Generation

61. AutoRecon: Automated 3D Object Discovery and Reconstruction

62. Dr. LLaMA: Improving Small Language Models in Domain-Specific QA via Generative Data Augmentation

63. The next iteration of Perplexity has arrived: Copilot, your interactive AI search companion (powered by GPT-4)

64. “slick” RLHF-alternative without RL

65. LDM3D: Latent Diffusion Model for 3D (text to 360 image)

66. Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold

67. BlockadeLabs announced a Sketch mode for its Skybox AI image generator, which creates environments based on the lines you draw and your text prompt

68. LIMA: Less Is More for Alignment

69. Any-to-Any Generation via Composable Diffusion - capable of generating any combination of output modalities, such as language, image, video, or audio, from any combination of input modalities

70. Cinematic Mindscapes: High-quality Video Reconstruction from Brain Activity

71. Text2NeRF: Text-Driven 3D Scene Generation with Neural Radiance Fields

72. RoomDreamer: Text-Driven 3D Indoor Scene Synthesis with Coherent Geometry and Texture

73. Meta's MMS: Massively Multilingual Speech - Can do speech2text and text speech in 1100 languages

74. Announcing, Windows Copilot. Integrating the power of Bing Chat across all of Windows and all your apps

75. Google introduces Product Studio, a tool that lets merchants create product imagery using generative AI

76. Adobe just added their first Generative AI tool to Photoshop

77. Intel Announces Aurora genAI, Generative AI Model With 1 Trillion Parameters

78. Introducing Microsoft Fabric: Data analytics for the era of AI

79. Control-A-Video: Controllable Text-to-Video Generation with Diffusion Models

80. Enhancing Detail Preservation for Customized Text-to-Image Generation: A Regularization-Free Approach

81. QLoRA: Efficient Finetuning of Quantized LLMs

82. Goat: Fine-tuned LLaMA Outperforms GPT-4 on Arithmetic Tasks

83. Introducing #ClinicalCamel 🐪: an innovative open-source conversational #LLM engineered specifically for healthcare

84. SiamMAE: Siamese Masked Autoencoders for self-supervised representation learning from videos

85. Microsoft just announced Power Virtual Agent, a new generative actions engine in its chatbot builder

86. Bard will include images in responses for relevant prompts, including when you specifically ask for images

87. Stable Diffusion “Reimagine XL” model

88. Alexandria, an open-source initiative to embed the internet, starting with Arxiv

89. Neuralink recieved its first Human clinical trial approval from FDA

90. New open-source LLMs called Falcon, which comes into size 7B trained on 1.5T tokens and 40B trained on 1T Tokens

91. ProlificDreamer: High-Fidelity and Diverse Text-to-3D Generation with Variational Score Distillation

92. Voyager: An Open-Ended Embodied Agent with Large Language Models (LLM playing Minecraft)

93. state-of-the-art fMRI-to-image approach that retrieves and reconstructs images from brain activity

94. A Neural Space-Time Representation for Text-to-Image Personalization (similar to Dreambooth)

95. ControlVideo: Adding Conditional Control for One Shot Text-to-Video Editing

96. Think Before You Act: Decision Transformers with Internal Working Memory

97. PandaGPT: One Model To Instruction-Follow Them All

98. Break-A-Scene: Extracting Multiple Concepts from a Single Image

99. Photoswap: Personalized Subject Swapping in Images

100. Generating Images with Multimodal Language Models

101. Nvidia showcases real-time AI conversation in a game using voice

102. AlteredAvatar: Stylizing Dynamic 3D Avatars with Fast Style Adaptation

103. StyleAvatar3D: Leveraging Image-Text Diffusion Models for High-Fidelity 3D Avatar Generation

104. OpenAI: We trained an AI using process supervision — rewarding the thought process rather than the outcome — to achieve new state-of-art in mathematical reasoning

105. UAE’s Falcon 40B, World’s Top-Ranked AI Model from Technology Innovation Institute is Now Royalty-free

106. Large sequence models for software development activities

107. Japan Goes All In: Copyright Doesn’t Apply To AI Training

Comments

Popular posts from this blog