APRIL

6. Introducing the Luma✨Unreal Engine alpha! Fully volumetric Luma NeRFs running realtime on Windows in UE 5 for incredible cinematic shots and experiences

7. Baize, an open-source chat model trained with ChatGPT self-chat data. Releasing 150k high-quality dialogs with 7B, 13B and 30B models

8. Berkley just released Koala-13B - The open-source chatbot was trained by fine-tuning LLaMA on web dialogue

9. Meta's Segment Anything Model (SAM) — a step toward the first foundation model for image segmentation

10. Open-sourcing “Baby AGI”, a paired down version of the “Task-Driven Autonomous Agent” at 105 lines of code

11. Latent Video Diffusion Models for High-Fidelity Long Video Generation

12. Google's TPU v4: An Optically Reconfigurable Supercomputer for Machine Learning with Hardware Support for Embeddings

13. Kandinsky 2.1: A new open source image generation model

14. Follow Your Pose: Pose-Guided Text-to-Video Generation using Pose-Free Videos

15. OpenAI's Blog - Our approach to AI Safety

16. Introducing DribbleBot: A robot that can dribble a soccer ball on diverse natural terrains

17. SegGPT: Segmenting Everything In Context

18. DITTO-NeRF: Diffusion-based Iterative Text To Omni-directional 3D Model

19. Excited to introduce: StackLlama - An end-to-end tutorial for training Llama with RLHF on preference data such as the StackExchange questions

20. Introducing ChatArena - a Python library of multi-agent language game environments that facilitates communication and collaboration between multiple large language models (LLMs)

21. Google AI: How Project Starline improves remote communication

22. Nvidia's Generative Novel View Synthesis with 3D-Aware Diffusion Models

23. Taming Encoder for Zero Fine-tuning Image Customization with Text-to-Image Diffusion Models

24. Generative Agents: Interactive Simulacra of Human Behavior (One step closer to The Matrix/West World)

25. InstantBooth: Personalized Text-to-Image Generation without Test-Time Finetuning

26. Grounded-Segment-Anything: Marrying Grounding DINO with Segment Anything & Stable Diffusion & BLIP - Automatically Detect , Segment and Generate Anything with Image and Text Inputs

27. OpenAGI: When LLM Meets Domain Experts

28. Anyone can build an AI app with no code in 5 minutes using Imagica. Plus, it's multimodal: feed it text, images, video, and 3D Models

29. Introducing: MemoryGPT. It’s ChatGPT but with long term memory. It will remember the things you say and will be able to personalize your conversation based on that

30. Caption-Anything - a versatile image processing tool that combines the capabilities of Segment Anything, Visual Captioning, and ChatGPT

31. Prompt Pre-Training with Twenty-Thousand Classes for Open-Vocabulary Visual Recognition

32. BabyAGI + Langchain Tools

33. Neural Lens Modeling

34. Teaching Large Language Models to Self-Debug

35. DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales

36. Databricks just released Dolly 2.0: the first open-source, instruction-following LLM licensed for commercial use

37. OpenAI released a implementation of Consistency Models

38. Deep RL at Scale: Sorting Waste in Office Buildings with a Fleet of Mobile Manipulators

39. Emergent autonomous scientific research capabilities of large language models

40. ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation

41. DreamPose: Fashion Image-to-Video Synthesis via Stable Diffusion

42. Unveiling Personal Assistant - HyperWriteAI's groundbreaking AI agent that can use a web browser like a human

43. Announcing Amazon Bedrock—giving customers the easiest way to build and scale generative AI applications with access to the leading foundation models

44. Microsoft DeepSpeed Chat offers an end-to-end RLHF pipeline to train ChatGPT-like Models

45. Meta's Animated Drawings, a first-of-its-kind #OpenSource project of annotated amateur drawings aimed at helping researchers easily create their own drawing-to-animation experiences or products

46. Segment Everything Everywhere All at Once

47. Zip-NeRF: Anti-Aliased Grid-Based Neural Radiance Fields

48. Expressive Text-to-Image Generation with Rich Text

49. Elon Musk has created a new artificial intelligence company called X.AI that is incorporated in Nevada

50. OpenAssistant Conversational AI Release

51. A NEW PAELLA: SIMPLE & EFFICIENT TEXT-TO-IMAGE GENERATION

52. MiniGPT-4, an open-sourced model performing complex vision-language tasks like GPT-4!

53. Inpaint Anything: Segment Anything Meets Image Inpainting

54. Google's Delta Denoising Score

55. Multimodal C4: An Open, Billion-scale Corpus of Images Interleaved With Text

56. Meta's DINOv2: Learning Robust Visual Features without Supervision

57. Announcing RedPajama — a project to create leading, fully open-source large language models, beginning with the release of a 1.2 trillion token dataset that follows the LLaMA recipe

58. Elon Musk to create TruthGPT in rival to ChatGPT

59. Synthetic Data from Diffusion Models Improves ImageNet Classification (Generative Models Improve Themselves)

60. LLaVA - Visual Instruction Tuning

61. MasaCtrl: Tuning-Free Mutual Self-Attention Control for Consistent Image Synthesis and Editing

62. Tool Learning with Foundation Models

63. Google AI - Video Generation Beyond a Single Clip

64. AnthropicAI's Claude-v1.3

65. Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models (Text to Video)

66. Avatars Grow Legs: Generating Smooth Human Motion from Sparse Tracking Inputs with Diffusion Model

67. Generative Disco: Text-to-Video Generation for Music Visualization

68. NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers

69. Today, Airbus announced its LOOP space station module designed for long term space missions

70. Stability Al announces StableLM - the first of their large language models, starting with 3B and 7B param models, with 15-65B to follow!

71. Whisper JAX ⚡️ is a highly optimised Whisper implementation for both GPU and TPU (70x faster than Whisper)

72. Bark - an Open Source Audio Generation Model

73. h2oGPT is out. A new 20 billion parameter instruction-following large language model licensed for commercial use

74. AMT: All-Pairs Multi-Field Transforms for Efficient Frame Interpolation

75. Reference-based Image Composition with Sketch via Structure-aware Diffusion Model

76. Reference-guided Controllable Inpainting of Neural Radiance Fields

77. Pretrained Language Models as Visual Planners for Human Assistance

78. Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models

79. NeuralField-LDM: Scene Generation with Hierarchical Latent Diffusion Models

80. DeepMind and the Brain team from Google Research will become a new unit: Google DeepMind

81. Google's Bard adds coding & debug abilities in 20+ languages

82. 3DCoMPaT++: a richly annotated, multimodal 2D/3D dataset of more than 10 million stylized 3D shapes

83. Ask-Anything, tool for chatting about video with chatGPT, miniGPT4 and StableLM

84. Scaling Transformer to 2 Million tokens and beyond with RMT

85. Introducing Chatbot Arena - Which LLM is better?

86. This AI Can Design Complex Proteins Perfectly Tailored to Our Needs

87. Track Anything: Segment Anything Meets Videos

88. Segment Anything in 3D with NeRFs

89. HOSNeRF: Dynamic Human-Object-Scene Neural Radiance Fields from a Single Video

90. AutoNeRF: Training Implicit Scene Representations with Autonomous Agents

91. Speed Is All You Need: Optimizing Stable Diffusion For Mobile Devices (under 12 seconds)

92. HuggingChat: open source alternative to ChatGPT

93. Stability AI's Image Upscaling API - upscale any image without losing any sharpness

94. Towards Realistic Generative 3D Face Models

95. Patch-based 3D Natural Scene Generation from a Single Example

96. AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head

97. Patch Diffusion: Faster and More Data-Efficient Training of Diffusion Models

98. TextMesh: Generation of Realistic 3D Meshes From Text Prompts

99. Replit announces replit-code-v1-3b : a code language model that is 2.7B parameters and Open Source

100. Code for IF by deepfloydai is up - a text to image diffusion model (open source)

101. TextDeformer: Geometry Manipulation using Text Guidance

102. Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware

103. Deepmind: Learning Agile Soccer Skills for a Bipedal Robot with Deep Reinforcement Learning

104. Unleashing Infinite-Length Input Capacity for Large-scale Language Models with Self-Controlled Memory System

105. Microsoft Designer expands preview with new AI design features

106. Introducing Eleven Multilingual v1: Our New Speech Synthesis Model

107. ChatVideo: A Tracklet-centric Multimodal and Versatile Video Understanding System

Search This Blog

PrO_RaZe 2.0

APRIL

Comments

Post a Comment

Popular posts from this blog

Meta's Animated Drawings, a first-of-its-kind #OpenSource project of annotated amateur drawings aimed at helping researchers easily create their own drawing-to-animation experiences or products #959

Meta AI: Introducing Meta Segment Anything Model 2 (SAM 2) — the first unified model for real-time, promptable object segmentation in images & videos #1621

Google AI: Introducing Mirasol, a multimodal model for learning across audio, video, & text #1661