Optimized PPA with Cadence Tensilica DSP FPU


0

Applications such as audio and video processing, radar, communications, drive control, virtual reality (VR), augmented reality (AR), and more recently also artificial intelligence (AI) algorithms rely heavily on DPSs (Signan digital processors) because they take the digital forms of physical analog signals and mathematically manipulate them by following algorithms specific. Many DSPs only support fixed-point mathematical representation and enable rendering of the precision required by most scientific applications, on the other hand, although floating-point offers a more relevant and accurate way of representing real-world data, which are essentially analog signals. Also, floating point allows integration and navigation with DSP code as IEEE754 arithmetic representation is widely used by aggregation, development and modeling tools. “Tensilica has been in the DSP area for a very long time. We have offered proven products for voice, sound, radar, LiDAR, and computer vision (including artificial intelligence). This is a new family,” said Ted Chua, Director of Product Management and Marketing, Tensilica DSPs in Cadence. Fully within the Tensilica portfolio, our first DSPs specifically designed to support floating point arithmetic.The new Cadence Tensilica FloatingPoint DSP family delivers scalable performance for a wide range of compute-intensive applications featuring extremely low power consumption.The Low Power DSP IP protocol optimizes power, performance, and space ( PPA), allowing space savings of up to 40% for mobile, automotive, consumer and ultrawide computing applications, and provides an easy programming environment for seamless program migration.” There are applications, such as motor control, where a floating point can do a much better job than a point system The Tensilica FloatingPoint DSP family extends the DSP IP family, Chua said. New floating-point, PPA-optimized devices range from small, ultra-low-end to ultra-high-performance devices, providing energy-efficient solutions for the most challenging applications, including battery-powered devices, artificial intelligence (AI), machine learning (ML) and engine control. Disks and sensor integration, augmented reality (AR) and virtual reality (VR). Based on the Tensilica Xtensa 32-bit RISC microarchitecture, the new family (Fig. 1) includes four cores: Tensilica FloatingPoint KP1 DSP, Tensilica FloatingPoint KP6 DSP, Tensilica FloatingPoint KQ7 DSP, and Tensilica FloatingPoint KQ8 DSP. The new DSPs not only offer high scalability from 128-bit vector width to 1024-bit vector width, but can also be configured to enable capabilities only required by specific applications, from energy-efficient solutions for battery-powered devices to high-performance computing (HPC). Figure 1: Tensilica FloatingPoint DSP Family The new family DSP cores share a common instruction set architecture (ISA) with the optional vector floating point unit (VFPU) from Tensilica DSPs and feature vector widths scalable from 128-bit SIMD to 1024-bit SIMD on both platforms. Tensilica Xtensa LX and NX. Performance with respect to Tensilica fixed-point DSPs is improved using the VFPU add-on, with a 25% increase in operating throughput in combined multiplexers (FMA). Performance can be further improved and highlighted using Tensilica Help Extension Language (TIE), a proprietary Verilog-like language that allows Cadence’s proprietary Verilog to identify custom processes that are automatically combined and recognized by the Xtensa toolchain. In addition, FloatingPoint DSPs offer up to 40% savings in area compared to a similar class of fixed point DSPs with VFPUs. Figure 2: Tensilica FloatingPoint KQ7 and KQ8 block diagram. As shown in Figure 2, the scalable Tensilica FloatingPoint DSP family provides SoC designers with design flexibility capable of meeting their PPA budget envelope. For energy-sensitive applications, the FloatingPoint KP1 DSP provides an extremely low power consumption solution, suitable for battery powered applications. FloatingPoint KP6 DSP offers a convenient compromise between high performance and space reduction, delivering excellent performance for each unit space design. For high-performance applications, both FloatingPoint KQ7 and KQ8 DSPs provide maximum operational throughput for a family-oriented floating point. In addition, the common ISA architecture simplifies portability and migration of programs. FloatingPoint DSPs also provide support for custom interfaces, such as queues and ports, simplifying communication and integration with external device blocks or for matching interfaces provided by existing third-party IP addresses. Most challenging applications are rapidly evolving and moving from the cloud to the edge. Computer vision, IoT sensors, self-driving cars, and smart devices are just a few examples where artificial intelligence (AI) algorithms are driven to the edge, providing embedded systems with enhanced and independent decision-making skills. All of these applications need a family of floating point DSP cores that can meet different market needs, reduce time to market, and improve power, performance and silicon area to keep product costs competitive. “Today, edge AI inference is primarily performed using fixed point accelerators. A floating point DSP provides an option to implement AI inference or training in a floating point format, and we all know that neural network training is done using a floating point representation,” Chua said. As mentioned earlier, configurability is another key and relevant factor in the Tensilica FloatingPoint DSP family. Chua commented, “Our Tensilica DSP is configurable, which means designers can select only the hardware features they need, without draining unnecessary power.” Among the most useful options is the possibility of scatter aggregation, which allows the designer to load data from a specific memory location and put it in a vector format. “The floating point unit within a DSP is a vector machine. For data that is not stored in sequential memory locations, the scattering feature allows you to load dispersed data into a single vector format, which improves overall performance,” Chua added. In terms of software development related aspects, Tensilica FloatingPoint DSP comes with a full suite of software tools, including a high-performance C/C++ compiler with automatic routing, help packages to support VLIW pipeline in DSP, linker, assembler, debugger, profiler , and graphical visualization. A useful tool is Instruction Set Simulation (ISS), which allows designers to quickly simulate and evaluate performance. When working with large systems or lengthy test buses, the Tensilica TurboXim simulation option claims to achieve speeds 40 times to 80 times faster than ISS for efficient software development and functional validation. Tensilica Xtensa SystemC (XTSC) and C-based Xtensa Modeling Protocol (XTMP) models are available for full chip simulation. Pin-level XTSC offers common simulation or SystemC and RTL-level dump accelerator blocks for fast and accurate simulation. Tensilica FloatingPoint DSPs support all major backend EDA streams, including the enhanced Eigen library, NatureDSP library, SLAM (Synchronous Localization and Mapping) library, and math software libraries, making the migration and migration process for floating point programs much easier. “With our family of floating point DSPs, we offer a suite of software tools in common with all other software tools for Tensilica DSPs. For any developer familiar with Tensilica software tools, there is really no learning curve, as it is exactly the same tool,” Chua said. This article was originally published on the sister website Embedded.com.


Like it? Share with your friends!

0

What's Your Reaction?

hate hate
0
hate
confused confused
0
confused
fail fail
0
fail
fun fun
0
fun
geeky geeky
0
geeky
love love
0
love
lol lol
0
lol
omg omg
0
omg
win win
0
win
Joseph

0 Comments

Your email address will not be published. Required fields are marked *